Manage international considerations

May include but is not limited to: collations, defining custom errors, filtering data, sort order, nvarchar, database collation, column collation

SQL Server supports a tremendous set of different cultures and languages, which leads to interesting things. For a second, try to imagine my situation. As a native Hungarian, I constantly have to battle with characters like ‘ő’ or ‘í’, and so on. Let’s I send you a table with the names of my customers, who happen to be Hungarians, too. How would you sort this table, with names like Álmos or Éva?

Every language and accent has sorting rules and SQL Server respects those rules, and enforces them within collations. In Hungary, the letter á comes immediately after a, and not after z. But in Latin, á is seen as an accent variable of a. Now the sorting is obfuscated! Even worse, what if we’re filtering? In Latin, ‘á’ = ‘a’ would return true, but in Hungarian, it certainly isn’t the case.

If you create a database, SQL Server asks you which collation you’d like to use (or uses the default one, if you don’t specify anything). You have the ability to override this setting on the column level, using the COLLATE keyword, and the collation name. For example:

CREATE TABLE GreekTable
( Name nvarchar(20) COLLATE greek_ci_as)

I’m sure you found out which part of the code above sets the collation to Greek. The other two values CI means case insensitive (CS is the case-sensitive one), and AS means accent-sensitive (guess how they mark accent-insensitive).

The last thing in this post will be the SQL string types, stay tuned! We’ve all encountered them, char, nchar, varchar, nvarchar, text, ntext… But what do they do? The good news is that text and ntext are deprecated, so avoid using them. The others: n means Unicode, var means variable length. When you use char and varchar (the ones without the n prefix), you use the underlying collation to specify the code page of the string. This is not the case with Unicode strings, which can store any character, because it uses two bytes for each of them. In both cases, the specified collation determines the filtering and sorting rules.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s