RE: Finding special characters within character strings

Hall of Fame

Points: 3992

August 19, 2014 at 11:17 am

Thanks for the article! One clarification:

Also notice that “Ça va” is missing. If the column is case sensitive, should that not have been captured? Again this is not a course on regular expressions but note: the regular expression does not see the accented characters as special. Meaning (é and e) or (Ä and A) are evaluated as the same, as far as the expression is concerned.

This is incorrect, they are not evaluated the same. When checking ranges in regular expressions, the collation setting is used. When you specify [A-Z], this includes every character alphabetically sorted between 'A' and 'Z' according to the collation. 'Ç' is sorted between 'C' and 'D' alphabetically in this collation, and so is included in [A-Z].

SELECT 'IT WORKS'
WHERE 'C' COLLATE SQL_Latin1_General_CP1_CI_AS LIKE ('%[C-C]%');
SELECT 'IT WORKS'
WHERE 'C' COLLATE SQL_Latin1_General_CP1_CI_AS LIKE ('%[B-C]%');
SELECT 'IT WORKS'
WHERE 'C' COLLATE SQL_Latin1_General_CP1_CI_AS LIKE ('%[C-D]%');
SELECT 'IT WORKS'
WHERE 'Ç' COLLATE SQL_Latin1_General_CP1_CI_AS LIKE ('%[C-C]%');
SELECT 'IT WORKS'
WHERE 'Ç' COLLATE SQL_Latin1_General_CP1_CI_AS LIKE ('%[B-C]%');
SELECT 'IT WORKS'
WHERE 'Ç' COLLATE SQL_Latin1_General_CP1_CI_AS LIKE ('%[C-D]%');

In 1-3, you can see that a plain 'C' matches all 3 ranges. However, in 4-6, the accented 'Ç' character only matches the range [C-D].