Henrik Staun Poulsen wrote:
Your solution does also find those FirstNames that does not really have a lowercase version, i.e. where FirstName starts with a digit or a dash or a dot...
I would probably try something like:
where left(su.name,1) = lower(left(su.name,1)) collate SQL_Latin1_General_CP1_CS_AS
and left(su.name,1) <> upper(left(su.name,1)) collate SQL_Latin1_General_CP1_CS_AS
Hello Henrik. Excellent point, thanks for catching that. Yes, I think that I would also approach it that way (as long as it's ok to exclude characters from alphabets that do not have the concept of upper-case/lower-case).
The specific collation to use would be the case-sensitive version of the collation already being used by that column. In this case, @btylerwhite did use the correct collation of
Also keep in mind that when working with non-Latin alphabets, it helps to use the most recent version of the desired collation (i.e. a 100 version instead of a 90 or non-versioned) as the newer versions have more upper-case to lower-case mappings which are used by the
LOWER functions. In some cases, using the
Japanese_XJIS_140_* series is best as those are the most recent and have the most mappings (e.g.
Japanese_XJIS_140_CS_AS ). Even if you aren't working with Japanese characters, the upper-case/lower-case mappings are correct for all cultures except for Turkish and one other that uses the same dotted upper-case "I" and dotless lower-case "i".
Henrik Staun Poulsen wrote:
My database (Azure SQL database) is using SQL_Latin1_General_CP1_CI_AS, so should I be using SQL_Latin1_General_CP1_CS_AS?
Whether or not to use a case-sensitive, or even binary, collation for the instance and/or database levels depends on you and your application. There is no correct answer. Case-insensitive collations are easier to interact with as you can use PascalCase for your object names (i.e. the correct approach 😉 ) yet not worry about the casing when writing ad hoc queries / debugging. However, that also allows for some sloppiness if the casing used in the code does not match the actual casing of the objects. Binary and case-sensitive collations enforce consistent casing between code and objects but are less forgiving when you might be in a hurry (i.e. debugging) and allow for a different form of sloppiness: objects having similar names that only differ in casing (e.g.
The bigger issue, for me, would be the use of the older SQL Server collations (i.e. names starting with
SQL_ ) as they have fewer mappings (not important for US English, but certainly can be for other languages), fewer characters with defined sort weights, and they allow for performance degradation when
VARCHAR columns are indexed but then compared to
NVARCHAR literals or variables (for full details on that, please see: Impact on Indexes When Mixing VARCHAR and NVARCHAR Types ). Hence, when working on SQL Server 2012 or newer, I would recommend using the
Latin1_General_100_ series, preferably one ending in
_SC when not using a