String collation

  • Hi:

    Given any random data string, is there a way to know the collation of the string? I guess more than one may apply and that is okay too. I am basically trying to find out if any given string has one or more cyrillic characters and really don't want to use any kind of a looping mechanism.

    Please note that I am not asking about the default collation of a database or a column

    Thanks in advance for your time and help.

    Regards,

    krish.

  • krishp (12/14/2012)


    Hi:

    Given any random data string, is there a way to know the collation of the string? I guess more than one may apply and that is okay too. I am basically trying to find out if any given string has one or more cyrillic characters and really don't want to use any kind of a looping mechanism.

    Please note that I am not asking about the default collation of a database or a column

    Thanks in advance for your time and help.

    Regards,

    krish.

    Collation is not determined by the content of string. Basically string itself has nothing to do with collation.

    Collation is a property of the storage which defines the use of characters in the language (sort order for unicode and non-unicode character datatypes and code page used to store non-unicode datatype).

    To determine if string contains Cyrillic characters, your best try would be creating CLR function (with RegEx) to do so, as you cannot use ASCII code to find out if the character is actually a Cyrillic one.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Strings are just numbers to computers. Without metadata in the storage, they're no different than any other data.

    You wouldn't need to loop through a string to find whether it has Cyrilic letters in it.

    Note: Cyrilic hex-range from: http://en.wikipedia.org/wiki/Cyrillic_(Unicode_block)

    Try something like this:

    DECLARE @S1 NVARCHAR(100) = N'ABC?EFG??' -- N'ABCDEFGHIJKLMNOP';

    IF EXISTS ( SELECT *

    FROM dbo.Numbers AS N -- Numbers table has integer values from 0 to 10,0000

    WHERE N.Number BETWEEN 1 AND LEN(@S1)

    AND CAST(UNICODE(SUBSTRING(@S1, Number, 1)) AS VARBINARY(100)) BETWEEN 0x00000400 AND 0x000004FF )

    PRINT 'Yep, Cyrilic'

    ELSE

    PRINT 'Nope';

    If you don't have a Numbers table (they're very useful and you might want to create one), there are ways to create one on-the-fly. http://www.bing.com/search?q=sql+server+numbers+table&qs=n&form=QBLH&pq=sql+server+numbers+table&sc=1-24&sp=-1&sk=

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply