character ordering

  • Ross.M (11/1/2012)


    honza.mf (11/1/2012)


    Ross.M (11/1/2012)


    ... however for the sake of understanding string comparisons, we have no use for knowing how many characters are between 0 and Z, we just need to know the collation type and have an ascii table handy.

    This is one of the tricks of this question. Collation and ascii tables are two different things.

    Collation ordering does not copy ascii ordering of characters, it's little bit more sofisticated.

    I understand that, my point though, is that the order and total are irrelevant, we never need to know how many characters are between 0 and Z on a particular collation.

    See Tom's answer above. Yes the number is pointless. The methods how to obtain it are important.



    See, understand, learn, try, use efficient
    © Dr.Plch

  • I'd agree that the intent was more important than the answer. Well done.

    ------------
    Buy the ticket, take the ride. -- Hunter S. Thompson

  • Nils Gustav Stråbø (11/1/2012)


    vk-kirov (11/1/2012)


    Note that the QOD query will return 139 characters if it runs on a database with the Latin1_General_CI_AS (or similar) collation. When running on a database with another collation, the results may vary. For example, for a Vietnamese_CI_AS database the query returns 131 characters, for a Cyrillic_General_CS_AS database – 158 characters, for a Japanese_CI_AS_KS_WS database – 122 characters, for a SQL_EBCDIC273_CP1_CS_AS (?!) database – 15 characters. But the answer given is correct though.

    That is because the query in the answer is missing a COLLATE for the '0' in BETWEEN, or you can simply use SQL Kiwi's example.

    Not so simple 🙂

    Try the following code (based on Paul's query):

    CREATE DATABASE qod_collation_db COLLATE Japanese_CI_AS_KS_WS;

    GO

    USE qod_collation_db;

    GO

    WITH Numbers AS

    ( SELECT 0 AS n

    UNION ALL

    SELECT n + 1

    FROM Numbers

    WHERE n <= 255

    )

    SELECT n AS code, CHAR(n) AS symbol

    FROM Numbers AS n

    WHERE CHAR(n) COLLATE Latin1_General_CI_AS BETWEEN '0' AND 'Z'

    OPTION(MAXRECURSION 256);

    GO

    USE master;

    GO

    DROP DATABASE qod_collation_db;

    GO

    It returns 62 characters. With the Cyrillic_General_CS_AS collation, you'll get 63 characters; with Vietnamese_CI_AS – 131 etc.

  • SQL Kiwi (11/1/2012)


    derek.colley (11/1/2012)


    I'm afraid this question was just too obscure for me. It's a good question, sure, but why would I need to know this or reference it in any way? I mean, we have hundreds of databases, most of which actually use this collation, and this nugget has never, and will never come in useful.

    The specific answer isn't useful, but the process of writing a query to answer it surely could be. I had no idea of the correct answer so I wrote a query very similar to that given in the answer:

    SELECT *

    FROM dbo.Numbers AS n

    WHERE

    CHAR(n) COLLATE Latin1_General_CI_AS

    BETWEEN '0' AND 'Z';

    The idea of QotDs that require the reader to write T-SQL code to find the answer intrigues me. If that was Tom's intention (as I suspect it was) and/or to highlight the usefulness of a Numbers table, I applaud him.

    What's interesting for me is that the query from the original question returns 120 rows.

    select CHAR(I),I from Tally

    where char(I) between '0' and 'Z' collate latin1_general_ci_as

    and I < 256 order by CHAR(I)

    However, your query gives 139. I also realize why here (one's apply the collation to 0 thru Z the other to the result of the CHAR function. But I did find that interesting.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • Oh and of course I got it wrong... because I just went with

    SELECT ASCII('Z')-ASCII('0')

    Sadly I'm just too dang American... A thru Z, 0 thru 9... sure... but ASCII isn't Latin1_General_CI_AS



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • mtassin (11/1/2012)


    Oh and of course I got it wrong... because I just went with

    SELECT ASCII('Z')-ASCII('0')

    That's pretty much what I did, but I did it in my head rather than using code. I was 100% sure this would be the wrong answer, mind you, I was just curious to see what the right one was! 🙂

  • I was dumbfounded after the correct answer was not listed: 11. The question clearly states "How many 8-bit characters are there between '0' and 'Z' (inclusive) in the collation Latin1_General_CI_AS?"

    There are exactly 11 8-bit characters in the string ['0' and 'Z'] 😛

    I thought this crowd liked to be literal to the point of absurdity?

    Aigle de Guerre!

  • mtassin (11/1/2012)


    What's interesting for me is that the query from the original question returns 120 rows.

    select CHAR(I),I from Tally

    where char(I) between '0' and 'Z' collate latin1_general_ci_as

    and I < 256 order by CHAR(I)

    However, your query gives 139. I also realize why here (one's apply the collation to 0 thru Z the other to the result of the CHAR function. But I did find that interesting.

    Actually, I suspect that the correct way to write the condition, because a collate clause applies only to a single column or constant or variable, is

    where CHAR(I) COLLATE Latin1_General_CI_AS

    BETWEEN '0' COLLATE Latin1_General_CI_AS

    AND 'Z' COLLATE Latin1_General_CI_AS

    and I < 256

    That would (I hope) ensure that default collations are not used anywhere, so whatever the server default and the database default are the query will always deliver the same answer.

    It was careless of me to use a query that doesn't correctly eliminate default collations in my explanation.

    Tom

  • paul.knibbs (11/1/2012)


    mtassin (11/1/2012)


    Oh and of course I got it wrong... because I just went with

    SELECT ASCII('Z')-ASCII('0')

    That's pretty much what I did, but I did it in my head rather than using code. I was 100% sure this would be the wrong answer, mind you, I was just curious to see what the right one was! 🙂

    ditto!

  • I don't think the collation specification is needed after the single character constants ('0' or 'Z') in the "between"-expression, but it is after the expression 'CHAR(i)' - at least, this query gives me the right number of characters (139), although my server collation is different than latin1_general_ci_as. (it is sql_latin1_general_cp1_ci_as).

    SELECT i,CHAR(i)

    FROM [Test].[dbo].[Tally]

    where CHAR(i) collate latin1_general_ci_as between '0' and 'Z'

    and i<256

    order by CHAR(i) collate latin1_general_ci_as

    I added the collation specification in the order by clause: this shows the correct order : 0, 1/4, ...etc.

    btw: my first answer was wrong, nice question!

  • Strange,

    when i run the query that's in my previous post (with the "order by char(i) collate latin1_general_ci_as") i

    have the order .... g G H h ...

    Sometimes the capital is listed before and sometimes after ....

    What am I missing here ?? :w00t:

  • Right! it's collate latin1_general_ci_as - Case INsensitive... (sorry for posting previous post too soon! )

  • L' Eomot Inversé (11/1/2012)


    Actually, I suspect that the correct way to write the condition, because a collate clause applies only to a single column or constant or variable, is

    where CHAR(I) COLLATE Latin1_General_CI_AS

    BETWEEN '0' COLLATE Latin1_General_CI_AS

    AND 'Z' COLLATE Latin1_General_CI_AS

    and I < 256

    That would (I hope) ensure that default collations are not used anywhere, so whatever the server default and the database default are the query will always deliver the same answer.

    Incorrect. Try the following script and you'll get 62 characters:

    CREATE DATABASE qod_collation_db COLLATE Japanese_CI_AS_KS_WS;

    GO

    USE qod_collation_db;

    GO

    WITH Numbers AS

    ( SELECT 0 AS n

    UNION ALL

    SELECT n + 1

    FROM Numbers

    WHERE n < 255

    )

    SELECT n AS code, CHAR(n) AS symbol

    FROM Numbers AS n

    WHERE CHAR(n) COLLATE Latin1_General_CI_AS BETWEEN '0' COLLATE Latin1_General_CI_AS AND 'Z' COLLATE Latin1_General_CI_AS

    OPTION(MAXRECURSION 255);

    GO

    USE master;

    GO

    DROP DATABASE qod_collation_db;

    GO

    What's interesting if we execute 'SELECT CHAR(n)' on a Japanese_CI_AS_KS_WS database we'll get NULLs for the values of 'n' from 129 to 159 and from 224 to 252.

    Another example (sorry for using a non-Latin character here; I don't know how you will see that character – in its original form, as a rectangle or something else; actually I'm using a cyrillic letter http://en.wikipedia.org/wiki/Tse_(Cyrillic)):

    CREATE DATABASE qod_collation_db COLLATE Cyrillic_General_CI_AS;

    GO

    USE qod_collation_db;

    GO

    SELECT

    CHAR(246) AS Cyrillic,

    CHAR(246) COLLATE Latin1_General_CI_AS AS Latin,

    CHAR(246) COLLATE Japanese_CI_AS_KS_WS AS Japanese;

    GO

    USE master;

    GO

    DROP DATABASE qod_collation_db;

    GO

    Here is the result set:

    Cyrillic Latin Japanese

    -------- ----- --------

    ? ?

    (Note that the 'Japanese' value is a whitespace.)

  • Great question Tom, thanks!

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • 😎 (actually I used extreme advanced skill to make the choice, and I still got it wrong.... but got to learn on load of things)

    Awesome question. 🙂 Thank you Tom

    ww; Raghu
    --
    The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

Viewing 15 posts - 16 through 30 (of 34 total)

You must be logged in to reply to this topic. Login to reply