SQL Sorting Rules???

  • Guys,

    i am sure this is a silly issue however i cant figure this out..

    I need to know why sql server 2005 with collation SQL_Latin1_General_CP1_CI_AS (both on server, DB and table) sorts non-alphanumeric characters in a certain way.

    I am expecting the sorting to be done based on the Unicode value of the first character unicode(substring(upper(col1),1,1).

    However the sorting does not follow this rule at all.. see example below.

    Note that the Unicode value 58 is sorted before 49 and 123 before 49..

    Please help me understanding how the sorting rule is applied by MSSQL!! 🙂

    /****** Object: Table [dbo].[table_3] Script Date: 8/15/2014 3:25:21 PM ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[table_3](

    [col1] [nvarchar](200) NULL

    ) ON [PRIMARY]

    GO

    insert into table_3 (col1)

    select

    '!"#$%&''()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ '

    union all

    select

    '&*(^)PO<>012ABC@'

    union all

    select

    ':;<=>?@ABc1678 '

    union all

    select

    '{}ABC456rtyGHijkl_)('

    union all

    select

    '{ab}c|":eFTY34'

    union all

    select

    '12345abc#$'

    union all

    select

    'ABC123)#$.,/;''ghtf'

    union all

    select

    'abcd678_)(&^'

    union all

    select

    'ae~!#1234DABCTY '

    union all

    select

    'CAYMANABC1d'

    select SERVERPROPERTY(N'Collation')

    SQL_Latin1_General_CP1_CI_AS

    (1 row(s) affected)

    select upper(col1), substring(upper(col1),1,1),UNICODE(substring(upper(col1),1,1)) from table_3 order by upper(col1)

    SQL_Latin1_General_CP1_CI_AS

    (1 row(s) affected)

    -------------------------------------------------- ---- -----------

    !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQR ! 33

    &*(^)PO<>012ABC@ & 38

    :;<=>?@ABC1678 : 58

    {}ABC456RTYGHIJKL_)( { 123

    {AB}C|":EFTY34 { 123

    12345ABC#$ 1 49

    ABC123)#$.,/;'GHTF A 65

    ABCD678_)(&^ A 65

    AE~!#1234DABCTY A 65

    CAYMANABC1D C 67

    (10 row(s) affected)

  • SQL Server doesn't order based on unicode values. The collation defines order to a certain point. Why do you need to understand the order of non-alphanumeric characters?

    Maybe you could use this as a test:

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    )

    SELECT N, NCHAR(N) ch

    FROM cteTally

    ORDER BY ch

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Hi, thank you for your response.

    We need to provide functional specs for unit/business testing. On our documentation we assumed that the sorting was based on Unicode however this is not how the system behaves.

    We need to describe the actual behavior.

    Thanks again.

    Luis Cazares (8/15/2014)


    SQL Server doesn't order based on unicode values. The collation defines order to a certain point. Why do you need to understand the order of non-alphanumeric characters?

    Maybe you could use this as a test:

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    )

    SELECT N, NCHAR(N) ch

    FROM cteTally

    ORDER BY ch

  • You can find some reference here:

    http://msdn.microsoft.com/en-US/library/ms143515(v=sql.90).aspx

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Thanks, saw that but the info provided is not specific or does not reflect the actual behavior.

    Luis Cazares (8/15/2014)


    You can find some reference here:

    http://msdn.microsoft.com/en-US/library/ms143515(v=sql.90).aspx

  • rizzefcazz (8/15/2014)


    Thanks, saw that but the info provided is not specific or does not reflect the actual behavior.

    Luis Cazares (8/15/2014)


    You can find some reference here:

    http://msdn.microsoft.com/en-US/library/ms143515(v=sql.90).aspx

    Why not? It never states that it will sort based on unicode values.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Anyone else with any suggestion?

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply