Natural Sort Order

  • Microsoft have said over here on the Azure feedback portal that they'll add natural sorting as a built in feature if it gets enough votes.

  • peter.aylett - Monday, November 19, 2018 4:10 PM

    Microsoft have said over here on the Azure feedback portal that they'll add natural sorting as a built in feature if it gets enough votes.

    I wouldn't count on it.  The request for a sequence (Tally Table-like) generator at the machine language level was submitted on 1 Feb 2008.  It has 287 votes and has been in the "Unplanned" status since.
    https://feedback.azure.com/forums/908035-sql-server/suggestions/32890519-add-a-built-in-table-of-numbers

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Have a look at the functions that was a team effort at the following link.
    http://www.sqlservercentral.com/scripts/String+Function/141686/

    The "DigitsOnlyEE" function was actually developed on a different thread.  Big hat's off to Eirikur Erikson, which is what the "EE" in the name of the function stands for.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Monday, November 19, 2018 8:44 PM

    Have a look at the functions that was a team effort at the following link.
    http://www.sqlservercentral.com/scripts/String+Function/141686/

    The "DigitsOnlyEE" function was actually developed on a different thread.  Big hat's off to Eirikur Erikson, which is what the "EE" in the name of the function stands for.

    That still isn't going to allow numbers to sort like numbers. for example "009" will sort as a lower number than "3" and "11111" will sort as a lower number than "8".
    The only way, that I've found, is to break up the string into columns at every transition between number and non-number... and then ordering by the resulting columns.
    Here's a function I wrote for someone on SO last week... (Natural (human alpha-numeric) sort in Microsoft SQL Server)

    SET QUOTED_IDENTIFIER ON
    SET ANSI_NULLS ON
    GO

    CREATE FUNCTION dbo.tfn_SplitForSort
    /* ===================================================================
    11/11/2018 JL, Created: Comments    
    =================================================================== */
    --===== Define I/O parameters
    (
        @string    VARCHAR(8000)
    )
    RETURNS TABLE WITH SCHEMABINDING AS
    RETURN
        WITH
            cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),
            cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
            cte_Tally (n) AS (
                SELECT TOP (LEN(@string))
                    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
                FROM
                    cte_n2 a CROSS JOIN cte_n2 b
                ),
            cte_split_string AS (
                SELECT
                    col_num = ROW_NUMBER() OVER (ORDER BY t.n) + CASE WHEN LEFT(@string, 1) LIKE '[0-9]' THEN 0 ELSE 1 END,
                    string_part = SUBSTRING(@string, t.n, LEAD(t.n, 1, 8000) OVER (ORDER BY t.n) - t.n)
                FROM
                    cte_Tally t
                    CROSS APPLY ( VALUES (SUBSTRING(@string, t.n, 2)) ) s (str2)
                WHERE
                    t.n = 1
                    OR SUBSTRING(@string, t.n - 1, 2) LIKE '[0-9][^0-9]'
                    OR SUBSTRING(@string, t.n - 1, 2) LIKE '[^0-9][0-9]'
                )

        SELECT
            so_01 = ISNULL(MAX(CASE WHEN ss.col_num = 1 THEN CONVERT(FLOAT, ss.string_part) END), 99999999),
            so_02 = MAX(CASE WHEN ss.col_num = 2 THEN ss.string_part END),
            so_03 = MAX(CASE WHEN ss.col_num = 3 THEN CONVERT(FLOAT, ss.string_part) END),
            so_04 = MAX(CASE WHEN ss.col_num = 4 THEN ss.string_part END),
            so_05 = MAX(CASE WHEN ss.col_num = 5 THEN CONVERT(FLOAT, ss.string_part) END),
            so_06 = MAX(CASE WHEN ss.col_num = 6 THEN ss.string_part END),
            so_07 = MAX(CASE WHEN ss.col_num = 7 THEN CONVERT(FLOAT, ss.string_part) END),
            so_08 = MAX(CASE WHEN ss.col_num = 8 THEN ss.string_part END),
            so_09 = MAX(CASE WHEN ss.col_num = 9 THEN CONVERT(FLOAT, ss.string_part) END),
            so_10 = MAX(CASE WHEN ss.col_num = 10 THEN ss.string_part END)
        FROM
            cte_split_string ss;
    GO

    It works by putting numeric strings into into the odd columns and non-numerics into the even columns... and then converting the odd columns to floats for sorting.
    The obvious down side is that, a) as written, it only looks at the 1st 10 transitions and b) it's a little cumbersome having to add 10 columns to the outer ORDER BY clause.
    I haven't torture tested it yet but so far it's provided the correct sort for the stings I've thrown at it and it seems to perform well (if you ignore the sort operation that uses it).

  • Jason A. Long - Monday, November 19, 2018 11:09 PM

    Jeff Moden - Monday, November 19, 2018 8:44 PM

    Have a look at the functions that was a team effort at the following link.
    http://www.sqlservercentral.com/scripts/String+Function/141686/

    The "DigitsOnlyEE" function was actually developed on a different thread.  Big hat's off to Eirikur Erikson, which is what the "EE" in the name of the function stands for.

    That still isn't going to allow numbers to sort like numbers. for example "009" will sort as a lower number than "3" and "11111" will sort as a lower number than "8".
    The only way, that I've found, is to break up the string into columns at every transition between number and non-number... and then ordering by the resulting columns.
    Here's a function I wrote for someone on SO last week... (Natural (human alpha-numeric) sort in Microsoft SQL Server)

    SET QUOTED_IDENTIFIER ON
    SET ANSI_NULLS ON
    GO

    CREATE FUNCTION dbo.tfn_SplitForSort
    /* ===================================================================
    11/11/2018 JL, Created: Comments    
    =================================================================== */
    --===== Define I/O parameters
    (
        @string    VARCHAR(8000)
    )
    RETURNS TABLE WITH SCHEMABINDING AS
    RETURN
        WITH
            cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),
            cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
            cte_Tally (n) AS (
                SELECT TOP (LEN(@string))
                    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
                FROM
                    cte_n2 a CROSS JOIN cte_n2 b
                ),
            cte_split_string AS (
                SELECT
                    col_num = ROW_NUMBER() OVER (ORDER BY t.n) + CASE WHEN LEFT(@string, 1) LIKE '[0-9]' THEN 0 ELSE 1 END,
                    string_part = SUBSTRING(@string, t.n, LEAD(t.n, 1, 8000) OVER (ORDER BY t.n) - t.n)
                FROM
                    cte_Tally t
                    CROSS APPLY ( VALUES (SUBSTRING(@string, t.n, 2)) ) s (str2)
                WHERE
                    t.n = 1
                    OR SUBSTRING(@string, t.n - 1, 2) LIKE '[0-9][^0-9]'
                    OR SUBSTRING(@string, t.n - 1, 2) LIKE '[^0-9][0-9]'
                )

        SELECT
            so_01 = ISNULL(MAX(CASE WHEN ss.col_num = 1 THEN CONVERT(FLOAT, ss.string_part) END), 99999999),
            so_02 = MAX(CASE WHEN ss.col_num = 2 THEN ss.string_part END),
            so_03 = MAX(CASE WHEN ss.col_num = 3 THEN CONVERT(FLOAT, ss.string_part) END),
            so_04 = MAX(CASE WHEN ss.col_num = 4 THEN ss.string_part END),
            so_05 = MAX(CASE WHEN ss.col_num = 5 THEN CONVERT(FLOAT, ss.string_part) END),
            so_06 = MAX(CASE WHEN ss.col_num = 6 THEN ss.string_part END),
            so_07 = MAX(CASE WHEN ss.col_num = 7 THEN CONVERT(FLOAT, ss.string_part) END),
            so_08 = MAX(CASE WHEN ss.col_num = 8 THEN ss.string_part END),
            so_09 = MAX(CASE WHEN ss.col_num = 9 THEN CONVERT(FLOAT, ss.string_part) END),
            so_10 = MAX(CASE WHEN ss.col_num = 10 THEN ss.string_part END)
        FROM
            cte_split_string ss;
    GO

    It works by putting numeric strings into into the odd columns and non-numerics into the even columns... and then converting the odd columns to floats for sorting.
    The obvious down side is that, a) as written, it only looks at the 1st 10 transitions and b) it's a little cumbersome having to add 10 columns to the outer ORDER BY clause.
    I haven't torture tested it yet but so far it's provided the correct sort for the stings I've thrown at it and it seems to perform well (if you ignore the sort operation that uses it).

    Of course it will help someone sort by numbers. 😉 It says right in the comments that the results are character based but it's an easy thing to do a convert for number sorts.  Agreed that it's not so straight for a NO_Sort. especially since it returns ALL of the digits even if interleaved with Alpha Characters.  The "Edge Detectors" you built into your WHERE clause in your code take care of that nuance if it needs to be taken care of.  It's still not clear on this thread what someone wants to do on this thread if there's more than one number "field" within any given string.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Tuesday, November 20, 2018 7:31 AM

    Of course it will help someone sort by numbers. 😉 It says right in the comments that the results are character based but it's an easy thing to do a convert for number sorts.  Agreed that it's not so straight for a NO_Sort. especially since it returns ALL of the digits even if interleaved with Alpha Characters.  The "Edge Detectors" you built into your WHERE clause in your code take care of that nuance if it needs to be taken care of.  It's still not clear on this thread what someone wants to do on this thread if there's more than one number "field" within any given string.

    Jeff - I can't tell if you're agreeing with me or making the case that a solution that only works under certain, very specific, conditions is preferable. 😀

  • Jason A. Long - Tuesday, November 20, 2018 1:16 PM

    Jeff Moden - Tuesday, November 20, 2018 7:31 AM

    Of course it will help someone sort by numbers. 😉 It says right in the comments that the results are character based but it's an easy thing to do a convert for number sorts.  Agreed that it's not so straight for a NO_Sort. especially since it returns ALL of the digits even if interleaved with Alpha Characters.  The "Edge Detectors" you built into your WHERE clause in your code take care of that nuance if it needs to be taken care of.  It's still not clear on this thread what someone wants to do on this thread if there's more than one number "field" within any given string.

    Jeff - I can't tell if you're agreeing with me or making the case that a solution that only works under certain, very specific, conditions is preferable. 😀

    Sorry.  I was totally agreeing with the concept of the "edge detectors" you built into the code.  But the OP hasn't stated if they want something like 123AB456 to be sorted as 123456 and then AB, 123 then AB, 456 then AB, or AB first in all of those cases.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Tuesday, November 20, 2018 9:19 PM

    Sorry.  I was totally agreeing with the concept of the "edge detectors" you built into the code.  But the OP hasn't stated if they want something like 123AB456 to be sorted as 123456 and then AB, 123 then AB, 456 then AB, or AB first in all of those cases.

    Given that this is a 14 year old tread... I'm not holding out for any additional clarification from the OP. I'm pretty sure we're in the clear to make our own reasonable assumptions.
    So, given your example of "123AB456"... I can't think of a real world example where it would make sense to recombobulate "123" and "456" into "123456" and then treat it like the number 123,456.
    I can, however, see pulling out 123 as a number, AB, as a text string followed by 456 as a number.

    For example... To me, this makes sense...

    Family Guy S2_E3     ('Family Guy S' / 2 / '_E' / 3)
    Family Guy S2_E10     ('Family Guy S' / 2 / '_E' / 10)
    Family Guy S11_E6    ('Family Guy S' / 11 / '_E' / 6 )

    But this doesn't...

    Family Guy S2_E3     (23 / 'Family Guy S_E')
    Family Guy S11_E6     (116 / 'Family Guy S_E')
    Family Guy S2_E10     (210 / 'Family Guy S_E')

    Of course, the mere fact that I can't think of a legitimate use case for the one doesn't mean there isn't one. So, as always, I'm open to having my mind changed.

  • Jason A. Long - Tuesday, November 20, 2018 10:50 PM

    Of course, the mere fact that I can't think of a legitimate use case for the one doesn't mean there isn't one. So, as always, I'm open to having my mind changed.

    Precisely my dilemma... even though the original post goes back 14 years, I was hoping that the OP may still have his "ears on" and maybe provide some clarification as to what the use case was and maybe even what he ended up doing because this IS an interesting problem.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply