Natural Sort Order

  • Hello.

    I have a table that has a column of nvarchar that has rows containing both numbers and letters... Ex: "15B" or "A15"

    How can i sort this column so that numbers are compared based on their numerical value and not their textual value?

    I really need the value "2" to come before "12"

    Any ideas?

    Thanks a lot!

  • Will the data always be in the formats that you showed us?

    ie, there won't be a value of 'AB2' or '3A4', but always a number followed by an alpha, or an alpha followed by a number?

    Steve

  • Unfortunately i cant make any assumptions about the format. Basically any number or letter combination is valid.

     

    Thanks!

  • Are the values in order in the table.  If so, maybe there is a key column you can sort off of.

  • Heh. Another no.

    Unfortunately all the nice little tricks that work for sorting data (that i know of anyway) are not applicable.

    Other than doing some kind of crazy conversion of the entire text value to an integer value and stashing it in another column to base sorting on, the actual comparison must come from a natural order comparison.

    I've seen the shell function StrCmpLogicalW (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/shellcc/platform/shell/reference/shlwapi/string/strcmplogicalw.asp) thrown around but i am not quite sure how i can tap into this shell function from Sql Server and perform the sort.

    By the way, the above mentioned function is what windows uses to base sorting of filenames off... So that, in theory would be able to tell if "Picture2.gif" would come before "Picture10.gif"

  • Hi, this is not to difficult if the numbers are only in the front.

    ~~begin Sample~~

    Create table #temp (value varchar(20))

    Insert into #Temp (value) values ('15')

    Insert into #Temp (value) values ('15B')

    Insert into #Temp (value) values ('151')

    Insert into #Temp (value) values ('A13')

    Insert into #Temp (value) values ('14B')

    Insert into #Temp (value) values ('141')

    --Fisrt Sample

    Select * from #Temp

    order by right('000000000000000000' + value , 20)

    --Second Sample

    Select value , IsNumeric(value) IsNumber from #Temp

    order by Isnumber DESC , right('000000000000000000' + value , 20)

    --Third Sample (INCORRECT NUMBERING)

    Select value , IsNumeric(value) IsNumber from #Temp

    order by Isnumber DESC , value

    --Third Sample (INCORRECT NUMBERING) end

    Drop table #Temp

    ~~end Sample~~

    Regards

       Anton

  • krypton,

    you say you need 2 before 12... that's something I can understand - but do you also need 2B9 before 12A?

    I'm afraid that it is almost impossible to catch all possible combinations correctly... Do you have some detailed description of the ordering? The condition as stated in the post is IMHO not clear enough to create any SQL that would work as required.

    cheers, Vladan

  • create function leftNumberPart(@code varchar(10))
    returns int
    as
    begin
     declare @number varchar(10)
     declare @position tinyint
     set @position = 1
     set @number = ''
     while @position < len(@code)
     begin
      if isnumeric(substring(@code, @position, 1)) = 1
       set @number = @number + substring(@code, @position, 1)
      else
       break
      set @position = @position + 1
     end
     if len(@number) = 0
      set @number = 2147483647 -- i've set this to the max number for an int - only matters if you want alpha starting codes to come after numeric starting codes
     return cast(@number as int)
    end
    select *, dbo.leftNumberPart(code) from testorder
    order by dbo.leftNumberPart(code), code
    

    Not great, I know, will probably be slow to sort on large datasets.

    Dave Hilditch.

  • You can use a combination of SUBSTRING and PATINDEX to get to the first number in the string.

    PATINDEX accepts ranges as a search expression in the form ' %[from-to]%'. This can be used

    as the start parameter of SUBSTRING so

    SELECT SUBSTRING(CharColumn,                     

               PATINDEX('%[0-9]%', CharColumn),       -- Displacement of first number

               LEN(CharColumn)                        --  Length of column

    FROM    UserTable

    WHERE  PATINDEX('%[0-9]%', CharColumn) > 0       -- Checks there is a number

    will return the number at the start of the result. This should make subsequent processing a bit easier.

  • If you are using SQL 2K and performance is not an issue,

    you can do this:

    SELECT *, SortColumn = Cast(b as sql_variant)

    FROM   Table1

    ORDER BY SortColumn

     

    Igor

  • This is bit complex but it seems to work . . .

    Create table #Temp1 (value varchar(20))

    Insert into #Temp1 (value) values ('15A')

    Insert into #Temp1 (value) values ('15B')

    Insert into #Temp1 (value) values ('151')

    Insert into #Temp1 (value) values ('A13')

    Insert into #Temp1 (value) values ('14B')

    Insert into #Temp1 (value) values ('141')

    Create table #Temp2 (numb int, prefix varchar(20)NULL, postfix varchar(20) NULL)

    DECLARE @AlphaNumb nchar(20)

    DECLARE @Pos1 int

    DECLARE @Pos2 int

    DECLARE @Length int

    DECLARE @Start int

    DECLARE @Prefix nchar(20)

    DECLARE @postfix nchar(20)

    DECLARE NumericSort_Cursor CURSOR FOR SELECT value FROM #Temp1

    OPEN NumericSort_Cursor

    FETCH NEXT FROM NumericSort_Cursor INTO @AlphaNumb

    SET @Length = LEN(@AlphaNumb)

    SET @Start = 1

    SET @Pos2 = 0

    -- find the position first numeric digit

    WHILE @Start <= @Length

     BEGIN

      IF ISNUMERIC(SUBSTRING(@AlphaNumb, @Start,1)) = 1

       BEGIN

        SET @Pos1 = @Start

        BREAK

       END

      ELSE

       SET @Start = @Start + 1

     END

    --increment the start position

    SET @Start = @Start + 1

    --find the position of the next non numeric digit numeric digit, if it exist

    WHILE @Start <= @Length

     BEGIN

      IF ISNUMERIC(SUBSTRING(@AlphaNumb, @Start,1)) = 0

       BEGIN

        SET @Pos2 = @Start - 1

        BREAK

       END

      ELSE

       BEGIN

        SET @Pos2 = @Start

        SET @Start = @Start + 1

       END

     END

    --PRINT CONVERT(nvarchar(20), @Pos1) + ', ' + CONVERT(nvarchar(20), @Pos2)

    IF @Pos1 > 1

     SET @Prefix = CONVERT(nchar(20), @Pos1 - 1)

    ELSE

     SET @Prefix = NULL

    IF @Pos2 < @Length

     SET @postfix = @Length

    ELSE

     SET @postfix = NULL

    IF (ISNULL(@Prefix, 0) = 0 AND ISNULL(@Postfix, 0) = 0)

     INSERT INTO #Temp2 (numb, prefix, postfix) VALUES(SUBSTRING(@AlphaNumb, @Pos1, @Pos2),NULL,NULL)

    IF (ISNULL(@Prefix, 0) <> 0 AND ISNULL(@Postfix, 0) = 0)

     INSERT INTO #Temp2 (numb, prefix, postfix) VALUES(SUBSTRING(@AlphaNumb, @Pos1, @Pos2),SUBSTRING(@AlphaNumb, @Pos1 - 1, 1),NULL)

    IF (ISNULL(@Prefix, 0) = 0 AND ISNULL(@Postfix, 0) <> 0)

     INSERT INTO #Temp2 (numb, prefix, postfix) VALUES(SUBSTRING(@AlphaNumb, @Pos1, @Pos2),NULL,SUBSTRING(@AlphaNumb, @Pos2 + 1, 1))

    IF (ISNULL(@Prefix, 0) <> 0 AND ISNULL(@Postfix, 0) <> 0)

     INSERT INTO #Temp2 (numb, prefix, postfix) VALUES(SUBSTRING(@AlphaNumb, @Pos1, @Pos2),SUBSTRING(@AlphaNumb, @Pos1 - 1, 1),SUBSTRING(@AlphaNumb, @Pos2 + 1, 1))

    WHILE @@FETCH_STATUS = 0

     BEGIN

      FETCH NEXT FROM NumericSort_Cursor INTO @AlphaNumb

      

      SET @Length = LEN(@AlphaNumb)

      SET @Start = 1

      SET @Pos2 = 0

      -- find the position first numeric digit

      WHILE @Start <= @Length

      

       BEGIN

        IF ISNUMERIC(SUBSTRING(@AlphaNumb, @Start,1)) = 1

         BEGIN

          SET @Pos1 = @Start

          BREAK

         END

        ELSE

         SET @Start = @Start + 1

      

       END

      --increment the start position

      SET @Start = @Start + 1

      --find the position of the next non numeric digit numeric digit, if it exist

      

      WHILE @Start <= @Length

      

       BEGIN

        IF ISNUMERIC(SUBSTRING(@AlphaNumb, @Start,1)) = 0

         BEGIN

          SET @Pos2 = @Start - 1

          BREAK

         END

        ELSE

         BEGIN

          SET @Pos2 = @Start

          SET @Start = @Start + 1

         END

      

       END

      --PRINT CONVERT(nvarchar(20), @Pos1) + ', ' + CONVERT(nvarchar(20), @Pos2)

      IF @Pos1 > 1

       SET @Prefix = CONVERT(nchar(20), @Pos1 - 1)

      ELSE

       SET @Prefix = NULL

      IF @Pos2 < @Length

       SET @postfix = @Length

      ELSE

       SET @postfix = NULL

      IF @@FETCH_STATUS <> -1

       BEGIN

      

       IF (ISNULL(@Prefix, 0) = 0 AND ISNULL(@Postfix, 0) = 0)

        INSERT INTO #Temp2 (numb, prefix, postfix) VALUES(SUBSTRING(@AlphaNumb, @Pos1, @Pos2),NULL,NULL)

       IF (ISNULL(@Prefix, 0) <> 0 AND ISNULL(@Postfix, 0) = 0)

        INSERT INTO #Temp2 (numb, prefix, postfix) VALUES(SUBSTRING(@AlphaNumb, @Pos1, @Pos2),SUBSTRING(@AlphaNumb, @Pos1 - 1, 1),NULL)

       IF (ISNULL(@Prefix, 0) = 0 AND ISNULL(@Postfix, 0) <> 0)

        INSERT INTO #Temp2 (numb, prefix, postfix) VALUES(SUBSTRING(@AlphaNumb, @Pos1, @Pos2),NULL,SUBSTRING(@AlphaNumb, @Pos2 + 1, 1))

       IF (ISNULL(@Prefix, 0) <> 0 AND ISNULL(@Postfix, 0) <> 0)

        INSERT INTO #Temp2 (numb, prefix, postfix) VALUES(SUBSTRING(@AlphaNumb, @Pos1, @Pos2),SUBSTRING(@AlphaNumb, @Pos1 - 1, 1),SUBSTRING(@AlphaNumb, @Pos2 + 1, 1))

      END

     END

    CLOSE NumericSort_Cursor

    DEALLOCATE NumericSort_Cursor

    SELECT ISNULL(prefix, '') + RTRIM(CONVERT(nchar(20), numb)) + ISNULL(postfix, '') AS numbs FROM #Temp2 ORDER BY numb ASC

    Drop table #Temp1

    Drop table #Temp2

  • Bob, your code is the second real attempt to solve this problem. However, it has two problems:

    1. The code uses cursors (which makes it big and slow)

    2. You assume that there is exactly one number in the string.

    Dave Hilditch also gave it a try, but his code assumes that the number is at the beginning and chokes on inputs like '1-2' or '12.A'.

    I have tried to solve this problem using a single SELECT query, and here it is:

    SELECT Value, Prefix, 
     CONVERT(numeric(38,0),LEFT(Reminder, 
      CASE WHEN PATINDEX('%[^0-9]%',Reminder)=0 
      THEN LEN(Reminder) 
      ELSE PATINDEX('%[^0-9]%',Reminder)-1 
      END)) AS Number,
     SUBSTRING(Reminder,
      CASE WHEN PATINDEX('%[^0-9]%',Reminder)<>0 
      THEN PATINDEX('%[^0-9]%',Reminder) 
      END, LEN(Reminder)) as Postfix
    FROM (
     SELECT Value,
      CASE WHEN Value LIKE '[^0-9]%'
      THEN LEFT(Value,
       CASE WHEN PATINDEX('%[0-9]%',Value)=0
       THEN LEN(Value) 
       ELSE PATINDEX('%[0-9]%',Value)-1 
       END) 
      END AS Prefix, 
      CASE WHEN Value LIKE '[^0-9]%'
      THEN SUBSTRING(Value,
       CASE WHEN PATINDEX('%[0-9]%',Value)<>0 
       THEN PATINDEX('%[0-9]%',Value) 
       END, LEN(Value)) 
      ELSE Value 
      END AS Reminder
     FROM #Temp1
    ) A ORDER BY Number, Prefix, Postfix

    This query is also based on the assumption that there is only one number in the string. If there is more than one number, only the first one is compared by value (the others are compared as strings). If there is a known number of numbers in the string, this query could be expanded to sort each of them, but it could get ugly if there may be more than a few numbers in a string.

    Krypton, are you still here ? If you want us to continue trying to solve this problem, please give us a more detailed specification, including sample data and expected results.

    For example, what is the expected ordering of the following values:

    10AB7

    10A30

    10-3

    B200C

    B7A20

    B7A100

    10B15

    B200C1

    100e4

    10.3

    10.23

    10.3.3

    10.3.17

    Is there a maximum number of characters ?

    Razvan

  • if you look at the following using the pubs database

    select au_id from authors

    order by null

    seems pretty redundant but it is just to illustrate what happens when an expression returns null in an order by clause the output is the same as

    select au_id from authors

    Now try an expression

    select au_id from authors

    order by substring(au_id,5,2)

    works just fine as does the following...

    select au_id from authors

    order by substring(au_id,5,2),null

    see what I mean....

    select au_id from authors

    order by substring(au_id,5,2),substring(au_id,1,3),substring(au_id,8,3)

    select au_id from authors

    order by substring(au_id,5,2),null,null

     

    you can see that you can order by an expression and that nulls have no effect in their place in the order

    so there are a number of solutions possible

    the simple solutiion where there is one number

    write a udf lets call this function simplefunc(myField) that takes your data field, removes the number from it and returns that number as an integer - put that udf in the order by clause - and if all the data is the simple case then problem solved.

    Now for many numbers it depends on what you want i.e. do you want the data to be ordered in the number order within the field as well as by the field or do you want the data to be sorted in number position within the field as well as by the field.

    If you scan your data to determine what are the maximum number of 'numbers' in the string fields you can construct an

    order by myfunc(mydata,1), myfunc(mydata,2),.....

    and run it against all of the data in the field - it will then order your column and by the order of the numbers appearing in the field and it will not matter whether there are multiple numbers or not because null reults have no effect in their position of an order by clause

     

    Is this clear? It has been a very long day

  • Razvan Socol,

    Yes, I agree that using the cursor is heavy and slow. 

    Since this problem is a little short on business rules (it doesn't have any) and we really don't know the maximum length of the string or the positions to expect numeric or alpha characters in the string, this may be beyond any eloquent solutiion using TSQL.

    An alternative here may be to use a another programming language that would use an array of arrays and store each character in it's original order inside the referenced array at each subscript.  Then procede to sort the base array (based on the values in the referenced array) using an index array as the base arrays subscripts.  This way the actual values represented in the referenced arrays don't have to be moved (much faster).  Once sorted the data could be written back to a table in the desired order. 

    Does anyone hear ActiveX Script task here?

    Example:

    ' A module-level variable

    Dim alphnumbs(1 To NUMBER_OF RECORDS) As Variant

    ' Add a character from the aphla numeric string to the array of arrays at position in the array.

    Sub AddNewAlphNumb(pos As Integer, alphnumbchar As String)

        Dim arr As Variant

        If IsEmpty(apps(pos)) Then

            ' This is the first character in the array.

            alphnumbs(pos) = Array(alphnumbchar)

        Else

            ' Add the character to the existing array.

            arr = alphnumbs(pos)

            ReDim Preserve arr(0 To UBound(arr) + 1) As Variant

            arr(UBound(arr)) = alphnumbchar

            alphnumbs(pos) = arr

        End If

    End Sub

    ' Change the value for the Nth character in the string

    ' at position in the base array.

    alphnumbs(pos)(n) = alphnumbchar

  • Correction . . .

    If IsEmpty(alphnumbs(pos)) Then

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply