Checking sort order and fixing it


  • CREATE TABLE #tblTasks
    (
            Sort
    ID int,
            Task varchar(50)

            )

    GO

    INSERT INTO #tblTasks (SortID, Task)
    SELECT 1, 'run' UNION ALL
    SELECT
    2, 'jump' UNION ALL
    SELECT
    3, 'hop' UNION ALL
    SELECT 3
    , 'skip' UNION ALL
    SELECT 4
    , 'rest' UNION ALL
    SELECT 4
    , 'sleep'

    The data in #tblTasks should have consecutive SortIDs.

    First question is - how can I tell that the SortIDs are not consecutive? The Sort Order has nothing to do with the varchar data in the Task column. That is irrelevant. Whenever I return this particular set of data, I need to check that the sort order has not gone wrong. In this example the rows should have SortIDs like ... 1;2;3;4;5;6
    But, let's say it has gone wrong and it has become 1,2,3,3,4,4

    First I need to test if it has gone wrong and, if it has, fix it - just as it is. i.e.

    The SortID of 3 in the fourth row should become 4
    The SortID of 4 in the fifth row should become 5
    The SortID of 4 in the sixth row should become 6

    which will turn it back to 1,2,3,4,5,6

    Thank you for any help.

  • The first problem is, the 4th row by what? Tables have no defined order, so saying the 4th row in the table is a meaningless statement.

    You're probably better off putting an identity column onto the table, or having a date column that defaults to getdate to track insert order (if the row order means the  order inserted), and then using Row_Number() to generate that sequence when you query the table.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gail is correct.   The problem here is that there's nothing else in the table on which you can base any form of sort on, other than the column that is already the problem.   As indicated, tables don't have any kind of inherent knowledge that says which row was inserted in what order.   That kind of information has to be preserved by doing as Gail has suggested, by adding either an IDENTITY column, or a datetime column with a default value of GETDATE().   Such a column, however, can't do anything about existing data, because any knowledge of the existing proper sort value has already been lost, and is not recoverable by any means other than a human being making a decision that chooses a value for the sort ordering column value.   Because of that, it is probably easier to use an IDENTITY column.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • GilaMonster - Wednesday, March 28, 2018 6:31 AM

    The first problem is, the 4th row by what? Tables have no defined order, so saying the 4th row in the table is a meaningless statement.

    You're probably better off putting an identity column onto the table, or having a date column that defaults to getdate to track insert order (if the row order means the  order inserted), and then using Row_Number() to generate that sequence when you query the table.

    Thank you for your reply. The real table has an identity column but, in terms of sorting, it has no meaning. Nor, really, does anything else. Because users can add rows to the table (or, in fact, a subset of the data in the table) and can change the sort order themselves. Despite my apparently fruitless attempts, they are still managing to get duplicate Sort IDs. They might be looking at 150 rows (which are tasks they have to do) and decide to add one and make it the 42nd task by setting the SortID to 42. When adding the step, I increment every task's SortID that is greater than, for this example, 42. And when they move a step from say, 27 to 51, I decrement or increment as necessary. But, as I say, and I haven't worked out why yet because whenever I test it works okay, users are managing to duplicate SortIDs. At which point all I want to do is ... if there are two SortIDs of, say, 21, I want to move one of them (don't care which one) to 22 and increment any that are later - bearing in mind that if there were two, say, 47s, one would become 48 and so on.

    At the moment I am joining on the identity column to the same data set - with a derived table that has Row_Number() as a column and am updating the SortID to that - which works and gives me what I want. Not sure if it is the best way to do it - and still can't work out how to identify which subsets of data have duplicated - or out of sequence - SortIDs. (If I sort by SortID and it was, for example, 1,2,3,5,6 - I need to spot that so I can run an update.

  • webskater - Wednesday, March 28, 2018 9:33 AM

    GilaMonster - Wednesday, March 28, 2018 6:31 AM

    The first problem is, the 4th row by what? Tables have no defined order, so saying the 4th row in the table is a meaningless statement.

    You're probably better off putting an identity column onto the table, or having a date column that defaults to getdate to track insert order (if the row order means the  order inserted), and then using Row_Number() to generate that sequence when you query the table.

    Thank you for your reply. The real table has an identity column but, in terms of sorting, it has no meaning. Nor, really, does anything else. Because users can add rows to the table (or, in fact, a subset of the data in the table) and can change the sort order themselves. Despite my apparently fruitless attempts, they are still managing to get duplicate Sort IDs. They might be looking at 150 rows (which are tasks they have to do) and decide to add one and make it the 42nd task by setting the SortID to 42. When adding the step, I increment every task's SortID that is greater than, for this example, 42. And when they move a step from say, 27 to 51, I decrement or increment as necessary. But, as I say, and I haven't worked out why yet because whenever I test it works okay, users are managing to duplicate SortIDs. At which point all I want to do is ... if there are two SortIDs of, say, 21, I want to move one of them (don't care which one) to 22 and increment any that are later - bearing in mind that if there were two, say, 47s, one would become 48 and so on.

    At the moment I am joining on the identity column to the same data set - with a derived table that has Row_Number() as a column and am updating the SortID to that - which works and gives me what I want. Not sure if it is the best way to do it - and still can't work out how to identify which subsets of data have duplicated - or out of sequence - SortIDs. (If I sort by SortID and it was, for example, 1,2,3,5,6 - I need to spot that so I can run an update.

    Are there multiple users making changes to the sort order at the same time?  Seems to me that they should be changing sort order via some kind of application that can keep the database at arm's length, and manage the changes under it's control.   The difficulty is what to do to manage concurrency.   That can be tricky.   However, no matter how the application handles things, the database query that updates might want to hold a table lock in order to force just one update at a time, if the number of people doing this is relatively small..  In the long run, the only good way to solve this problem is with some kind of application that can let the user drag and drop an object that represents a data row and having a unique key separate from the sort identifier, that can then be combined with a generated sequence of numbers within the application, with THAT data then used to update the sequence numbers.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • sgmunson - Wednesday, March 28, 2018 9:46 AM

    webskater - Wednesday, March 28, 2018 9:33 AM

    GilaMonster - Wednesday, March 28, 2018 6:31 AM

    The first problem is, the 4th row by what? Tables have no defined order, so saying the 4th row in the table is a meaningless statement.

    You're probably better off putting an identity column onto the table, or having a date column that defaults to getdate to track insert order (if the row order means the  order inserted), and then using Row_Number() to generate that sequence when you query the table.

    Thank you for your reply. The real table has an identity column but, in terms of sorting, it has no meaning. Nor, really, does anything else. Because users can add rows to the table (or, in fact, a subset of the data in the table) and can change the sort order themselves. Despite my apparently fruitless attempts, they are still managing to get duplicate Sort IDs. They might be looking at 150 rows (which are tasks they have to do) and decide to add one and make it the 42nd task by setting the SortID to 42. When adding the step, I increment every task's SortID that is greater than, for this example, 42. And when they move a step from say, 27 to 51, I decrement or increment as necessary. But, as I say, and I haven't worked out why yet because whenever I test it works okay, users are managing to duplicate SortIDs. At which point all I want to do is ... if there are two SortIDs of, say, 21, I want to move one of them (don't care which one) to 22 and increment any that are later - bearing in mind that if there were two, say, 47s, one would become 48 and so on.

    At the moment I am joining on the identity column to the same data set - with a derived table that has Row_Number() as a column and am updating the SortID to that - which works and gives me what I want. Not sure if it is the best way to do it - and still can't work out how to identify which subsets of data have duplicated - or out of sequence - SortIDs. (If I sort by SortID and it was, for example, 1,2,3,5,6 - I need to spot that so I can run an update.

    Are there multiple users making changes to the sort order at the same time?  Seems to me that they should be changing sort order via some kind of application that can keep the database at arm's length, and manage the changes under it's control.   The difficulty is what to do to manage concurrency.   That can be tricky.   However, no matter how the application handles things, the database query that updates might want to hold a table lock in order to force just one update at a time, if the number of people doing this is relatively small..  In the long run, the only good way to solve this problem is with some kind of application that can let the user drag and drop an object that represents a data row and having a unique key separate from the sort identifier, that can then be combined with a generated sequence of numbers within the application, with THAT data then used to update the sequence numbers.

    There are multiple users of the application. They change the Sort Order of the tasks using a .net based web application. There is clearly something wrong in my logic when I handle their changes to SortID - although I can't find what and have tested many times. Whenever I change the Sort Order it works. They can only change the SortID of one row at a time.

    I have the update covered ... joining to the same data as a derived table using Row_Number is doing that job. It is identifying the data that is wrong that I am stuck on. Let's say a subset of the data in the table has 100 rows. I just want to make sure the SortIDs in that set of data go from 1 to 100 consecutively. I.e. that there are no duplicates and no gaps.

  • Assuming you have something in the table that identifies the task list to which each of those items belongs, identifying problem data should be as simple as doing a ROW_NUMBER partitioned by whatever identifies the task list and ordered by SortID, then returning rows where the ROW_NUMBER<>SortID.

    In your original example, something like this:

    WITH numbered AS (SELECT rn=ROW_NUMBER() OVER (ORDER BY SortID), * FROM #tblTasks)

    SELECT *
    FROM numbered
    WHERE rn<>SortID;

    In the real data, assuming you have multiple task lists in the table and something to identify them, like a ListID, then the OVER clause for the ROW_NUMBER would change to OVER (PARTITION BY ListID ORDER BY SortID), or some such thing.

    Cheers!

  • webskater - Wednesday, March 28, 2018 10:07 AM

    sgmunson - Wednesday, March 28, 2018 9:46 AM

    webskater - Wednesday, March 28, 2018 9:33 AM

    GilaMonster - Wednesday, March 28, 2018 6:31 AM

    The first problem is, the 4th row by what? Tables have no defined order, so saying the 4th row in the table is a meaningless statement.

    You're probably better off putting an identity column onto the table, or having a date column that defaults to getdate to track insert order (if the row order means the  order inserted), and then using Row_Number() to generate that sequence when you query the table.

    Thank you for your reply. The real table has an identity column but, in terms of sorting, it has no meaning. Nor, really, does anything else. Because users can add rows to the table (or, in fact, a subset of the data in the table) and can change the sort order themselves. Despite my apparently fruitless attempts, they are still managing to get duplicate Sort IDs. They might be looking at 150 rows (which are tasks they have to do) and decide to add one and make it the 42nd task by setting the SortID to 42. When adding the step, I increment every task's SortID that is greater than, for this example, 42. And when they move a step from say, 27 to 51, I decrement or increment as necessary. But, as I say, and I haven't worked out why yet because whenever I test it works okay, users are managing to duplicate SortIDs. At which point all I want to do is ... if there are two SortIDs of, say, 21, I want to move one of them (don't care which one) to 22 and increment any that are later - bearing in mind that if there were two, say, 47s, one would become 48 and so on.

    At the moment I am joining on the identity column to the same data set - with a derived table that has Row_Number() as a column and am updating the SortID to that - which works and gives me what I want. Not sure if it is the best way to do it - and still can't work out how to identify which subsets of data have duplicated - or out of sequence - SortIDs. (If I sort by SortID and it was, for example, 1,2,3,5,6 - I need to spot that so I can run an update.

    Are there multiple users making changes to the sort order at the same time?  Seems to me that they should be changing sort order via some kind of application that can keep the database at arm's length, and manage the changes under it's control.   The difficulty is what to do to manage concurrency.   That can be tricky.   However, no matter how the application handles things, the database query that updates might want to hold a table lock in order to force just one update at a time, if the number of people doing this is relatively small..  In the long run, the only good way to solve this problem is with some kind of application that can let the user drag and drop an object that represents a data row and having a unique key separate from the sort identifier, that can then be combined with a generated sequence of numbers within the application, with THAT data then used to update the sequence numbers.

    There are multiple users of the application. They change the Sort Order of the tasks using a .net based web application. There is clearly something wrong in my logic when I handle their changes to SortID - although I can't find what and have tested many times. Whenever I change the Sort Order it works. They can only change the SortID of one row at a time.

    I have the update covered ... joining to the same data as a derived table using Row_Number is doing that job. It is identifying the data that is wrong that I am stuck on. Let's say a subset of the data in the table has 100 rows. I just want to make sure the SortIDs in that set of data go from 1 to 100 consecutively. I.e. that there are no duplicates and no gaps.

    Have them show you exactly how they change the sort order, and if they know how to consistently reproduce the problem.   Then verify by query that the problem exists in the data.  Then you can start the debugging process inside your application.  I'm guessing that preserving the existing sort order in some form may not be happening within the application in some fashion.  These can be tricky to find, but there's an old rule I follow:  NEVER DESTROY YOUR INPUTS...  It's something to look for that might help solve the problem.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Jacob Wilkins - Wednesday, March 28, 2018 10:19 AM

    Assuming you have something in the table that identifies the task list to which each of those items belongs, identifying problem data should be as simple as doing a ROW_NUMBER partitioned by whatever identifies the task list and ordered by SortID, then returning rows where the ROW_NUMBER<>SortID.

    In your original example, something like this:

    WITH numbered AS (SELECT rn=ROW_NUMBER() OVER (ORDER BY SortID), * FROM #tblTasks)

    SELECT *
    FROM numbered
    WHERE rn<>SortID;

    In the real data, assuming you have multiple task lists in the table and something to identify them, like a ListID, then the OVER clause for the ROW_NUMBER would change to OVER (PARTITION BY ListID ORDER BY SortID), or some such thing.

    Cheers!

    That's just the job - thank you. I had variations of the same thing, but when I was comparing to see if rn <> SortID - it kept telling me rn did not exist.

  • Hi,

    ROW_NUMER()  analytical function can be used to generate a new number for each row.

    Select t.*, ROW_NUMBER () over (order by sortid)   RowNumber
    from #tblTasks t

    Hope that helps.!!

    =======================================================================

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply