Grouping Out Of Order

  • Greetings,

    I have a bit of a strange problem. I am trying to find out how to group data together but keep the data in a different order. For example, lets say I have a table in the following format.

    DECLARE @BookLoans table

    (

    BookID int,

    StudentID int,

    LoanDate datetime,

    ReturnDate datetime

    )

    Then load it with the following data:

    INSERT INTO @BookLoans

    SELECT 1, 56, '2010-01-01', '2010-01-07'

    UNION

    SELECT 1, 56, '2010-01-08', '2010-01-10'

    UNION

    SELECT 1, 78, '2010-01-11', '2010-01-18'

    UNION

    SELECT 1, 56, '2010-01-19', '2010-01-23'

    UNION

    SELECT 5, 23, '2010-03-23', '2010-04-01'

    UNION

    SELECT 5, 23, '2010-04-02', '2010-04-07'

    UNION

    SELECT 5, 23, '2010-04-08', '2010-04-26'

    UNION

    SELECT 8, 56, '2010-01-01', '2010-01-07'

    UNION

    SELECT 8, 56, '2010-01-08', '2010-01-10'

    UNION

    SELECT 8, 78, '2010-01-11', '2010-01-18'

    UNION

    SELECT 8, 56, '2010-01-19', '2010-01-23'

    UNION

    SELECT 8, 56, '2010-01-24', '2010-02-14'

    UNION

    SELECT 8, 56, '2010-02-14', '2010-03-22'

    UNION

    SELECT 8, 23, '2010-03-23', '2010-04-01'

    UNION

    SELECT 8, 23, '2010-04-02', '2010-04-07'

    UNION

    SELECT 8, 23, '2010-04-08', '2010-04-26'

    The desire is to list each book in order. Then list each student that had loaned the book and the first LoanDate and the last ReturnDate with in the date sequence that they had sequencially loaned the book. Then do the same for the next student. The final result should look like the following table.

    |---------------------------------------------|

    |BookID | StudentID | LoanStart | LoanEnd |

    |---------------------------------------------|

    | 1 | 56 | 2010-01-01 | 2010-01-10 |

    |---------------------------------------------|

    | 1 | 78 | 2010-01-11 | 2010-01-18 |

    |---------------------------------------------|

    | 1 | 56 | 2010-01-19 | 2010-01-23 |

    |---------------------------------------------|

    | 5 | 23 | 2010-03-23 | 2010-04-26 |

    |---------------------------------------------|

    | 8 | 56 | 2010-01-01 | 2010-01-10 |

    |---------------------------------------------|

    | 8 | 78 | 2010-01-11 | 2010-01-18 |

    |---------------------------------------------|

    | 8 | 56 | 2010-01-19 | 2010-03-22 |

    |---------------------------------------------|

    | 8 | 23 | 2010-03-23 | 2010-04-26 |

    |---------------------------------------------|

    I have tried grouping by StudentID, but that gives the wrong answer as it also needs to group by the LoanDate first. But, this fields is changing with each record so each row becomes its own group. I am currently using a CLR to recursively run through the table in proper order and find the start and end of each section, but the performance is poor.

    Is there a way to retrieve the desired results with a set based SQL script or am I doomed to have to use the "RBAR" method (thanks Jeff Moden).

    Thank you for your time.

    Terry Steadman

  • Here is what I did, and thank you for making the testing easy!

    Select BookID, StudentID, LoanDate, ReturnDate

    From @BookLoans

    Group By BookID, StudentID, LoanDate, ReturnDate

    Order by BookID ASC, LoanDate ASC, ReturnDate ASC

    And the results:

    1 56 2010-01-01 00:00:00.000 2010-01-07 00:00:00.000

    1 56 2010-01-08 00:00:00.000 2010-01-10 00:00:00.000

    1 78 2010-01-11 00:00:00.000 2010-01-18 00:00:00.000

    1 56 2010-01-19 00:00:00.000 2010-01-23 00:00:00.000

    5 23 2010-03-23 00:00:00.000 2010-04-01 00:00:00.000

    5 23 2010-04-02 00:00:00.000 2010-04-07 00:00:00.000

    5 23 2010-04-08 00:00:00.000 2010-04-26 00:00:00.000

    8 56 2010-01-01 00:00:00.000 2010-01-07 00:00:00.000

    8 56 2010-01-08 00:00:00.000 2010-01-10 00:00:00.000

    8 78 2010-01-11 00:00:00.000 2010-01-18 00:00:00.000

    8 56 2010-01-19 00:00:00.000 2010-01-23 00:00:00.000

    8 56 2010-01-24 00:00:00.000 2010-02-14 00:00:00.000

    8 56 2010-02-14 00:00:00.000 2010-03-22 00:00:00.000

    8 23 2010-03-23 00:00:00.000 2010-04-01 00:00:00.000

    8 23 2010-04-02 00:00:00.000 2010-04-07 00:00:00.000

    8 23 2010-04-08 00:00:00.000 2010-04-26 00:00:00.000

  • Greetings g33kspeak,

    Nice try, but this does not summarize the start and end dates for the book loan. It is only listing all records as they have been entered. Since you have grouped on the date fields, each row then became its own group and so you really don't have a group at all, just an order by. If you look at the result table I have included at the end of my question, you will see what the result should look like.

    Thank you for your time.

    Terry Steadman

  • I see... I missed the "totaling" of the time in the initial example... πŸ˜›

    Back to the drawing board!

  • DECLARE @BookLoans table

    (

    BookID int,

    StudentID int,

    LoanDate datetime,

    ReturnDate datetime

    )

    INSERT INTO @BookLoans

    SELECT 1, 56, '2010-01-01', '2010-01-07'

    UNION

    SELECT 1, 56, '2010-01-08', '2010-01-10'

    UNION

    SELECT 1, 78, '2010-01-11', '2010-01-18'

    UNION

    SELECT 1, 56, '2010-01-19', '2010-01-23'

    UNION

    SELECT 5, 23, '2010-03-23', '2010-04-01'

    UNION

    SELECT 5, 23, '2010-04-02', '2010-04-07'

    UNION

    SELECT 5, 23, '2010-04-08', '2010-04-26'

    UNION

    SELECT 8, 56, '2010-01-01', '2010-01-07'

    UNION

    SELECT 8, 56, '2010-01-08', '2010-01-10'

    UNION

    SELECT 8, 78, '2010-01-11', '2010-01-18'

    UNION

    SELECT 8, 56, '2010-01-19', '2010-01-23'

    UNION

    SELECT 8, 56, '2010-01-24', '2010-02-14'

    UNION

    SELECT 8, 56, '2010-02-14', '2010-03-22'

    UNION

    SELECT 8, 23, '2010-03-23', '2010-04-01'

    UNION

    SELECT 8, 23, '2010-04-02', '2010-04-07'

    UNION

    SELECT 8, 23, '2010-04-08', '2010-04-26'

    ;WITH FirstOut AS (

    SELECT

    BookID

    ,StudentID

    ,MIN(LoanDate) AS LoanDate

    FROM

    @BookLoans

    GROUP BY

    BookID

    ,StudentID

    )

    , LastIn AS (

    SELECT

    BookID

    ,StudentID

    ,MAX(ReturnDate) AS ReturnDate

    FROM

    @BookLoans

    GROUP BY

    BookID

    ,StudentID

    )

    SELECT

    f.BookID

    ,f.StudentID

    ,f.LoanDate

    ,l.ReturnDate

    FROM

    FirstOut f

    JOIN

    LastIn l

    ON

    f.BookID = l.BookID

    AND

    f.StudentID = l.StudentID

    John

  • Greetings John,

    Thank you for your reply, but it does not quite answer the problem. Your solution is grouping by the StudentID first, and then getting the Loan and Return dates. This loses the places where the book switched possession from student (56) to student (78) and then back to student (56) and their respective date ranges.

    Terry Steadman

  • Greetings all,

    As a heads up, I do have some code that will return a correct result. The problem that I am having is that it is too slow. If applied against a table with several thousand entries, it will start taking a very long time to process. This leads me to believe that it is using a "RBAR" method of findind the answer. Definitely not good when this script would be called quite often for larger reports.

    The current, but slow, script I have is:

    ;WITH

    BookLoans AS

    (

    SELECT

    BookID,

    StudentID,

    LoanDate,

    ReturnDate,

    ROW_NUMBER() OVER (ORDER BY BookID, LoanDate) AS BookRow_ID,

    DENSE_RANK() OVER (PARTITION BY BookID ORDER BY LoanDate) AS BookOrder_ID

    FROM @BookLoans

    ),

    BookList AS

    (

    SELECT

    BookID,

    StudentID,

    LoanDate,

    ReturnDate,

    BookRow_ID

    FROM BookLoans

    WHERE

    BookOrder_ID = 1

    ),

    BookAnchor AS

    (

    SELECT

    BookID,

    StudentID,

    LoanDate,

    ReturnDate,

    BookRow_ID,

    1 AS BookStudentRow,

    StudentID AS LastStudent

    FROM BookList

    UNION All

    SELECT

    bt.BookID,

    bt.StudentID,

    bt.LoanDate,

    bt.ReturnDate,

    bt.BookRow_ID,

    (

    CASE

    WHEN bt.StudentID <> ba.LastStudent THEN

    ba.BookStudentRow + 1

    ELSE

    ba.BookStudentRow

    END

    ) AS BookStudentRow,

    bt.StudentID AS LastStudent

    FROM BookLoans AS bt

    JOIN BookAnchor AS ba ON

    bt.BookID = ba.BookID

    AND (bt.BookRow_ID - 1) = ba.BookRow_ID

    WHERE

    bt.BookRow_ID > 1

    )

    SELECT

    BookID,

    StudentID,

    MIN(LoanDate) AS LoanStart,

    MAX(ReturnDate) AS LoanEnd

    FROM BookAnchor

    GROUP BY

    BookID,

    BookStudentRow,

    StudentID

    This script is actually a simplified version of the actual tables and data, but the logical problem and solution are still the same.

    Thank you for your help.

    Terry Steadman

  • Out of curiosity..what would you want to show up if this happened:

    Student loans a book 01-01 to 01-07, then rents it again 01-09 to 01-10

    Do you want it to just show up as the 1st to the 10th? Or as two separate, since it wasn't rented on the 8th?

    Also...I don't see any way this can be done in a set.

  • Greetings Derrick,

    In this case, it would show up as 1 line (2010-01-01 to 2010-01-10). I have extra code that would check for vacant or non-loaned lines as needed but for this problem, it would have only caused extra complexity and confusion.

    I tend to agree that it does appear that there isn't a set based solution to this problem. That is sad as I had worked with other SQL databases that could group correctly based on an alternate sort method. I was just hoping that the SQL gurus here would notice my dilema and be able to provide a solution that also works with Microsoft SQL. I still hold hope that there is a way. πŸ˜€

    Thank you for your help.

    Terry Steadman

  • I think the order difference technique should work well here.

    Try the following and let me know how it performs:

    ;WITH BLOrderDif

    AS

    (

    SELECT *

    ,ROW_NUMBER() OVER (PARTITION BY BookID ORDER BY LoanDate)

    - ROW_NUMBER() OVER (PARTITION BY BookID ORDER BY StudentID, LoanDate) AS OrderDiff

    FROM @BookLoans

    )

    , BLRange

    AS

    (

    SELECT BookID, StudentID, OrderDiff, MIN(LoanDate) AS LoanStart, MAX(ReturnDate) AS LoadEnd

    FROM BLOrderDif

    GROUP BY BookID, StudentID, OrderDiff

    )

    SELECT BookID, StudentID, LoanStart, LoadEnd

    FROM BLRange

    ORDER BY BookID, LoanStart

  • I have just had time to look at this again.

    To allow for when a book is not on loan try using a numbers/tally table.

    (I am just generating the numbers table here for convenience.)

    ;WITH Numbers5(N) AS ( SELECT 1 UNION ALL SELECT 0 )

    ,Numbers4(N) AS (SELECT 1 FROM Numbers5 N1 CROSS JOIN Numbers5 N2)

    ,Numbers3(N) AS (SELECT 1 FROM Numbers4 N1 CROSS JOIN Numbers4 N2)

    ,Numbers2(N) AS (SELECT 1 FROM Numbers3 N1 CROSS JOIN Numbers3 N2)

    ,Numbers1(N) AS (SELECT 1 FROM Numbers2 N1 CROSS JOIN Numbers2 N2)

    ,Numbers0(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM Numbers1)

    ,Numbers AS (SELECT N FROM Numbers0 WHERE N <= 1000)

    ,BRange

    AS

    (

    SELECT BookID, MIN(LoanDate) AS LoanDate, MAX(ReturnDate) AS ReturnDate

    FROM @BookLoans

    GROUP BY BookID

    )

    , BL

    AS

    (

    SELECT BookID, StudentID, LoanDate, ReturnDate

    FROM @BookLoans

    UNION ALL

    SELECT R.BookID, -1, R.LoanDate + N.N -1, R.LoanDate + N.N -1

    FROM BRange R

    JOIN Numbers N

    ON N.N <= DATEDIFF(day, R.LoanDate, R.ReturnDate) + 1

    WHERE NOT EXISTS

    (

    SELECT *

    FROM @BookLoans L1

    WHERE L1.BookID = R.BookID

    AND R.LoanDate + N.N -1 BETWEEN L1.LoanDate AND L1.ReturnDate

    )

    )

    ,BLOrderDif

    AS

    (

    SELECT BookID, StudentID, LoanDate, ReturnDate

    ,ROW_NUMBER() OVER (PARTITION BY BookID ORDER BY LoanDate)

    - ROW_NUMBER() OVER (PARTITION BY BookID ORDER BY StudentID, LoanDate) AS OrderDiff

    FROM BL

    )

    , BLRange

    AS

    (

    SELECT BookID, StudentID, OrderDiff, MIN(LoanDate) AS LoanStart, MAX(ReturnDate) AS LoanEnd

    FROM BLOrderDif

    GROUP BY BookID, StudentID, OrderDiff

    )

    SELECT BookID, StudentID, LoanStart, LoanEnd

    FROM BLRange

    WHERE StudentID > 0

    ORDER BY BookID, LoanStart

  • Greetings Ken,

    We Have A Winner!!! πŸ˜€

    I can't say I fully understand how or why it works yet, but it does seem to do what I have been wanting to do before - reduce the date ranges into a single value that changes when the StudentID changes.

    I have converted your example to my live test (not really books :-P) and the speed difference was enormous. Before, my fastest time was 29 seconds. Now, with your change in place, it sped up to less than a second (maybe .5 seconds).

    Thank you for your help.

    Terry Steadman

  • Nice one Ken. Very elegant solution to this..I didn't think of grouping on the rowdiff.

  • I am glad the order difference technique worked but unfortunately I cannot claim credit for the idea. The technique was used to win a competition but I cannot remember the details or who won it. (It may have been a databasejournal.com competition with either Joe Celko or Itzik Ben-Gan as one of the judges.)

    It is an extremely elegant solution to work out when state changes – in this case who has borrowed a book. Like all good ideas, the underlying concept is simple – work out two orders so that the difference is the same while the state is the same.

  • That's nicely condensed, Ken. I'd been working on a version

    ;WITH CTE1 AS (

    SELECT BookID, StudentID, LoanDate, ReturnDate,

    RowNum1 = ROW_NUMBER() OVER (ORDER BY BookID, LoanDate),

    RowNum2 = ROW_NUMBER() OVER (PARTITION BY StudentID ORDER BY BookID, LoanDate)

    FROM @BookLoans

    ), CTE2 AS (

    SELECT BookID, StudentID, LoanDate, ReturnDate,

    LoanStretch = DENSE_RANK() OVER (ORDER BY BookID, StudentID, RowNum1-RowNum2)

    FROM CTE1

    )

    SELECT BookID, StudentID, LoanStart = MIN(LoanDate), LoanEnd = MAX(ReturnDate)

    FROM CTE2

    GROUP BY BookID, StudentID, LoanStretch

    ORDER BY BookID, MIN(LoanDate)

    which does the same thing but includes unnecessary workings - sometimes it's handy to keep it in to help folks understand what's going on.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply