2 records at a time?

  • Hi everyone,

    I'm not even sure how to word this succinctly enough to search for this topic, so I apologize in advance if this is covered elsewhere! It almost feels like a gaps and islands issue, but not quite....

    Background Info:

    I am developing a form for a mortgage company. There can be any number of borrowers on a given loan, and the business has asked that this form return only 2 borrowers at a time for a loan. For example, if there are 3 borrowers for a loan, they want the first copy of the form to print the first 2 borrowers and then another copy of the form to print the 3rd. No matter how many copies are printed, they want the borrower information to be labeled as 'Borrower1' xyz and 'Borrower2' xyz. Also, there will be a LOT more fields returned on the real form, so the sample information below is very simplified test data.

    Sample Data:

    CREATE TABLE #t (LoanID VARCHAR(5), BorrowerName VARCHAR(20), BorrowerOrder INT);

    GO

    INSERT INTO #t VALUES

    ('::E', 'John Smith', 0)

    , ('::E', 'Jane Smith', 1)

    , ('::E', 'Rob Jackson', 2)

    , ('AF_CF', 'Sloan Burton', 1)

    , ('AF_CF', 'Virginia Burton', 2)

    , ('AF_CF', 'John Hardeman', 0)

    , ('::E', 'Mary Lamb', 4)

    , ('::E', 'Patrick Stewart', 3)

    So for loan 'AF_CF', the form should print twice:

    1st copy: Borrower1 = Sloan Burton, Borrower2 = Virginia Burton

    2nd copy: Borrower1 = John Hardeman, Borrower2 = NULL

    We are on 2012, so I tried to do this with LEAD():

    SELECT

    Borrower1 = BorrowerName

    , Borrower2 = LEAD(BorrowerName) OVER(ORDER BY BorrowerOrder)

    FROM

    #t AS t

    WHERE

    FileID = 'AF_CF'

    But this returns a record for every borrower, so the form is printing 3 times:

    1st: Borrower1 = Sloan Burton, Borrower2 = Virginia Burton

    2nd: Borrower1 = Virginia Burton, Borrower2 = John Hardeman

    3rd: Borrower1 = John Hardeman, Borrower2 = NULL

    I don't want that 2nd record to return. This result is what makes me think of gaps and islands, but I don't know if the 2nd record is really an island since it's (1) not stored this way...it's returning this way because of the query and (2) it's not sequential data....

    Finally, I tried restricting this by putting this into a CTE and then returning only the odd numbered records like I have below. This runs pretty quickly when dealing with one loan. But...I am concerned that the CTE will be slow when we run batches of loans.

    Attempt with CTE:

    --With CTE

    ;WITH cte AS

    (SELECT

    Borrower1 = BorrowerName

    , Borrower2 = LEAD(BorrowerName) OVER(ORDER BY BorrowerOrder)

    , RowNumber = ROW_NUMBER() OVER(ORDER BY BorrowerOrder)

    FROM

    #t AS t

    WHERE

    FileID = '::E'

    )

    SELECT

    *

    FROM

    cte

    WHERE

    RowNumber % 2 = 1

    Is there a better, cleaner way to do this? Or is the CTE the best way to go?

  • Since you already have the BorrowerOrder column I'm not sure why it would need to be so complicated...

    Wouldn't a simple BorrowerOrder < 2 be enough?

    SELECT

    *

    FROM

    #t AS t

    WHERE BorrowerOrder < 2



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • LutzM (11/17/2013)


    Since you already have the BorrowerOrder column I'm not sure why it would need to be so complicated...

    Wouldn't a simple BorrowerOrder < 2 be enough?

    SELECT

    *

    FROM

    #t AS t

    WHERE BorrowerOrder < 2

    The trouble with that is that it's only going to return 1 row, so what happens when the file has more than 2 borrowers? Also, the sequence of the values in the BORROWERORDER field may contain gaps, so that may only get one borrower when it should get two. Or even worse, return no borrowers at all.

    I think the OP's solution with a CTE is probably OK; roughly half of the CTE's rows are excluded from the output, but if the BorrowerOrder values are not guaranteed to be set of integers without gaps (and I imagine they can't be, since if they were there would be no need to use LEAD instead of just adding 1) there is no way to avoid that, and the performance penalty for having the extra data in the CTE is not enormous as all the rows with the right FILEID have to be read whether they are first of a pair or second of a pair. If the table is clustered on a key of which the FILEID is the first column, or there is a suitable covering index for this query where FILEID is the first column, the solution using a CTE will be pretty fast.

    Of course a slightly different query consisting of a CTE which adds a row number and doesn't combine records and a main select which joins with a tally table might be faster, but it might actually be slower. Probably it would be easy enough to measure and compare the two approaches if there's adequate test data.

    Tom

  • jessica.green2312 (11/15/2013)


    Hi everyone,

    I'm not even sure how to word this succinctly enough to search for this topic, so I apologize in advance if this is covered elsewhere! It almost feels like a gaps and islands issue, but not quite....

    Background Info:

    I am developing a form for a mortgage company. There can be any number of borrowers on a given loan, and the business has asked that this form return only 2 borrowers at a time for a loan. For example, if there are 3 borrowers for a loan, they want the first copy of the form to print the first 2 borrowers and then another copy of the form to print the 3rd. No matter how many copies are printed, they want the borrower information to be labeled as 'Borrower1' xyz and 'Borrower2' xyz. Also, there will be a LOT more fields returned on the real form, so the sample information below is very simplified test data.

    Sample Data:

    CREATE TABLE #t (LoanID VARCHAR(5), BorrowerName VARCHAR(20), BorrowerOrder INT);

    GO

    INSERT INTO #t VALUES

    ('::E', 'John Smith', 0)

    , ('::E', 'Jane Smith', 1)

    , ('::E', 'Rob Jackson', 2)

    , ('AF_CF', 'Sloan Burton', 1)

    , ('AF_CF', 'Virginia Burton', 2)

    , ('AF_CF', 'John Hardeman', 0)

    , ('::E', 'Mary Lamb', 4)

    , ('::E', 'Patrick Stewart', 3)

    So for loan 'AF_CF', the form should print twice:

    1st copy: Borrower1 = Sloan Burton, Borrower2 = Virginia Burton

    2nd copy: Borrower1 = John Hardeman, Borrower2 = NULL

    We are on 2012, so I tried to do this with LEAD():

    SELECT

    Borrower1 = BorrowerName

    , Borrower2 = LEAD(BorrowerName) OVER(ORDER BY BorrowerOrder)

    FROM

    #t AS t

    WHERE

    FileID = 'AF_CF'

    But this returns a record for every borrower, so the form is printing 3 times:

    1st: Borrower1 = Sloan Burton, Borrower2 = Virginia Burton

    2nd: Borrower1 = Virginia Burton, Borrower2 = John Hardeman

    3rd: Borrower1 = John Hardeman, Borrower2 = NULL

    I don't want that 2nd record to return. This result is what makes me think of gaps and islands, but I don't know if the 2nd record is really an island since it's (1) not stored this way...it's returning this way because of the query and (2) it's not sequential data....

    Finally, I tried restricting this by putting this into a CTE and then returning only the odd numbered records like I have below. This runs pretty quickly when dealing with one loan. But...I am concerned that the CTE will be slow when we run batches of loans.

    Attempt with CTE:

    --With CTE

    ;WITH cte AS

    (SELECT

    Borrower1 = BorrowerName

    , Borrower2 = LEAD(BorrowerName) OVER(ORDER BY BorrowerOrder)

    , RowNumber = ROW_NUMBER() OVER(ORDER BY BorrowerOrder)

    FROM

    #t AS t

    WHERE

    FileID = '::E'

    )

    SELECT

    *

    FROM

    cte

    WHERE

    RowNumber % 2 = 1

    Is there a better, cleaner way to do this? Or is the CTE the best way to go?

    This is actually pretty easy to do but I need some clarification on the Borrower Order column. 0 typically comes before 1 or 2 in the world of numbers. Why, in you example, do the BorrowOrder of 1 and 2 come before 0?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you all for your replies!

    Tom, you have it correct. There are, at times, gaps in the BorrowerOrder numbers, and there are times when there might be up to 7 or 8 borrowers.

    Jeff, I apologize! This was my error in creating the test data. For the example resultset I gave, it should have been Sloan - 0, Virginia - 1, John - 2. What do you recommend?

  • jessica.green2312 (11/18/2013)


    Jeff, I apologize! This was my error in creating the test data. For the example resultset I gave, it should have been Sloan - 0, Virginia - 1, John - 2. What do you recommend?

    Heh... my first recommendation is to stop cutting yourself short by posting test-data-creation code that works only in 2008 and above. There are a whole lot of some really helpful and smart folks that could help you that are still stuck in 2005 depending on what environment they might currently be sitting in. Other than that, you did an outstanding job of posting readily consumable data we can understand. Well done.

    With that thought in mind, I've converted your test data creation script to work in 2005 and up. I've also made the correction you identified and I made the BorrowerOrder non-sequential as you suggested it may be in real life.

    CREATE TABLE #t (LoanID VARCHAR(5), BorrowerName VARCHAR(20), BorrowerOrder INT);

    GO

    INSERT INTO #t

    SELECT '::E' , 'John Smith' , 0 UNION ALL

    SELECT '::E' , 'Jane Smith' , 1 UNION ALL

    SELECT '::E' , 'Rob Jackson' , 2 UNION ALL

    SELECT 'AF_CF' , 'Sloan Burton' , 0 UNION ALL

    SELECT 'AF_CF' , 'Virginia Burton' , 2 UNION ALL

    SELECT 'AF_CF' , 'John Hardeman' , 4 UNION ALL

    SELECT '::E' , 'Mary Lamb' , 4 UNION ALL

    SELECT '::E' , 'Patrick Stewart' , 3

    ;

    I may have read a little too much into your request but my understanding is that you actually need to control the letters and the copies of the letters. No better way to do that than to create a row-per-copy relationship for a result set so that all you have to do is "field replacement" from a table. In other words, denormalize the hell out of the data in the form of a result set. 😀 Here's the code to do that. This uses an ancient method known as a "CrossTab" and it can be much faster than a Pivot. Please see the following article for more on that subject (which is followed by the code for this problem)...

    [font="Arial Black"]http://www.sqlservercentral.com/articles/T-SQL/63681/[/font][/url]

    A little math along with the CrossTab code makes short work of it all.

    WITH ctePreSort AS

    ( --=== Create unbroken sequences starting at zero for each LoanID in BorrowerOrder order

    SELECT LoanID

    ,BorrowerName

    ,SortOrder = ROW_NUMBER() OVER (PARTITION BY LoanID ORDER BY BorrowerOrder)-1

    FROM #t

    ) --=== Do a string aggregation (CrossTab) to pivot the data correctly for use as a print control table.

    SELECT LoanID

    ,CopyNumber = SortOrder/2+1

    ,Borrower1 = MAX(CASE WHEN SortOrder%2 = 0 THEN BorrowerName ELSE '' END)

    ,Borrower2 = MAX(CASE WHEN SortOrder%2 = 1 THEN BorrowerName ELSE '' END)

    FROM ctePreSort

    GROUP BY LoanID,SortOrder/2

    ORDER BY LoanID,SortOrder/2

    ;

    Here are the results...

    LoanID CopyNumber Borrower1 Borrower2

    ------ ---------- ------------- ---------------

    ::E 1 John Smith Jane Smith

    ::E 2 Rob Jackson Patrick Stewart

    ::E 3 Mary Lamb

    AF_CF 1 Sloan Burton Virginia Burton

    AF_CF 2 John Hardeman

    (5 row(s) affected)

    Finally, I tried restricting this by putting this into a CTE and then returning only the odd numbered records like I have below. This runs pretty quickly when dealing with one loan. But...I am concerned that the CTE will be slow when we run batches of loans.

    I think you were right to be concerned. I haven't used it myself, yet, but I've seen some performance tests by people I trust and LEAD/LAG doesn't measure up to what I had expected (doesn't come close, actually). I can tell you that, especially with an index on the GROUP BY columns, the CrossTab type of string aggregation I did above absolutely flies (although I admit that I have NOT tested this particular bit of code but have severely tested nearly identical code).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I can second what Jeff said about performance of LAG/LEAD as I've done some testing myself. Frankly I was not impressed in the admittedly limited testing that I have done, even though they do tend to make the code look simpler (in some cases).

    In this case, I think what Jeff offered looks pretty simple. At least to me. But perhaps I am not objective as I've used that same technique before.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Ahhh! Thank you so much, Jeff!! I've used crosstab before, but didn't realize that's what that was called. Scary, right? I feel like a kid who drops a perfectly good toy in favor of a shiny new one. Hahaha. This is great and works really well - I appreciate you helping me out. You explained everything so well and thank you for sharing the link to your crosstab/pivot article! I'm sharing this post and the article link with my group at work. And I also appreciate you pointing out the use of the backwards compatible test code to broaden my audience. I'll be sure to do this going forward.

    Have a great day, everyone!

  • Thank you, Dwain - good to know!

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply