Tying MAX() value to <= value from another table

  • This one is giving me fits. I'm querying mainframe data that has been imported to SQL. An "Invoice" table and an "Address" table. What I need to do is tie the Invoice record to the Address record nearest to the invoice payment date, but under the invoice payment date. MAX() in Address won't work since an address record can be updated after a payment. MIN() won't work because there can be multiple address updates and the oldest one is too old.

    Both tables can contain multiple records (many to many relationship). I'm not including all columns in my samples, just relevant scrubbed ones. I tried a correlated subquery (included below) but it dropped a bunch of records instead of finding my values.

    Thoughts? (FYI: I can't change data or table schema.)


    IF (SELECT OBJECT_ID('tempdb..#Invoice')) IS NOT NULL
     DROP TABLE #Invoice;
    IF (SELECT OBJECT_ID('tempdb..#Address')) IS NOT NULL
     DROP TABLE #Address;

    CREATE TABLE #Invoice (InvoiceNo VARCHAR(20), AddressCode CHAR(2), InvoiceType INT, PayDate DATETIME);
    CREATE TABLE #Address (AddressCode CHAR(2), InvoiceNo VARCHAR(20), LastAccountDate DATETIME, Address1 VARCHAR(50), City VARCHAR(50), [State] CHAR(2));

    INSERT INTO #Invoice (InvoiceNo, AddressCode, InvoiceType, PayDate)
    VALUES ('INV17896279','08',123,'2017-03-28 00:00:00.000'),
    ('INV17896279','08',123,'2017-03-28 00:00:00.000'),
    ('INV17618372','15',456,'2017-09-06 00:00:00.000'),
    ('INV17618372','15',456,'2017-09-06 00:00:00.000'),
    ('INV10015672','03',789,'2017-12-05 00:00:00.000'),
    ('INV20088561','01',789,'2017-10-31 00:00:00.000'),
    ('INV15586390','02',789,'2017-06-15 00:00:00.000');

    INSERT INTO #Address (AddressCode, InvoiceNo, LastAccountDate, Address1, City, [State])
    VALUES ('08','INV17896279','2017-03-27 00:00:00.000','555 Mouse Way', 'Orlando', 'FL'),
    ('15','INV17618372','2017-09-05 00:00:00.000','555 Mouse Way', 'Orlando', 'FL'),
    ('03','INV10015672','2017-12-05 00:00:00.000','555 Mouse Way', 'Orlando', 'FL'),
    ('03','INV10015672','2017-07-07 00:00:00.000','111 Duck Ave', 'Orlando', 'FL'),
    ('01','INV20088561','2017-10-25 00:00:00.000','727 McDuck Manor', 'Orlando', 'FL'),
    ('01','INV20088561','2017-09-18 00:00:00.000','824 Goofy Dr', 'Orlando', 'FL'),
    ('01','INV20088561','2017-05-10 00:00:00.000','359 Minnie Ct', 'Orlando', 'FL'),
    ('02','INV15586390','2017-07-22 00:00:00.000','111 Duck Ave', 'Orlando', 'FL'),
    ('02','INV15586390','2017-06-05 00:00:00.000','824 Goofy Dr', 'Orlando', 'FL');

    SELECT i.InvoiceNo, i.AddressCode, i.PayDate, a.LastAccountDate, a.Address1, a.City, a.[State]
    FROM #Invoice i
    INNER JOIN #Address a
    ON i.InvoiceNo = a.InvoiceNo
    AND i.AddressCode = a.AddressCode
    AND i.PayDate = a.LastAccountDate;
    --This is the code we had before discovering non-matching dates


    SELECT i.InvoiceNo, i.AddressCode, i.PayDate, a.LastAccountDate, a.Address1, a.City, a.[State]
    FROM #Invoice i
    INNER JOIN #Address a
    ON i.InvoiceNo = a.InvoiceNo
    AND i.AddressCode = a.AddressCode
    AND i.PayDate = (SELECT MAX(LastAccountDate) AS LastAccountDate
         FROM #Address
         WHERE LastAccountDate <= i.PayDate
         AND InvoiceNo = i.InvoiceNo
         AND AddressCode = i.AddressCode
         GROUP by InvoiceNo, AddressCode);
    --Gives me the wrong results

    Expected results:

    InvoiceNo AddressCode PayDate LastAccountDate Address1 City State
    INV10015672 03 2017-12-05 00:00:00.000 2017-12-05 00:00:00.000 555 Mouse Way Orlando FL
    INV17618372 15  2017-09-06 00:00:00.000 2017-09-05 00:00:00.000 555 Mouse Way Orlando FL
    INV17896279 08  2017-09-06 00:00:00.000 2017-09-05 00:00:00.000 555 Mouse Way Orlando FL
    INV20088561 01  2017-10-31 00:00:00.000 2017-10-25 00:00:00.000 727 McDuck Manor Orlando FL
    INV15586390 02  2017-06-15 00:00:00.000 2017-06-05 00:00:00.000 824 Goofy Dr Orlando FL

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • There are two approaches.  Which one performs better depends on how dense the address table is with respect to the invoice table, but since the invoice table also has duplicates, I suspect that the CTE will perform better.


    SELECT DISTINCT i.InvoiceNo, i.AddressCode, i.PayDate, a.LastAccountDate, a.Address1, a.City, a.[State]
    FROM #Invoice i
    CROSS APPLY
    (
        SELECT TOP 1 *
        FROM #Address a
        WHERE a.InvoiceNo = i.InvoiceNo
            AND a.AddressCode = i.AddressCode
            AND i.PayDate >= a.LastAccountDate
        ORDER BY a.LastAccountDate DESC
    ) a

    ;
    WITH Ordered_Invoices AS
    (
        SELECT i.InvoiceNo, i.AddressCode, i.PayDate, a.LastAccountDate, a.Address1, a.City, a.[State],
            ROW_NUMBER() OVER(PARTITION BY i.InvoiceNo ORDER BY a.LastAccountDate DESC) AS rn
        FROM #Invoice i
        INNER JOIN #Address a
        ON i.InvoiceNo = a.InvoiceNo
        AND i.AddressCode = a.AddressCode
        AND i.PayDate >= a.LastAccountDate
    )
    SELECT InvoiceNo, AddressCode, PayDate, LastAccountDate, Address1, City, [State]
    FROM Ordered_Invoices
    WHERE rn = 1

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Thanks. I'll check those out... I didn't even think about CROSS APPLY. I keep forgetting about that.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • How about this, Brandie. You could use DATEDIFF() to find which of two address rows is closest in date to the invoice.
    SELECT i.InvoiceNo, i.AddressCode, i.PayDate, '#' '#', a.LastAccountDate, a.Address1, a.City, a.[State], '#' '#', x1.*, '#' '#', x2.*
    FROM #Invoice i
    LEFT JOIN #Address a
     ON i.InvoiceNo = a.InvoiceNo
     AND i.AddressCode = a.AddressCode
     AND i.PayDate = a.LastAccountDate
    OUTER APPLY (
     SELECT TOP 1 *
     FROM #Address a2
     WHERE i.InvoiceNo = a2.InvoiceNo
      AND i.AddressCode = a2.AddressCode
      AND i.PayDate > a2.LastAccountDate
      AND a.InvoiceNo IS NULL
     ORDER BY i.PayDate
    ) x1
    OUTER APPLY (
     SELECT TOP 1 *
     FROM #Address a3
     WHERE i.InvoiceNo = a3.InvoiceNo
      AND i.AddressCode = a3.AddressCode
      AND i.PayDate < a3.LastAccountDate
      AND a.InvoiceNo IS NULL
     ORDER BY i.PayDate DESC
    ) x2

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • It occurs to me that no matter what way I go, all of these solutions will need to be put in a CTE since I'm joining to multiple tables (not included in my sample data) for this query. These all look good so far, so it's just a matter of me choosing which one will work best for performance.

    Thank you both for your assistance. I appreciate it.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • ChrisM@Work - Tuesday, January 23, 2018 7:50 AM

    How about this, Brandie. You could use DATEDIFF() to find which of two address rows is closest in date to the invoice.
    SELECT i.InvoiceNo, i.AddressCode, i.PayDate, '#' '#', a.LastAccountDate, a.Address1, a.City, a.[State], '#' '#', x1.*, '#' '#', x2.*
    FROM #Invoice i
    LEFT JOIN #Address a
     ON i.InvoiceNo = a.InvoiceNo
     AND i.AddressCode = a.AddressCode
     AND i.PayDate = a.LastAccountDate
    OUTER APPLY (
     SELECT TOP 1 *
     FROM #Address a2
     WHERE i.InvoiceNo = a2.InvoiceNo
      AND i.AddressCode = a2.AddressCode
      AND i.PayDate > a2.LastAccountDate
      AND a.InvoiceNo IS NULL
     ORDER BY i.PayDate
    ) x1
    OUTER APPLY (
     SELECT TOP 1 *
     FROM #Address a3
     WHERE i.InvoiceNo = a3.InvoiceNo
      AND i.AddressCode = a3.AddressCode
      AND i.PayDate < a3.LastAccountDate
      AND a.InvoiceNo IS NULL
     ORDER BY i.PayDate DESC
    ) x2

    I think the phrasing was a bit confusing.  Brandie said she needs the "Address record nearest to the invoice payment date, but under the invoice payment date."  The second OUTER APPLY won't meet the second part of that criteria.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Tuesday, January 23, 2018 8:45 AM

    ChrisM@Work - Tuesday, January 23, 2018 7:50 AM

    How about this, Brandie. You could use DATEDIFF() to find which of two address rows is closest in date to the invoice.
    SELECT i.InvoiceNo, i.AddressCode, i.PayDate, '#' '#', a.LastAccountDate, a.Address1, a.City, a.[State], '#' '#', x1.*, '#' '#', x2.*
    FROM #Invoice i
    LEFT JOIN #Address a
     ON i.InvoiceNo = a.InvoiceNo
     AND i.AddressCode = a.AddressCode
     AND i.PayDate = a.LastAccountDate
    OUTER APPLY (
     SELECT TOP 1 *
     FROM #Address a2
     WHERE i.InvoiceNo = a2.InvoiceNo
      AND i.AddressCode = a2.AddressCode
      AND i.PayDate > a2.LastAccountDate
      AND a.InvoiceNo IS NULL
     ORDER BY i.PayDate
    ) x1
    OUTER APPLY (
     SELECT TOP 1 *
     FROM #Address a3
     WHERE i.InvoiceNo = a3.InvoiceNo
      AND i.AddressCode = a3.AddressCode
      AND i.PayDate < a3.LastAccountDate
      AND a.InvoiceNo IS NULL
     ORDER BY i.PayDate DESC
    ) x2

    I think the phrasing was a bit confusing.  Brandie said she needs the "Address record nearest to the invoice payment date, but under the invoice payment date."  The second OUTER APPLY won't meet the second part of that criteria.

    Drew

    Ah. Good catch, Drew. I was just in the middle of testing this code too. You're right, it doesn't work for me. I think I'm leaning toward your CTE.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Almost 3 hours later and the CTE is still trying to run. I'm going to yank it out, run the code for PayDate = LastAccountDate, then do an update for all the remaining NULL records with the CTE (see how that works).

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Had to fiddle with your code, Drew. It didn't take into account multiple Address codes on a single invoice. Then again, neither did my sample data because I didn't see this until now.

    WITH Ordered_Invoices AS
    (
    SELECT i.InvoiceNo, i.AddressCode, i.PayDate, a.LastAccountDate, a.Address1, a.City, a.[State],
    ROW_NUMBER() OVER(PARTITION BY i.InvoiceNo,i.AddressCode ORDER BY a.LastAccountDate DESC) AS rn
    FROM #Invoice i
    INNER JOIN #Address a
    ON i.InvoiceNo = a.InvoiceNo
    AND i.AddressCode = a.AddressCode
    AND i.PayDate >= a.LastAccountDate
    )
    SELECT InvoiceNo, AddressCode, PayDate, LastAccountDate, Address1, City, [State]
    FROM Ordered_Invoices
    WHERE rn = 1

    This works for me.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin - Tuesday, January 23, 2018 1:20 PM

    Had to fiddle with your code, Drew. It didn't take into account multiple Address codes on a single invoice. Then again, neither did my sample data because I didn't see this until now.

    WITH Ordered_Invoices AS
    (
    SELECT i.InvoiceNo, i.AddressCode, i.PayDate, a.LastAccountDate, a.Address1, a.City, a.[State],
    ROW_NUMBER() OVER(PARTITION BY i.InvoiceNo,i.AddressCode ORDER BY a.LastAccountDate DESC) AS rn
    FROM #Invoice i
    INNER JOIN #Address a
    ON i.InvoiceNo = a.InvoiceNo
    AND i.AddressCode = a.AddressCode
    AND i.PayDate >= a.LastAccountDate
    )
    SELECT InvoiceNo, AddressCode, PayDate, LastAccountDate, Address1, City, [State]
    FROM Ordered_Invoices
    WHERE rn = 1

    This works for me.

    Did that change allow it to run in a reasonable time?  I did think about adding AddressCode to the PARTITION BY clause, but it didn't appear to be necessary from your data.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Tuesday, January 23, 2018 1:47 PM

    Brandie Tarvin - Tuesday, January 23, 2018 1:20 PM

    Had to fiddle with your code, Drew. It didn't take into account multiple Address codes on a single invoice. Then again, neither did my sample data because I didn't see this until now.

    WITH Ordered_Invoices AS
    (
    SELECT i.InvoiceNo, i.AddressCode, i.PayDate, a.LastAccountDate, a.Address1, a.City, a.[State],
    ROW_NUMBER() OVER(PARTITION BY i.InvoiceNo,i.AddressCode ORDER BY a.LastAccountDate DESC) AS rn
    FROM #Invoice i
    INNER JOIN #Address a
    ON i.InvoiceNo = a.InvoiceNo
    AND i.AddressCode = a.AddressCode
    AND i.PayDate >= a.LastAccountDate
    )
    SELECT InvoiceNo, AddressCode, PayDate, LastAccountDate, Address1, City, [State]
    FROM Ordered_Invoices
    WHERE rn = 1

    This works for me.

    Did that change allow it to run in a reasonable time?  I did think about adding AddressCode to the PARTITION BY clause, but it didn't appear to be necessary from your data.

    Drew

    It only runs in a reasonable time when I make it an update after my initial table load. The initial load hits too many tables, too many columns that when I add this on top of it, it just caused a massive slowdown (from 20 minutes to 2+ hours and counting when I stopped it). But the update runs in a couple of minutes after the fact just fine.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply