January 23, 2018 at 7:19 am
This one is giving me fits. I'm querying mainframe data that has been imported to SQL. An "Invoice" table and an "Address" table. What I need to do is tie the Invoice record to the Address record nearest to the invoice payment date, but under the invoice payment date. MAX() in Address won't work since an address record can be updated after a payment. MIN() won't work because there can be multiple address updates and the oldest one is too old.
Both tables can contain multiple records (many to many relationship). I'm not including all columns in my samples, just relevant scrubbed ones. I tried a correlated subquery (included below) but it dropped a bunch of records instead of finding my values.
Thoughts? (FYI: I can't change data or table schema.)
IF (SELECT OBJECT_ID('tempdb..#Invoice')) IS NOT NULL
DROP TABLE #Invoice;
IF (SELECT OBJECT_ID('tempdb..#Address')) IS NOT NULL
DROP TABLE #Address;
CREATE TABLE #Invoice (InvoiceNo VARCHAR(20), AddressCode CHAR(2), InvoiceType INT, PayDate DATETIME);
CREATE TABLE #Address (AddressCode CHAR(2), InvoiceNo VARCHAR(20), LastAccountDate DATETIME, Address1 VARCHAR(50), City VARCHAR(50), [State] CHAR(2));
INSERT INTO #Invoice (InvoiceNo, AddressCode, InvoiceType, PayDate)
VALUES ('INV17896279','08',123,'2017-03-28 00:00:00.000'),
('INV17896279','08',123,'2017-03-28 00:00:00.000'),
('INV17618372','15',456,'2017-09-06 00:00:00.000'),
('INV17618372','15',456,'2017-09-06 00:00:00.000'),
('INV10015672','03',789,'2017-12-05 00:00:00.000'),
('INV20088561','01',789,'2017-10-31 00:00:00.000'),
('INV15586390','02',789,'2017-06-15 00:00:00.000');
INSERT INTO #Address (AddressCode, InvoiceNo, LastAccountDate, Address1, City, [State])
VALUES ('08','INV17896279','2017-03-27 00:00:00.000','555 Mouse Way', 'Orlando', 'FL'),
('15','INV17618372','2017-09-05 00:00:00.000','555 Mouse Way', 'Orlando', 'FL'),
('03','INV10015672','2017-12-05 00:00:00.000','555 Mouse Way', 'Orlando', 'FL'),
('03','INV10015672','2017-07-07 00:00:00.000','111 Duck Ave', 'Orlando', 'FL'),
('01','INV20088561','2017-10-25 00:00:00.000','727 McDuck Manor', 'Orlando', 'FL'),
('01','INV20088561','2017-09-18 00:00:00.000','824 Goofy Dr', 'Orlando', 'FL'),
('01','INV20088561','2017-05-10 00:00:00.000','359 Minnie Ct', 'Orlando', 'FL'),
('02','INV15586390','2017-07-22 00:00:00.000','111 Duck Ave', 'Orlando', 'FL'),
('02','INV15586390','2017-06-05 00:00:00.000','824 Goofy Dr', 'Orlando', 'FL');
SELECT i.InvoiceNo, i.AddressCode, i.PayDate, a.LastAccountDate, a.Address1, a.City, a.[State]
FROM #Invoice i
INNER JOIN #Address a
ON i.InvoiceNo = a.InvoiceNo
AND i.AddressCode = a.AddressCode
AND i.PayDate = a.LastAccountDate;
--This is the code we had before discovering non-matching dates
SELECT i.InvoiceNo, i.AddressCode, i.PayDate, a.LastAccountDate, a.Address1, a.City, a.[State]
FROM #Invoice i
INNER JOIN #Address a
ON i.InvoiceNo = a.InvoiceNo
AND i.AddressCode = a.AddressCode
AND i.PayDate = (SELECT MAX(LastAccountDate) AS LastAccountDate
FROM #Address
WHERE LastAccountDate <= i.PayDate
AND InvoiceNo = i.InvoiceNo
AND AddressCode = i.AddressCode
GROUP by InvoiceNo, AddressCode);
--Gives me the wrong results
Expected results:
InvoiceNo AddressCode PayDate LastAccountDate Address1 City State
INV10015672 03 2017-12-05 00:00:00.000 2017-12-05 00:00:00.000 555 Mouse Way Orlando FL
INV17618372 15 2017-09-06 00:00:00.000 2017-09-05 00:00:00.000 555 Mouse Way Orlando FL
INV17896279 08 2017-09-06 00:00:00.000 2017-09-05 00:00:00.000 555 Mouse Way Orlando FL
INV20088561 01 2017-10-31 00:00:00.000 2017-10-25 00:00:00.000 727 McDuck Manor Orlando FL
INV15586390 02 2017-06-15 00:00:00.000 2017-06-05 00:00:00.000 824 Goofy Dr Orlando FL
January 23, 2018 at 7:39 am
There are two approaches. Which one performs better depends on how dense the address table is with respect to the invoice table, but since the invoice table also has duplicates, I suspect that the CTE will perform better.
SELECT DISTINCT i.InvoiceNo, i.AddressCode, i.PayDate, a.LastAccountDate, a.Address1, a.City, a.[State]
FROM #Invoice i
CROSS APPLY
(
SELECT TOP 1 *
FROM #Address a
WHERE a.InvoiceNo = i.InvoiceNo
AND a.AddressCode = i.AddressCode
AND i.PayDate >= a.LastAccountDate
ORDER BY a.LastAccountDate DESC
) a
;
WITH Ordered_Invoices AS
(
SELECT i.InvoiceNo, i.AddressCode, i.PayDate, a.LastAccountDate, a.Address1, a.City, a.[State],
ROW_NUMBER() OVER(PARTITION BY i.InvoiceNo ORDER BY a.LastAccountDate DESC) AS rn
FROM #Invoice i
INNER JOIN #Address a
ON i.InvoiceNo = a.InvoiceNo
AND i.AddressCode = a.AddressCode
AND i.PayDate >= a.LastAccountDate
)
SELECT InvoiceNo, AddressCode, PayDate, LastAccountDate, Address1, City, [State]
FROM Ordered_Invoices
WHERE rn = 1
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
January 23, 2018 at 7:44 am
Thanks. I'll check those out... I didn't even think about CROSS APPLY. I keep forgetting about that.
January 23, 2018 at 7:50 am
How about this, Brandie. You could use DATEDIFF() to find which of two address rows is closest in date to the invoice.SELECT i.InvoiceNo, i.AddressCode, i.PayDate, '#' '#', a.LastAccountDate, a.Address1, a.City, a.[State], '#' '#', x1.*, '#' '#', x2.*
FROM #Invoice i
LEFT JOIN #Address a
ON i.InvoiceNo = a.InvoiceNo
AND i.AddressCode = a.AddressCode
AND i.PayDate = a.LastAccountDate
OUTER APPLY (
SELECT TOP 1 *
FROM #Address a2
WHERE i.InvoiceNo = a2.InvoiceNo
AND i.AddressCode = a2.AddressCode
AND i.PayDate > a2.LastAccountDate
AND a.InvoiceNo IS NULL
ORDER BY i.PayDate
) x1
OUTER APPLY (
SELECT TOP 1 *
FROM #Address a3
WHERE i.InvoiceNo = a3.InvoiceNo
AND i.AddressCode = a3.AddressCode
AND i.PayDate < a3.LastAccountDate
AND a.InvoiceNo IS NULL
ORDER BY i.PayDate DESC
) x2
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 23, 2018 at 8:40 am
It occurs to me that no matter what way I go, all of these solutions will need to be put in a CTE since I'm joining to multiple tables (not included in my sample data) for this query. These all look good so far, so it's just a matter of me choosing which one will work best for performance.
Thank you both for your assistance. I appreciate it.
January 23, 2018 at 8:45 am
ChrisM@Work - Tuesday, January 23, 2018 7:50 AMHow about this, Brandie. You could use DATEDIFF() to find which of two address rows is closest in date to the invoice.SELECT i.InvoiceNo, i.AddressCode, i.PayDate, '#' '#', a.LastAccountDate, a.Address1, a.City, a.[State], '#' '#', x1.*, '#' '#', x2.*
FROM #Invoice i
LEFT JOIN #Address a
ON i.InvoiceNo = a.InvoiceNo
AND i.AddressCode = a.AddressCode
AND i.PayDate = a.LastAccountDate
OUTER APPLY (
SELECT TOP 1 *
FROM #Address a2
WHERE i.InvoiceNo = a2.InvoiceNo
AND i.AddressCode = a2.AddressCode
AND i.PayDate > a2.LastAccountDate
AND a.InvoiceNo IS NULL
ORDER BY i.PayDate
) x1
OUTER APPLY (
SELECT TOP 1 *
FROM #Address a3
WHERE i.InvoiceNo = a3.InvoiceNo
AND i.AddressCode = a3.AddressCode
AND i.PayDate < a3.LastAccountDate
AND a.InvoiceNo IS NULL
ORDER BY i.PayDate DESC
) x2
I think the phrasing was a bit confusing. Brandie said she needs the "Address record nearest to the invoice payment date, but under the invoice payment date." The second OUTER APPLY won't meet the second part of that criteria.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
January 23, 2018 at 8:53 am
drew.allen - Tuesday, January 23, 2018 8:45 AMChrisM@Work - Tuesday, January 23, 2018 7:50 AMHow about this, Brandie. You could use DATEDIFF() to find which of two address rows is closest in date to the invoice.SELECT i.InvoiceNo, i.AddressCode, i.PayDate, '#' '#', a.LastAccountDate, a.Address1, a.City, a.[State], '#' '#', x1.*, '#' '#', x2.*
FROM #Invoice i
LEFT JOIN #Address a
ON i.InvoiceNo = a.InvoiceNo
AND i.AddressCode = a.AddressCode
AND i.PayDate = a.LastAccountDate
OUTER APPLY (
SELECT TOP 1 *
FROM #Address a2
WHERE i.InvoiceNo = a2.InvoiceNo
AND i.AddressCode = a2.AddressCode
AND i.PayDate > a2.LastAccountDate
AND a.InvoiceNo IS NULL
ORDER BY i.PayDate
) x1
OUTER APPLY (
SELECT TOP 1 *
FROM #Address a3
WHERE i.InvoiceNo = a3.InvoiceNo
AND i.AddressCode = a3.AddressCode
AND i.PayDate < a3.LastAccountDate
AND a.InvoiceNo IS NULL
ORDER BY i.PayDate DESC
) x2I think the phrasing was a bit confusing. Brandie said she needs the "Address record nearest to the invoice payment date, but under the invoice payment date." The second OUTER APPLY won't meet the second part of that criteria.
Drew
Ah. Good catch, Drew. I was just in the middle of testing this code too. You're right, it doesn't work for me. I think I'm leaning toward your CTE.
January 23, 2018 at 11:32 am
Almost 3 hours later and the CTE is still trying to run. I'm going to yank it out, run the code for PayDate = LastAccountDate, then do an update for all the remaining NULL records with the CTE (see how that works).
January 23, 2018 at 1:20 pm
Had to fiddle with your code, Drew. It didn't take into account multiple Address codes on a single invoice. Then again, neither did my sample data because I didn't see this until now.
WITH Ordered_Invoices AS
(
SELECT i.InvoiceNo, i.AddressCode, i.PayDate, a.LastAccountDate, a.Address1, a.City, a.[State],
ROW_NUMBER() OVER(PARTITION BY i.InvoiceNo,i.AddressCode ORDER BY a.LastAccountDate DESC) AS rn
FROM #Invoice i
INNER JOIN #Address a
ON i.InvoiceNo = a.InvoiceNo
AND i.AddressCode = a.AddressCode
AND i.PayDate >= a.LastAccountDate
)
SELECT InvoiceNo, AddressCode, PayDate, LastAccountDate, Address1, City, [State]
FROM Ordered_Invoices
WHERE rn = 1
This works for me.
January 23, 2018 at 1:47 pm
Brandie Tarvin - Tuesday, January 23, 2018 1:20 PMHad to fiddle with your code, Drew. It didn't take into account multiple Address codes on a single invoice. Then again, neither did my sample data because I didn't see this until now.
WITH Ordered_Invoices AS
(
SELECT i.InvoiceNo, i.AddressCode, i.PayDate, a.LastAccountDate, a.Address1, a.City, a.[State],
ROW_NUMBER() OVER(PARTITION BY i.InvoiceNo,i.AddressCode ORDER BY a.LastAccountDate DESC) AS rn
FROM #Invoice i
INNER JOIN #Address a
ON i.InvoiceNo = a.InvoiceNo
AND i.AddressCode = a.AddressCode
AND i.PayDate >= a.LastAccountDate
)
SELECT InvoiceNo, AddressCode, PayDate, LastAccountDate, Address1, City, [State]
FROM Ordered_Invoices
WHERE rn = 1This works for me.
Did that change allow it to run in a reasonable time? I did think about adding AddressCode to the PARTITION BY clause, but it didn't appear to be necessary from your data.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
January 23, 2018 at 3:40 pm
drew.allen - Tuesday, January 23, 2018 1:47 PMBrandie Tarvin - Tuesday, January 23, 2018 1:20 PMHad to fiddle with your code, Drew. It didn't take into account multiple Address codes on a single invoice. Then again, neither did my sample data because I didn't see this until now.
WITH Ordered_Invoices AS
(
SELECT i.InvoiceNo, i.AddressCode, i.PayDate, a.LastAccountDate, a.Address1, a.City, a.[State],
ROW_NUMBER() OVER(PARTITION BY i.InvoiceNo,i.AddressCode ORDER BY a.LastAccountDate DESC) AS rn
FROM #Invoice i
INNER JOIN #Address a
ON i.InvoiceNo = a.InvoiceNo
AND i.AddressCode = a.AddressCode
AND i.PayDate >= a.LastAccountDate
)
SELECT InvoiceNo, AddressCode, PayDate, LastAccountDate, Address1, City, [State]
FROM Ordered_Invoices
WHERE rn = 1This works for me.
Did that change allow it to run in a reasonable time? I did think about adding AddressCode to the PARTITION BY clause, but it didn't appear to be necessary from your data.
Drew
It only runs in a reasonable time when I make it an update after my initial table load. The initial load hits too many tables, too many columns that when I add this on top of it, it just caused a massive slowdown (from 20 minutes to 2+ hours and counting when I stopped it). But the update runs in a couple of minutes after the fact just fine.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply