SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Grouping dates and IDs together


Grouping dates and IDs together

Author
Message
Brandie Tarvin
Brandie Tarvin
SSC Guru
SSC Guru (126K reputation)SSC Guru (126K reputation)SSC Guru (126K reputation)SSC Guru (126K reputation)SSC Guru (126K reputation)SSC Guru (126K reputation)SSC Guru (126K reputation)SSC Guru (126K reputation)

Group: General Forum Members
Points: 126207 Visits: 9604

I have what at first (and second and fifth) glance appears to be an impossible task. I have to take a table with multiple rows per ID and group it across different dates. I've tried CTEs, cursors, WHILE loops and MERGE Upserts all in an effort to get around this problem and at the end of the day, something is always going wrong with the data. The worst part is I can't use MAX() and MIN() effectively. (Note: this problem will look familiar to those who helped me with a similar but slightly different issue a couple of weeks ago).

Here's what I've got (data obfuscated, so if something seems off, please ask):


CREATE TABLE #MyTemp (Office VARCHAR(20), POID INT, DeliverySite INT, AmtDollars MONEY, StartDate DATE, EndDate DATE, DelSite_Order TINYINT, POID_Order TINYINT)

INSERT INTO #MyTemp (Office, POID, DeliverySite, AmtDollars, StartDate, EndDate, DelSite_Order, POID_Order)
VALUES ('ABC3566', 1071581, 1821, 50.00, '08/18/2010', '02/23/2011', 1, 1),
('ABC3566', 1071578, 1821, 50.00, '09/03/2010', '05/01/2013', 1, 2),
('ABC3566', 1092732, 1821, 50.00, '07/01/2011', '02/11/2014', 1, 3),
('ABC3566', 1195618, 1821, 75.00, '02/12/2014', '06/05/2014', 1, 4),
('ABC3566', 1195621, 1821, 75.00, '02/12/2014', '06/03/2014', 1, 5),
('ABC3566', 1200240, 1821, 125.00, '06/06/2014', '07/31/2014', 1, 6),
('ABC3566', 1204971, 1821, 125.00, '08/01/2014', '10/31/2014', 1, 7),
('ABC3566', 1204980, 1821, 125.00, '08/01/2014', '10/31/2014', 1, 8),
('ABC3566', 1214303, 1821, 50.00, '11/01/2014', '01/12/2015', 1, 9),
('ABC3566', 1214303, 3882, 50.00, '11/01/2014', '01/12/2015', 2, 9),
('ABC3566', 1216923, 1821, 50.00, '01/13/2015', '09/30/2015', 1, 10),
('ABC3566', 1216923, 3882, 50.00, '01/13/2015', '09/30/2015', 2, 10),
('ABC3566', 1216926, 1821, 50.00, '01/13/2015', '09/30/2015', 1, 11),
('ABC3566', 1216926, 3882, 50.00, '01/13/2015', '09/30/2015', 2, 11),
('ABC3566', 1228334, 1821, 50.00, '10/01/2015', '06/30/2016', 1, 12),
('ABC3566', 1228334, 3882, 50.00, '10/01/2015', '06/30/2016', 2, 12),
('EFJ9833', 1072026, 2607, 25.00, '11/01/2010', '05/01/2013', 1, 1),
('EFJ9833', 1072026, 2608, 25.00, '11/01/2010', '05/01/2013', 2, 1),
('EFJ9833', 1099563, 2607, 25.00, '07/01/2011', '06/04/2013', 1, 2),
('EFJ9833', 1099563, 2608, 25.00, '07/01/2011', '06/04/2013', 2, 2),
('EFJ9833', 1189651, 2607, 25.00, '06/05/2013', '05/14/2014', 1, 3),
('EFJ9833', 1189651, 2608, 25.00, '06/05/2013', '05/14/2014', 2, 3),
('EFJ9833', 1189651, 3504, 25.00, '06/05/2013', '05/14/2014', 3, 3),
('EFJ9833', 1199690, 2607, 25.00, '05/15/2014', '06/30/2014', 1, 4),
('EFJ9833', 1199690, 3784, 25.00, '05/15/2014', '06/30/2014', 2, 4),
('EFJ9833', 1199690, 3785, 25.00, '05/15/2014', '06/30/2014', 3, 4),
('EFJ9833', 1209430, 2607, 25.00, '07/01/2014', '07/31/2014', 1, 5),
('EFJ9833', 1209430, 3784, 25.00, '07/01/2014', '07/31/2014', 2, 5),
('EFJ9833', 1209430, 3828, 25.00, '07/01/2014', '07/31/2014', 3, 5),
('EFJ9833', 1209513, 2607, 25.00, '08/01/2014', '09/30/2015', 1, 6),
('EFJ9833', 1209513, 3784, 25.00, '08/01/2014', '09/30/2015', 2, 6),
('EFJ9833', 1209513, 3828, 25.00, '08/01/2014', '09/30/2015', 3, 6);



What I've got to do is combine data sets based on the start / end date, the POID, and the amount. So, for example, office ABC3566 has 3 rows initially from 08/18/2010 to the "max" end date of 2/11/2014 for $50.00 heading to delivery site 1821. Since the amount then changes to $75.00 (but for the same site), I should get a new record and the same when it increases to $125.00. When the delivery sites break into 2 different sites, there should be one new record each for each site. Therefore the records should look like this:

ABC3566 1821 $50.00 08/18/2010 02/11/2014
ABC3566 1821 $75.00 02/12/2014 06/05/2014 <- Note the "05", not the "03" day even though the order puts "03" last.
ABC3566 1821 $125.00 06/06/2014 10/31/2014
ABC3566 1821 $50.00 11/01/2014 06/30/2016
ABC3566 3882 $50.00 11/01/2014 06/30/2016


My biggest problem here is that because 1821 has two different time frames in which the amount is $50.00, the MAX EndDate ends up being 6/30/2016 for the first entry when it shouldn't be. No matter how I group things, I can't seem to fix this problem.

With the next one, EFJ9833, the results should look like this:

EFJ9833 2607 $25.00 11/01/2010 06/04/2013
EFJ9833 2608 $25.00 11/01/2010 06/04/2013
EFJ9833 2607 $25.00 06/05/2013 05/14/2014
EFJ9833 2608 $25.00 06/05/2013 05/14/2014
EFJ9833 3504 $25.00 06/05/2013 05/14/2014
EFJ9833 2607 $25.00 05/15/2014 06/30/2014
EFJ9833 3784 $25.00 05/15/2014 06/30/2014
EFJ9833 3785 $25.00 05/15/2014 06/30/2014
EFJ9833 2607 $25.00 07/01/2014 09/30/2015
EFJ9833 3784 $25.00 07/01/2014 09/30/2015
EFJ9833 3828 $25.00 07/01/2014 09/30/2015


And then, just when the dates seem to be making sense, i.e., a StartDate occurring a day after a EndDate, there's data like this:


INSERT INTO #MyTemp (Office, POID, DeliverySite, AmtDollars, StartDate, EndDate, DelSite_Order, POID_Order)
VALUES ('XKA9523', 1043624, 1987, 25.00, '11/01/2010', '05/01/2013', 1, 1),
('XKA9523', 1100518, 2005, 25.00, '12/01/2011', '05/01/2013', 1, 1),
('XKA9523', 1096836, 1592, 40.00, '09/01/2015', '06/04/2016', 1, 2);


Which needs to be:

XKA9523 1987 $25.00 11/01/2010 05/01/2013
XKA9523 2005 $25.00 12/01/2011 05/01/2013 <- Notice this StartDate starts before the last ENDDAte
XKA9523 1592 $40.00 09/01/2015 06/04/2016 <- Notice this StartDate has skipped several months between the last


So if a StartDate happens before or after the last EndDate but is for the same Office and the same DeliverySite with the same dollar amount, it gets grouped in with the previous record. If it happens before or after the last EndDate for a different DeliverySite *or* a different amount, it gets its own record. The "order" columns were done by me using RANK() and DENSE_RANK() in an effort to group DeliverySites by Office and in order of site #. Because I had no other way of effectively grouping this information it was the best I could do.

I've been banging my head against this particular wall for 2 weeks now and still am no closer to a solution. Each time I solve the problem for one office, I break it for another.

Any thoughts on where I can start?

EDIT: <headdesk> My apologies. I thought I posted this in the T-SQL forum but my mouse must have skipped up when I hit the forum header. I'm not going to double-post, so please answer here.



Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.
drew.allen
drew.allen
SSC Guru
SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)

Group: General Forum Members
Points: 50527 Visits: 14720
This gives the results that you are looking for. I had to group based on the delivery sites per POID in order to get the "splits" correct.

;
WITH DeliverySites AS
(
SELECT *,
STUFF(
(
SELECT ',', DeliverySite AS [text()]
FROM #MyTemp sub
WHERE main.POID = sub.POID
ORDER BY DeliverySite
FOR XML PATH(''), TYPE
).value('.', 'VARCHAR(MAX)')
, 1, 1, '') AS DeliverySites
FROM #MyTemp main
)
, Starts AS
(
SELECT *, CASE WHEN AmtDollars = LAG(AmtDollars) OVER(PARTITION BY Office, DeliverySites, DeliverySite ORDER BY StartDate) THEN 0 ELSE 1 END IsStart
FROM DeliverySites
)
, Groups AS
(
SELECT Office, DeliverySite, AmtDollars, StartDate, EndDate, DeliverySites, SUM(IsStart) OVER(PARTITION BY Office, DeliverySites, DeliverySite ORDER BY StartDate ROWS UNBOUNDED PRECEDING) AS Grp
FROM Starts
)
SELECT Office, DeliverySite, AmtDollars, MIN(StartDate), MAX(EndDate)
FROM Groups
GROUP BY Office, DeliverySites, DeliverySite, AmtDollars, Grp
ORDER BY Office, MIN(StartDate), DeliverySite


Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
How to post data/code on a forum to get the best help.
How to Post Performance Problems
Make sure that you include code in the appropriate IFCode tags, e.g. [code=sql]<your code here>[/code]. You can find the IFCode tags under the INSERT options when you are writing a post.
Lynn Pettis
Lynn Pettis
SSC Guru
SSC Guru (310K reputation)SSC Guru (310K reputation)SSC Guru (310K reputation)SSC Guru (310K reputation)SSC Guru (310K reputation)SSC Guru (310K reputation)SSC Guru (310K reputation)SSC Guru (310K reputation)

Group: General Forum Members
Points: 310535 Visits: 41515
I am curious why in some cases an adjacent date range is merged but in others they aren't. What I mean is the enddate for one range is the day before the start of the next (range a enddate 2014-07-31 and range b start date 2014-08-01 for the same Office, DeliverySite, and AmtDollars.

Cool
Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
drew.allen
drew.allen
SSC Guru
SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)

Group: General Forum Members
Points: 50527 Visits: 14720
If you look at POID 1099563, it delivers to sites 2607 and 2608, but POID 1189651 delivers to sites 2607, 2608, and 3504. I treated those as a break even though there is no break if you just look at sites 2607 and 2608, because that was the only way that I could get my results to match the desired results

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
How to post data/code on a forum to get the best help.
How to Post Performance Problems
Make sure that you include code in the appropriate IFCode tags, e.g. [code=sql]<your code here>[/code]. You can find the IFCode tags under the INSERT options when you are writing a post.
Lynn Pettis
Lynn Pettis
SSC Guru
SSC Guru (310K reputation)SSC Guru (310K reputation)SSC Guru (310K reputation)SSC Guru (310K reputation)SSC Guru (310K reputation)SSC Guru (310K reputation)SSC Guru (310K reputation)SSC Guru (310K reputation)

Group: General Forum Members
Points: 310535 Visits: 41515
drew.allen - Friday, February 9, 2018 3:29 PM
If you look at POID 1099563, it delivers to sites 2607 and 2608, but POID 1189651 delivers to sites 2607, 2608, and 3504. I treated those as a break even though there is no break if you just look at sites 2607 and 2608, because that was the only way that I could get my results to match the desired results

Drew


That really isn't very intuitive. You must have spent some time on this or something pattern-wise just struck you as I didn't catch that.

Cool
Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Brandie Tarvin
Brandie Tarvin
SSC Guru
SSC Guru (126K reputation)SSC Guru (126K reputation)SSC Guru (126K reputation)SSC Guru (126K reputation)SSC Guru (126K reputation)SSC Guru (126K reputation)SSC Guru (126K reputation)SSC Guru (126K reputation)

Group: General Forum Members
Points: 126207 Visits: 9604
drew.allen - Friday, February 9, 2018 3:29 PM
If you look at POID 1099563, it delivers to sites 2607 and 2608, but POID 1189651 delivers to sites 2607, 2608, and 3504. I treated those as a break even though there is no break if you just look at sites 2607 and 2608, because that was the only way that I could get my results to match the desired results

Drew


That's exactly what I need to do. And yes, Lynn, it's not intuitive. In fact, I told our project manager that our data was incompatible with the requirements. But the requirements are the way it needs to be done because Reasons.

Thanks, Drew. I'll check your code out. I toyed with the idea of using STUFF() for about 5 minutes but couldn't see how to apply it, so went on to try nested cursors and nested while loops, etc. Maybe I should have actually tried STUFF() instead of just looked at it. @=)

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.
Brandie Tarvin
Brandie Tarvin
SSC Guru
SSC Guru (126K reputation)SSC Guru (126K reputation)SSC Guru (126K reputation)SSC Guru (126K reputation)SSC Guru (126K reputation)SSC Guru (126K reputation)SSC Guru (126K reputation)SSC Guru (126K reputation)

Group: General Forum Members
Points: 126207 Visits: 9604
Drew, thank you so very much. I'd never heard of LAG() or the ROWS UNBOUNDED PRECEDING stuff until now. I love this job because I learn new things every day.

The kicker to this specific project is that I have to check delivery sites for overlapping dates / amounts and error them out. Which would be my next hurdle if I hadn't gotten your help on the "Tying MAX() value" thread for the last project a couple of weeks ago. You just sorted out a whole bunch of problems with two posts. Thank you so very much.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.
Brandie Tarvin
Brandie Tarvin
SSC Guru
SSC Guru (126K reputation)SSC Guru (126K reputation)SSC Guru (126K reputation)SSC Guru (126K reputation)SSC Guru (126K reputation)SSC Guru (126K reputation)SSC Guru (126K reputation)SSC Guru (126K reputation)

Group: General Forum Members
Points: 126207 Visits: 9604
BAH. I thought I was through with this, but I just found an outlier.

When testing with the this data:

INSERT INTO #MyTemp (Office, POID, DeliverySite, AmtDollars, StartDate, EndDate, DelSite_Order, POID_Order)
VALUES ('KTY4568', 1071581, 2056, 50.00, '01/30/2012', '07/31/2013', 1, 1),
('KTY4568', 1071578, 5943, 50.00, '08/01/2013', '09/02/2013', 1, 2),
('KTY4568', 1092732, 2056, 50.00, '09/03/2013', '12/12/2013', 1, 3)


I get
KTY4568 2056 $50.00 01/30/2012 12/12/2013
KTY4568 5943 $50.00 08/01/2013 09/02/2013


When I should be getting:
KTY4568 2056 $50.00 01/30/2012 07/31/2013
KTY4568 5943 $50.00 08/01/2013 09/02/2013
KTY4568 2056 $50.00 09/03/2013 12/12/2013


Thoughts?

EDIT: I think I may have to do these ones manually. Fortunately, there only appears to be about 9 offices affected by this.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.
Lynn Pettis
Lynn Pettis
SSC Guru
SSC Guru (310K reputation)SSC Guru (310K reputation)SSC Guru (310K reputation)SSC Guru (310K reputation)SSC Guru (310K reputation)SSC Guru (310K reputation)SSC Guru (310K reputation)SSC Guru (310K reputation)

Group: General Forum Members
Points: 310535 Visits: 41515
Brandie Tarvin - Monday, February 12, 2018 11:46 AM
BAH. I thought I was through with this, but I just found an outlier.

When testing with the this data:

INSERT INTO #MyTemp (Office, POID, DeliverySite, AmtDollars, StartDate, EndDate, DelSite_Order, POID_Order)
VALUES ('KTY4568', 1071581, 2056, 50.00, '01/30/2012', '07/31/2013', 1, 1),
('KTY4568', 1071578, 5943, 50.00, '08/01/2013', '09/02/2013', 1, 2),
('KTY4568', 1092732, 2056, 50.00, '09/03/2013', '12/12/2013', 1, 3)


I get
KTY4568 2056 $50.00 01/30/2012 12/12/2013
KTY4568 5943 $50.00 08/01/2013 09/02/2013


When I should be getting:
KTY4568 2056 $50.00 01/30/2012 07/31/2013
KTY4568 5943 $50.00 08/01/2013 09/02/2013
KTY4568 2056 $50.00 09/03/2013 12/12/2013


Thoughts?


So this goes back to sometimes adjacent time spans aren't joined together. Is there something that tells you that or is it institutional knowledge that you have to code for in the query?

Cool
Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
drew.allen
drew.allen
SSC Guru
SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)

Group: General Forum Members
Points: 50527 Visits: 14720
Brandie Tarvin - Monday, February 12, 2018 11:46 AM
BAH. I thought I was through with this, but I just found an outlier.

When testing with the this data:

INSERT INTO #MyTemp (Office, POID, DeliverySite, AmtDollars, StartDate, EndDate, DelSite_Order, POID_Order)
VALUES ('KTY4568', 1071581, 2056, 50.00, '01/30/2012', '07/31/2013', 1, 1),
('KTY4568', 1071578, 5943, 50.00, '08/01/2013', '09/02/2013', 1, 2),
('KTY4568', 1092732, 2056, 50.00, '09/03/2013', '12/12/2013', 1, 3)


I get
KTY4568 2056 $50.00 01/30/2012 12/12/2013
KTY4568 5943 $50.00 08/01/2013 09/02/2013


When I should be getting:
KTY4568 2056 $50.00 01/30/2012 07/31/2013
KTY4568 5943 $50.00 08/01/2013 09/02/2013
KTY4568 2056 $50.00 09/03/2013 12/12/2013


Thoughts?

EDIT: I think I may have to do these ones manually. Fortunately, there only appears to be about 9 offices affected by this.

You just need to go back and adjust the CASE expression to make sure that the IsStart matches your expectations. You will probably need to add additional WHEN clauses. As Lynn says, it's still not clear what the exact criteria for packing intervals are.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
How to post data/code on a forum to get the best help.
How to Post Performance Problems
Make sure that you include code in the appropriate IFCode tags, e.g. [code=sql]<your code here>[/code]. You can find the IFCode tags under the INSERT options when you are writing a post.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search