January 27, 2009 at 1:45 am
I have a set of data, approximately 15,000 lines which contains information about products and their status by week. I would like to reduce the number of rows such that the date range is truly the beginning and ending dates for the item by units, place, status and flag. Any of these (units, place, status and flag) can change week by week, and so grouping by them and taking the max/min dates doesn't work.
ItemCode Units Place Status StartDate EndDate Flag
-------------------- ----------- ----- ----------- ----------------------- ----------------------- -----------
B100 1 FFF 1 2009-04-04 00:00:00.000 2009-04-10 00:00:00.000 0
B100 1 FFF 1 2009-04-11 00:00:00.000 2009-04-17 00:00:00.000 0
B100 1 FFF 3 2008-06-28 00:00:00.000 2008-07-04 00:00:00.000 0
B100 1 FFF 3 2008-07-05 00:00:00.000 2008-07-11 00:00:00.000 0
B100 1 FFF 3 2008-07-12 00:00:00.000 2008-07-18 00:00:00.000 0
B100 1 FFF 3 2008-07-19 00:00:00.000 2008-07-25 00:00:00.000 0
B100 1 FFF 3 2008-07-26 00:00:00.000 2008-08-01 00:00:00.000 0
In the sample set above I would want 2 rows returned.
Row 1:
ItemCode Units Place Status StartDate EndDate Flag
-------------------- ----------- ----- ----------- ----------------------- ----------------------- -----------
B100 1 FFF 1 2008-06-28 00:00:00.000 2008-08-01 00:00:00.000 0
Row2:
ItemCode Units Place Status StartDate EndDate Flag
-------------------- ----------- ----- ----------- ----------------------- ----------------------- -----------
B100 1 FFF 3 2009-04-17 00:00:00.000 2009-04-17 00:00:00.000 0
Any ideas? Thanks in advance.
January 27, 2009 at 3:04 am
This isn't particularly efficient, but should work okay
WITH LBounds AS(
SELECT s1.ItemCode,s1.Units,s1.Place,s1.Status,s1.StartDate,s1.Flag
FROM mytable s1
WHERE NOT EXISTS(SELECT * FROM mytable s2
WHERE s1.ItemCode=s2.ItemCode
AND s1.Units=s2.Units
AND s1.Place=s2.Place
AND s1.Status=s2.Status
AND s1.Flag=s2.Flag
AND s1.StartDate > s2.StartDate AND s1.StartDate <= s2.EndDate+1)
),
UBounds AS (
SELECT t1.ItemCode,t1.Units,t1.Place,t1.Status,t1.EndDate,t1.Flag
FROM mytable t1
WHERE NOT EXISTS(SELECT * FROM mytable t2
WHERE t1.ItemCode=t2.ItemCode
AND t1.Units=t2.Units
AND t1.Place=t2.Place
AND t1.Status=t2.Status
AND t1.Flag=t2.Flag
AND t1.EndDate >= t2.StartDate-1 AND t1.EndDate < t2.EndDate)
)
SELECT s1.ItemCode,s1.Units,s1.Place,s1.Status,s1.StartDate,MIN(t1.EndDate) AS EndDate,s1.Flag
FROM LBounds s1
INNER JOIN UBounds t1 ON t1.ItemCode=s1.ItemCode
AND t1.Units=s1.Units
AND t1.Place=s1.Place
AND t1.Status=s1.Status
AND t1.Flag=s1.Flag
AND s1.StartDate<t1.EndDate
GROUP BY s1.ItemCode,s1.Units,s1.Place,s1.Status,s1.StartDate,s1.Flag
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537January 27, 2009 at 3:09 am
Provided that there aren't any gaps in each group, then this will work:
DROP TABLE #ProductStatus
CREATE TABLE #ProductStatus (ItemCode varchar(20), Units int, Place varchar(5), Status int, StartDate datetime, EndDate datetime, Flag INT)
INSERT INTO #ProductStatus (ItemCode, Units, Place, Status, StartDate, EndDate, Flag)
SELECT 'B100', 1, 'FFF', 1, '2009-04-04 00:00:00.000', '2009-04-10 00:00:00.000', 0 UNION ALL
SELECT 'B100', 1, 'FFF', 1, '2009-04-11 00:00:00.000', '2009-04-17 00:00:00.000', 0 UNION ALL
SELECT 'B100', 1, 'FFF', 3, '2008-06-28 00:00:00.000', '2008-07-04 00:00:00.000', 0 UNION ALL
SELECT 'B100', 1, 'FFF', 3, '2008-07-05 00:00:00.000', '2008-07-11 00:00:00.000', 0 UNION ALL
SELECT 'B100', 1, 'FFF', 3, '2008-07-12 00:00:00.000', '2008-07-18 00:00:00.000', 0 UNION ALL
SELECT 'B100', 1, 'FFF', 3, '2008-07-19 00:00:00.000', '2008-07-25 00:00:00.000', 0 UNION ALL
SELECT 'B100', 1, 'FFF', 3, '2008-07-26 00:00:00.000', '2008-08-01 00:00:00.000', 0
SELECT a.ItemCode, a.Units, a.Place, a.Status, MIN(a.StartDate) AS StartDate, MAX(b.EndDate) AS EndDate, a.Flag
FROM #ProductStatus a
LEFT JOIN #ProductStatus b ON b.ItemCode = a.ItemCode
AND b.Status = a.Status
AND b.StartDate = a.EndDate+1
GROUP BY a.ItemCode, a.Units, a.Place, a.Status, a.Flag
Oops Mark beat me to it.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 27, 2009 at 8:45 am
Thank you both for your assistance. It appears that Mark's solution works well (and better than the loop solution I had attempted). Chris - your solution worked exactly right for the data I had provided, however, on the full data set there are instances where the item only has a particular status for a single week. In these cases the end date is returned as null.
January 27, 2009 at 8:52 am
ssetzer (1/27/2009)
Thank you both for your assistance. It appears that Mark's solution works well (and better than the loop solution I had attempted). Chris - your solution worked exactly right for the data I had provided, however, on the full data set there are instances where the item only has a particular status for a single week. In these cases the end date is returned as null.
No problem...
DROP TABLE #ProductStatus
CREATE TABLE #ProductStatus (ItemCode varchar(20), Units int, Place varchar(5), Status int, StartDate datetime, EndDate datetime, Flag INT)
INSERT INTO #ProductStatus (ItemCode, Units, Place, Status, StartDate, EndDate, Flag)
SELECT 'B100', 1, 'FFF', 1, '2009-04-04 00:00:00.000', '2009-04-10 00:00:00.000', 0 UNION ALL
SELECT 'B100', 1, 'FFF', 1, '2009-04-11 00:00:00.000', '2009-04-17 00:00:00.000', 0 UNION ALL
SELECT 'B100', 1, 'FFF', 2, '2008-06-28 00:00:00.000', '2008-07-04 00:00:00.000', 0 UNION ALL
SELECT 'B100', 1, 'FFF', 3, '2008-07-05 00:00:00.000', '2008-07-11 00:00:00.000', 0 UNION ALL
SELECT 'B100', 1, 'FFF', 3, '2008-07-12 00:00:00.000', '2008-07-18 00:00:00.000', 0 UNION ALL
SELECT 'B100', 1, 'FFF', 3, '2008-07-19 00:00:00.000', '2008-07-25 00:00:00.000', 0 UNION ALL
SELECT 'B100', 1, 'FFF', 3, '2008-07-26 00:00:00.000', '2008-08-01 00:00:00.000', 0
SELECT a.ItemCode, a.Units, a.Place, a.Status,
MIN(a.StartDate) AS StartDate,
ISNULL(MAX(b.EndDate), MIN(a.EndDate)) AS EndDate,
a.Flag
FROM #ProductStatus a
LEFT JOIN #ProductStatus b ON b.ItemCode = a.ItemCode
AND b.Status = a.Status
AND b.StartDate = a.EndDate+1
GROUP BY a.ItemCode, a.Units, a.Place, a.Status, a.Flag
Cheers
ChrisM
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 27, 2009 at 3:53 pm
Hmm, so a given item can't go to a different status *and back* on sequential weeks:
SELECT 'B100', 1, 'FFF', 1, '2009-04-04 00:00:00.000', '2009-04-10 00:00:00.000', 0 UNION ALL
SELECT 'B100', 1, 'FFF', 1, '2009-04-11 00:00:00.000', '2009-04-17 00:00:00.000', 0 UNION ALL
SELECT 'B100', 1, 'FFF', 3, '2008-06-28 00:00:00.000', '2008-07-04 00:00:00.000', 0 UNION ALL
SELECT 'B100', 1, 'FFF', 3, '2008-07-05 00:00:00.000', '2008-07-11 00:00:00.000', 0 UNION ALL
SELECT 'B100', 1, 'FFF', 2, '2008-07-12 00:00:00.000', '2008-07-18 00:00:00.000', 0 UNION ALL --<<--
SELECT 'B100', 1, 'FFF', 3, '2008-07-19 00:00:00.000', '2008-07-25 00:00:00.000', 0 UNION ALL
SELECT 'B100', 1, 'FFF', 3, '2008-07-26 00:00:00.000', '2008-08-01 00:00:00.000', 0
ItemCode Units Place Status StartDate EndDate Flag
-------------------- ----------- ----- ----------- ------------------------------------------------------ ------------------------------------------------------ -----------
B100 1 FFF 1 2009-04-04 00:00:00.000 2009-04-17 00:00:00.000 0
B100 1 FFF 2 2008-07-12 00:00:00.000 NULL 0
B100 1 FFF 3 2008-06-28 00:00:00.000 2008-08-01 00:00:00.000 0
SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.
January 27, 2009 at 3:58 pm
That scenario was the limitation with Chris's original reply (triggered by my limited data sample). It seems that Mark's application of CTE's and joining on all non-date columns did the trick. I haven't run into any issues as I've QAed the data. BTW the effect of the reduction is 15000 rows before = 1000 rows after.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply