Reducing common data into fewer rows.

  • I have a set of data, approximately 15,000 lines which contains information about products and their status by week. I would like to reduce the number of rows such that the date range is truly the beginning and ending dates for the item by units, place, status and flag. Any of these (units, place, status and flag) can change week by week, and so grouping by them and taking the max/min dates doesn't work.

    ItemCode Units Place Status StartDate EndDate Flag

    -------------------- ----------- ----- ----------- ----------------------- ----------------------- -----------

    B100 1 FFF 1 2009-04-04 00:00:00.000 2009-04-10 00:00:00.000 0

    B100 1 FFF 1 2009-04-11 00:00:00.000 2009-04-17 00:00:00.000 0

    B100 1 FFF 3 2008-06-28 00:00:00.000 2008-07-04 00:00:00.000 0

    B100 1 FFF 3 2008-07-05 00:00:00.000 2008-07-11 00:00:00.000 0

    B100 1 FFF 3 2008-07-12 00:00:00.000 2008-07-18 00:00:00.000 0

    B100 1 FFF 3 2008-07-19 00:00:00.000 2008-07-25 00:00:00.000 0

    B100 1 FFF 3 2008-07-26 00:00:00.000 2008-08-01 00:00:00.000 0

    In the sample set above I would want 2 rows returned.

    Row 1:

    ItemCode Units Place Status StartDate EndDate Flag

    -------------------- ----------- ----- ----------- ----------------------- ----------------------- -----------

    B100 1 FFF 1 2008-06-28 00:00:00.000 2008-08-01 00:00:00.000 0

    Row2:

    ItemCode Units Place Status StartDate EndDate Flag

    -------------------- ----------- ----- ----------- ----------------------- ----------------------- -----------

    B100 1 FFF 3 2009-04-17 00:00:00.000 2009-04-17 00:00:00.000 0

    Any ideas? Thanks in advance.

  • This isn't particularly efficient, but should work okay

    WITH LBounds AS(

    SELECT s1.ItemCode,s1.Units,s1.Place,s1.Status,s1.StartDate,s1.Flag

    FROM mytable s1

    WHERE NOT EXISTS(SELECT * FROM mytable s2

    WHERE s1.ItemCode=s2.ItemCode

    AND s1.Units=s2.Units

    AND s1.Place=s2.Place

    AND s1.Status=s2.Status

    AND s1.Flag=s2.Flag

    AND s1.StartDate > s2.StartDate AND s1.StartDate <= s2.EndDate+1)

    ),

    UBounds AS (

    SELECT t1.ItemCode,t1.Units,t1.Place,t1.Status,t1.EndDate,t1.Flag

    FROM mytable t1

    WHERE NOT EXISTS(SELECT * FROM mytable t2

    WHERE t1.ItemCode=t2.ItemCode

    AND t1.Units=t2.Units

    AND t1.Place=t2.Place

    AND t1.Status=t2.Status

    AND t1.Flag=t2.Flag

    AND t1.EndDate >= t2.StartDate-1 AND t1.EndDate < t2.EndDate)

    )

    SELECT s1.ItemCode,s1.Units,s1.Place,s1.Status,s1.StartDate,MIN(t1.EndDate) AS EndDate,s1.Flag

    FROM LBounds s1

    INNER JOIN UBounds t1 ON t1.ItemCode=s1.ItemCode

    AND t1.Units=s1.Units

    AND t1.Place=s1.Place

    AND t1.Status=s1.Status

    AND t1.Flag=s1.Flag

    AND s1.StartDate<t1.EndDate

    GROUP BY s1.ItemCode,s1.Units,s1.Place,s1.Status,s1.StartDate,s1.Flag

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Provided that there aren't any gaps in each group, then this will work:

    DROP TABLE #ProductStatus

    CREATE TABLE #ProductStatus (ItemCode varchar(20), Units int, Place varchar(5), Status int, StartDate datetime, EndDate datetime, Flag INT)

    INSERT INTO #ProductStatus (ItemCode, Units, Place, Status, StartDate, EndDate, Flag)

    SELECT 'B100', 1, 'FFF', 1, '2009-04-04 00:00:00.000', '2009-04-10 00:00:00.000', 0 UNION ALL

    SELECT 'B100', 1, 'FFF', 1, '2009-04-11 00:00:00.000', '2009-04-17 00:00:00.000', 0 UNION ALL

    SELECT 'B100', 1, 'FFF', 3, '2008-06-28 00:00:00.000', '2008-07-04 00:00:00.000', 0 UNION ALL

    SELECT 'B100', 1, 'FFF', 3, '2008-07-05 00:00:00.000', '2008-07-11 00:00:00.000', 0 UNION ALL

    SELECT 'B100', 1, 'FFF', 3, '2008-07-12 00:00:00.000', '2008-07-18 00:00:00.000', 0 UNION ALL

    SELECT 'B100', 1, 'FFF', 3, '2008-07-19 00:00:00.000', '2008-07-25 00:00:00.000', 0 UNION ALL

    SELECT 'B100', 1, 'FFF', 3, '2008-07-26 00:00:00.000', '2008-08-01 00:00:00.000', 0

    SELECT a.ItemCode, a.Units, a.Place, a.Status, MIN(a.StartDate) AS StartDate, MAX(b.EndDate) AS EndDate, a.Flag

    FROM #ProductStatus a

    LEFT JOIN #ProductStatus b ON b.ItemCode = a.ItemCode

    AND b.Status = a.Status

    AND b.StartDate = a.EndDate+1

    GROUP BY a.ItemCode, a.Units, a.Place, a.Status, a.Flag

    Oops Mark beat me to it.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thank you both for your assistance. It appears that Mark's solution works well (and better than the loop solution I had attempted). Chris - your solution worked exactly right for the data I had provided, however, on the full data set there are instances where the item only has a particular status for a single week. In these cases the end date is returned as null.

  • ssetzer (1/27/2009)


    Thank you both for your assistance. It appears that Mark's solution works well (and better than the loop solution I had attempted). Chris - your solution worked exactly right for the data I had provided, however, on the full data set there are instances where the item only has a particular status for a single week. In these cases the end date is returned as null.

    No problem...

    DROP TABLE #ProductStatus

    CREATE TABLE #ProductStatus (ItemCode varchar(20), Units int, Place varchar(5), Status int, StartDate datetime, EndDate datetime, Flag INT)

    INSERT INTO #ProductStatus (ItemCode, Units, Place, Status, StartDate, EndDate, Flag)

    SELECT 'B100', 1, 'FFF', 1, '2009-04-04 00:00:00.000', '2009-04-10 00:00:00.000', 0 UNION ALL

    SELECT 'B100', 1, 'FFF', 1, '2009-04-11 00:00:00.000', '2009-04-17 00:00:00.000', 0 UNION ALL

    SELECT 'B100', 1, 'FFF', 2, '2008-06-28 00:00:00.000', '2008-07-04 00:00:00.000', 0 UNION ALL

    SELECT 'B100', 1, 'FFF', 3, '2008-07-05 00:00:00.000', '2008-07-11 00:00:00.000', 0 UNION ALL

    SELECT 'B100', 1, 'FFF', 3, '2008-07-12 00:00:00.000', '2008-07-18 00:00:00.000', 0 UNION ALL

    SELECT 'B100', 1, 'FFF', 3, '2008-07-19 00:00:00.000', '2008-07-25 00:00:00.000', 0 UNION ALL

    SELECT 'B100', 1, 'FFF', 3, '2008-07-26 00:00:00.000', '2008-08-01 00:00:00.000', 0

    SELECT a.ItemCode, a.Units, a.Place, a.Status,

    MIN(a.StartDate) AS StartDate,

    ISNULL(MAX(b.EndDate), MIN(a.EndDate)) AS EndDate,

    a.Flag

    FROM #ProductStatus a

    LEFT JOIN #ProductStatus b ON b.ItemCode = a.ItemCode

    AND b.Status = a.Status

    AND b.StartDate = a.EndDate+1

    GROUP BY a.ItemCode, a.Units, a.Place, a.Status, a.Flag

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hmm, so a given item can't go to a different status *and back* on sequential weeks:

    SELECT 'B100', 1, 'FFF', 1, '2009-04-04 00:00:00.000', '2009-04-10 00:00:00.000', 0 UNION ALL

    SELECT 'B100', 1, 'FFF', 1, '2009-04-11 00:00:00.000', '2009-04-17 00:00:00.000', 0 UNION ALL

    SELECT 'B100', 1, 'FFF', 3, '2008-06-28 00:00:00.000', '2008-07-04 00:00:00.000', 0 UNION ALL

    SELECT 'B100', 1, 'FFF', 3, '2008-07-05 00:00:00.000', '2008-07-11 00:00:00.000', 0 UNION ALL

    SELECT 'B100', 1, 'FFF', 2, '2008-07-12 00:00:00.000', '2008-07-18 00:00:00.000', 0 UNION ALL --<<--

    SELECT 'B100', 1, 'FFF', 3, '2008-07-19 00:00:00.000', '2008-07-25 00:00:00.000', 0 UNION ALL

    SELECT 'B100', 1, 'FFF', 3, '2008-07-26 00:00:00.000', '2008-08-01 00:00:00.000', 0

    ItemCode Units Place Status StartDate EndDate Flag

    -------------------- ----------- ----- ----------- ------------------------------------------------------ ------------------------------------------------------ -----------

    B100 1 FFF 1 2009-04-04 00:00:00.000 2009-04-17 00:00:00.000 0

    B100 1 FFF 2 2008-07-12 00:00:00.000 NULL 0

    B100 1 FFF 3 2008-06-28 00:00:00.000 2008-08-01 00:00:00.000 0

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • That scenario was the limitation with Chris's original reply (triggered by my limited data sample). It seems that Mark's application of CTE's and joining on all non-date columns did the trick. I haven't run into any issues as I've QAed the data. BTW the effect of the reduction is 15000 rows before = 1000 rows after.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply