Is this query possible?

  • Hi, I work for a large transit agency and would like to know if this query is possible (in order to make a certain report for managers).

    If I start out with this table:

    CREATE TABLE Test1

    (

    interval int NULL,

    interval_prime int NULL,

    cnt int NULL

    )

    with this data in it:

    insert into test1 (interval, interval_prime, cnt)

    values (1, 1, 1)

    insert into test1 (interval, interval_prime, cnt)

    values (2, null, 1)

    insert into test1 (interval, interval_prime, cnt)

    values (3, null, 1)

    insert into test1 (interval, interval_prime, cnt)

    values (4, 2, 1)

    insert into test1 (interval, interval_prime, cnt)

    values (5, null, 1)

    insert into test1 (interval, interval_prime, cnt)

    values (6, 5, 1)

    insert into test1 (interval, interval_prime, cnt)

    values (7, null, 1)

    insert into test1 (interval, interval_prime, cnt)

    values (8, null, 1)

    insert into test1 (interval, interval_prime, cnt)

    values (9, 7, 1)

    insert into test1 (interval, interval_prime, cnt)

    values (10, 10, 1)

    insert into test1 (interval, interval_prime, cnt)

    values (11, 11, 1)

    insert into test1 (interval, interval_prime, cnt)

    values (12, null, 1)

    insert into test1 (interval, interval_prime, cnt)

    values (13, null, 1)

    insert into test1 (interval, interval_prime, cnt)

    values (14, null, 1)

    insert into test1 (interval, interval_prime, cnt)

    values (15, null, 1)

    insert into test1 (interval, interval_prime, cnt)

    values (16, null, 1)

    insert into test1 (interval, interval_prime, cnt)

    values (17, 12, 1)

    insert into test1 (interval, interval_prime, cnt)

    values (18, 18, 1)

    and I'd like to write a query that returns this table:

    CREATE TABLE Test2

    (

    interval int NULL,

    cnt int NULL

    )

    with this data in it:

    insert into test2 (interval, cnt)

    values (1, 1)

    insert into test2 (interval, cnt)

    values (2, null)

    insert into test2 (interval, cnt)

    values (3, null)

    insert into test2 (interval, cnt)

    values (4, 3)

    insert into test2 (interval, cnt)

    values (5, null)

    insert into test2 (interval, cnt)

    values (6, 2)

    insert into test2 (interval, cnt)

    values (7, null)

    insert into test2 (interval, cnt)

    values (8, null)

    insert into test2 (interval, cnt)

    values (9, 3)

    insert into test2 (interval, cnt)

    values (10, 1)

    insert into test2 (interval, cnt)

    values (11, 1)

    insert into test2 (interval, cnt)

    values (12, null)

    insert into test2 (interval, cnt)

    values (13, null)

    insert into test2 (interval, cnt)

    values (14, null)

    insert into test2 (interval, cnt)

    values (15, null)

    insert into test2 (interval, cnt)

    values (16, null)

    insert into test2 (interval, cnt)

    values (17, 6)

    insert into test2 (interval, cnt)

    values (18, 1)

    The rule is that the interval list is preserved, intervals which have a null interval_prime stay as Zero or Null, but for intervals that have a value in interval_prime, the data is grouped (from interval_prime to interval).

    I hope this is not written in a confusing manner!!

    Thanks for any help you can give,

    Martin

  • The following works in SQL 2012 & 2014, but I'm not certain about 2008 (or R2). Unfortunately, I don't have 2008 to test with.

    (So... Sorry in advance if it doesn't work in 2008.)

    IF OBJECT_ID('tempdb..#Test1') IS NOT NULL

    DROP TABLE #Test1;

    CREATE TABLE #Test1

    (

    interval int NULL,

    interval_prime int NULL,

    cnt int NULL

    );

    IF OBJECT_ID('tempdb..#Test2') IS NOT NULL

    DROP TABLE #Test2;

    CREATE TABLE #Test2 (

    interval int NULL,

    cnt int NULL

    );

    INSERT #Test1 (interval,interval_prime,cnt) VALUES

    (1, 1, 1), (2, null, 1), (3, null, 1), (4, 1, 1), (5, null, 1), (6, 4, 1),

    (7, null, 1), (8, null, 1), (9, 6, 1), (10, 10, 1), (11, 11, 1), (12, null, 1),

    (13, null, 1), (14, null, 1), (15, null, 1), (16, null, 1), (17, 12, 1), (18, 18, 1);

    insert into #Test2 (interval, cnt) VALUES

    (1, 1), (2, null), (3, null), (4, 3), (5, null), (6, 2), (7, null), (8, null), (9, 3),

    (10, 1), (11, 1), (12, null), (13, null), (14, null), (15, null), (16, null), (17, 6), (18, 1);

    --SELECT * FROM #Test1 t1;

    --SELECT * FROM #Test2 t2;

    WITH IntervalGroup AS (

    SELECT

    CAST(SUBSTRING(MIN(CAST(t1.interval AS BINARY(4)) + CAST(t1.interval_prime AS BINARY(4))) OVER (ORDER BY t1.interval DESC), 1, 4) AS INT) AS IntervalGroup,

    t1.cnt

    FROM

    #Test1 t1

    ), IntervalRollup AS (

    SELECT

    ig.IntervalGroup,

    SUM(ig.cnt) AS InvervalGroupCount

    FROM

    IntervalGroup ig

    GROUP BY

    ig.IntervalGroup

    )

    SELECT

    t1.interval,

    t1.interval_prime,

    t1.cnt,

    ir.InvervalGroupCount

    FROM

    #Test1 t1

    LEFT JOIN IntervalRollup ir

    ON t1.interval = ir.IntervalGroup;

    Here are the results... (tested using 2014)

    interval interval_prime cnt InvervalGroupCount

    ----------- -------------- ----------- ------------------

    1 1 1 1

    2 NULL 1 NULL

    3 NULL 1 NULL

    4 1 1 3

    5 NULL 1 NULL

    6 4 1 2

    7 NULL 1 NULL

    8 NULL 1 NULL

    9 6 1 3

    10 10 1 1

    11 11 1 1

    12 NULL 1 NULL

    13 NULL 1 NULL

    14 NULL 1 NULL

    15 NULL 1 NULL

    16 NULL 1 NULL

    17 12 1 6

    18 18 1 1

    HTH,

    Jason

  • For a 2008-friendly version (as Jason indicated, his won't work prior to 2012, because ORDER BY was not supported for aggregate window functions yet; it only worked for ranking window functions):

    SELECT OuterReference.Interval,

    cnt=SUM(InnerReference.cnt)

    FROM test1 OuterReference LEFT JOIN test1 InnerReference

    ON InnerReference.interval BETWEEN OuterReference.interval_prime AND OuterReference.interval

    GROUP BY OuterReference.interval

    Cheers!

    EDIT: I added the parenthetical explaining why Jason correctly suspected his required 2012+.

  • Jacob Wilkins (8/19/2015)


    For a 2008-friendly version (as Jason indicated, his won't work prior to 2012, because ORDER BY was not supported for aggregate window functions yet; it only worked for ranking window functions):

    SELECT OuterReference.Interval,

    cnt=SUM(InnerReference.cnt)

    FROM test1 OuterReference LEFT JOIN test1 InnerReference

    ON InnerReference.interval BETWEEN OuterReference.interval_prime AND OuterReference.interval

    GROUP BY OuterReference.interval

    Cheers!

    EDIT: I added the parenthetical explaining why Jason correctly suspected his required 2012+.

    Very nice solution! I didn't even notice the relationship between interval & interval_prime.

    I did notice 2 things...

    1) Now that I see the pattern, the interval_prime value on row 17 appears to be set incorrectly. If it's following the pattern for the rest of the data, it should be 11 not 12.

    2) The use of the BETWEEN is causing an over count on anything greater than 1. The following should be a simple fix...

    CASE WHEN SUM(InnerReference.cnt) = 1 THEN 1 ELSE SUM(InnerReference.cnt) -1 END

  • Hmmm...I'm not sure I follow. The results of my original query exactly match the contents of test2, his desired results, with his sample data.

    It looks like your sample data changed some of the interval_prime values from his without correctly changing the corresponding rows in the desired output table (specifically for intervals 4,6, and 9).

    I may be missing something, but I'm not sure what yet 🙂

    Cheers!

  • Jacob Wilkins (8/19/2015)


    Hmmm...I'm not sure I follow. The results of my original query exactly match the contents of test2, his desired results, with his sample data.

    It looks like your sample data changed some of the interval_prime values from his without correctly changing the corresponding rows in the desired output table (specifically for intervals 4,6, and 9).

    I may be missing something, but I'm not sure what yet 🙂

    Cheers!

    I think we have a shiesty OP who edited the data after I grabbed my copy (and before you grabbed yours).

  • Jason A. Long (8/19/2015)


    Jacob Wilkins (8/19/2015)


    Hmmm...I'm not sure I follow. The results of my original query exactly match the contents of test2, his desired results, with his sample data.

    It looks like your sample data changed some of the interval_prime values from his without correctly changing the corresponding rows in the desired output table (specifically for intervals 4,6, and 9).

    I may be missing something, but I'm not sure what yet 🙂

    Cheers!

    I think we have a shiesty OP who edited the data after I grabbed my copy (and before you grabbed yours).

    Ah, excellent point! I just noticed that the original post was edited. I did think it was weird that you would change the values for the sample data. This explanation makes a lot more sense 🙂

  • I apologize for the sheistiness... I did indeed edit the OP because I had a couple of the values incorrect. The [interval_prime field] should indicate the first interval of the group, while [interval] field should indicate the last interval of the group. So the way I have it now is the way the data sits.

    Also, I wasn't able to run Jason's answer successfully. My sample data is in SQL server, but the actual query is being written in Oracle. I'm going to try Jacob's answer, and if it works I'll adapt it for oracle.

    Thanks for the help so far!

    (I guess it would help if I had my example correct from the beginning...)

  • No worries. 🙂

    In any case Jacob's solution is what you're looking for. It should work in pretty much any RDBMS.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply