Sub-Grouping by consecutive rows

  • I have a dataset like the one below. I would like to group by Col be and cola. But by the sequence of col A. See my desired output below. Thank you in advance.

    Starting dataset -

    ABTo Sum

    1AL2

    1Al3

    1AL4

    23NY5

    1Al6

    34NY7

    34NY8

    32CA8

    42NY9

    1Al10

    1Al11

    1Al12

    Desired output

    ABTo Sum

    1AL9

    23NY5

    1Al6

    34NY15

    32CA8

    42NY9

    1Al33

  • I've not tested it (no readily consumable data in post... see first link in my signature line for how to do that and I strongly recommend you do that before your next post) but this is simple, provided that you understand that there's no way to enforce the output order that you posted.

    SELECT A,B,ToSum = SUM(ToSum)

    FROM dbo.YourTable

    GROUP BY A,B

    ;

    You can certainly add an ORDER BY after the GROUP BY but there's nothing in the data that will preserve the original order of data to get the same order as your desired output.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hello, Thanks for your reply.

    This will group all the rows where A = 1 and B = Al to one row.

    I only want to sum those up where they are consecutive in the data set.

  • mishka-723908 (1/22/2016)


    Hello, Thanks for your reply.

    This will group all the rows where A = 1 and B = Al to one row.

    I only want to sum those up where they are consecutive in the data set.

    I suggest you read Jeff's entire post again, then follow his advice about how to ask your question so that people can help.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • I tried to paste the result to make it looks nice, but could not.

    Here is the code for the dataset in question. Thanks for your help.

    CREATE TABLE #testa

    (

    A INTEGER,

    B VARCHAR(2),

    ToSum INTEGER

    )

    INSERT INTO #testa

    ( A, B, ToSum )

    select 1, 'AL', 2 UNION ALL

    SELECT 1, 'AL', 3 UNION ALL

    SELECT 1, 'AL', 4 UNION ALL

    SELECT 23, 'NY', 5 UNION ALL

    SELECT 1, 'AL', 6 UNION ALL

    SELECT 34, 'NY', 7 UNION ALL

    SELECT 34, 'NY', 8 UNION ALL

    SELECT 32, 'CA', 8 UNION ALL

    SELECT 42, 'NY', 9 UNION ALL

    SELECT 1, 'AL', 2 UNION ALL

    SELECT 1, 'AL', 2 UNION ALL

    SELECT 1, 'AL', 2

    SELECT * FROM #testa

    I would like the result to look like -

    ABTo Sum

    1AL9

    23NY5

    1Al6

    34NY15

    32CA8

    42NY9

    1Al33

  • I think this what you're after... Please note that this much easier on later editions of SQL server using the LAG/LEAD functions...

    Some consumable test data...

    IF OBJECT_ID('tempdb..#temp', 'U') IS NOT NULL

    DROP TABLE #temp;

    CREATE TABLE #temp (

    SeqNum INT IDENTITY (1,1) NOT NULL PRIMARY KEY CLUSTERED, -- adding an indetity column to estabolist an insertion order

    -- and a clustered primary key. A properly order clustered index is going to be essential for the next step.

    ColA INT NOT NULL,

    ColB CHAR(2) NOT NULL,

    ToSum INT NOT NULL,

    SeqGroup INT NULL -- This value will be filled in to estabolish "sequense groups"

    );

    INSERT #temp (ColA, ColB, ToSum) VALUES

    (1,'AL',2),

    (1,'AL',3),

    (1,'AL',4),

    (23,'NY',5),

    (1,'AL',6),

    (34,'NY',7),

    (34,'NY',8),

    (32,'CA',8),

    (42,'NY',9),

    (1,'AL',10),

    (1,'AL',11),

    (1,'AL',12);

    The actual solution...

    -- Use the "Quirky Method" to populate the SeqGroup column.

    DECLARE

    @ColA INT,

    @ColB CHAR(2),

    @SeqNum INT,

    @SeqGroup INT;

    UPDATE t SET

    @SeqGroup = t.SeqGroup = CASE WHEN @ColA <> t.ColA AND @ColB <> t.ColB THEN t.SeqNum ELSE ISNULL(@SeqGroup, t.SeqNum) END,

    @ColA = t.ColA,

    @ColB = t.ColB,

    @SeqNum = t.SeqNum

    FROM

    #temp t WITH (TABLOCKX) -- not really necessary w/ a temp table but you do need a 100% guarantee that you are the only one accessing the table at this time.

    OPTION(MAXDOP 1); -- necessary to prevent a parallel execution plan.

    -- The final select

    SELECT

    t.ColA,

    t.ColB,

    ToSum = SUM(t.ToSum)

    FROM

    #temp t

    GROUP BY

    t.SeqGroup,

    t.ColA,

    t.ColB;

    The output...

    ColA ColB ToSum

    ----------- ---- -----------

    1 AL 9

    23 NY 5

    1 AL 6

    34 NY 15

    32 CA 8

    42 NY 9

    1 AL 33

    Side note... Jeff will be back shortly to show a better quirky update... I always manage to mangle it to some degree... 😛

  • mishka-723908 (1/22/2016)


    Hello, Thanks for your reply.

    This will group all the rows where A = 1 and B = Al to one row.

    I only want to sum those up where they are consecutive in the data set.

    One of the "problems" with data in a table is that you simply can't rely on the "natural order". There has to be something in the table that enforces the order. What you're asking for may look like it will work but without a column to enforce the correct sequence will break sooner than later and produce the wrong answers.

    No solution is possible for what you ask unless such a column exists in the table. Anything that anyone posts that looks like it will work will be totally unreliable without it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jason A. Long (1/22/2016)


    I think this what you're after... Please note that this much easier on later editions of SQL server using the LAG/LEAD functions...

    Some consumable test data...

    IF OBJECT_ID('tempdb..#temp', 'U') IS NOT NULL

    DROP TABLE #temp;

    CREATE TABLE #temp (

    SeqNum INT IDENTITY (1,1) NOT NULL PRIMARY KEY CLUSTERED, -- adding an indetity column to estabolist an insertion order

    -- and a clustered primary key. Aproperly order clustered index is going to be important.

    ColA INT NOT NULL,

    ColB CHAR(2) NOT NULL,

    ToSum INT NOT NULL,

    SeqGroup INT NULL -- This value will be filled in to estabolish "sequense groups"

    );

    INSERT #temp (ColA, ColB, ToSum) VALUES

    (1,'AL',2),

    (1,'AL',3),

    (1,'AL',4),

    (23,'NY',5),

    (1,'AL',6),

    (34,'NY',7),

    (34,'NY',8),

    (32,'CA',8),

    (42,'NY',9),

    (1,'AL',10),

    (1,'AL',11),

    (1,'AL',12);

    The actual solution...

    -- Use the "Quirky Method" to populate the SeqGroup column.

    DECLARE

    @ColA INT,

    @ColB CHAR(2),

    @SeqNum INT,

    @SeqGroup INT;

    UPDATE t SET

    @SeqGroup = t.SeqGroup = CASE WHEN @ColA <> t.ColA AND @ColB <> t.ColB THEN t.SeqNum ELSE ISNULL(@SeqGroup, t.SeqNum) END,

    @ColA = t.ColA,

    @ColB = t.ColB,

    @SeqNum = t.SeqNum

    FROM

    #temp t WITH (TABLOCKX) -- not really necessary w/ a temp table but you do need a 100% guarantee that you are the only one accessing the table at this time.

    OPTION(MAXDOP 1); -- necessary to prevent a parallel execution plan.

    -- The final select

    SELECT

    t.ColA,

    t.ColB,

    ToSum = SUM(t.ToSum)

    FROM

    #temp t

    GROUP BY

    t.SeqGroup,

    t.ColA,

    t.ColB;

    The output...

    ColA ColB ToSum

    ----------- ---- -----------

    1 AL 9

    23 NY 5

    1 AL 6

    34 NY 15

    32 CA 8

    42 NY 9

    1 AL 33

    Side note... Jeff will be back shortly to show a better quirky update... I always manage to mangle it to some degree... 😛

    Nope. won't be demonstrating a Quirky Update for this because there is nothing to enforce the desired order. This problem has no solution without such a column to enforce the order.

    There must be a disturbance in the force because this is the second such request for an impossible solution due to unenforceable ordered data in two days.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hello, Thank you for the reply. It did work. This is the way I get the data in the file from a source system and need to properly insert it. There will be a column to keep the table properly ordered.

    Thank you all for your help.

  • mishka-723908 (1/22/2016)


    Hello, Thank you for the reply. It did work. This is the way I get the data in the file from a source system and need to properly insert it. There will be a column to keep the table properly ordered.

    Thank you all for your help.

    What is that column because that will change everything. Jason's code will need to be changed to take advantage of that column.

    It you want the Quirky Update solution, I need for you to repost the readily consumable data including that column because we can't do the Quirky Update without it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • mishka-723908 (1/22/2016)


    Hello, Thank you for the reply. It did work. This is the way I get the data in the file from a source system and need to properly insert it. There will be a column to keep the table properly ordered.

    Thank you all for your help.

    Mishka - Before you scamper off and attempt to adapt this to your own data, heed Jeff's request for consumable test data that includes your actual ordering column.

    The "Quirky Method" is more than capable of generating some nasty surprises, in the form of difficult to detect, incorrect results, if you don't know what you're doing and why you're doing it.

    If you're considering it for a production, Jeff's article Solving the Running Total and Ordinal Rank Problems (Rewritten)[/url] is a worthwhile read (if not 100% compulsory).

    It's a lengthy article, so you may want to concentrate on the 2nd 1/2 that introduces the "Quirky Update"...

    The running totals problem that he address in the article is obviously different that you current task, but the rules he lays out for using the Quirky method are universal and should be fully understood before using this method in any production code.

    Just my 2 cents...

  • Hello,

    Here is the set. There will be a begin and end date where we get the min(begin) and max(end) for each set. The cols will be ordered by the date combination. Sorry if I asked the question in the wrong area. We are using SQL 2014 for this.

    IF OBJECT_ID('tempdb..#testa', 'U') IS NOT NULL

    DROP TABLE #testa;

    CREATE TABLE #testa

    (

    A INTEGER,

    B VARCHAR(2),

    ToSum INTEGER,

    beginDate BIGINT,

    endDate bigint

    )

    INSERT INTO #testa

    ( A, B, ToSum, beginDate, endDate )

    select 1, 'AL', 2, 20081003153540,20081003160918 UNION ALL

    SELECT 1, 'AL', 3, 20081003160918,20090824171732 UNION ALL

    SELECT 1, 'AL', 4, 20090824171732,20090824172124 UNION ALL

    SELECT 23, 'NY', 5, 20090824172124,20100805131507 UNION ALL

    SELECT 1, 'AL', 6, 20100805131507,20100805134025 UNION ALL

    SELECT 34, 'NY', 7, 20100805134025,20100805135736 UNION ALL

    SELECT 34, 'NY', 8, 20100805135736,20110720153831 UNION ALL

    SELECT 32, 'CA', 8, 20110720153831,20110815143945 UNION ALL

    SELECT 42, 'NY', 9, 20110815143945,20111017170554 UNION ALL

    SELECT 1, 'AL', 2, 20111017170554,20111017170640 UNION ALL

    SELECT 1, 'AL', 2, 20111017170640,20120809174002 UNION ALL

    SELECT 1, 'AL', 2, 20120809174002,20130809174002

    SELECT * FROM #testa

  • There's no need for the Quirky Update in such a case. The "Difference Between Two Row Numbers" will do it, will be faster that the Quirky Update if the right clustered index is available, and will use methods that are "supported".

    Thanks for posting the readily consumable data. I'll be back soon.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here's the latest readily consumable data with a couple of comments added for other experimenters...

    --===== Conditionally drop the test table to make reruns in SSMS easier.

    IF OBJECT_ID('tempdb..#testa', 'U') IS NOT NULL

    DROP TABLE #testa

    ;

    --===== Create the test table. Note the modifications

    CREATE TABLE #testa

    (

    A INTEGER NOT NULL,

    B CHAR(2) NOT NULL,

    ToSum INTEGER NOT NULL,

    beginDate BIGINT NOT NULL,

    endDate BIGINT NOT NULL--,

    CONSTRAINT PK_#testa PRIMARY KEY CLUSTERED (beginDate,A,B)

    )

    ;

    --===== Populate the test table

    INSERT INTO #testa

    (A, B, ToSum, beginDate, endDate)

    SELECT 1, 'AL', 2, 20081003153540, 20081003160918 UNION ALL

    SELECT 1, 'AL', 3, 20081003160918, 20090824171732 UNION ALL

    SELECT 1, 'AL', 4, 20090824171732, 20090824172124 UNION ALL

    SELECT 23, 'NY', 5, 20090824172124, 20100805131507 UNION ALL

    SELECT 1, 'AL', 6, 20100805131507, 20100805134025 UNION ALL

    SELECT 34, 'NY', 7, 20100805134025, 20100805135736 UNION ALL

    SELECT 34, 'NY', 8, 20100805135736, 20110720153831 UNION ALL

    SELECT 32, 'CA', 8, 20110720153831, 20110815143945 UNION ALL

    SELECT 42, 'NY', 9, 20110815143945, 20111017170554 UNION ALL

    SELECT 1, 'AL', 2, 20111017170554, 20111017170640 UNION ALL

    SELECT 1, 'AL', 2, 20111017170640, 20120809174002 UNION ALL

    SELECT 1, 'AL', 2, 20120809174002, 20130809174002

    ;

    --===== Show what we have, so far.

    SELECT * FROM #testa

    ;

    Here's one possible solution. I say "possible" solution because I don't have a 2012+ box at my disposal right now to test a Preceding Rows thing, which might cut down on the number of sorts and shorten the code, to boot.

    --===== Solve the problem using the ordered difference between two row numbers.

    WITH

    cteEnumerate AS

    ( --=== Create the difference between row numbers to form aggregate group markers.

    SELECT A, B, ToSum, beginDate,

    GrpNum = ROW_NUMBER() OVER (ORDER BY beginDate)

    - ROW_NUMBER() OVER (PARTITION BY A,B ORDER BY beginDate)

    FROM #testa

    ),

    cteAggregate AS

    ( --=== Do the aggregates by GrpNum and figure out the display order

    SELECT A, B, ToSum = SUM(ToSum), SortDisplay = MIN(beginDate)

    FROM cteEnumerate

    GROUP BY GrpNum, A, B

    ) --=== Display the aggregated data in the correct order.

    SELECT A, B, ToSum

    FROM cteAggregate

    ORDER BY SortDisplay

    ;

    Here are the results with sequence groups in the original order as they appear in the original data...

    A B ToSum

    ----------- ---- -----------

    1 AL 9

    23 NY 5

    1 AL 6

    34 NY 15

    32 CA 8

    42 NY 9

    1 AL 6

    (7 row(s) affected)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Nice job, Jeff. I especially like the way you calculated the group number using two row number functions. Kind of reminiscent of something Itzik wrote once, but I don't remember where.

  • Viewing 15 posts - 1 through 15 (of 19 total)

    You must be logged in to reply to this topic. Login to reply