Help with aggregate function

  • have table:
    col1 col2 col3
    A       B    10
    B       A     5
    C       D    8

    Want to get:
    A    B   15
    C    D    8

  • And how are you determining that 

    A B 10
    B A 5

    Should aggregate to A B 15

  • col1 and col2 represent company names and col3 is the transaction value, could be negative, between the companies. So I want to know how much company A spent?

  • Lengthy, but there might not be a better alternative without fixing the data.

    CREATE TABLE SampleData(
      col1 varchar(10),
      col2 varchar(10),
      col3 int
    );
    INSERT INTO SampleData
    VALUES
      ('A', 'B', 10),
      ('B', 'A', 5 ),
      ('C', 'D', 8 );

    SELECT CASE WHEN col1 <= col2 THEN col1 ELSE col2 END AS col1,
       CASE WHEN col1 <= col2 THEN col2 ELSE col1 END AS col2,
       SUM( col3) AS col3
    FROM SampleData
    GROUP BY CASE WHEN col1 <= col2 THEN col1 ELSE col2 END,
       CASE WHEN col1 <= col2 THEN col2 ELSE col1 END;

    GO
    DROP TABLE SampleData

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • It won't make a bit of improvement performance-wise over what Luis posted, but it does simplify the code a bit.  The derived table could be a CTE or CROSS APPLY, as well.  Since this is also for 2012 or greater, we can use IIF for further code simplification although they do resolve to CASE operators behind the scenes.


     SELECT d.Col1, d.Col2, Col3 = SUM(d.COL3)
       FROM (
             SELECT  Col1 = IIF(Col1<=Col2, Col1, Col2)
                    ,Col2 = IIF(Col1<=Col2, Col2, Col1)
                    ,Col3
               FROM dbo.SampleData
            ) d
      GROUP BY d.Col1, d.Col2
    ;

    Also, I recommend that you always use the 2 part naming convention for all tables and objects in all of your code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • That is excellent guys. Both queries work as expected. I have modified the query a little bit to accomodate a case when col3 is negative. In that case it means that for company col1 is a loss but for comany col2 is a plus.so for examble for the following values;
    col1 col2 col3
    A    B    100
    B    A    -50
    A    C    200
    A    D    100
    A    B    150
    B    F    200
    The result of the modified queries:
    SELECT CASE WHEN col1 <= col2 THEN col1 ELSE col2 END AS col1,
     CASE WHEN col1 <= col2 THEN col2 ELSE col1 END AS col2,
     SUM( CASE WHEN col3 < 0 THEN col3 * -1 ELSE col3 END ) AS col3
    FROM SampleData
    GROUP BY CASE WHEN col1 <= col2 THEN col1 ELSE col2 END,
     CASE WHEN col1 <= col2 THEN col2 ELSE col1 END;

    SELECT d.col1, d.col2, col3 = SUM(IIF (d.col3 < 0, d.col3 * -1, d.col3))
     FROM (
       SELECT col1 = IIF(col1<=col2, col1, col2)
          ,col2 = IIF(col1<=col2, col2, col1)
          ,col3
        FROM SampleData
       ) d
    GROUP BY d.col1, d.col2
    ;
    Result:
    A    B    300
    A    C    200
    A    D    100
    B    F    200

    That is what I wanted!!
    I got all the help I requested. Thank you very much. From my prespective we can close this thread, Not sure who is going to do it.

  • ger_serrano - Monday, November 20, 2017 9:11 AM

    That is excellent guys. Both queries work as expected. I have modified the query a little bit to accomodate a case when col3 is negative. In that case it means that for company col1 is a loss but for comany col2 is a plus.so for examble for the following values;
    col1 col2 col3
    A    B    100
    B    A    -50
    A    C    200
    A    D    100
    A    B    150
    B    F    200
    The result of the modified queries:
    SELECT CASE WHEN col1 <= col2 THEN col1 ELSE col2 END AS col1,
     CASE WHEN col1 <= col2 THEN col2 ELSE col1 END AS col2,
     SUM( CASE WHEN col3 < 0 THEN col3 * -1 ELSE col3 END ) AS col3
    FROM SampleData
    GROUP BY CASE WHEN col1 <= col2 THEN col1 ELSE col2 END,
     CASE WHEN col1 <= col2 THEN col2 ELSE col1 END;

    SELECT d.col1, d.col2, col3 = SUM(IIF (d.col3 < 0, d.col3 * -1, d.col3))
     FROM (
       SELECT col1 = IIF(col1<=col2, col1, col2)
          ,col2 = IIF(col1<=col2, col2, col1)
          ,col3
        FROM SampleData
       ) d
    GROUP BY d.col1, d.col2
    ;
    Result:
    A    B    300
    A    C    200
    A    D    100
    B    F    200

    That is what I wanted!!
    I got all the help I requested. Thank you very much. From my prespective we can close this thread, Not sure who is going to do it.

    Thanks for the kind feedback and about your modification.  Threads are never closed, though, just in case someone comes up with a better idea that will help us all for similar problems.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sorry . that was only half of the work I had to do. tuens out that every time I have negative value in col3 I have to turn col3 positive and flip col1 and col2 values and then group so if have:
    col1 col2 col3
    A B 100
    B A -50
    A C 200
    A D 100
    A B 150
    B F 200
    C D -50
    I have to produce
    col1 col2 col3
    A B 300
    A C 200
    A D 100
    B F 200
    D C 50

  • Following query produced the expected results:
    A    B    300
    A    C    200
    A    D    100
    B    F    200
    D    C    50

    SELECT CASE WHEN COL1 <= COL2 AND COL3 > 0 THEN COL1 ELSE COL2 END AS COL1,
     CASE WHEN COL1 <= COL2 AND COL3 > 0 THEN COL2 ELSE COL1 END AS COL2,
     SUM( CASE WHEN COL3 < 0 THEN COL3 *-1 ELSE COL3 END ) AS COL3
    FROM SampleData
    GROUP BY CASE WHEN COL1 <= COL2 AND COL3 > 0 THEN COL1 ELSE COL2 END,
     CASE WHEN COL1 <= COL2 AND COL3 > 0 THEN COL2 ELSE COL1 END
     ORDER BY CASE WHEN COL1 <= COL2 AND COL3 > 0 THEN COL1 ELSE COL2 END,
     CASE WHEN COL1 <= COL2 AND COL3 > 0 THEN COL2 ELSE COL1 END

  • unfortunately taht query did not encompass all scenarios such as:
    col1 col2 col3
    ('A','B',100),
    ('B','A',-50),
    ('A','C',200),
    ('A','D',100),
    ('A','B',150),
    ('B','F',200),
    ('d','K',400),
    ('K','D',-20),
    ('C','D',-50),
    ('D','C',-60);
    I need to display which one of col1 or col2 had the max net value (col3). I mean I have to have as a result:
    A,B,300
    A,C,200
    A,D,100
    B,F,200
    D,K,420
    C,D,10
    hopefully this table will help to cover all 4 scenarios of values between col1 and col2 (first two rows). Last four rows is the answer we need to get:

    AB3030-30-30
    BA90-9090-90
    BA60   
    AB 120  
    BA  120 
    AB   60

    s???9w

  • ger_serrano - Monday, December 18, 2017 1:26 PM

    unfortunately taht query did not encompass all scenarios such as:
    col1 col2 col3
    ('A','B',100),
    ('B','A',-50),
    ('A','C',200),
    ('A','D',100),
    ('A','B',150),
    ('B','F',200),
    ('d','K',400),
    ('K','D',-20),
    ('C','D',-50),
    ('D','C',-60);
    I need to display which one of col1 or col2 had the max net value (col3). I mean I have to have as a result:
    A,B,300
    A,C,200
    A,D,100
    B,F,200
    D,K,420
    C,D,10
    hopefully this table will help to cover all 4 scenarios of values between col1 and col2 (first two rows). Last four rows is the answer we need to get:

    AB3030-30-30
    BA90-9090-90
    BA60   
    AB 120  
    BA  120 
    AB   60

    sԾ��9w

    Two things...

    First, help us help you... you need to start posting your data in a readily consumable format.  Please see the first link in my signature line below under "Helpful Links" for how to do that.

    Second, I'm just not baggin' what you're rakin' on your post above.  Since your previous post said that all negative values need to be made positive, I have no idea how you came up with the result of "C,D,10"  nor do I have any idea how you came up with the numbers you have in your "all 4 scenarios" example above because there's neither hide nor hair of any combination of numbers for "A,B" or "B,A" that will produce any number not evenly divisible by 50.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Really sorry but business keep changing the requirements. My table is:
    create table SampleData ( col1 VARCHAR(10), col2 VARCHAR(10), MTM numeric(10));
    INSERT INTO SampleData VALUES
    ('A','B',100),
    ('B','A',-50),
    ('A','C',200),
    ('A','D',100),
    ('A','B',150),
    ('B','F',200),
    ('d','K',400),
    ('K','D',-20),
    ('C','D',-50),
    ('B','A',15),
    ('D','C',-60),
    ('X','Y',-25),
    ('Y','X',-25);
    Here is the meaning of col1, col2, and MTM: col1 and col2 are companies and MTM is the result of a trade between col1 and col2 but reported by col1. There is no restriction in the number of trades between companies and MTM could be positive or negative. What we need to report is the net position between col1 and col2. Based on the data I posted I would like to have
    col1    col2    MTM
    A      B        285
    A      C        200
    A      D        100
    B      F        200
    C        D        10
    d        K        420
    X        Y        0

    So back to your question about result C D 10: C reports trade with D -50 which means that C lost 50 in the trade but D won 50. Now D reports another trade with C -60 which means that D lost 60 but C won 60. If we compute those two trades then C trading with D won at the end 10 and that is why we report C D 10, C is in the first position and D in the second position. Hopefuly that makes sense. Same login applies to the trades between X and Y. The net result is X Y 0 or Y X 0 (the same for me).

  • ger_serrano - Tuesday, December 19, 2017 8:39 AM

    Really sorry but business keep changing the requirements. My table is:
    create table SampleData ( col1 VARCHAR(10), col2 VARCHAR(10), MTM numeric(10));
    INSERT INTO SampleData VALUES
    ('A','B',100),
    ('B','A',-50),
    ('A','C',200),
    ('A','D',100),
    ('A','B',150),
    ('B','F',200),
    ('d','K',400),
    ('K','D',-20),
    ('C','D',-50),
    ('B','A',15),
    ('D','C',-60),
    ('X','Y',-25),
    ('Y','X',-25);
    Here is the meaning of col1, col2, and MTM: col1 and col2 are companies and MTM is the result of a trade between col1 and col2 but reported by col1. There is no restriction in the number of trades between companies and MTM could be positive or negative. What we need to report is the net position between col1 and col2. Based on the data I posted I would like to have
    col1    col2    MTM
    A      B        285
    A      C        200
    A      D        100
    B      F        200
    C        D        10
    d        K        420
    X        Y        0

    So back to your question about result C D 10: C reports trade with D -50 which means that C lost 50 in the trade but D won 50. Now D reports another trade with C -60 which means that D lost 60 but C won 60. If we compute those two trades then C trading with D won at the end 10 and that is why we report C D 10, C is in the first position and D in the second position. Hopefuly that makes sense. Same login applies to the trades between X and Y. The net result is X Y 0 or Y X 0 (the same for me).

    It's actually very easy.  You need to have some method to standardize the order of the companies and then make sure that the records match that standard.  Once you've standardized your records it's a simple grouping.  For example, force the companies in your records to be in alphabetical order ( and switch the sign where they're not).  I've used a CROSS APPLY, but you could also use three CASE expressions or a simple UNION ALL.

    SELECT s.col1, s.col2, SUM(s.MTM)
    FROM #SampleData
    CROSS APPLY
    (
        SELECT col1, col2, mtm
        WHERE col1 < col2

        UNION ALL

        SELECT col2, col1, -mtm
        WHERE col1 >= col2
    ) AS s -- standardized
    GROUP BY s.col1, s.col2

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Awesome Jeff!!.  Thanks a lot. I think the following query is also rdbms agnostic:
    with newtab (col1,col2, mtm)
    as
    (SELECT upper(col1), upper(col2), mtm FROM sampledata
      WHERE upper(col1) < upper(col2)
        
     UNION ALL

      SELECT upper(col2), upper(col1), -mtm FROM sampledata
      WHERE upper(col1) >= upper(col2)
        )
    select col1, col2, sum(mtm) as mtm from newtab
    group by col1, col2;

  • ger_serrano - Wednesday, December 20, 2017 7:34 AM

    Awesome Jeff!!.  Thanks a lot. I think the following query is also rdbms agnostic:
    with newtab (col1,col2, mtm)
    as
    (SELECT upper(col1), upper(col2), mtm FROM sampledata
      WHERE upper(col1) < upper(col2)
        
     UNION ALL

      SELECT upper(col2), upper(col1), -mtm FROM sampledata
      WHERE upper(col1) >= upper(col2)
        )
    select col1, col2, sum(mtm) as mtm from newtab
    group by col1, col2;

    That came from me, not Jeff.

    I believe that the CTE will require two separate scans of the table, whereas the CROSS APPLY will only require one.  I don't have the data set up anymore to test.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply