Multi-column, single result aggregation

  • Hey folks, hoping I can shortcut the full test harness thing and see if anyone's done some multi-column (to single result) aggregation recently. A quickie example:

    CREATE TABLE #tmp

    (tIDINT IDENTITY(1, 1),

    Acct_IDINT,

    Category VARCHAR(20),

    Value1MONEY,

    Value2MONEY

    )

    GO

    SELECT

    Acct_ID,

    SUM( SumMe) As Value

    FROM

    (SELECT

    Acct_ID,

    CASE WHEN Category IN ( 'a', 'bdq', 'rr') THEN Value1

    WHEN Category IN ( 'zz', 'xx', 'qrq') THEN Value2

    ELSE 0

    END AS sumMe

    FROM

    #tmp

    )

    GROUP BY

    Acct_ID

    What I'm doing is dealing with variable entry cycles where a 'positive increase to an account' could be in either value 1 or value 2 depending on the category, because the category may apply to another account (either taking money from, or sending money to) the account I'm looking for.

    The real query is wrapped around dates and a few other conditions, but my primary concern is this is a 60 million row+ recordset that's wide and while I can control the indexing, I'm hoping someone will share their research/results/links to a well defined, already done, generic optimization in this style. I just can't be the first person to bang my head on this wall. 🙂


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Generic optimizations are difficult to give since how long something takes will vary depending on the schema. One thing that may come in handy is using CTE to handle some of this. You may be able to get the sums of the values separately that way and then sum them after that.

    declare @Tmp table (tID INT IDENTITY(1, 1),

    Acct_ID INT,

    Category VARCHAR(20),

    Value1 MONEY,

    Value2 MONEY

    );

    with sum_stage as (select acct_id, 'value1' [col], sum(value1) [value] from @Tmp where category in ('a', 'bdq', 'rr')

    group by acct_id

    union all

    select acct_id, 'value2' [col], sum(value2) [value] from @Tmp where category in ('zz', 'xx', 'qrq')

    group by acct_id)

    select acct_id, sum(value) from sum_stage

    If you're taking in an account ID as a parameter then that can be included in the CTE. Or, since you were a little vague on the description, this may end up being to difficult to do in your environment depending on the number of columns.

  • cfradenburg (9/16/2010)


    Generic optimizations are difficult to give since how long something takes will vary depending on the schema.

    Yeah, I know, it's painful, but I was figuring this had been done a billion times and I could just apply it.

    One thing that may come in handy is using CTE to handle some of this. You may be able to get the sums of the values separately that way and then sum them after that.

    declare @Tmp table (tID INT IDENTITY(1, 1),

    Acct_ID INT,

    Category VARCHAR(20),

    Value1 MONEY,

    Value2 MONEY

    );

    with sum_stage as (select acct_id, 'value1' [col], sum(value1) [value] from @Tmp where category in ('a', 'bdq', 'rr')

    group by acct_id

    union all

    select acct_id, 'value2' [col], sum(value2) [value] from @Tmp where category in ('zz', 'xx', 'qrq')

    group by acct_id)

    select acct_id, sum(value) from sum_stage

    If you're taking in an account ID as a parameter then that can be included in the CTE. Or, since you were a little vague on the description, this may end up being to difficult to do in your environment depending on the number of columns.

    Yeah, the problem is I already know I'm scanning the index/table i'll be working from, because as your last comment indicates, I'm not passing any form of filtering other then some date work. This is for an ETL process and I'm working off original schema that is vendor controlled. I'm trying to keep it simple because instead of ending up lost in the details I was hoping to find a good baseline. The inside of that WITH statement is equivalently what I'm looking at the moment, with a wrapper sum on the external that's pretty cheap.

    I'm still building out the general first pass against the structure. Once I do I'll post up my code/structures/etc. I do appreciate the initial attempt, however. Thank you.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Would that be an easy baseline?

    SELECT

    Acct_ID,

    SUM( CASE WHEN Category IN ( 'a', 'bdq', 'rr') THEN Value1

    WHEN Category IN ( 'zz', 'xx', 'qrq') THEN Value2

    ELSE 0

    END) AS VALUE

    FROM #tmp

    GROUP BY Acct_ID

    It still performs a table scan and the execution plan is identical to your original query.

    All that's changed is syntax (maybe readability...).

    So I'm not sure if it will be of any value for you...:unsure:



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Yeah, Lutz, pretty much the same result, but I see what you're driving at. I had been originally tempted not to put in any code at all and ask for resources, but I knew some folks would get a better idea from a sample example. 🙂

    Here's a sample of what I'm trying to optimize (realize I'm incomplete and this isn't what I'd want to optimize finally, but it shows the beginning of the complexity). I believe the problem is I was hoping to find a few ideas towards the principals of multi-field aggregation optimization.

    CREATE TABLE [dbo].[vPortfolioTransaction](

    [PortfolioID] [int] NOT NULL,

    [PortfolioTransactionID] [int] NOT NULL,

    [TradeDate] [datetime] NULL,

    [SequenceNo] [int] NOT NULL,

    [RecID] [tinyint] NOT NULL,

    [TransactionCode] [char](2) NOT NULL,

    [TranCodeLabel] [varchar](50) NULL,

    [Comment] [varchar](70) NULL,

    [SecTypeCode1] [char](2) NULL,

    [SecurityID1] [int] NULL,

    [SettleDate] [datetime] NULL,

    [OriginalCostDate] [datetime] NULL,

    [Quantity] [float] NULL,

    [ClosingMethodCode] [char](1) NULL,

    [SecTypeCode2] [char](2) NULL,

    [SecurityID2] [int] NULL,

    [TradeDateFX] [float] NULL,

    [SettleDateFX] [float] NULL,

    [OriginalFX] [float] NULL,

    [MarkToMarket] [bit] NULL,

    [TradeAmount] [float] NULL,

    [OriginalCost] [float] NULL,

    [WithholdingTax] [float] NULL,

    [ExchangeID] [tinyint] NULL,

    [ExchangeFee] [float] NULL,

    [Commission] [float] NULL,

    [ImpliedCommission] [bit] NULL,

    [OtherFees] [float] NULL,

    [CommissionPurposeID] [tinyint] NOT NULL,

    [IsPledge] [bit] NULL,

    [CustodianID] [int] NULL,

    [IsDestPledge] [bit] NULL,

    [DestCustodianID] [int] NULL,

    [OriginalFace] [float] NULL,

    [YieldOnCost] [float] NULL,

    [DurationOnCost] [float] NULL,

    [TransUserDef1ID] [tinyint] NOT NULL,

    [TransUserDef2ID] [tinyint] NOT NULL,

    [TransUserDef3ID] [tinyint] NOT NULL,

    [TranID] [int] NULL,

    [IPCounter] [varchar](16) NULL,

    [SourceID] [int] NULL,

    [PostDate] [datetime] NULL,

    [LotNumber] [smallint] NULL,

    [ReclaimAmount] [float] NULL,

    [StrategyID] [int] NULL,

    [RecordDate] [datetime] NULL,

    [DivTradeDate] [datetime] NULL,

    [PerfContributionOrWithdrawal] [bit] NULL,

    [VersusDate] [datetime] NULL,

    [FeePeriodDate] [datetime] NULL,

    [BrokerFirmID] [int] NULL,

    [BrokerRepSecurityID] [int] NULL,

    [TradeBlotterLineID] [int] NULL

    ) ON [PRIMARY]

    The initial aggregation code I'm currently working off of (I'm still fleshing out the logic, trying to remove # of passes, etc):

    (Note to self, test even half done code before posting, sorry, it's better now...)

    DECLARE @DataUntilDate DATETIME

    SET @DataUntilDate = GETDATE()

    SELECT

    ISNULL( drvPos.pID, drvNeg.pID) AS pID,

    ISNULL( drvPos.sID, drvNeg.sID) AS sID,

    ISNULL( drvPos.qty, 0) - ISNULL( drvNeg.qty, 0) AS qty,

    ISNULL( drvPos.amt, 0) - ISNULL( drvNeg.amt, 0) AS amt

    FROM

    (SELECT

    pID,

    sID,

    ISNULL( sum( qty), 0) AS qty,

    sum( amt) as amt

    FROM

    (

    select

    portfolioID AS pID,

    securityID1 AS sID,

    sum( CASE WHEN sectypecode1 IN ( 'ca', 'ex', 'ep') THEN TradeAmount ELSE quantity END) AS qty,

    SUM( TradeAmount) AS amt

    FROM

    vPortfolioTransaction

    WHERE

    transactionCode IN ( 'ti', 'li', 'dp', 'by', 'cs')

    AND tradeDate <= @DataUntilDate

    GROUP BY

    portfolioID,

    securityID1

    UNION ALL

    SELECT

    PortfolioID,

    securityID2,

    SUM( CASE WHEN sectypecode2 IN ( 'ca', 'ex', 'ep') THEN TradeAmount ELSE quantity END) ,

    SUM( TradeAmount)

    FROm

    vPortfolioTransaction

    WHERE

    transactionCode IN ( 'sl', 'wd', 'in', 'dv')

    AND tradeDate <= @DataUntilDate

    GROUP BY

    portfolioID,

    securityID2

    ) AS drv

    GROUP BY

    pID,

    sID

    ) AS drvPos

    FULL OUTER JOIN

    (SELECT

    pID,

    sID,

    ISNULL( sum( qty), 0) AS qty,

    sum( amt) as amt

    FROM

    (

    select

    portfolioID AS pID,

    securityID1 AS sID,

    sum( CASE WHEN sectypecode1 IN ( 'ca', 'ex', 'ep') THEN TradeAmount ELSE quantity END) AS qty,

    SUM( TradeAmount) AS amt

    FROM

    vPortfolioTransaction

    WHERE

    transactionCode IN ( 'to', 'lo', 'wd', 'sl', 'ss')

    AND tradeDate <= @DataUntilDate

    GROUP BY

    portfolioID,

    securityID1

    UNION ALL

    SELECT

    PortfolioID,

    securityID2,

    SUM( CASE WHEN sectypecode2 IN ( 'ca', 'ex', 'ep') THEN TradeAmount ELSE quantity END) ,

    SUM( TradeAmount)

    FROm

    vPortfolioTransaction

    WHERE

    transactionCode IN ( 'by', 'dp')

    AND tradeDate <= @DataUntilDate

    GROUP BY

    portfolioID,

    securityID2

    ) AS drv

    GROUP BY

    pID,

    sID

    ) AS drvNeg

    ONdrvPos.pID = drvNeg.pID

    AND drvPos.sID = drvNeg.sID

    ORDER BY

    ISNULL( drvPos.pID, drvNeg.pID),

    ISNULL( drvPos.sID, drvNeg.sID)

    No, it's not pretty, and I'm sure I'll come up with ways to reduce the # of passes, but this gives you an idea of where I'm going with it. It will end up a lot more complex, which is part of why I was hoping for generic principals help. :blink:


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • What I'd basically do is to move the decision whether to use securityID1 or securityID2 into a CASE statement instead of a UNION to avoid running a table scan/seek more often than needed.

    I'm not sure if the FULL JOIN can be replaced with a CASE statement as well. At the moment it looks like it's possible. It would be great to have some sample data and expected result set to play with.

    DECLARE @DataUntilDate DATETIME

    SET @DataUntilDate = GETDATE()

    SELECT

    ISNULL( drvPos.pID, drvNeg.pID) AS pID,

    ISNULL( drvPos.sID, drvNeg.sID) AS sID,

    ISNULL( drvPos.qty, 0) - ISNULL( drvNeg.qty, 0) AS qty,

    ISNULL( drvPos.amt, 0) - ISNULL( drvNeg.amt, 0) AS amt

    FROM

    (SELECT

    portfolioID AS pID,

    CASE WHEN transactionCode IN ( 'sl', 'wd', 'in', 'dv') THEN securityID2 ELSE securityID1 END AS sID,

    SUM( CASE WHEN sectypecode1 IN ( 'ca', 'ex', 'ep') THEN TradeAmount ELSE quantity END) AS qty,

    SUM( TradeAmount) AS amt

    FROM

    vPortfolioTransaction

    WHERE

    transactionCode IN ( 'ti', 'li', 'dp', 'by','sl', 'wd', 'in', 'dv')

    AND tradeDate <= @DataUntilDate

    GROUP BY

    portfolioID,

    CASE WHEN transactionCode IN ( 'sl', 'wd', 'in', 'dv') THEN securityID2 ELSE securityID1 END

    ) AS drvPos

    FULL OUTER JOIN

    (SELECT

    portfolioID AS pID,

    CASE WHEN transactionCode IN ( 'by', 'dp') THEN securityID2 ELSE securityID1 END AS sID,

    SUM( CASE WHEN sectypecode1 IN ( 'ca', 'ex', 'ep') THEN TradeAmount ELSE quantity END) AS qty,

    SUM( TradeAmount) AS amt

    FROM

    vPortfolioTransaction

    WHERE

    transactionCode IN ( 'to', 'lo', 'wd', 'sl', 'by', 'dp')

    AND tradeDate <= @DataUntilDate

    GROUP BY

    portfolioID,

    CASE WHEN transactionCode IN ( 'by', 'dp') THEN securityID2 ELSE securityID1 END

    ) AS drvNeg

    ON drvPos.pID = drvNeg.pID

    AND drvPos.sID = drvNeg.sID

    ORDER BY

    ISNULL( drvPos.pID, drvNeg.pID) ,

    ISNULL( drvPos.sID, drvNeg.sID)



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • LutzM (9/16/2010)


    I'm not sure if the FULL JOIN can be replaced with a CASE statement as well. At the moment it looks like it's possible. It would be great to have some sample data and expected result set to play with.

    Yeah, that's part of the whole NDA problem at the moment. What results I'm looking for and the sample data are proprietary, and I'd need to setup a whole separate build to create the test data. I'll see what I can come up with on that front but it's not simple.

    Still going through your code, and I believe you're right... Get back to you on that after a few rounds with the data. 🙂

    EDIT: I should have said manually create the test data. I have no way of automating test data that will also give intelligent results.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • I think all that's needed so far are 14 sample rows (one for each transactionCode value). If there are more than those 14 values mentioned so far you should include a sample of the ones not mentioned yet.

    From my point of view, the easy part is to set up the sample data. To come up with values for all remaning columns leading to a result set that can be verified to either be correct or not is a totally different story.... Absolutely understood.

    From my point of view, you'll have plenty of time: I'm going to take a nap (it's 1am over here) followed by a few hours at the place where I make the money for living. So it's going to be at least 15hrs before I'll have a chance to get back here.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply