Need help with query tuning

  • Hi folks,

    I'm struggling to come up with ideas of how to improve the performance of the following query:

    SELECT

    datename(mm, dbo.Trade.TradeDate),

    dbo.DataSource.Name,

    datename(yy, dbo.Trade.TradeDate),

    sum(dbo.Trade.SalesValueCcy * SalesAttribution.Percentage),

    SUM(dbo.Trade.SalesCreditCcy * SalesAttribution.Percentage),

    sum(dbo.Trade.VolumeCcy * SalesAttribution.Percentage),

    SUM(dbo.Trade.MarkupCcy * SalesAttribution.Percentage),

    TradingDepartment.Description,

    TradingDesk.Description,

    SalesDepartment.Description,

    SalesDesk.Description,

    TPIndustrySubSector.Description,

    TPReportingResponsibility.Description,

    ImmediateParent.Description,

    TopParent.Description,

    Customer.Description,

    CustReportingCountry.Description,

    CustReportingResponsibility.Description,

    CustRepCustomerIndustry.Description,

    ParentProduct.Description,

    Product.Description

    FROM

    dbo.Trade

    INNER JOIN dbo.TradeMember ON dbo.Trade.TradeID=dbo.TradeMember.TradeID

    INNER JOIN dbo.Member Product ON dbo.TradeMember.ProductID=Product.MemberID

    INNER JOIN dbo.Dimension ParentProduct ON Product.ParentDimensionID=ParentProduct.DimensionID

    INNER JOIN dbo.DataSource ON dbo.DataSource.DataSourceID=dbo.Trade.DataSourceID

    INNER JOIN SalesAttribution ON dbo.Trade.SalesAttributionGroupID=SalesAttribution.SalesAttributionGroupID

    INNER JOIN dbo.Member SalesDesk ON SalesDesk.MemberID=SalesAttribution.SalespersonDeskID

    INNER JOIN dbo.Dimension SalesDepartment ON SalesDepartment.DimensionID=SalesDesk.ParentDimensionID

    INNER JOIN dbo.Member Customer ON Customer.MemberID=dbo.TradeMember.CustomerID

    INNER JOIN dbo.CustomerMemberDetail TradeCustomer ON TradeCustomer.MemberID=Customer.MemberID

    INNER JOIN dbo.Member CustRepCustomerIndustry ON CustRepCustomerIndustry.MemberID=TradeCustomer.ReportingIndustryID

    INNER JOIN dbo.Dimension CustRepCustomerIndustrySubSector ON CustRepCustomerIndustrySubSector.DimensionID=CustRepCustomerIndustry.ParentDimensionID

    INNER JOIN dbo.Member CustReportingCountry ON CustReportingCountry.MemberID=TradeCustomer.ReportingCountryID

    INNER JOIN dbo.Dimension CustReportingResponsibility ON CustReportingResponsibility.DimensionID=CustReportingCountry.ParentDimensionID

    INNER JOIN dbo.Member ImmediateParent ON ImmediateParent.MemberID=TradeCustomer.ImmediateParentID

    INNER JOIN dbo.CustomerMemberDetail ImmediateParentDetails ON ImmediateParentDetails.MemberID=ImmediateParent.MemberID

    INNER JOIN dbo.Member IPRepCustomerIndustry ON IPRepCustomerIndustry.MemberID=ImmediateParentDetails.ReportingIndustryID

    INNER JOIN dbo.Dimension IPRepCustomerIndustrySubSector ON IPRepCustomerIndustrySubSector.DimensionID=IPRepCustomerIndustry.ParentDimensionID

    INNER JOIN dbo.Member IPReportingCountry ON IPReportingCountry.MemberID=ImmediateParentDetails.ReportingCountryID

    INNER JOIN dbo.Dimension IPReportingResponsibility ON IPReportingResponsibility.DimensionID=IPReportingCountry.ParentDimensionID

    INNER JOIN dbo.Member TopParent ON TopParent.MemberID=TradeCustomer.TopParentID

    INNER JOIN dbo.CustomerMemberDetail TopParentDetails ON TopParentDetails.MemberID=TopParent.MemberID

    INNER JOIN dbo.Member TPIndustry ON TPIndustry.MemberID=TopParentDetails.IndustryID

    INNER JOIN dbo.Dimension TPIndustrySubSector ON TPIndustrySubSector.DimensionID=TPIndustry.ParentDimensionID

    INNER JOIN dbo.Member TPReportingCountry ON TPReportingCountry.MemberID=TopParentDetails.ReportingCountryID

    INNER JOIN dbo.Dimension TPReportingResponsibility ON TPReportingResponsibility.DimensionID=TPReportingCountry.ParentDimensionID

    INNER JOIN dbo.Member TradingDesk ON TradingDesk.MemberID=dbo.TradeMember.TradingDeskID

    INNER JOIN dbo.Dimension TradingDepartment ON TradingDepartment.DimensionID=TradingDesk.ParentDimensionID

    WHERE

    (

    ( dbo.Trade.TradeDate >= DATEADD(DD, -1 * DAY(GETDATE()), DATEADD(MM, -1 * MONTH(GETDATE()) + 1, GETDATE())))

    )

    GROUP BY

    datename(mm, dbo.Trade.TradeDate),

    dbo.DataSource.Name,

    datename(yy, dbo.Trade.TradeDate),

    TradingDepartment.Description,

    TradingDesk.Description,

    SalesDepartment.Description,

    SalesDesk.Description,

    TPIndustrySubSector.Description,

    TPReportingResponsibility.Description,

    ImmediateParent.Description,

    TopParent.Description,

    Customer.Description,

    CustReportingCountry.Description,

    CustReportingResponsibility.Description,

    CustRepCustomerIndustry.Description,

    ParentProduct.Description,

    Product.Description

    Here are the tables involved:

    dbo.Member

    CREATE TABLE [dbo].[Member](

    [MemberID] [int] IDENTITY(1,1) NOT NULL,

    [Code] [nvarchar](100) NOT NULL,

    [Description] [nvarchar](200) NOT NULL,

    [ParentDimensionID] [int] NOT NULL,

    [DimensionTypeID] [int] NOT NULL,

    CONSTRAINT [PK__Member] PRIMARY KEY CLUSTERED

    (

    [MemberID] ASC

    )

    dbo.Dimension

    CREATE TABLE [dbo].[Dimension](

    [DimensionID] [int] IDENTITY(1,1) NOT NULL,

    [ParentDimensionID] [int] NULL,

    [Code] [nvarchar](100) NOT NULL,

    [Description] [nvarchar](200) NOT NULL,

    [DimensionLevelID] [int] NOT NULL,

    CONSTRAINT [PK__Dimension] PRIMARY KEY CLUSTERED

    (

    [DimensionID] ASC

    )

    dbo.Trade and dbo.TradeMember

    Two tables that consist of captured trade features: Trade table mainly includes trade values and TradeMember stores all the IDs that are referenced to Member table(things like Product, Customer etc)

    These tables are quite big(several millions rows usually) and wide: Trade table has about 70 columns, TradeMember - 15.

    dbo.SalesAttribution

    Again quite wide table(about 15 columns), but here are the most important ones

    CREATE TABLE [dbo].[SalesAttribution](

    [SalesAttributionID] [int] IDENTITY(1,1) NOT NULL,

    [SalespersonID] [int] NOT NULL,

    [SalespersonDeskID] [int] NOT NULL,

    [Percentage] [numeric](11, 10) NOT NULL,

    [SalesAttributionGroupID] [int] NOT NULL,

    CONSTRAINT [PK_SalesAttributionID] PRIMARY KEY CLUSTERED

    (

    [SalesAttributionID] ASC

    )

    So every row in Trade table is tagged with SalesAttributionGroupID, which can correspond to at least one(could be more) rows in SalesAttribution table.

    dbo.CustomerMemberDetail

    Again really wide table(about 25 columns) that includes more detailed info about Customer. I will not include this table description here to save space, but just to mention that all the columns used in joins are indexed.

    Problem

    When I pull all data from big and most important tables(Trade JOIN TradeMember JOIN SalesAttribution) everything goes fine(takes appropriate amount of time), but when I start adding Member hierarchical info on top(joins with Member and Dimension tables, and therefore more grouping) this is when it all starts to go nuts.

    Here are IO stats:

    Table 'DataSource'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Dimension'. Scan count 0, logical reads 5580024, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Member'. Scan count 1860008, logical reads 11160048, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'SalesAttribution'. Scan count 465002, logical reads 930592, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'CustomerMemberDetail'. Scan count 0, logical reads 2790012, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'TradeMember'. Scan count 0, logical reads 1395006, physical reads 1029, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Trade'. Scan count 3, logical reads 204444, physical reads 335, read-ahead reads 169197, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    I am concerned about stats on Member and Dimension table(too many scans and logical reads). It's interesting that Clustered index seek is being used on Dimension table(I guess this is why scans = 0) and NonClustered on Member. I did try to play a bit with those, but no luck.

    The problem with the query in general is that I can't implement covering indexes on Trade and TradeMember tables as this is Business Objects generated query and the fields are picked by user(so can be different every time). And in this case the index on TradeDate would not be selective enough anyway. And as I said before when I drop joins with Member and Dimension tables( and therefore grouping), everything works fine.

    Please find the actual execution plan attached. I didn't include the indexes existing on used table, as then the post would be too long(it already is, sorry), but if needed I can provide that info.

    Can I do anything to get rid of those scans and reduce the reads on Member and Dimension tables? Or do anything else to speed up the query? Any help or ideas will be highly appreciated.

    Regards,

    Igor

  • Hmm, odd.

    Im interested in this part of the plan

    Which is pretty much the very first join between datasource and trade.

    The estimate from the hash join is 1 , the actual row count is 465,002 !

    Not much of a difference.

    2005 is getting a bit old now as is my memory of its differences over 2008/2012.

    You may be better of 'caching' the result of this join into a temp table and driving the rest from that.

    Edit:

    Just to elaborate on why this is bad :

    As it thinks that there is 1 row, it has chosen to do loop joins to the other tables in the plan.

    This would be good if there were 1 row but theres not, theres 465,000.

    If it knew there were 465,000 it would have chosen a hash (or merge) join which will be more efficient

    Dave



    Clear Sky SQL
    My Blog[/url]

  • Hello Dave,

    Thank you very much for reply.

    I was wondering about the fact that you noticed. I tried to execute a similar query in our different client's TEST environment

    Here is the code(just to emphasize that the queries are similar)

    SELECT

    datename(mm, dbo.Trade.TradeDate),

    dbo.DataSource.Name,

    datename(yy, dbo.Trade.TradeDate),

    sum(dbo.Trade.TradeValueCcy),

    SUM(dbo.Trade.SalesCreditCcy),

    sum(dbo.Trade.TradeVolumeCcy),

    SUM(dbo.Trade.MarginValue),

    TradingDepartment.Description,

    TradingDesk.Description,

    SalesDepartment.Description,

    SalesDesk.Description,

    Customer.Description,

    ParentProduct.Description,

    Product.Description

    FROM

    dbo.Trade

    INNER JOIN dbo.TradeMember ON dbo.Trade.TradeID=dbo.TradeMember.TradeID

    INNER JOIN dbo.Member Product ON dbo.TradeMember.ProductID=Product.MemberID

    INNER JOIN dbo.Dimension ParentProduct ON Product.ParentDimensionID=ParentProduct.DimensionID

    INNER JOIN dbo.Member SalesDesk ON SalesDesk.MemberID=TradeMember.SalesDeskID

    INNER JOIN dbo.Dimension SalesDepartment ON SalesDepartment.DimensionID=SalesDesk.ParentDimensionID

    INNER JOIN dbo.Member Customer ON Customer.MemberID=dbo.TradeMember.CustomerID

    INNER JOIN dbo.CustomerMemberDetail TradeCustomer ON TradeCustomer.MemberID=Customer.MemberID

    INNER JOIN dbo.Member TradingDesk ON TradingDesk.MemberID=dbo.TradeMember.TradingDeskID

    INNER JOIN dbo.Dimension TradingDepartment ON TradingDepartment.DimensionID=TradingDesk.ParentDimensionID

    INNER JOIN dbo.DataSource ON dbo.DataSource.DataSourceID=dbo.Trade.DataSourceID

    WHERE

    (

    ( dbo.Trade.TradeDate >= DATEADD(DD, -1 * DAY(GETDATE()), DATEADD(MM, -1 * MONTH(GETDATE()) + 1, GETDATE())))

    )

    GROUP BY

    datename(mm, dbo.Trade.TradeDate),

    dbo.DataSource.Name,

    datename(yy, dbo.Trade.TradeDate),

    TradingDepartment.Description,

    TradingDesk.Description,

    SalesDepartment.Description,

    SalesDesk.Description,

    Customer.Description,

    ParentProduct.Description,

    Product.Description

    And the execution plan proves that your idea might be right. Please find it attached.

    Here are the IO stats:

    Table 'DataSource'. Scan count 1, logical reads 2, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Dimension'. Scan count 9, logical reads 5694, physical reads 4, read-ahead reads 1715, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Member'. Scan count 12, logical reads 11492, physical reads 4, read-ahead reads 2620, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'TradeMember'. Scan count 3, logical reads 224456, physical reads 398, read-ahead reads 204206, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Trade'. Scan count 3, logical reads 2451785, physical reads 485, read-ahead reads 2176232, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    I appreciate that the databases and queries are different and the performance difference can be caused by that.

    Dave Ballantyne (12/20/2012)


    Just to elaborate on why this is bad :

    As it thinks that there is 1 row, it has chosen to do loop joins to the other tables in the plan.

    This would be good if there were 1 row but theres not, theres 465,000.

    If it knew there were 465,000 it would have chosen a hash (or merge) join which will be more efficient

    Sorry for probably stupid question, but how to let SQL engine know that there are 465,000 rows? I rebuilt all the indexes(thought it might help), but had no luck with it.

    And one not related question: what program do you use to view SQL executions plans? Really liked number of estimated rows displayed next to the edge.

    Thanks a lot once again.

    Small update: the query I'm working on is generated by Business objects(reporting tool), so can't really use a temporary table or modify SQL(this is not 100 correct, I can change joins in BOBJ designer, which affects the way it generates SQL)

    Igor

  • I would be interested in seeing the plan for this ..

    Select TradeID,DataSource.DataSourceID

    FROM

    dbo.Trade

    INNER JOIN dbo.DataSource ON dbo.DataSource.DataSourceID=dbo.Trade.DataSourceID

    WHERE

    (

    ( dbo.Trade.TradeDate >= DATEADD(DD, -1 * DAY(GETDATE()), DATEADD(MM, -1 * MONTH(GETDATE()) + 1, GETDATE())))

    )

    As the orignal plan timed-out this *could* come back with a decent plan.

    Sorry for probably stupid question, but how to let SQL engine know that there are 465,000 rows? I rebuilt all the indexes(thought it might help), but had no luck with it.

    Thats the problem, both of the tables here have a decent estimate of rows but its the join output that doesent.

    This could be a bug/hole in the optimizer the above will help prove/disprove that.

    Im anycase it still sounds to me a better option to use the above query , store the results in a temp table and drive the remainder from that.

    HTH

    The plan view i use is sql sentry plan explorerer , its a free download too 🙂

    Dave



    Clear Sky SQL
    My Blog[/url]

  • It looks like cardinality estimation error. Please update statistics WITH FULLSCAN on DataSource and Trade tables.

    Also in the original query you can try to replace

    INNER JOIN dbo.DataSource ON dbo.DataSource.DataSourceID=dbo.Trade.DataSourceID

    with

    LEFT JOIN dbo.DataSource ON dbo.DataSource.DataSourceID=dbo.Trade.DataSourceID


    Alex Suprun

  • Thank you both very much.

    Updating statistics on Trade and DataSource tables resolved the issue.

    Regards,

    Igor

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply