Help with tuning stored proc

  • Hi all

    I have a proc here where the estimated amount of rows is around 2 trillion where the actual numbers of rows is 300. I have updated statistics on the relevant tables to no avail, there seems to be a hash match at the time when the rows go miles out of synch.

    Does anyone have nay suggestions? Currently there are no indexes on the temp tables perhaps that would help I am unsure?

    I have attached the plan and the code is below.

    Thanks for any input

    DECLARE

    @paramCompanyvarchar(3),

    @paramStartDatedatetime,

    @paramEndDatedatetime,

    @paramSalesOfficevarchar(2000),

    @paramSalesResponsiblevarchar(max),

    @paramQwertyvarchar(2)

    SET @paramCompany= @Company

    SET @paramStartDate= @StartDate

    SET @paramEndDate= @EndDate

    SET @paramSalesOffice= @SalesOffice

    SET @paramSalesResponsible= @SalesResponsible

    SET @paramQwerty= @qwerty-2

    SELECT

    svcitem.DESPATCHPOINT,

    svcitem.DATAAREAID,

    svcitem.ITEMID

    INTO #d1

    FROM

    [BIDATAWSQL].DynamicsV5Realtime.dbo.MARDESPATCHSERVICEITEMS svcitem

    INNER JOIN

    [BIDATAWSQL].DynamicsV5Realtime.dbo.marreportenums enum

    ON

    svcitem.MARSALESCHARGEITEMTYPE = enum.ENUMVALUEINT

    WHERE

    enum.ENUMNAME = 'MARSalesChargeItemType'

    and enum.ENUMVALUESTR = 'Excess Haulage'

    and svcitem.DATAAREAID = @paramCompany

    -- to get Original Sales Orders that have had an RTW order create within the date parameters

    SELECT DISTINCT

    st.MARIMSSALESID,

    st.DATAAREAID

    INTO #d2

    FROM

    [BIDATAWSQL].DynamicsV5Realtime.dbo.SALESTABLE st --- RTW Order

    WHERE

    st.DATAAREAID = @paramCompany

    and st.SALESTYPE = 4 --Returned Order

    and st.CUSTACCOUNT not like('K' + '%')

    and st.CUSTACCOUNT not like('E' + '%')

    and (CAST(FLOOR(CAST(st.CREATEDDATETIME AS FLOAT))AS DATETIME) between @paramStartDate and @paramEndDate)

    SELECT

    st.MARIMSSALESID as 'OriginalOrder',

    st2.MARTRANSPORTGROUP as 'DespatchPoint',

    st2.MARVEHICLETYPE as 'OriginalOrderVehicleType',

    origwt.OriginalOrderWeight,

    isnull(servchrg.OriginalServiceCharges,0) as 'OriginalServiceCharges',

    CASE WHEN origwt.OriginalOrderWeight = ABS(SUM((sl.QTYORDERED * it.NETWEIGHT) / 1000))

    THEN isnull(servchrg.OriginalServiceCharges,0) ELSE 0 END as 'ReturnableServiceCharges',

    st.SALESID as 'RTWOrder',

    st.MARSALESOFFICE as 'RTWSalesOffice',

    st.MARVEHICLETYPE as 'RTWOrderVehicleType',

    pm.name as 'RTWSalesResponsible',

    st.CREATEDDATETIME as 'RTWOrderCreated',

    st.DELIVERYCITY + ' - ' + st.DELIVERYZIPCODE as 'SiteTown',

    dbo.udf_get_PricingArea(st.DELIVERYZIPCODE,#d1.ITEMID) as 'PricingArea',

    st.MARTRANSPORTGROUP as 'Returnto',

    ABS(SUM((sl.QTYORDERED * it.NETWEIGHT) / 1000)) as 'RTWWeight', -- in tons

    ho.SALESID as 'HaulageOrder',

    sl1.ITEMID as 'Item',

    sl1.SALESQTY as 'Qty',

    sl1.LINEAMOUNT as 'HaulageCharged',

    CASE WHEN st.MARVEHICLETYPE = '0' or st2.MARVEHICLETYPE = '0' THEN 1 ELSE 2 END as 'haulcalcfactor',

    coalesce(dbo.udf_get_YardPrice(st.DELIVERYZIPCODE,#d1.ITEMID), dbo.udf_get_SitePrice(st.DELIVERYZIPCODE,#d1.ITEMID)) as 'ListPrice',

    coalesce(dbo.udf_get_YardPrice(st.DELIVERYZIPCODE,#d1.ITEMID), dbo.udf_get_SitePrice(st.DELIVERYZIPCODE,#d1.ITEMID))

    * ABS(SUM((sl.QTYORDERED * it.NETWEIGHT) / 1000))

    * (CASE WHEN st.MARVEHICLETYPE = '0' or st2.MARVEHICLETYPE = '0' THEN 1 ELSE 2 END) as 'ExpectedHaulage', --ListPrice * RTWWeight * 2 or 1 now - see CASE statement

    #d1.ITEMID as 'ExcessHaulageItem',

    pm.emplid as 'EmplId',

    IsNull(pm.email,'gill.wilton.@marshalls.co.uk') as 'SalesResponsibleEmail'

    INTO #d3

    FROM

    [BIDATAWSQL].DynamicsV5Realtime.dbo.SALESTABLE st --- RTW Order

    LEFT OUTER JOIN

    udf_R000_EmployeeList() pm

    ON

    st.SALESRESPONSIBLE = pm.emplid

    INNER JOIN

    [BIDATAWSQL].DynamicsV5Realtime.dbo.SALESLINE sl

    ON

    sl.SALESID = st.SALESID

    and sl.DATAAREAID = st.DATAAREAID

    INNER JOIN

    #d2

    ON

    #d2.MARIMSSALESID = st.MARIMSSALESID

    and #d2.DATAAREAID = st.DATAAREAID

    INNER JOIN

    [BIDATAWSQL].DynamicsV5Realtime.dbo.INVENTTABLE it

    ON

    it.ITEMID = sl.ITEMID

    and it.DATAAREAID = sl.DATAAREAID

    LEFT OUTER JOIN

    [BIDATAWSQL].DynamicsV5Realtime.dbo.INVENTDIM id

    ON

    id.INVENTDIMID = sl.INVENTDIMID

    and id.DATAAREAID = sl.DATAAREAID

    INNER JOIN

    (SELECT

    st.SALESID,

    st.MARIMSSALESID,

    st.DATAAREAID,

    st.CREATEDDATETIME

    FROM

    [BIDATAWSQL].DynamicsV5Realtime.dbo.SALESTABLE st

    INNER JOIN

    [BIDATAWSQL].DynamicsV5Realtime.dbo.SALESLINE sl

    ON

    sl.SALESID = st.SALESID

    and sl.DATAAREAID = st.DATAAREAID

    WHERE

    st.DATAAREAID = @paramCompany

    and st.CUSTACCOUNT not like('K' + '%')

    and st.CUSTACCOUNT not like('E' + '%')

    and sl.ITEMID = 'JZEX111RTW0'

    ) ho -- Haulage Order

    ON

    ho.MARIMSSALESID = st.MARIMSSALESID

    and ho.DATAAREAID = st.DATAAREAID

    and ho.SALESID <> st.SALESID

    INNER JOIN

    [BIDATAWSQL].DynamicsV5Realtime.dbo.SALESLINE sl1

    ON

    sl1.SALESID = ho.SALESID

    and sl1.DATAAREAID = ho.DATAAREAID

    and sl1.ITEMID = 'JZEX111RTW0'

    INNER JOIN

    [BIDATAWSQL].DynamicsV5Realtime.dbo.SALESTABLE st1

    ON

    st1.SALESID = sl1.SALESID

    and st1.DATAAREAID = sl1.DATAAREAID

    LEFT OUTER JOIN

    [BIDATAWSQL].DynamicsV5Realtime.dbo.SALESTABLE st2

    ON

    st2.SALESID = st.MARIMSSALESID

    and st2.DATAAREAID = st.DATAAREAID

    -- to get total value of original order service charges

    LEFT OUTER JOIN

    (SELECT

    sl.SALESID,

    sl.DATAAREAID,

    SUM(sl.SALESQTY * sl.SALESPRICE) as 'OriginalServiceCharges'

    FROM

    [BIDATAWSQL].DynamicsV5Realtime.dbo.SALESLINE sl

    WHERE

    sl.ITEMID like ('JZEX' + '%')

    GROUP BY

    sl.SALESID,

    sl.DATAAREAID) servchrg

    ON

    servchrg.SALESID = st2.SALESID

    and servchrg.DATAAREAID = st2.DATAAREAID

    -- to get net weight of original order items sent

    LEFT OUTER JOIN

    (SELECT

    sl.SALESID,

    sl.DATAAREAID,

    ABS(SUM(sl.SALESQTY * it.NETWEIGHT / 1000)) as 'OriginalOrderWeight'

    FROM

    [BIDATAWSQL].DynamicsV5Realtime.dbo.SALESLINE sl

    INNER JOIN

    [BIDATAWSQL].DynamicsV5Realtime.dbo.INVENTTABLE it

    ON

    it.ITEMID = sl.ITEMID

    and it.DATAAREAID = sl.DATAAREAID

    WHERE

    sl.ITEMID not like ('JZEX' + '%')

    GROUP BY

    sl.SALESID,

    sl.DATAAREAID) origwt

    ON

    origwt.SALESID = st2.SALESID

    and origwt.DATAAREAID = st2.DATAAREAID

    LEFT OUTER JOIN

    #d1

    ON

    #d1.DESPATCHPOINT = st2.MARTRANSPORTGROUP

    and #d1.DATAAREAID = st2.DATAAREAID

    WHERE

    st.DATAAREAID = @paramCompany

    and st.SALESTYPE = 4 --Returned Order

    and st.CUSTACCOUNT not like('K' + '%')

    and st.CUSTACCOUNT not like('E' + '%')

    and st1.MARVEHICLETYPE = '100' --DUMMY T.M.G. (ADMINISTRATION)

    --and st1.MARFOEORDERTYPE = 0 -- commented out for Gill to test and advise ???

    and (st.MARSALESOFFICE in(select * from udf_MultiValueParameterHandlingString(@paramSalesOffice)) or @paramSalesOffice = 'All')

    and (pm.emplid IN (select * from udf_MultiValueParameterHandlingString(@paramSalesResponsible)) or @paramSalesResponsible = 'All')

    GROUP BY

    st.SALESID,

    st.MARIMSSALESID,

    st2.MARTRANSPORTGROUP,

    st.DELIVERYCITY + ' - ' + st.DELIVERYZIPCODE,

    st.MARTRANSPORTGROUP,

    ho.SALESID,

    sl1.ITEMID,

    sl1.SALESQTY,

    sl1.LINEAMOUNT,

    st.DELIVERYZIPCODE,

    #d1.ITEMID,

    st.CREATEDDATETIME,

    servchrg.OriginalServiceCharges,

    origwt.OriginalOrderWeight,

    st.MARVEHICLETYPE ,

    st2.MARVEHICLETYPE,

    pm.name,

    st.MARSALESOFFICE,

    pm.emplid,

    IsNull(pm.email,'gill.wilton.@marshalls.co.uk')

    SELECT

    #d3.OriginalOrder,

    #d3.DespatchPoint,

    #d3.OriginalOrderVehicleType,

    #d3.OriginalOrderWeight,

    #d3.OriginalServiceCharges,

    #d3.ReturnableServiceCharges,

    #d3.RTWOrder,

    #d3.RTWSalesOffice,

    #d3.RTWOrderVehicleType,

    #d3.RTWSalesResponsible,

    #d3.RTWOrderCreated,

    #d3.SiteTown,

    #d3.PricingArea,

    #d3.Returnto,

    #d3.RTWWeight,

    #d3.HaulageOrder,

    #d3.Item,

    #d3.Qty,

    #d3.HaulageCharged,

    #d3.ListPrice,

    #d3.ExpectedHaulage,

    #d3.ExcessHaulageItem,

    #d3.HaulageCharged - #d3.ExpectedHaulage as 'OldValDiff',

    (#d3.HaulageCharged - #d3.ExpectedHaulage) / (#d3.ExpectedHaulage) * 100 as 'OldPercentDiff',

    #d3.HaulageCharged - #d3.ExpectedHaulage - #d3.ReturnableServiceCharges as 'ValDiff',

    (#d3.HaulageCharged - #d3.ExpectedHaulage - #d3.ReturnableServiceCharges) / (#d3.ExpectedHaulage + #d3.ReturnableServiceCharges) * 100 as 'PercentDiff',

    #d3.haulcalcfactor,

    #d3.EmplId,

    #d3.SalesResponsibleEmail

    INTO #d4

    FROM

    #d3

    WHERE

    (#d3.HaulageCharged - #d3.ExpectedHaulage - #d3.ReturnableServiceCharges) / (#d3.ExpectedHaulage + #d3.ReturnableServiceCharges) * 100 <= -1 --< -0.05

    IF @paramQwerty = 'D' SELECT * FROM #d4

    IF @paramQwerty = 'S' SELECT DISTINCT #d4.EmplId, #d4.SalesResponsibleEmail FROM #d4

    ORDER BY

    1

    DROP TABLE #d1

    DROP TABLE #d2

    DROP TABLE #d3

    DROP TABLE #d4

    GO

  • " ... Thanks for any input ..."

    I hate "Order by 1" type code. Why not put the column name so it's clear ?

    And no comments and meaningless temp table names ... Very frustrating for anyone else that has to work on it.

  • yes I am aware its a big of a nightmare this is what happens when BAs write code. I am currently tearing my hair out re-writing it.

  • It looks like the query is accessing tables on a different server.

    Unlike the local server, distribution statistics from the remote server will not be returned if the user does not have permission in the remote server(eg. sysadmin, db_owner) to retrieve the statistics distribution results, and will use default assumptions.

  • An index on #d2 supporting this join ([#d2].MARIMSSALESID) = ([DynamicsV5Realtime].[dbo].[SALESTABLE].MARIMSSALESID) would reduce the exaggerated estimated row counts, but that's besides the point. This lot is unlikely to run well unless the whole caboodle is run remotely. I'd recommend you create a reporting db on server [BIDATAWSQL] and package these queries into a stored procedure in it. Not only would it perform far better than it does now, it would be a heck of a lot easier to perform the remaining tuning. You can only go so far with it as it is.

    If your udf's, particularly udf_R000_EmployeeList, aren't already configured as inline table-valued functions, then consider recoding them.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • You can put index on the columns which are using in your WHERE, ON clause ........

    _______________________________________________________________
    To get quick answer follow this link:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • ChrisM@Work (4/9/2013)


    An index on #d2 supporting this join ([#d2].MARIMSSALESID) = ([DynamicsV5Realtime].[dbo].[SALESTABLE].MARIMSSALESID) would reduce the exaggerated estimated row counts, but that's besides the point. This lot is unlikely to run well unless the whole caboodle is run remotely. I'd recommend you create a reporting db on server [BIDATAWSQL] and package these queries into a stored procedure in it. Not only would it perform far better than it does now, it would be a heck of a lot easier to perform the remaining tuning. You can only go so far with it as it is.

    If your udf's, particularly udf_R000_EmployeeList, aren't already configured as inline table-valued functions, then consider recoding them.

    Hi all and thanks for the input. I will check out ChrisM's suggestion in an attempt to maybe fix the hash match issue. As for your suggestion on creating a report DB, that is actually what we have however the tester was pointing from a test box to a live server. He was doing this in SSRS and the code was failing validation when executing, which I believe was being caused by the large amount of estimated rows.

  • This article by Benjamin Nevarez confirms what @sharky pointed out - if the user isn't the owner of the table or doesn't belong to the sysadmin, db_owner, or db_ddladmin roles on the remote server, the optimizer on the local server cannot get statistics for that data on the remote server. Lack of statistics almost always results in a suboptimal query plan.

    As @ChrisM@Work suggested, this query likely will perform a lot better if all the work to get the data from the remote server happens on the remote server. When a query references only tables on a remote server, SQL Server is smart enough to package the whole thing up and send it to the remote server for execution there. When a query joins local tables and remote tables, though, the optimizer has to decide whether it's better to request the data from the remote server and perform the join locally or to send the local data to the remote server to execute the join there and return the results. Based on the execution plan you provided, it looks like SQL Server is requesting the data from the remote machine and performing joins locally because of the joins to #d1 and #d2 in the portion of your proc that populates #d3. Without good statistics on the data, whether from the remote server

    One thing you didn't mention in your post, though, is how you perceive that the erroneous estimates are affecting the actual execution of the proc. Usually, execution plans tank when the actual row counts exceed the estimated row counts because the resources allocated and operations chosen for the execution are inadequate to deal with the larger actual row counts. In your case, the actual row counts are much, much smaller than the estimates - hundreds or thousands rather than billions. I wouldn't expect this to drag down the performance of this query too much - it may run slower than necessary, but maybe on the order of a few seconds rather than the hours and hours that can result from underestimated row counts. I might expect to see the performance of other queries on the server suffer, though, because this query will have much-larger-than-necessary memory grants and other resources allocated to it. Is thing the thing running unacceptably slowly, or are you just trying to understand the discrepancy between the estimates and actual row counts?

    Jason Wolfkill

  • Yes thats right I was really just trying to understand the differences between the est. and actual number of rows, and the explanations you have given go some way to do that.

    In fact the reason I was looking at this at all was because in SSRS the code validation fails on a subscription yet the subscription runs fine. I am currently looking into the use of tmp tables when validating code this way, as I believe the issue may lie in that.

    Thanks

  • Kwisatz78 (4/10/2013)


    Yes thats right I was really just trying to understand the differences between the est. and actual number of rows, and the explanations you have given go some way to do that.

    In fact the reason I was looking at this at all was because in SSRS the code validation fails on a subscription yet the subscription runs fine. I am currently looking into the use of tmp tables when validating code this way, as I believe the issue may lie in that.

    Thanks

    Do you mean the T-SQL generates a syntax error when parsed or that a run-time error occurs on execution when SSRS calls the stored proc? Using temp tables in the stored proc shouldn't be a problem. The first thing I always check in this situation is whether the credentials passed to the remote server have the appropriate permissions there. Linked server configurations default to using the security credentials of the current local login, but you can map a local login to a different set of credentials to use on the remote server. What credentials is your SSRS report running with, what credentials does the linked server connection pass to the remote server, and are the necessary permissions present in both places?

    Jason Wolfkill

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply