Need to Optimize Update Query

  • I am implementing Type 1 SCD - Update changed attributes and disregard older attributes.

    Load new data to DB-->Populate the STG table(Derive a flag based on data) --> update the dbo.customerdetails table if the values are different for an existing customer

    The update is taking for around 150 minutes to process 300,000 records to dbo.customerdetails table. i did tried disabling the indexes but it didnt help. Also i have attached the execution plan..

    FYI. The dbo.customerdetails tables has around 3 million records and the processing does happen in batches.

    UPDATE cd

    SET cd.DetailID = T1.DetailID ,

    cd.DetailValue = T1.DetailValue ,

    cd.DetailTag = T1.DetailTag ,

    cd.Batch = T1.Batch ,

    cd.updateddate = 'Jun 16 2014 10:26AM'

    FROM STG.CustomerDetails T1

    INNER JOIN dbo.Customer C ON T1.RowID = c.RowID

    AND T1.TypeID = c.TypeID

    INNER JOIN dbo.CustomerDetails CD ON c.CustomerID = cd.CustomerID

    AND ( cd.DetailID = T1.DetailID )

    AND cd.DateTo IS NULL

    WHERE T1.Batch = 11

    AND ( ISNULL(cd.DetailID, 0) != ISNULL(T1.DetailID, 0)

    OR ISNULL(cd.DetailValue, 0) != ISNULL(T1.DetailValue, 0)

    OR ISNULL(cd.DetailTag, '') != ISNULL(T1.DetailTag, '')

    )

    Thanks,
    Chinna

    Its the Journey which gives you Happiness not the Destination-- Dan Millman

  • Are you updating a view?

    There is a stream aggregate which suggests some grouping going on somewhere. You'll need to attach the actual plan as a *.sqlplan to properly interpret it.

  • There is a view which is built on dbo.customerdetail table.. Also attached the actual plan.

    Thanks,
    Chinna

    Its the Journey which gives you Happiness not the Destination-- Dan Millman

  • Your query has an inner join between tables STG.personattribute (stg) and GDW.personattribute (tar)

    which includes this predicate: Tar.AttributeID = stg.AttributeID.

    The WHERE clause includes this predicate: ISNULL(Tar.AttributeID, 0) != ISNULL(stg.AttributeID, 0),

    which fortunately is ORred or you wouldn't get any results at all. However, it's sufficient "red flag dodgy coding"

    to put the query back into the test stream rather than the optimisation stream.

    Other points to note:

    The estimated plan is of limited use for optimisation studies.

    Using functions - including ISNULL - on columns in joins or the WHERE clause is likely to impact performance.

    Table GDW.Person is joined to both tar and stg. Without the rowcounts available from an "actual" execution plan

    it's difficult to tell, but this could be a significant bottleneck.

    The plan includes a key lookup to fetch AttributeValue and AttributeTag from tar. Consider adding these two columns

    to the INCLUDE column clause of index UQ__PersonAttribute__4F1CD4FC.

    The table which is the update target should ideally be the first table in the FROM list, like this:

    UPDATE tar

    SET AttributeID = stg.AttributeID ,

    AttributeValue = stg.AttributeValue ,

    AttributeTag = stg.AttributeTag ,

    BatchFileID = stg.BatchFileID ,

    DateUpdated = 'Jun 16 2014 10:26AM'

    FROM GDW.personattribute tar

    INNER JOIN STG.personattribute stg

    ON tar.AttributeID = stg.AttributeID

    AND tar.DateTo IS NULL

    INNER JOIN GDW.Person p

    ON p.SourceRowID = stg.SourceRowID

    AND p.SourceID = stg.SourceID

    and p.PersonID = tar.PersonID

    WHERE stg.BatchFileID = 5502

    AND ( ISNULL(Tar.AttributeID, 0) != ISNULL(stg.AttributeID, 0)

    OR ISNULL(Tar.AttributeValue, 0) != ISNULL(stg.AttributeValue, 0)

    OR ISNULL(Tar.AttributeTag, '') != ISNULL(stg.AttributeTag, '')

    )

    Try optimising this query in two phases, beginning with the SELECT equivalent of the UPDATE:

    SELECT

    tar.AttributeID, stg.AttributeID ,

    tar.AttributeValue, stg.AttributeValue ,

    tar.AttributeTag, stg.AttributeTag ,

    tar.BatchFileID, stg.BatchFileID ,

    tar.DateUpdated

    FROM GDW.personattribute tar

    INNER JOIN STG.personattribute stg

    ON tar.AttributeID = stg.AttributeID

    AND tar.DateTo IS NULL

    INNER JOIN GDW.Person p

    ON p.SourceRowID = stg.SourceRowID

    AND p.SourceID = stg.SourceID

    and p.PersonID = tar.PersonID

    WHERE stg.BatchFileID = 5502

    AND (

    ISNULL(Tar.AttributeID, 0) != ISNULL(stg.AttributeID, 0) OR

    ISNULL(Tar.AttributeValue, 0) != ISNULL(stg.AttributeValue, 0) OR

    ISNULL(Tar.AttributeTag, '') != ISNULL(stg.AttributeTag, '')

    )

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply