Query optimisation

  • Hi all

     

    I've got a query that has to update a table containing several million records.

    I've attached the create statements for the table and the stored procedure in question.

     

    Because of the nature of the query, we have to run it against every record (which I think is the bit that has slowed everything down).

     

    Can anyone see any improvements that can be made?

    I've checked the execution plan and it's not recommending any indexes.

     

    Any help anyone can give would be greatly appreciated.

     

    We're running SQL2016 Developer and this is to be deployed to SQL2016 Enterprise when it's ready.

    ::edit::

    I've had to rename the files with a .txt extension (from the original .sql extension) as they wouldn't upload.

     

    TIA

    Richard

     

     

    • This topic was modified 3 years, 11 months ago by  richardmgreen1. Reason: Issues with file uploads
    Attachments:
    You must be logged in to view attached files.
  • I think you do need at least a couple of indexes.  Also, you should capture both pat and ref flags in the same pass of the table and update them both in one UPDATE statement.  I don't have time to post more details on how to do that now, but I will do so later.

    The temp table should be clustered on the CodedResponseKey, not a separate identity column (there is no commandment that all tables must have an identity column, let alone cluster on it!).  That could allow a more efficient join.  Like this:

    CREATE TABLE #pat_flags

    (

    ,CodedResponseKey BIGINT NOT NULL PRIMARY KEY

    ,IsFirstRecordedCodePatient INT NOT NULL

    ,IsFirstRecordedTemplatePatient INT NOT NULL

    ,IsLatestRecordedCodePatient INT NOT NULL

    ,IsLatestRecordedTemplatePatient INT NOT NULL

    );

     

    CREATE UNIQUE NONCLUSTERED INDEX Fact_Community_Coded_Responses__IX_CTV3CodeKey 
    ON dbo.Fact_Community_Coded_Responses ( CTV3CodeKey, CurrentPatientKey, CareContactDateKey, CareContactTimeKey, pkCodedResponseKey )
    INCLUDE ( ReferralKey )
    WITH ( DATA_COMPRESSION = ROW, FILLFACTOR = 98 /*change fillfactor if/as needed*/, SORT_IN_TEMPDB = ON )
    ON [PRIMARY] /*change filegroup if/as needed*/


    CREATE UNIQUE NONCLUSTERED INDEX Fact_Community_Coded_Responses__IX_CommunityTemplateKey
    ON dbo.Fact_Community_Coded_Responses ( CommunityTemplateKey, CurrentPatientKey, CareContactDateKey, CareContactTimeKey, pkCodedResponseKey )
    INCLUDE ( ReferralKey )
    WITH ( DATA_COMPRESSION = ROW, FILLFACTOR = 98 /*change fillfactor if/as needed*/, SORT_IN_TEMPDB = ON )
    ON [PRIMARY] /*change filegroup if/as needed*/

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • It would be helpful if you knew which particular statements within the stored procedure are the long running ones. The first thing to do is identify which statement(s) are taking the most time.

    Also, you are updating a table from a LEFT JOIN. This makes no sense to me. Why are you using a LEFT JOIN not an INNER JOIN in the following two statements?

    UPDATE fccr
    SET IsFirstRecordedCodePatient = f.IsFirstRecordedCodePatient,
    IsFirstRecordedTemplatePatient = f.IsFirstRecordedTemplatePatient,
    IsLatestRecordedCodePatient = f.IsLatestRecordedCodePatient,
    IsLatestRecordedTemplatePatient = f.IsLatestRecordedTemplatePatient
    FROM #pat_flags f
    LEFT JOIN dbo.Fact_Community_Coded_Responses fccr
    ON fccr.pkCodedResponseKey = f.CodedResponseKey
    WHERE f.ID BETWEEN @ID AND @ID + 999999;


    UPDATE fccr
    SET IsFirstRecordedCodeReferral = f.IsFirstRecordedCodeReferral,
    IsFirstRecordedTemplateReferral = f.IsFirstRecordedTemplateReferral,
    IsLatestRecordedCodeReferral = f.IsLatestRecordedCodeReferral,
    IsLatestRecordedTemplateReferral = f.IsLatestRecordedTemplateReferral
    FROM #ref_flags f
    LEFT JOIN dbo.Fact_Community_Coded_Responses fccr
    ON fccr.pkCodedResponseKey = f.CodedResponseKey
    WHERE f.ID BETWEEN @ID AND @ID + 999999;

    I would create indexes on the two temporary tables in the stored procedure immediately after you have populated them:

    CREATE INDEX IX_#pat_flags_1 ON #pat_flags
    (
    ID,
    CodedResponseKey
    )
    INCLUDE
    (
    IsFirstRecordedCodePatient,
    IsFirstRecordedTemplatePatient,
    IsLatestRecordedCodePatient,
    IsLatestRecordedTemplatePatient
    );

    CREATE INDEX IX_#ref_flags_1 ON #ref_flags
    (
    ID,
    CodedResponseKey
    )
    INCLUDE
    (
    IsFirstRecordedCodeReferral,
    IsFirstRecordedTemplateReferral,
    IsLatestRecordedCodeReferral,
    IsLatestRecordedTemplateReferral
    );

    I'm not 100% about the order of ID, CodedResponseKey so it would be worth experimenting with it in the opposite order, i.e. CodedResponseKey, ID in the leading part of the indexes.

    There is no need to have DISTINCT in statements like: WHERE colX IN (SELECT DISTINCT myCol...

    as SQL server will work out you don't need duplicated values.

  • Thanks both

     

    It looks like it's the ROW_NUMBER() section that's taking the time to work out.

    Is there any sort of alternative to that bit?

    I need to get the records in order and then put a marker against the first and last ones.

  • I've tried recoding the query with *some* success (it's now down to 42 minutes on a single run from an hour).

     

    Can someone have anther look and check I've got the indexes on the right things please?

    I'm also open to other suggestions for speeding it up.

     

    The new code is now just a SELECT rather than an update (but I can soon change that).

     

    TIA

    Attachments:
    You must be logged in to view attached files.
  • richardmgreen1 wrote:

    Thanks both

    It looks like it's the ROW_NUMBER() section that's taking the time to work out.

    Is there any sort of alternative to that bit?

    I need to get the records in order and then put a marker against the first and last ones.

    I don't know if you have alternatives, I don't know  your data that well.  But I suggested you needed the indexes to address the ROW_NUMBER issue.  The indexes will (should) avoid the sort for two of the ROW_NUMBERs and reduce the overhead of the other two sorts.  Sorting is a very expensive operation.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply