table with 3 million plus records taking half a minute, can I improve performance

  • Thank you in advance for any help.

    I have a table that I need to do some computations on all the data but first I need to remove the duplicate records and insert the results into a destination table. Here's the example below. My table has 3.1 million rows. I have tried using the DISTINCT and the GROUP BY but both ways to select the data takes about half a minute to run. I'm wondering if there is a way to increase performance. Users are ok with this time since the process runs overnight but improving it won't hurt. I do have a clustered index on these fields but that doesn't seem to improve any.

    SELECTDateYear ,

    DateMonth ,

    Nbr ,

    Nbr1 ,

    Nbr2 ,

    Datafield1 ,

    Datafield2,

    ID

    FROM [dbo].[TableName]

    WHERE 1 = 1

    AND COALESCE([ID],0) > 0 -- ID sometimes is NULL, we don't want these records

    GROUP BYDateYear ,

    DateMonth ,

    Nbr ,

    Nbr1 ,

    Nbr2 ,

    Datafield1 ,

    Datafield2,

    ID

  • I don't think you need that COALESCE call. NULL > 0 is not true, so [ID] > 0 should be equivalent. My memory is a little rusty on this, so you may want to confirm.

    Alternatively, the function call to COALESCE will be called on every row. Try replacing that COALESCE call with ([ID] > 0 AND [ID] IS NOT NULL).

  • HildaJ (8/6/2015)


    Thank you in advance for any help.

    I have a table that I need to do some computations on all the data but first I need to remove the duplicate records and insert the results into a destination table. Here's the example below. My table has 3.1 million rows. I have tried using the DISTINCT and the GROUP BY but both ways to select the data takes about half a minute to run. I'm wondering if there is a way to increase performance. Users are ok with this time since the process runs overnight but improving it won't hurt. I do have a clustered index on these fields but that doesn't seem to improve any.

    SELECTDateYear ,

    DateMonth ,

    Nbr ,

    Nbr1 ,

    Nbr2 ,

    Datafield1 ,

    Datafield2,

    ID

    FROM [dbo].[TableName]

    WHERE 1 = 1

    AND COALESCE([ID],0) > 0 -- ID sometimes is NULL, we don't want these records

    GROUP BYDateYear ,

    DateMonth ,

    Nbr ,

    Nbr1 ,

    Nbr2 ,

    Datafield1 ,

    Datafield2,

    ID

    Get rid of the COALESCE, as Andrew points out it's not required and will punish performance.

    The speed of this query will depend on whether or not there's an index in place to support a streaming aggregate. With the COALESCE removed, can you post an Actual execution plan as an attachment file, and also ddl for the table and indexes. Cheers.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Andrew P (8/6/2015)


    I don't think you need that COALESCE call. NULL > 0 is not true, so [ID] > 0 should be equivalent. My memory is a little rusty on this, so you may want to confirm.

    Alternatively, the function call to COALESCE will be called on every row. Try replacing that COALESCE call with ([ID] > 0 AND [ID] IS NOT NULL).

    To help clear out the "fuzzies", you don't need to check for IS NOT NULL. Simply ID > 0 will remove and rows that are NULL. NULL is not equal to, greater than or less than 0. It is NULL.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • --Keep your query as simple as possible

    SELECT DISTINCT DateYear ,

    DateMonth , Nbr , Nbr1 , Nbr2 , Datafield1 , Datafield2, ID

    FROM [dbo].[TableName]

    WHERE [ID] > 0

    If possible, try to decrease the amount of column you want to run distinct against. And than use join to get rest of the Columns.

  • Thanks for the advice, I remove the COALESCE and it improve the query by a few three seconds. Unfortunately, I do have to have all those columns.

  • GROUP BY will typically perform less well than DISTINCT:

    SELECT DISTINCT DateYear ,

    DateMonth ,

    Nbr ,

    Nbr1 ,

    Nbr2 ,

    Datafield1 ,

    Datafield2,

    ID

    FROM [dbo].[TableName]

    WHERE 1 = 1

    AND ID > 0

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Can you not use SSIS and remove the duplicates using the SORT transformation since you said its a nightly job? If you have to do it only the T-SQL way, why not use the ROW_NUMBER() and determine if there are duplicate rows and delete everything that is greater than 1.

    https://www.youtube.com/watch?v=4fl1tN2DSQE

    https://www.youtube.com/watch?v=O4pf7m-VTSk

    just a thought.. hope it helps...

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply