Distinct performance

  • When I've created a table filled it with data then select distinct data from it, it takes hours and is still running up to 17 hours.

    Removing the distinct returns all the rows. Creating the same table in the master or MSDB database and running the select distinct statement it returns in a second.

    I can't see any apparent differences in the databases in terms or collation or compatibility.

    The script I'm using is below:

    if exists (select 1 from sys.tables where name = 'temp_table')

    drop table temp_table

    CREATE TABLE temp_table (data(100) null)

    -- Populating the temp_table takes a few seconds

    begin tran

    insert temp_table

    SELECT data

    FROM myTable with (nolock)

    commit tran

    -- Takes 17 hours and is still running

    select distinct data from temp_table

    Any ideas why?

    I've check disk I/O, CPU and memory and they all seem fine.

  • berki (7/19/2015)


    When I've created a table filled it with data then select distinct data from it, it takes hours and is still running up to 17 hours.

    Removing the distinct returns all the rows. Creating the same table in the master or MSDB database and running the select distinct statement it returns in a second.

    I can't see any apparent differences in the databases in terms or collation or compatibility.

    The script I'm using is below:

    if exists (select 1 from sys.tables where name = 'temp_table')

    drop table temp_table

    CREATE TABLE temp_table (data(100) null)

    -- Populating the temp_table takes a few seconds

    begin tran

    insert temp_table

    SELECT data

    FROM myTable with (nolock)

    commit tran

    -- Takes 17 hours and is still running

    select distinct data from temp_table

    Any ideas why?

    I've check disk I/O, CPU and memory and they all seem fine.

    Thousands of ideas as to "why". To start with, have you checked for blocking and uncommitted transactions?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yep - nothing else is running and there's no blocking I can remove the transaction commit statements but still the same.

    I've created a new database on local storage, same outcome the distinct query never completes.

  • Quick questions, what are the server's specs? What is the cost threshold for parallelism? What are the tempdb configurations? Can you post the execution plans? Any indices on the table?

    😎

    Edit:typo

  • berki (7/19/2015)


    Yep - nothing else is running and there's no blocking I can remove the transaction commit statements but still the same.

    I've created a new database on local storage, same outcome the distinct query never completes.

    What happens if you manually execute a COMMIT (and just the word COMMIT)? If it does NOT return with an error, then you had an open transaction somewhere in the mix.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If the same query is run as a group by command instead of Select Distinct does it complete?

    How many rows are you expecting as the output from the total? have you created any indexes and updated the Stats for the table after the insert?

    Is Autocreate stats disabled at the db level?

    Jayanth Kurup[/url]

  • Well I can't believe it but turning off the Auto create statistics at the database level meant the query ran successfully. But I left auto update statistics enabled.

    Why would the addition of statistics make the query so slow?

    Would I have to manually create statistics on any new table? If so is there a T-SQL script to do this for all tables that don't have statistics?

    Many thanks for you help Jayanth

  • Hmm, no reason to insert a large number of rows and then do a distinct. Do either of the following, depending on if you need a count for each unique value or not:

    begin tran

    insert temp_table

    SELECT distinct data

    FROM myTable with (nolock)

    /*OR

    insert temp_table

    SELECT data,count(*)

    FROM myTable with (nolock)

    GROUP BY data

    */

    commit tran

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Why would you wrap single insert statement in transaction?

    There is no reason to use nolock hint, untill you want some records to be missed or duplicated.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply