Distinct performance

Question

Distinct performance

berki

Mr or Mrs. 500

Points: 510
More actions
July 19, 2015 at 9:15 am

#300326

When I've created a table filled it with data then select distinct data from it, it takes hours and is still running up to 17 hours.
Removing the distinct returns all the rows. Creating the same table in the master or MSDB database and running the select distinct statement it returns in a second.
I can't see any apparent differences in the databases in terms or collation or compatibility.
The script I'm using is below:
if exists (select 1 from sys.tables where name = 'temp_table')
drop table temp_table
CREATE TABLE temp_table (data(100) null)
-- Populating the temp_table takes a few seconds
begin tran
insert temp_table
SELECT data
FROM myTable with (nolock)
commit tran
-- Takes 17 hours and is still running
select distinct data from temp_table
Any ideas why?
I've check disk I/O, CPU and memory and they all seem fine.

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 1

berki (7/19/2015)
When I've created a table filled it with data then select distinct data from it, it takes hours and is still running up to 17 hours.
Removing the distinct returns all the rows. Creating the same table in the master or MSDB database and running the select distinct statement it returns in a second.
I can't see any apparent differences in the databases in terms or collation or compatibility.
The script I'm using is below:
if exists (select 1 from sys.tables where name = 'temp_table')
drop table temp_table
CREATE TABLE temp_table (data(100) null)
-- Populating the temp_table takes a few seconds
begin tran
insert temp_table
SELECT data
FROM myTable with (nolock)
commit tran
-- Takes 17 hours and is still running
select distinct data from temp_table
Any ideas why?
I've check disk I/O, CPU and memory and they all seem fine.

Thousands of ideas as to "why". To start with, have you checked for blocking and uncommitted transactions?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

berki Mr or Mrs. 500 Points: 510 More actions · Answer 2

Yep - nothing else is running and there's no blocking I can remove the transaction commit statements but still the same.

I've created a new database on local storage, same outcome the distinct query never completes.

Eirikur Eiriksson SSC Guru Points: 182939 More actions · Answer 3

Quick questions, what are the server's specs? What is the cost threshold for parallelism? What are the tempdb configurations? Can you post the execution plans? Any indices on the table?

😎

Edit:typo

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 4

berki (7/19/2015)
Yep - nothing else is running and there's no blocking I can remove the transaction commit statements but still the same.
I've created a new database on local storage, same outcome the distinct query never completes.

What happens if you manually execute a COMMIT (and just the word COMMIT)? If it does NOT return with an error, then you had an open transaction somewhere in the mix.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jayanth_Kurup SSC-Insane Points: 22967 More actions · Answer 5

If the same query is run as a group by command instead of Select Distinct does it complete?

How many rows are you expecting as the output from the total? have you created any indexes and updated the Stats for the table after the insert?

Is Autocreate stats disabled at the db level?

Jayanth Kurup[/url]

berki Mr or Mrs. 500 Points: 510 More actions · Answer 6

Well I can't believe it but turning off the Auto create statistics at the database level meant the query ran successfully. But I left auto update statistics enabled.

Why would the addition of statistics make the query so slow?

Would I have to manually create statistics on any new table? If so is there a T-SQL script to do this for all tables that don't have statistics?

Many thanks for you help Jayanth

ScottPletcher SSC Guru Points: 101244 More actions · Answer 7

Hmm, no reason to insert a large number of rows and then do a distinct. Do either of the following, depending on if you need a count for each unique value or not:

begin tran

insert temp_table

SELECT distinct data

FROM myTable with (nolock)

/*OR

insert temp_table

SELECT data,count(*)

FROM myTable with (nolock)

GROUP BY data

*/

commit tran

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 8

Why would you wrap single insert statement in transaction?

There is no reason to use nolock hint, untill you want some records to be missed or duplicated.

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]