Looking for opinions on a slow Select-Distinct problem

Question

Looking for opinions on a slow Select-Distinct problem

angelo_p

SSC Journeyman

Points: 75
More actions
January 8, 2010 at 9:12 am

#72820

One of the tables in our company was designed to be denormalized, the problem now is that I need to select all the records distinctly (but with specific fields included). Using Distinct is an option but growing very slow as the data gets larger:
The denormalized table (let's call it tblDenormalized) has an Identity-type Primary Key as well as an IdentifierID that's used to link to other tables. It has about 25 fields with only 5 being different, the 20 remaining fields are always repeated every single time. We need to select all records but we only need to include those fields that are repeating.
The original SQL is written as such:
SELECT DISTINCT EAV.field1, EAV.field2, EAV.field3, EAV.field4 (repeat until field20),
schLoc.fieldA,schLoc.fieldB,schEAL.field100
FROM tblDenormalized EAV
INNER JOIN tlkpDetailsTable1 schEAL ON schEAL.IdentifierID = EAV.IdentifierID
INNER JOIN tlkpLookupTable2 schLoc ON schLoc.cityID = schEAL.cityID
this is of course slow as the fields selected go to more than 20 so using Select-Distinct is very slow (taking 3 mins. to return the first record).
The solution I've come up with is written below - it's about 4 times faster than the original one but it's still slow (taking about 50 seconds to start returning records). Is there a faster way to do this? Thanks.
SELECT DISTINCT EAV.IdentifierID, EAV.field1, EAV.field2, EAV.field3, EAV.field4 (repeat until field20),
schLoc.fieldA,schLoc.fieldB,schEAL.field100
FROM tblDenormalized EAV
INNER JOIN tlkpDetailsTable1 schEAL ON schEAL.IdentifierID = EAV.IdentifierID
INNER JOIN tlkpLookupTable2 schLoc ON schLoc.cityID = schEAL.cityID
INNER JOIN (
SELECT EAV.IdentifierID, MIN(EAV.PkID) as PkID
FROM tblDenormalized .tblEGPLAvailability EAV
GROUP BY EAV.IdentifierID
) tblDistinctRecords ON EAV.PkID = tblDistinctRecords.PkID

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

Bru Medishetty SSC-Insane Points: 23606 More actions · Answer 1

Query Execution Plan might provide how optimizer is executing the query and information of Index existing on the table is not mentioned by you.

Bru Medishetty

Blog -- LearnSQLWithBru

Join on Facebook Page Facebook.comLearnSQLWithBru

Twitter -- BruMedishetty

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 2

If you really do need to get distinct rows out of a table with duplicates, you have few options. Distinct is the most likely one to work. Honestly, I'd look at redesigning the tables and normalising them properly. There are lots of disadvantages to a denormalised design, other than this one.

If you just want help with this one specific query, please post table definitions, index definitions and execution plan, as per http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass