January 8, 2010 at 9:12 am
One of the tables in our company was designed to be denormalized, the problem now is that I need to select all the records distinctly (but with specific fields included). Using Distinct is an option but growing very slow as the data gets larger:
The denormalized table (let's call it tblDenormalized) has an Identity-type Primary Key as well as an IdentifierID that's used to link to other tables. It has about 25 fields with only 5 being different, the 20 remaining fields are always repeated every single time. We need to select all records but we only need to include those fields that are repeating.
The original SQL is written as such:
SELECT DISTINCT EAV.field1, EAV.field2, EAV.field3, EAV.field4 (repeat until field20),
schLoc.fieldA,schLoc.fieldB,schEAL.field100
FROM tblDenormalized EAV
INNER JOIN tlkpDetailsTable1 schEAL ON schEAL.IdentifierID = EAV.IdentifierID
INNER JOIN tlkpLookupTable2 schLoc ON schLoc.cityID = schEAL.cityID
this is of course slow as the fields selected go to more than 20 so using Select-Distinct is very slow (taking 3 mins. to return the first record).
The solution I've come up with is written below - it's about 4 times faster than the original one but it's still slow (taking about 50 seconds to start returning records). Is there a faster way to do this? Thanks.
SELECT DISTINCT EAV.IdentifierID, EAV.field1, EAV.field2, EAV.field3, EAV.field4 (repeat until field20),
schLoc.fieldA,schLoc.fieldB,schEAL.field100
FROM tblDenormalized EAV
INNER JOIN tlkpDetailsTable1 schEAL ON schEAL.IdentifierID = EAV.IdentifierID
INNER JOIN tlkpLookupTable2 schLoc ON schLoc.cityID = schEAL.cityID
INNER JOIN (
SELECT EAV.IdentifierID, MIN(EAV.PkID) as PkID
FROM tblDenormalized .tblEGPLAvailability EAV
GROUP BY EAV.IdentifierID
) tblDistinctRecords ON EAV.PkID = tblDistinctRecords.PkID
January 8, 2010 at 9:23 am
Query Execution Plan might provide how optimizer is executing the query and information of Index existing on the table is not mentioned by you.
Blog -- LearnSQLWithBru
Join on Facebook Page Facebook.comLearnSQLWithBru
Twitter -- BruMedishetty
January 8, 2010 at 9:24 am
If you really do need to get distinct rows out of a table with duplicates, you have few options. Distinct is the most likely one to work. Honestly, I'd look at redesigning the tables and normalising them properly. There are lots of disadvantages to a denormalised design, other than this one.
If you just want help with this one specific query, please post table definitions, index definitions and execution plan, as per http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 3 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply