Looking for opinions on a slow Select-Distinct problem

  • One of the tables in our company was designed to be denormalized, the problem now is that I need to select all the records distinctly (but with specific fields included). Using Distinct is an option but growing very slow as the data gets larger:

    The denormalized table (let's call it tblDenormalized) has an Identity-type Primary Key as well as an IdentifierID that's used to link to other tables. It has about 25 fields with only 5 being different, the 20 remaining fields are always repeated every single time. We need to select all records but we only need to include those fields that are repeating.

    The original SQL is written as such:

    SELECT DISTINCT EAV.field1, EAV.field2, EAV.field3, EAV.field4 (repeat until field20),

    schLoc.fieldA,schLoc.fieldB,schEAL.field100

    FROM tblDenormalized EAV

    INNER JOIN tlkpDetailsTable1 schEAL ON schEAL.IdentifierID = EAV.IdentifierID

    INNER JOIN tlkpLookupTable2 schLoc ON schLoc.cityID = schEAL.cityID

    this is of course slow as the fields selected go to more than 20 so using Select-Distinct is very slow (taking 3 mins. to return the first record).

    The solution I've come up with is written below - it's about 4 times faster than the original one but it's still slow (taking about 50 seconds to start returning records). Is there a faster way to do this? Thanks.

    SELECT DISTINCT EAV.IdentifierID, EAV.field1, EAV.field2, EAV.field3, EAV.field4 (repeat until field20),

    schLoc.fieldA,schLoc.fieldB,schEAL.field100

    FROM tblDenormalized EAV

    INNER JOIN tlkpDetailsTable1 schEAL ON schEAL.IdentifierID = EAV.IdentifierID

    INNER JOIN tlkpLookupTable2 schLoc ON schLoc.cityID = schEAL.cityID

    INNER JOIN (

    SELECT EAV.IdentifierID, MIN(EAV.PkID) as PkID

    FROM tblDenormalized .tblEGPLAvailability EAV

    GROUP BY EAV.IdentifierID

    ) tblDistinctRecords ON EAV.PkID = tblDistinctRecords.PkID

  • Query Execution Plan might provide how optimizer is executing the query and information of Index existing on the table is not mentioned by you.


    Bru Medishetty

    Blog -- LearnSQLWithBru

    Join on Facebook Page Facebook.comLearnSQLWithBru

    Twitter -- BruMedishetty

  • If you really do need to get distinct rows out of a table with duplicates, you have few options. Distinct is the most likely one to work. Honestly, I'd look at redesigning the tables and normalising them properly. There are lots of disadvantages to a denormalised design, other than this one.

    If you just want help with this one specific query, please post table definitions, index definitions and execution plan, as per http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply