Partitioned Table performance Question.

  • I've been having a look at a project that is underperforming when selecting from a view, and noticed that in the primary View there are a number of interesting joins going on.

    The structure of the tables is basically

    CREATE Table #Partitioned (

    Id Int Identity(1,1) NOT FOR REPLICATION

    ,Col1Id int

    ,Col2Id int

    ,Col3Id int

    ,SysId int

    )

    Create Table #Lookup (

    ColId Int Identity(1,1) NOT FOR REPLICATION

    SysId Int

    ColDesc varchar(255)

    )

    With the table #Partitioned Partitioned by SysId, with a compound PK consisting of Id and SysId

    The table #Lookup has a Compound Primary key of ColId and SysId, even though ColId is Unique, due to it being an Identity column

    The query in view that joins these is basically

    Select *

    from #Partitioned p

    LEFT JOIN #Lookup l

    on p.Col1Id=l.ColId

    AND p.SysId=l.SysId

    LEFT JOIN #Lookup l2

    on p.Col2Id=l2.ColId

    AND p.SysId=l2.SysId

    LEFT JOIN #Lookup l3

    on p.Col2Id=l3.ColId

    AND p.SysId=l3.SysId

    Row counts approx 10 million rows in #Partitioned and 50,000 rows in #Lookup.

    I've done some preliminary testing by removing the SysId, from the Join and had around 4-5% performance increase with no visible impact on the result set.

    In both instances the SQL server Query plan (Actual and Estimated) both show a Table Scan of the #Lookup

    I've asked why this was done and I received an answer i didnt think was correct, so I would like to understand the following

    1) Is the SysId in the PK on the #lookup table pointless? As the col1Id is Unique by default so it doesnt need the additional column to make it unique.

    2) Why doesnt SQL server use the PK on the join?

    3) Would an Index hint on the join Increase the performance?

    I cant give actual query plans or code, but would just love to understand the rational behind this design.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • The optimizer is choosing to scan the lookup table because it thinks that is the fastest way for it to deliver the data. Consider that for each row in #Partitioned, there are 3 LEFT JOINs, meaning it would have to do thirty million index seek operations against the #Lookup table. I suspect it opted instead for a hash join. The execution plan would confirm this.

    The SysID in the primary key of the #lookup table isn't necessarily pointless, but it really doesn't help this query, because the SysIDs are all spread out due to the fact that ColID is the primary column in the key. The index you described would only be truly helpful for queries that requested ColID and SysID only, because those queries could be answered without going to the leaf level of the index.

    As for the index hint, try it and see what happens. 😉 Be sure to turn STATISTICS IO ON and get a baseline of the number of logical reads before you use the index hint.

    Hope this helps.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Jason-299789 (8/6/2012)


    1) Is the SysId in the PK on the #lookup table pointless? As the col1Id is Unique by default so it doesnt need the additional column to make it unique.

    From your description, yes... but so is including it as a PK component on #Partitioned, since that's ALSO an identity. As part of the clustered index however it will help if you have a more restrictive where clause that only uses certain sysIDs. This may be a case of overusage.

    2) Why doesnt SQL server use the PK on the join?

    Depends, is the PK also the clustered? It'll also depend on column ordering and the like. Without the execution plan (which I realize you can't provide) pretty much everything we can offer would be a guess.

    3) Would an Index hint on the join Increase the performance?

    For a specific use case, possibly. Generically, not usually. They're the last-ditch effort to try to nail down what's wrong in the design so you can eventually remove the hints anyway and let the optimizer run properly.

    In general, when you're pulling that many rows back, you need to evaluate the design in general rather than what the process is doing. The purpose of it, the expectations, etc etc. Without more details though, it's hard to help optimize a process that could have over a hundred different tweaks that are causing you more headaches than expected.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Thank you to both of the replies so far its confirmed a few things, and given me a few things to ponder.

    The PK is Non-clustered on the #Lookup table, and looking at the table there is no clustered Index at all, unfortunately id assumed the PK (or secondary index) was clustered, which probably explains the table scan. Id like to see what happens if you make the PK clustered.

    I did some 'off-record' testing by dropping the SysId from the joins inthe actual query, and over a couple of runs there was an approx 5% performance improvement.

    Also in the actual query the #Lookup table is referenced around 8 times. At the moment the query returns around 25% of the data in a little over 13-14 minutes.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • Jason-299789 (8/7/2012)


    Thank you to both of the replies so far its confirmed a few things, and given me a few things to ponder.

    The PK is Non-clustered on the #Lookup table, and looking at the table there is no clustered Index at all, unfortunately id assumed the PK (or secondary index) was clustered, which probably explains the table scan. Id like to see what happens if you make the PK clustered.

    I did some 'off-record' testing by dropping the SysId from the joins inthe actual query, and over a couple of runs there was an approx 5% performance improvement.

    Also in the actual query the #Lookup table is referenced around 8 times. At the moment the query returns around 25% of the data in a little over 13-14 minutes.

    Assuming the optimizer thinks there is 25% of the table's rows coming back and that estimate is accurate you will almost certainly get scans in your case since you have no WHERE clause. The only thing that will help this is a) better hardware (especially RAM and IO) and b) not trying to return too much data to SSMS if that is what you are doing with your SELECT.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply