Index - Whether or not my table should have one

  • Hi,

    I have a typical DWH scenario of  Source  -->   Destination ETL.

    During this ETL, there are some intermediate tables constructed and dropped.   Source  -->  Stage1  --->  Stage 2.....Stage N --->  Destination.

    Each of these Stage tables are dedicated to this particular ETL and are not used by another process/user.    Since any INDEX has its own housekeeping overhead, having one need not necessarily guarantee enhanced performance.   

       1)  Thus,   Record-count-wise, Depth-wise or Width-wise  is there a thumb rule when we should think of Indexing a table....in my scenario, the Staging tables?
       2)  if not from Volume standpoint as said in  (1)  above, from the corresponding Execution Plan,  is there anything to be watched for,  that ALARMs the fact that, a particular area  can be tuned?   (such as RID Lookups as candidates for Covering Index  or    Hash Join/Merge/Nested joins to be converted into a  more efficient mechanism etc)

    thank you

  • If indexing will enhance the performance of your intermediary staging tables then it's a good idea. Without a lot more information no one can give you meaningful advise. Try testing the load process with indexes vs without.

  • etl2016 - Saturday, July 28, 2018 3:44 PM

    Hi,

    I have a typical DWH scenario of  Source  -->   Destination ETL.

    During this ETL, there are some intermediate tables constructed and dropped.   Source  -->  Stage1  --->  Stage 2.....Stage N --->  Destination.

    Each of these Stage tables are dedicated to this particular ETL and are not used by another process/user.    Since any INDEX has its own housekeeping overhead, having one need not necessarily guarantee enhanced performance.   

       1)  Thus,   Record-count-wise, Depth-wise or Width-wise  is there a thumb rule when we should think of Indexing a table....in my scenario, the Staging tables?
       2)  if not from Volume standpoint as said in  (1)  above, from the corresponding Execution Plan,  is there anything to be watched for,  that ALARMs the fact that, a particular area  can be tuned?   (such as RID Lookups as candidates for Covering Index  or    Hash Join/Merge/Nested joins to be converted into a  more efficient mechanism etc)

    thank you

    No arguments with Joe. You have to make the determination yourself through testing what, where, and how to index any given table within your own system. Staging or not, the right indexes in the right place help performance.

    As to looking at the execution plans, there isn't a single set of "do this/don't do this" that will work in all situations. Yes, an RID lookup tells you that you don't have a clustered index, you do have nonclustered indexes, and that you are having to go to the heap to retrieve the data that is not a part of that nonclustered index. Is that a problem? Well, is it for one row or one million rows? Will adding a clustered index slow down the insert or update that you're doing as part of the data load? Can you do the data load and then add the index in support of queries against the loaded data? After adding the cluster do you still need the nonclustered index? Will making the nonclustered index covering work as well or better than creating a clustered index? Can we make the index unique? Is the query doing an aggregation? What if we added a columnstore index? Clustered columnstore or nonclustered?

    See what I mean? It's all far too dependent on the situation to give you a 10 point check list that will, guaranteed, cover 90% of issues.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Non-clustered index(es) are almost always a waste of resources on a staging table.  

    A clustered index is much more likely to be useful, although it's obviously not guaranteed to be.  If you join the staging table a lot, and the join column(s) are the same as the clus key in the final table the staging table will be loaded into, then definitely clus the staging table that way.

    Otherwise, as everyone has noted, it's on a case by case basis, but with special scrutiny (review) of any non-clus index(es), as there has to be a big, known advantage to bother with a non-clus index on a staging table.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • If the staging tables are a decent size (over a million rows), consider creating them with a clustered columnstore index.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply