Temporary Indexes

  • Comments posted to this topic are about the item Temporary Indexes

  • I use them all the time on temporary tables. Where you have temp tables with lots of data, or they are joining onto normal tables with lots of data, adding an index can have a massive positive impact on performance in joins. The improvement far outweighs any overhead caused by SQL Server creating and updating the index.

  • Most definitely.

    Most common situation is a version-upgrade of the application where the schema is changing and data needs to be migrated (especially when a normalization-change is involved).

    The standard pattern for this is to add one or more additional (physical or computed) columns to the old tables that provide some new key used to obtain info handy for the migration. Create an index on those new columns if they are going to be used to filter the old data for migration purposes. Then build the new tables from the old (here the indexes could seriously speed up the migration process).

    BTW, all this could be part of the multi-step release patter described in Solomon's article "Restructure 100 Million Row (or more) Tables in Seconds. SRSLY!" (http://www.sqlservercentral.com/articles/Data+Modeling/72814/). Where those additional columns/indexes are created in the pre-release (as a separate new table).

    Another case is to investigate some bug and I want to analyse the data in ways never foreseen in original development. Some additional indexes can help playing with the data-query iteratively in a timely fashion (having to wait 10s of seconds for a result can be very annoying when you want to play with the query after seeing the results). I normally do this in a database copied to my local machine, since this sort of querying could seriously hamper the production tables (being locked for long periods).

    Marco

  • I do use temporary indexes when I'm crunching large datasets. This is usually the case for our month-end reports. In one report it cut our execution time from over 8 minutes to under 2. Even when I use table variables I've found performance gains by declaring a Primary Key if I can.

  • Marco +1. I'll also use them in a version upgrade, where the schema is changing and I am converting tables to the new schema, I may need to store an old key in a table in order to use it to also convert the data in detail tables below the current one. In that case I'll create a temp (working) index on that old key, then drop the index and column at the end of the patch. Performance is vastly improved in the conversion.

  • I would guess that about 50-75% of our SQL use is to fill Datasets on the .NET side of things. We once had a talented DBA who suggested we use temporary indexes on a number of our SQL queries used to fill these Datasets in an effort in increase performance. We gave it a try...

    What we found was that on queries that fill large Datasets, we did see some improvement, but on smaller (or some medium) fills, the temporary indexes were actually slowing performance. We were able to increase performance there by making a number of these indexes permanent so we avoided the create-on-the-fly element involved.

    What is interesting about all this is that in "our world" we have to do a bit of experimenting each time we get into these situations. For example, sometimes its better to take just a plain old 'give me the data' Dataset fill, and then play with code on the .NET side to get whatever result we might be after - where other times (especially in reporting) we find its better to work on the SQL side and tweak around with temporary indexes and in some cases temporary tables/cursors to maximize performance.

    I think temporary indexes have their place, but like most 'get the data' situations in software development, they are not the be-all-end-all answer to performance. They can be in individual cases, but it really depends on what the end goal is, and that takes some experimenting to determine.

    There's no such thing as dumb questions, only poorly thought-out answers...
  • blandry (4/29/2011)


    What we found was that on queries that fill large Datasets, we did see some improvement, but on smaller (or some medium) fills, the temporary indexes were actually slowing performance. We were able to increase performance there by making a number of these indexes permanent so we avoided the create-on-the-fly element involved.

    I was actually wondering how many of these would work better as permanent indexes. The last place I worked used them for upgrades but that was it. We did index temporary tables but in those cases the indexes were permanent in relation to the lifespan of the table so I wouldn't consider those temporary indexes.

  • SQL Server routinely creates temporary "index" structures in memory or tempdb all the time, like during a hash join on large tables.

    Staging tables and temporary indexes are common in scenarios where a moderate amount of reporting is performed directly out of the OLTP database rather than from a snapshot or external data mart. This can speed up the reporting process, but of course it will also result in a lot of transaction logging.

    When I stage data for something like month end reporting purposes it's in summary form, and I generally retain it for archival purposes in case the reports need to be re-run later or leveraged for some other BI process. In that case the indexes are on the staging tables, so there is not so much reason to drop them afterward.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Quite common: indexes on #temp tables.

    Uncommon, but not unheard of: temporary indexes on permanent tables, always as a result of benchmarking with our actual production dataset such that the aggregate (Profiler SQL:BatchCompleted) stats, particularly Reads and Writes, are significantly better (including all the overhead) with the temporary index than without.

    As another user mentioned, these really depend.

  • Temporary indexes is something that we use on a regular basis for reporting needs, etl needs and those once in a blue moon processes.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • While I don't currently use temporary indexes, I did want to thank you for this editorial. It is a neat idea to keep in the back of my head. The article gave me one of those 'doh' moments. It seems like an obvious tool to consider/test--after someone points it out.

  • We use them in ETL processing. We BCP into a heap, create a clustered index, run multiple queries on the CL index and then drop it in preparation for the next BCP load....I'd think this is pretty common, too.

  • I've used temporary indexes both on temporary and permanent objects for a variety of reasons, but usually do to one off (like quarterly) processing that I don't want those 20-30 indexes involved in my OLTP system otherwise. I also use them pretty heavily when using #tmp tables, especially when I'm using them as connection tables between two permanent tables.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • mikewhi (4/29/2011)


    We use them in ETL processing. We BCP into a heap, create a clustered index, run multiple queries on the CL index and then drop it in preparation for the next BCP load....I'd think this is pretty common, too.

    +1

    One ETL design pattern is to create indexes on source tables, perform the ETL, then drop the indexes (or drop the ETL indexes and replace the original source table indexes). What works for the ETL process may not be good for "normal" users of the source data.

    :{>

    Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

  • mikewhi (4/29/2011)


    We use them in ETL processing. We BCP into a heap, create a clustered index, run multiple queries on the CL index and then drop it in preparation for the next BCP load....I'd think this is pretty common, too.

    That surprises me. It's normally much more effiient to bulk insert (with minimal logging) into an empty clustered table, ensuring the data arrives in clustered key order.

    I could understand bulk loading a (perhaps partitioned) heap and creating non-clustered indexes afterward, but by building the cluster you write all the data twice...

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply