Creating Indices on Temp Tables from Stored Procedures

  • Hi all - I have a question regarding indices and temp tables.

    The situation which we are facing is this - we have a "main" table, which stores all the information regarding our users. This table is a mess - it's full of fields that are obsolete or perhaps seldomly used, and it has a *ton* of indices, most of which are used for specific applications or processes, and some of which are used once in a blue moon.

    The problem which we are facing is that, when we get new applications or tasks which require us to access the data in this table, it is very difficult for us to create new indices to cover these tasks, since there are so many of them already there - the table has over 20 indices!

    So, the solution which I've come up with, and which I want some feedback on, is to use temp tables and index those temp tables instead.

    Picture this scenario - let's say I have a query which I need to run, that will access the data in that table in some form. This query requires an index to be created on the fields it is accessing.

    Rather than creating an index on the actual table, I instead use an existing index to filter down the records as much as possible, and store the results into a temp table. I then create an index on the temp table, using the columns from the temp table which I need as part of the index. The rest of the query then accesses this temp table, and at the end of the procedure, the temp table and the resulting index are dropped.

    Are there any reasons why this is a bad idea? It seems to me like the perfect solution to having indexes that are for specific procedures.

  • kramaswamy (3/24/2010)


    Hi all - I have a question regarding indices and temp tables.

    The situation which we are facing is this - we have a "main" table, which stores all the information regarding our users. This table is a mess - it's full of fields that are obsolete or perhaps seldomly used, and it has a *ton* of indices, most of which are used for specific applications or processes, and some of which are used once in a blue moon.

    The problem which we are facing is that, when we get new applications or tasks which require us to access the data in this table, it is very difficult for us to create new indices to cover these tasks, since there are so many of them already there - the table has over 20 indices!

    Does the data in this base table change much?

    20 indexes may seem like a bunch, but I have seen more. You could create many many more indexes on this table if need be.

    So, the solution which I've come up with, and which I want some feedback on, is to use temp tables and index those temp tables instead.

    Picture this scenario - let's say I have a query which I need to run, that will access the data in that table in some form. This query requires an index to be created on the fields it is accessing.

    Rather than creating an index on the actual table, I instead use an existing index to filter down the records as much as possible, and store the results into a temp table. I then create an index on the temp table, using the columns from the temp table which I need as part of the index. The rest of the query then accesses this temp table, and at the end of the procedure, the temp table and the resulting index are dropped.

    Are there any reasons why this is a bad idea? It seems to me like the perfect solution to having indexes that are for specific procedures.

    This solution adds steps and thus increases processing time as well as disk IO. This could very likely increase the cost of the query and make it unacceptable.

    I would lean toward creating appropriate indexes on the table.

    Another option would be to create an indexed view that the application could use.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • I don't understand what a temporary table would do that a covering index wouldn't do.

    If your data isn't read-only and non-volatile, maintaining a set of temporary tables derived from the primary table strikes me as being doomed from the start. That is a much different matter than creating a temporary table with a query and building an index over it to set up later joins, then getting rid of it.

    Instead, why don't you look into partitioning your main table (and it's indexes too!) ? That bought us some excellent performance benefits.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • The Dixie Flatline (3/24/2010)


    Instead, why don't you look into partitioning your main table (and it's indexes too!) ? That bought us some excellent performance benefits.

    I like the idea of partitioning the table and indexes.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Rather than creating an index on the actual table, I instead use an existing index to filter down the records as much as possible, and store the results into a temp table. I then create an index on the temp table, using the columns from the temp table which I need as part of the index. The rest of the query then accesses this temp table, and at the end of the procedure, the temp table and the resulting index are

    dropped.

    Just re-read this. I've done that in practice and sometimes it bought some performance benefits, primarily by enabling more efficient joins against the temporary table. The downside is that creating a temporary table and then indexing it are both "blocking" operations that interrupt the flow of a query until they are completed.

    It's probably a technique worth trying to speed up reports or big batch jobs, but not something that you would want to rely on for delivering sub-second run times. For that, you usually need a preexisting index that fits the specific needs of a query, including (at least) all the columns the query will select or filter on, indexed in an order that eliminates a sort from the query plan.

    If 20 indexes seems like a ton to you, audit them both for similar content, and for how often they are used. If any are used once a month for a single report, just build them on demand.... you don't need them around the rest of the time. If two indexes are ordered by the same columns in the same sequence, and have 90% of their columns in common, consider combining them into a single index that covers multiple queries.

    Partitioning is still worth looking into however. Glad you like the idea, Jason 😉

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Yeah this sort of idea is not for the purposes of an action which is going to be taken on a regular basis - moreso for a report purpose. It's kinda a hack I suppose, but I mainly wanted to find out if there was some huge negative associated to it. Employing this solution took the procedure from a 2 hour run time to a 2 minute run time, so in terms of time alone it was definitely worth it.

  • It's not a hack. It's just one more tool in the toolbox. If it significantly reduces the run time of a job, then it's a good thing. Like all tools, just be aware of its limitations and the situations where it isn't appropriate.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • I agree with Bob. This is a method that can be used where appropriate. It is not appropriate in all situations and can be worse in some. In some situations, the temp table method can speed up the process significantly.

    It is something that must be tested.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • In 95+% of the cases I have seen clients put indexes on temp tables said operation has been counter-productive to efficient operation of the query involved.

    Do a proper indexing strategy analysis on this table and a) get rid of indexes that shouldn't be there and b) create ones that should. Any other action is almost certainly suboptimal. I have taken tables with over 60 indexes (wonderful DTA actions for you there!) to under 20 with an unmeasurable decline in SELECT performance but HUGE increases in DML throughput.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (3/26/2010)


    ...

    Do a proper indexing strategy analysis on this table and a) get rid of indexes that shouldn't be there and b) create ones that should. Any other action is almost certainly suboptimal. ...

    I agree with these points. When dealing with selects and not drawn out processes, I have not seen an improvement in speed through the use of a temp table, then indexing that table over querying the table directly - if properly indexed.

    I have seen the temp table used in long drawn out processes where it has helped improve performance dramatically.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Any other action is almost certainly suboptimal.

    I can't disagree with this statement, and yet I feel it is incomplete. While optimum performance is always a goal, sometimes the quest for perfection gets in the way of "good enough". If a report is run once or twice a day, and you can take the runtime down to 10 seconds instead of several minutes, is it really worth the human effort and extra disk space to rethink your indexing strategy for the sake of dropping the time down to two seconds? Are there more productive places you could be spending your time?

    There is code in our system that was written years ago, and I know it is suboptimal, but it remains in place because it is performing correctly and quickly enough for our users. But improving it isn't our highest priority because there are new projects than can generate more revenue that need attention. When major changes to functionality are required, we rewrite and improve them.

    This isn't to say that an audit of the indexing scheme isn't a worthwhile goal in and of itself. Unquestionably, proper indexing can make a phenomenal difference in the performance of a query. But kramaswamy is trying to deal with the limitations imposed upon him by the system he has to work with right now. That includes time limitations.

    Could we come up with a better solution than the temp table given a free hand with the indexes and plenty of time? In most cases, yes. Are we going to do it for free? No. Is the man who needs the assistance given a budget to hire us? Probably not. Will he learn or figure out better ways to get it done without the temp table in the future? Almost certainly.

    Until then, if the temp table gives him acceptable (though suboptimal) performance, I'm not going to discourage him from using it.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • @Dixie Flatline, that's pretty much spot on. The issue of the tables being badly designed / indexed is a major issue, but it's not something we can just drop everything to resolve at this instant. In a perfect world, we would sit down, rethink the whole thing, set up some migration patterns, move the data to new tables, set them up properly, etc...

    But, this is a production database, and it is working correctly right now - yeah it's messy, but it is still working. And, it is interconnected with a billion other processes. There's just not enough time to re-do everything.

    Which is why this kind of solution is so appealing to me.

  • kramaswamy (3/26/2010)


    @Dixie Flatline, that's pretty much spot on. The issue of the tables being badly designed / indexed is a major issue, but it's not something we can just drop everything to resolve at this instant. In a perfect world, we would sit down, rethink the whole thing, set up some migration patterns, move the data to new tables, set them up properly, etc...

    But, this is a production database, and it is working correctly right now - yeah it's messy, but it is still working. And, it is interconnected with a billion other processes. There's just not enough time to re-do everything.

    Which is why this kind of solution is so appealing to me.

    There's no inherant problem in doing this, sometimes temp tables are just what the doctor ordered for these long running queries with lots of joins / large tables / bad indexes / bad data etc. That's not to say you couldn't beat the temp table method most of the time by doing a fully optimized approach, but the difference in dev time / impact on other processes is also a very valid factor.

    One thing to keep in mind is the order of ops for this temp table creation/data insertion/index creation. Depending on the quantities of data / complexity of the procedure, doing one of the other first might have performance trade offs.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • I find it hard to believe that your proposed solution - using an existing index to retrieve some subset of a table into a temporary table and then indexing that temp table and then doing a select off of the temp table - will be more efficient or even better than adding (or adding to an existing) an index to said base table. Are you already getting a lot of long-term blocking that is causing application performance problems? Will you be iteratively (and by iteratively I mean 5+ perhaps) hitting the temp table? If the answer to both of those is no I wouldn't even consider your solution - I would just add an appropriate index.

    Also, have you checked the index usage DMV to see if all of those 20 indexes are actually being used?? Have you compared their create scripts to see if they are overlapping and maybe you can eliminate a few without any effect on system performance (actually it could go up)?? These things are very simple and will take all of 10 minutes of your time.

    Oh, and what is the capacity of your tempdb database? Have you checked it for IO stalls? If you, like so many of my clients, has an under-powered tempdb storage system your solution will be even worse off...

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Oh, and what is the capacity of your tempdb database? Have you checked it for IO stalls? If you, like so many of my clients, has an under-powered tempdb storage system your solution will be even worse off...

    Guru: This is an extremely strong point you just made. But, I've already conceded that performance of the temp table solution will be suboptimal. My support for it in spite of the performance hit is based on the prioritization of human time to effect changes in the database. It's like triage in a battlefield hospital. You choose to save one man's life instead of another man's leg. You don't treat this man because he's too healthy, or that man because he's too far gone. The use of a temp table is often a crutch.

    For a report that is run once a week, it probably won't kill them if the query takes 48 seconds instead of 4.8 seconds. In fact, if a temp table solution can quickly get them to 48 seconds instead of 4.8 minutes, it's worthwhile even as a stopgap solution. If it does start killing them, then more time is required to develop an optimal solution.

    Performance is your business. When you go to work for a client you have that foremost in mind, and you can get it done fast. Don't hold it against someone (who isn't as fast as you) that they do the best they can with something that saves them time.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Viewing 15 posts - 1 through 15 (of 52 total)

You must be logged in to reply to this topic. Login to reply