RE: Is a Temporary Table Really Necessary?

Right there with Babe

Points: 783

June 1, 2008 at 7:10 pm

pduplessis (5/29/2008)
I agree that in most cases you can avoid the use of temp tables by structuring your query slightly differently, but for sure, sometimes they are unavoidable.
[snip]...
Firstly, ANYTHING going into a where statement, I queried the dimensions first and got the respective keys for (example, only certain products are deemed to be "activating" products). Popped these into a temporary products table, which I now could use in an inner join on my monster fact
[snip]...

By forcing the temporary table in effect you are saying before you merge join hash/match whatever, that creating a single set of index seeks to produce a subset of records to join in a query can be more performant than a repeating series of clustered or even covering non-clustered index seeks each time the subset is referenced. Even though the number of index pages scanned is logarithmically related to the cardinality of the table it can still be non-trivial for large tables. Copying a small amount of data into a new table at some point is more efficient that certain amount of index seeks.

Or in other words I know that there are a couple of (generally key always indexed) columns in this table and I know that if I restrict my working table to these before I start evaluating the rest of the query I can eliminate a whole pile of index page reads because the smaller number of rows I am interested in can be pre-fetched and much easier to index and join over as it is a much smaller range than via the index on the much large entire table.

But the converse where the predicate is not a good indicator of selectivity (ie accounts < 100,000) the pre restriction wont help because the subset wont be much smaller than the original table (if indeed there are 110,000 accounts) and creating an indexed temporary table will be a waste of effort. For a given database design,row cardinality and list of selection predicates you can tell whether this method is going to be a benefit or not.

Its forcing the data access design back into procedural programming (well 2 steps) away from a logical set definition of SQL (ie desired result and leaving the determination of the best method for evaluation to the query optimiser).

The fact that this in practice this is necessary is a pity but I guess until you get large sets it gives no benefit and a DB never really goes back to being smaller. Whether this method is better is really a property of how selective a predicate is on an index eg: account = 123456789 on the product_sales table.