Advantages and Disadvantage using Temp Table or Fisical Table

  • Please share your thought about advantages and disadvantages for processing using temp table or process using actual table in SQL Server 2016 and above?

    These are the SQL queries for processing using temporary table:

    DROP TABLE IF EXIST #temp1;

    SELECT a.col1 AS col1, a.col2 AS col2,
    b.col1 AS col3, b.col2 AS col4
    INTO #temp1
    FROM schema1.table1 AS a
    INNER JOIN schema1.table2 AS b ON a.id = b.id
    WHERE a.col1 = 'best_practice'
    OR a.col1 = 'performance' ;

    INSERT INTO schema1.destination_table
    (col1, col2, col3, col4)
    SELECT col1, col2, col3, col4
    FROM #temp1 ;

    DROP TABLE IF EXISTS #temp1;

    These are the SQL queries using actual table:

    TRUNCATE TABLE schema1.process_table;

    INSERT INTO schema1.process_table
    (col1, col2, col3, col4)
    SELECT
    a.col1 AS col1, a.col2 AS col2,
    b.col1 AS col3, b.col2 AS col4
    FROM
    schema1.table1 AS a
    INNER JOIN
    schema1.table2 AS b ON a.id = b.id
    WHERE
    a.col1 = 'best_practice'
    OR a.col1 = 'performance' ;

    INSERT INTO schema1.destination_table (col1, col2, col3, col4)
    SELECT col1, col2, col3, col4
    FROM #temp1;

    TRUNCATE TABLE schema1.process_table;

    DDL (Data Definition Language) script for process table:

    CREATE TABLE schema1.destination_table
    (
    col1 AS VARCHAR(10) NULL,
    col2 AS VARCHAR(10) NULL,
    col3 AS VARCHAR(10) NULL,
    col4 AS VARCHAR(10) NULL
    );

     

  • I can think of one possible reason but let me ask you... why has someone made the code so that it first populates and interim "process" table only to turn right around and use that data to populate the final destination table?

    The reason I ask is because knowing the intent of doing such a thing may help answer your question.  To be sure, the "normal" way of doing such a thing is to insert into the destination directly from the query without doubling the insert-work by going through an interim table.

    I'm also hoping that someone doesn't think this is a panacea "Best Practice" for reasons of "Performance" that should be used all the time.  It CAN help with performance but, like all else in SQL Server, "It Depends" on a whole lot of variables.  It can also be a severe duplication of effort that beats the hell out of log files and consumes twice as much memory as needed.

    Also, there is no need to drop a temp table at the beginning or end of stored procedure assuming, of course, this code is a part of a stored procedure.  Dropping of temp tables should usually be reserved for reruns of code in SSMS when troubleshooting.  As with all else, "It Depends" and there can be reasons for dropping temp tables in stored procedures.

    As for which code you posted is better, like I said... we need to know what the intention of having an interim table is.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you for your feedback and your time.

    I suppose to write this code as below. and give more detail information about number of rows in schema1.table1 and schema1.table2. There about 100 million rows in each table and when these two tables are joining, it takes a lot of resources from server and so one of the possible solution is to limit the input of rows from schema1.table1 and schema1.table2 into #temp table so we have less number of rows to be joined.  This way the number of resources required is not as much as when we have direct join. I hope this will clarify my intention and give more perspective to all.

    SELECT  a.col1 AS col1, a.col2 AS col2, 
    b.col1 AS col3, b.col2 AS col4
    INTO #temp1
    FROM schema1.table1 AS a
    WHERE a.col1 = 'best_practice'
    OR a.col1 = 'performance' ;

    SELECT a.col1 AS col1, a.col2 AS col2,
    b.col1 AS col3, b.col2 AS col4
    INTO #temp2
    FROM #temp1 AS a
    INNER JOIN schema1.table2 AS b ON a.id = b.id
    WHERE a.col1 = 'best_practice'
    OR a.col1 = 'performance' ;

    INSERT INTO schema1.destination_table
    (col1, col2, col3, col4)
    SELECT col1, col2, col3, col4
    FROM #temp2 ;
  • Better code:

    SELECT  a.col1 AS col1, a.col2 AS col2, 
    b.col1 AS col3, b.col2 AS col4
    INTO #temp1
    FROM schema1.table1 AS a (WITH NOLOCK)
    WHERE (a.col1 = 'best_practice'
    OR a.col1 = 'performance') ;

    CREATE INDEX IdxA ON #temp1 (col1 ASC , col2 ASC)

    SELECT b.col1 AS col3, b.col2 AS col4
    INTO #temp2
    FROM schema1.table2 AS b (WITH NOLOCK)
    WHERE (b.col1 = 'best_practice'
    OR b.col1 = 'performance');

    CREATE INDEX IdxA ON #temp2 (col1 ASC , col2 ASC);

    INSERT INTO schema1.destination_table
    (col1, col2, col3, col4)
    SELECT col1, col2, col3, col4
    FROM #temp1 as a (WITH NOLOCK)
    INNER JOIN #temp2 as b (WITH NOLOCK)
    ON a.id = b.id ;
    ;

     

     

  • Ok.  That's a bit different than what you originally posted and I somewhat agree with the premise (it's called "Divide'n'Conquer"), although I question why even with a 100 million rows, why it would be so arduous.  Something is wrong there and I'd seriously be tempted to figure out what's wrong and fix it instead of doing a workaround like this.

    That apparently notwithstanding,  it also looks like this is a once a day thing and there will be only one run doing this at a time and so we don't have to worry about contention between two or more simultaneous runs.

    Also understand that I don't know how many rows you end up with for either #temp1 or #temp2 nor what the total number of bytes for either is nor what condition your TempDB database is in for usage.

    As your code is written in you last post above, #temp1 already only has those ID's that meet the criteria of the WHERE clause and so you shouldn't actually need a WHERE clause on  the INSERT into #temp2.  It's just an added complication for the optimizer and will burn more CPU unnecessarily.

    I would, however, make sure that #temp1 had a Unique Clustered Index on the ID column provided, of course, that you the ID's are actually unique.  If they aren't, then you're going to have some problems with duplication and will also still need the WHERE clause I just told you you could get rid of.

    Since both of these tables reside in TempDB, which is in the SIMPLE Recovery Model, the use of WITH (TABLOCK), along with the "order" provided by the indexes on the ID columns, will allow you to achieve "Minimal Logging" on your inserts, which is twice as fast as a normal insert.  If the Clustered Indexes on the temp tables is already present, that is sometimes faster than inserting into a temp table and then adding the Clustered Index.  "It Depends" so you'll need to test both methods.

    If you have a LOT of data (forget the row count... I'm talking byte count), you might overwhelm TempDB during the run.  What you could do is make a "scratch" database in the SIMPLE Recovery Model and use that instead of TempDB with the understanding that everything in the "scratch" database is only temporary and that the database should never be backed up.  You should also be diligent in truncating and dropping any such work tables you place in the "scratch" database.

    In your code above, you'll need to create the "b" alias on #temp2.

    So, to summarize, the code you have above looks like it'll work.  If it overwhelms TempDB (as in "contention" or causes it to grow too much), you could create a "scratch" database that uses the SIMPLE Recovery Model.  That will be a little bit slower than TempDB because TempDB starts out in memory and only spills to disk if it gets something too big.  I don't know what the threshold for that is.

    I'd also take a look at why you need to do this work around.  There's usually something wrong when you have to resort to such trickery.  For example, I'm thinking that a filtered index on the big tables might work a treat.  I'd also look to see if the big tables have any in-row LOBs or some silly long variable width that aren't of the MAX datatype and haven't been bumped to be out of row... that slows down just about everything and can also produce a very low average page density (which uses a LOT of extra memory) that cannot be improved with index maintenance because of "short trapped pages" that can be as little as 1% full and will never get any fuller.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you for your feedback.

    Actually the number of rows are more than 100 millions. I just put 100 millions as an example. Table schema1.table1 and schema.table2 has Primary Key and Index correctly according sql server stat. This is actually come from one of our slowest query listed in server. It was used for stored procedure that run periodically. The destination table were actually cache table that were design to remove running query over and over so in beginning before running this query , we check if the result is available in destination table or not.

    This is "Divide'N'Conquer" code

    SELECT  a.id as id, a.col1 AS col1, a.col2 AS col2 
    INTO #temp1
    FROM schema1.table1 AS a WITH (NOLOCK)
    WHERE (a.col1 = 'best_practice'
    OR a.col1 = 'performance') ;

    CREATE INDEX IdxA ON #temp1 (col1 ASC , col2 ASC)

    SELECT b.id as id , b.col1 AS col3, b.col2 AS col4
    INTO #temp2
    FROM schema1.table2 AS b WITH (NOLOCK)
    WHERE (b.col1 = 'best_practice'
    OR b.col1 = 'performance');

    CREATE INDEX IdxA ON #temp2 (col1 ASC , col2 ASC);

    INSERT INTO schema1.destination_table
    (col1, col2, col3, col4)
    SELECT a.col1, a.col2, b.col3, b.col4
    FROM #temp1 as a WITH (TABLOCK)
    INNER JOIN #temp2 as b WITH (TABLOCK)
    ON a.id = b.id ;
  • Have you ever tried to shoot an arrow through the hole of a donut rolling down a hill from a quarter mile away?  That's how I'm starting to feel with your code because you keep changing the requirements through the code itself.

    Look at the final insert of that and figure out why the two indexes on the temp tables you've created are wrong.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Once again thank you for your Great Review. Ten Thumbs up!!!.

    I hope this donut is not rolling down further.

    "Divide'N'Conquer" code:

    SELECT a.id as id, a.col1 AS col1, a.col2 AS col2

    INTO #temp1

    FROM schema1.table1 AS a WITH (NOLOCK)

    WHERE (a.col1 = 'best_practice'

    OR a.col1 = 'performance') ;

    CREATE INDEX Idxtemp1 ON #temp1 (col1 ASC , col2 ASC)

    SELECT b.id as id , b.col1 AS col3, b.col2 AS col4

    INTO #temp2

    FROM schema1.table2 AS b WITH (NOLOCK)

    WHERE (b.col1 = 'best_practice'

    OR b.col1 = 'performance');

    CREATE INDEX Idxtemp2 ON #temp2 (col1 ASC , col2 ASC);

    INSERT INTO schema1.destination_table

    (col1, col2, col3, col4)

    SELECT a.col1, a.col2, b.col3, b.col4

    FROM #temp1 as a WITH (TABLOCK)

    INNER JOIN #temp2 as b WITH (TABLOCK)

    ON a.id = b.id ;

  • Your indexes on the temp tables are still incorrect.  You're joining them by ID, not col1 and col2.  And, they should be Clustered Indexes.  At least that's why I see according to the join in your final insert.

     

    • This reply was modified 3 years, 11 months ago by  Jeff Moden.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This is the revised code.

    I should clarify that we have join id column between schema1.table1 and schema1.table2 because it is a foreign key join . That is why we have a.id = b.id at the last query. I also add a GROUP BY statement making sure my CLUSTERED Index is not violated.

    SELECT a.id as id, a.col1 AS col1, a.col2 AS col2
    INTO #temp1
    FROM schema1.table1 AS a WITH (NOLOCK)
    WHERE (a.col1 = 'best_practice'
    OR a.col1 = 'performance')
    GROUP BY a.id , a.col1 , a.col2
    ;

    CREATE CLUSTERED INDEX Idxtemp1 ON #temp1 (id, col1 , col2);

    SELECT b.id as id , b.col1 AS col3, b.col2 AS col4
    INTO #temp2
    FROM schema1.table2 AS b WITH (NOLOCK)
    WHERE (b.col1 = 'best_practice'
    OR b.col1 = 'performance')
    GROUP BY b.id , b.col1 , b.col2
    ;

    CREATE CLUSTERED INDEX Idxtemp2 ON #temp2 (id, col1 , col2 );

    INSERT INTO schema1.destination_table
    (col1, col2, col3, col4)
    SELECT a.col1, a.col2, b.col3, b.col4
    FROM #temp1 as a WITH (TABLOCK)
    INNER JOIN #temp2 as b WITH (TABLOCK)
    ON a.id = b.id ;

    ----column schema1.table2.id columns is a Foreign key from Schema1.table2.id
    ----this is why we have a join between #temp1 table and #temp2 table

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply