Usage of #TempTables and an Index in Stored Procedure

  • Jeff Moden wrote:

    Jonathan AC Roberts wrote:

    To fix it I had to create the temporary table first with the clustered index then insert the data.

    When I said "we", I should have said "they".  What you wrote above is my preferred method along with following the requirements (which are pretty easy, as you know) for "Minimal Logging" to make it light-weight and nasty fast.  That's if a clustered index is actually necessary on a Temp Table.  Ostensibly, you've only got the correct data in the temp table anyway and probably don't actually need the CI.

    Indexes are needed as the temp tables are joined to other tables, without indexes the queries would take a lot longer. The clustered index creation below takes 43 seconds on a local instance of SQL Server on my Dell work laptop but 458 seconds on a db.m5d.4xlarge AWS RDS machine.

    Set up data:

    DROP TABLE IF EXISTS #temp;
    GO
    -- Create a temporary table with 10 million rows
    DECLARE @string nvarchar(1000) = REPLICATE('X', 120)
    SELECT ABS(CHECKSUM(NewId())) % 20000 [VKey],
    ABS(CHECKSUM(NewId())) % 2000 [DurableVKey],
    ABS(CHECKSUM(NewId())) % 65000 [DateKey],
    ABS(CHECKSUM(NewId())) % 100000 [UKey],
    ABS(CHECKSUM(NewId())) % 200000 [DurableUKey],
    @string col1,
    @string col2,
    @string col3
    INTO #temp
    FROM fnTally(1,10000000) -- 10 million rows
    GO

    Create clustered index:

    DECLARE @StartTime datetime = getdate()
    CREATE CLUSTERED INDEX [IX_#temp] ON #temp
    (
    [VKey] ASC,
    [DurableVKey] ASC,
    [DateKey] ASC,
    [UKey] ASC,
    [DurableUKey] ASC
    )

    -- Display the runtime in seconds
    SELECT CONCAT('Create clustered index took ', DATEDIFF(ss, @StartTime, GETDATE()), ' seconds')

    I still don't understand why an AWS server that costs more per month to rent than the entire cost of buying my laptop new should take 10 times longer to create a clustered index.

  • Jonathan AC Roberts wrote:

    Indexes are needed as the temp tables are joined to other tables, without indexes the queries would take a lot longer.

    You and I have know each other for a very long time and so you know I'm going to ask, "Have you proven that"? 😀

    Jonathan AC Roberts wrote:

    The clustered index creation below takes 43 seconds on a local instance of SQL Server on my Dell work laptop but 458 seconds on a db.m5d.4xlarge AWS RDS machine.

    I'm just not surprised at all to hear that.  It's one of the many reasons why I'm desperately trying to convince some people that I work with that moving to the Cloud is not synonymous with performance improvements.  Even our vendor for the telephone system is stopping all support for on-premise systems and moving to the cloud.  The same goes for Atlassian products.  We're apparently going to make that shift for both in the same month.  Looks like I'm going to be forced into getting some cloud experience after all. 🙁

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Okay to comment on the (A) CREATE TABLE with INDEX then INSERT INTO versus the (B) SELECT INTO then create INDEX there is no set solution for which is better. We have ran extensive tests on this (and continue to do each time it comes up) and sometimes Option A is moderately faster and sometimes Option B is moderately faster and sometimes the difference is negligible. Keep in mind when clocking this Option A is simple but Option B requires you to add the time it took to load table plus the time it took to create the index on that table after loading it.  The only way we have found to discern this is to test it for each specific case and that is what we do.

    The suggestion of making sure you do minimal logging (aka WITH (TABLOCK) or WITH (TABLOCKX) ) that is very important. Also it helps if you can use SET TRANSACTION ISOLATION LEVEL SNAPSHOT in your queries especially if you are hitting large tables. However if you are launching your stored procedures using JAVA code the Java Wrapper interferes with SNAPSHOT so you will have to use SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED instead. Yeah it is a dirty read but it does not lock the table in case it is a table that sees frequent updates and you do not want to block the updates. Otherwise the likelihood of a dirty read is minimal so a win-win either way.

    I could be wrong on this but unless your Stored Procedures are updating the tables you are using to create the Temporary Tables there should be no concurrency issues. Further if you can use SNAPSHOT then the data you are initially reading might be a few seconds old  which is usually not an issue especially if its a report but if it is well you got bigger fish to fry and a totally different solution to implement (we have one but working on making a better one for our high update tables that get poled frequently as well).

    In many queries we use temporary tables in order to speed up the efficiency of the query, I mean which is worse a daily report that takes all day to run (which we had) or a daily report that might be a few minutes behind the database but takes an hour or less to run (and is thus now about an hour plus those couple minutes behind the database).

    Lastly I know many DBAs and some of them are not always at the top of their game. I would do as was suggested, get that DBA to explain why they feel there will be a concurrency issue using the temporary tables the way you are using and for what you are using them for. Perhaps his information is out of date or there is something else going on with the database that developers should be aware of when they create stored procedures.  DBAs should not live secluded within an ivory tower but some do. Instead if their is an issue they should be working with the developers to fix them and helping the developers create more efficient running stored procedures. However, this means the DBA has to be at the top their game and some are not. I have learned some of my best SQL from high quality DBAs but bumped heads with lesser DBAs due to my exporsure to those higher quality DBAs. If they say no ask them to explain why in detail and terms that are easily understood. If they cannot or will not chances are they are wrong and they might even know they are wrong. They just do not want something happening because they simply do not fully understand it or have some personal beef with it. Bumped into of couple of those DBAs before, considered it weird and moved on.

    Basically never take a surface answer, make sure you get detailed answers and then validate those answers. You do not know how much misinformation is floating around on the internet being propogated by wanna-be-know-it-alls who are simply regurgitating a wrong answer someone else posted until you have that wrong answer out there so many times that many folks think its a correct answer.

  • Jonathan AC Roberts wrote:

    I still don't understand why an AWS server that costs more per month to rent than the entire cost of buying my laptop new should take 10 times longer to create a clustered index.

    It took 13 seconds on an x1e.2xlarge VM running SQL 2017, which probably costs more to rent than several laptops. I then used an x1e.4xlarge and it took 20 seconds. Different server, different tempdb config and different throughput settings for the volumes. I changed the 4x to a 1x and it took 21 seconds, so the 4 times the memory and increased cpu's didn't make much difference.

    How much control to you have over tempdb in RDS? I know we couldn't use RDS because we needed simple recovery model.

  • Jeff Moden wrote:

    Jonathan AC Roberts wrote:

    Indexes are needed as the temp tables are joined to other tables, without indexes the queries would take a lot longer.

    You and I have know each other for a very long time and so you know I'm going to ask, "Have you proven that"? 😀

    It was years ago that I changed the processing to use intermediate temporary tables. I did a lot of testing on many different ways of producing the data so I'm pretty sure that adding a clustered index made the processing faster. The example I gave above was creating a table with 10 million rows, in the production there are actually over 100 million rows in the temporary table. The clustered index creation was taking about 6 minutes on the old system and this went to 77 minutes on the RDS machine.

  • Ed B wrote:

    Jonathan AC Roberts wrote:

    I still don't understand why an AWS server that costs more per month to rent than the entire cost of buying my laptop new should take 10 times longer to create a clustered index.

    It took 13 seconds on an x1e.2xlarge VM running SQL 2017, which probably costs more to rent than several laptops. I then used an x1e.4xlarge and it took 20 seconds. Different server, different tempdb config and different throughput settings for the volumes. I changed the 4x to a 1x and it took 21 seconds, so the 4 times the memory and increased cpu's didn't make much difference.

    How much control to you have over tempdb in RDS? I know we couldn't use RDS because we needed simple recovery model.

    That's interesting, I think the main bottleneck is the disk speed. My laptop has an M.2 card for its disk which is really fast, RDS runs on a virtual machine but I think db.m5d.4xlarge has a dedicated "fast" drive for its tempdb. I think the machine you ran the script on is an EC2 server which is not virtual. I know you get some reduced performance from running on a virtual machine but I thought this would be maybe 20% slower, the creation of the clustered index is over 10 times slower.

  • Dennis Jensen wrote:

    Okay to comment on the (A) CREATE TABLE with INDEX then INSERT INTO versus the (B) SELECT INTO then create INDEX there is no set solution for which is better. We have ran extensive tests on this (and continue to do each time it comes up) and sometimes Option A is moderately faster and sometimes Option B is moderately faster and sometimes the difference is negligible. Keep in mind when clocking this Option A is simple but Option B requires you to add the time it took to load table plus the time it took to create the index on that table after loading it.  The only way we have found to discern this is to test it for each specific case and that is what we do.

    I agree with most of that but you also have to consider the "Pay to Read" environment when it comes to cost.  The insert into a CI generates no reads if it's the first insert.  Using SELECT INTO will generate no reads but the addition of a CI generates reads equivalent to all of the pages in the heap.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jonathan AC Roberts wrote:

    I know you get some reduced performance from running on a virtual machine by I thought this would be maybe 20% slower, the creation of the clustered index is over 10 times slower.

    Wouldn't it be a hoot if the reason for that was that the extra time was caused by it measuring the reads so that they could charge you for them. 😀

    As the old science saying reminds us, "To measure something is to change it". 😀

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden perhaps that is the reason why Option A is sometimes faster and why Option B is sometimes faster my knowledge of DBA aspects is not as deep as yours *tips hat* but what we did was simply time the two and run them against a version of the large data they would be handling then use the version that has the best results.

    Still thanks for that information I will share that with my colleagues.

  • Dennis Jensen wrote:

    Jeff Moden perhaps that is the reason why Option A is sometimes faster and why Option B is sometimes faster my knowledge of DBA aspects is not as deep as yours *tips hat* but what we did was simply time the two and run them against a version of the large data they would be handling then use the version that has the best results.

    Still thanks for that information I will share that with my colleagues.

    Totally understood and appreciated.  To your very good point in your previous post, even if you're an actual "expert" with oodles of personal experience, etc, etc, etc... you STILL have to test it every time.  Heh... in other words, if you really do have such grand experience, you'll know that you STILL have to test it... every time.

    You also need to be "environment aware" especially in "Pay to Read" environments.

    BTW... good "meeting" you.  It IS a pleasure, especially with what you're doing for folks for free, as you stated on the other thread.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I've seen issues repeatedly with "SELECT ... INTO <new_table>" causing blocking (on SQL 2016).  When we adjust the code to split the table creation and the table load the problems go away.  So you can say "it doesn't exist" but, since I've seen it many times, I disagree, even though I can't give you an appeal-to-authority on it that you would accept.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher wrote:

    I've seen issues repeatedly with "SELECT ... INTO <new_table>" causing blocking (on SQL 2016).  When we adjust the code to split the table creation and the table load the problems go away.  So you can say "it doesn't exist" but, since I've seen it many times, I disagree, even though I can't give you an appeal-to-authority on it that you would accept.

    My point is that you said "The DBA's right about the concurrency issue."  My point is that neither you nor he are correct on this issue.  The correct answer is "It Depends". 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    ScottPletcher wrote:

    I've seen issues repeatedly with "SELECT ... INTO <new_table>" causing blocking (on SQL 2016).  When we adjust the code to split the table creation and the table load the problems go away.  So you can say "it doesn't exist" but, since I've seen it many times, I disagree, even though I can't give you an appeal-to-authority on it that you would accept.

    My point is that you said "The DBA's right about the concurrency issue."  My point is that neither you nor he are correct on this issue.  The correct answer is "It Depends". 😉

    Since "it depends", how do you know he is not correct in his specific situation?  He may have seen the same things I have.  I won't assume people are incompetent or lying until I have evidence of that.  I presume him to be a professional who has some idea of what he's talking about (again, until I see evidence to the contrary).

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Jeff Moden wrote:

    ScottPletcher wrote:

    I've seen issues repeatedly with "SELECT ... INTO <new_table>" causing blocking (on SQL 2016).  When we adjust the code to split the table creation and the table load the problems go away.  So you can say "it doesn't exist" but, since I've seen it many times, I disagree, even though I can't give you an appeal-to-authority on it that you would accept.

    My point is that you said "The DBA's right about the concurrency issue."  My point is that neither you nor he are correct on this issue.  The correct answer is "It Depends". 😉

    Actually your point seemed to be that their DBA was wrong.  Basically you said, "I've never seen it, it really doesn't happen like that all, unless you prove to me by some know authority that it can."

    For me, seeing it was believing it.  I'm not sure why we had so much of an issue with it.  Maybe because the data volume was high and/or came from remote sources, and thus could often take 30 minutes or more (sometimes much more) to load?

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher wrote:

    Jeff Moden wrote:

    ScottPletcher wrote:

    I've seen issues repeatedly with "SELECT ... INTO <new_table>" causing blocking (on SQL 2016).  When we adjust the code to split the table creation and the table load the problems go away.  So you can say "it doesn't exist" but, since I've seen it many times, I disagree, even though I can't give you an appeal-to-authority on it that you would accept.

    My point is that you said "The DBA's right about the concurrency issue."  My point is that neither you nor he are correct on this issue.  The correct answer is "It Depends". 😉

    Actually your point seemed to be that their DBA was wrong.  Basically you said, "I've never seen it, it really doesn't happen like that all, unless you prove to me by some know authority that it can."

    For me, seeing it was believing it.  I'm not sure why we had some of an issue with it.  Maybe because the data volume was high and/or came from remote sources, and thus could often take 30 minutes or more (sometimes much more) to load?

    Yes... their DBA is wrong.  He didn't say "It Depends".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 30 (of 37 total)

You must be logged in to reply to this topic. Login to reply