Best practices are really the best (fastest) solution?

  • Hello all,

    I was "optimizing" a procedure that was taking a long time to run, and I am really confused about the results,

    That procedures did theses things

    part a)

    select into #temptable from table 1 join table 2 join table 3 ( table 2 and 3 weren't really needed)

    ( about 50 million records)

    create indexes

    part b)

    select into #temptable2 from ...

    create indexes

    part c)

    select into #temptable3 from ...

    create indexes

    part d)

    select into #temptable4 from #temptable1 join #temptable2 join #temptable3

    (about 5 million records)

    Resultant time 50 minutes

    I changed it to something like this

    Part a)

    create table #temptable1

    create clustered index

    insert into #temptable1 select from table1 ( all needed data was here)

    create non clustered indexes

    Part b)

    create table #temptable2

    create clustered index

    insert into #temptable2 select from ...

    create non clustered indexes

    Part c)

    create table #temptable3

    create clustered index

    insert into #temptable3 select from ...

    create non clustered indexes

    Partd d)

    select into #temptable4 from #temptable1 join #temptable2 join #temptable3

    Resultant time 4 hours

    Somebody can explain to me why such a thing is possible? where can I find the very best practices for tuning?

    I believed that I was following the rules and look at what happened!

    Thanks in advance

  • Your question isn't entirely easy to answer. You're missing a few pieces. Like, were you creating clustered indexes in the first sequence, or non-clustered indexes? Secondly, did you change any of the other Select queries like you did the first one?

    Things like that will change performance. None of us can give you a decent answer to your problem with the pseudo-code you've listed. However... I can tell you this. Clustered indexes suck for data insertion. They're great for reads, not so much for writes. And 50 million rows? That's a LOT of data. I'm betting your second attempt spent most of that 4 hours just on part a.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Thanks for the quick response

    I didn't change the indexes, only the order

    I dind't change the other querys since they already use the correspondant indexes, I only changed the first one since it used 3 tables and all the data was on the fist one and reacheble with a clustered index

  • Okay, the best way to figure this out is to try Part A of both methods and time them. I'm betting this is where the problem is. Because you're creating the clustered index prior to data load, it's forcing the data to get reordered as it's inserting.

    But that's a guess. As I said, the best way to know for sure is to ignore the other parts, and just do Part A of both methods in two separate SSMS windows. Time it, have it Display Execution Plan, and look at the plan for both bits when it's done running.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • I think that best practice is to create indexes after the load.

    Otherwise you are doing index maintenance for each insert operation which I would think would be substantially slower.

    -Dan B

  • Oops! Ditto what Brandie wrote...

  • Already did that test

    Part A, original sproc 14 minutes/New sproc 15 minutes

  • Then keep doing the tests one at a time until you find out which part of your method changed so much that it's taking 4 hours to do.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • My first question is, what are you doing with the data that you need all the temp tables? That's probably a lot of spill to disk that might be avoided by avoiding temp tables all together.

    Why can't you just do your final join on the base tables that are filling the temp tables? That "might" be faster than the loading temp tables with 50million rows to return 5 million.

  • The second most important part of optimizing is to know how long [font="Arial Black"]each part [/font]took using the old way and how long it takes using the new way. Otherwise, you're tuning blindly and are likely going to make things very much worse. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • You really have given minimum infornation here. No join conditions, no where clauses or column names, no execution plans, no tables or table details. Sorry we can't help with some details.

    I've just been through a simmilar proces of optimising a poorly written query that was reading 48 million rows into a temp table as part of a complex select. Turned out all it needed was to add the INCLUDE clause with 4 int columns to an existing index to improve the query plan. The execution plan went from a Table Scan to an index seek and the report went from 40 minutes to 3 seconds.:-D

    If you give us enough info there are people on the forum who can help you.

    Cheers

    Leo

    Leo
    Nothing in life is ever so complicated that with a little work it can't be made more complicated.

  • Hi,

    Gail Shaw has an excellent article on here covering how to post queries relating to performance issues in order to give people the information they need in order to assist.

    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

    Have a look at this and follow her advice and I'm sure you'll get an answer to your problem, and in all probability a method of improving your response time, and resource use, dramatically

  • Jeff Moden (9/20/2010)


    The second most important part of optimizing is to know how long [font="Arial Black"]each part [/font]took using the old way and how long it takes using the new way. Otherwise, you're tuning blindly and are likely going to make things very much worse. 🙂

    Exactly the point I was trying to make. Thanks for clarifying it, Jeff.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Actually I didn't post that info on purpose, I just wanted to show a case where the best practices aren't the bes, and the question at the final part of my post is "where can I find the very best practices for tuning?" sice I followed all of the ones I know, all my querys use index seek or clustered index seek and the original one use index scan in all the cases, so I want to know if someone has a set of rules that we should follow when we write or try to optimize something like this

    Thanks

  • ricardo_chicas (9/21/2010)


    Actually I didn't post that info on purpose, I just wanted to show a case where the best practices aren't the bes, and the question at the final part of my post is "where can I find the very best practices for tuning?" sice I followed all of the ones I know, all my querys use index seek or clustered index seek and the original one use index scan in all the cases, so I want to know if someone has a set of rules that we should follow when we write or try to optimize something like this

    Thanks

    I think you missed one of the most important ones, which I alluded to in my original post. Only return the rows that you need to return. Is it really necessary to insert 50million rows into a temp table in order to return 5 million rows? The answer may be "Yes", but without details no one can suggest a better way. You could be experiencing a ton of contention in tempdb, especially if there are other processes that work the same way.

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply