Primary Key creation - before and after bulk insert

  • Bhuvnesh (8/16/2010)


    first, why optimizer will get confused by rownumber ( it return bigint value) for int type output ?

    The optimiser can't trust the conversion to preserve the sort order, so an explicit sort is added to the plan.

    Second, SELECT ...INTO approach is minimal logged operation. why ? or any related article ?

    See Operations That Can Be Minimally Logged

    SELECT...INTO is always minimally logged - though exactly what gets logged depends on the recovery model (even FULL).

    I gave a full explanation of that in this thread: http://www.sqlservercentral.com/Forums/Topic912916-360-1.aspx#bm918624

  • Arjun, when you create your indexes, the statistics are normally created for that index. I have seen major differences between statistics created on an empty table and on a loaded table. Granted, it would take longer to create the indexes on a loaded table than an empty one but the results should be worth it.

    Steve Jimmo
    Sr DBA
    “If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan

  • sjimmo (8/16/2010)


    Arjun, when you create your indexes, the statistics are normally created for that index. I have seen major differences between statistics created on an empty table and on a loaded table. Granted, it would take longer to create the indexes on a loaded table than an empty one but the results should be worth it.

    Statistics are always created at the same time as an index, and always as if FULLSCAN had been specified.

    SQL Server automatically updates statistics as the optimser needs them (by default).

    See the following links for details:

    Statistics Used By the Query Optimiser

    Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005

  • SELECT...INTO is always minimally logged ... (even FULL).

    That's not my understanding, especially for SQL 2005.

    In FULL recovery mode, I thought you got FULL logging, as otherwise point-in-time recovery is not possible, which FULL mode must support.

    I think even your own link -- Operations That Can Be Minimally Logged -- makes that point 🙂 .

    So, for SQL 2005, I think you must drop down from FULL recovery to get minimal logging. Whether or not you can afford to do that in a production system to gain load time is up to each db controller, of course. But make sure you check the plan and verify that you are actually getting minimal logging, since other things can affect that as well.

    For example, see the table at the end of this blog by :

    http://blogs.msdn.com/b/sqlserverstorageengine/archive/2008/02/05/bulk-logging-optimizations-minimal-logging.aspx

    [MS made some enhancements in 2008 to increase the changes of using minimal logging, but even then the restrictions can catch you.]

    Scott Pletcher, SQL Server MVP 2008-2010

  • Scott,

    My views are fully expressed in the forum link I gave.

    Go with whatever definition works best for you!

    Thanks,

    Paul

    P.S. One more link for you:

    http://www.developersdex.com/sql/message.asp?p=580&r=6876205

    In which Kalen Delaney says:

    SELECT INTO is always a minimally logged operation. Exactly how much

    gets logged is different in the different recovery models, but it is never

    logged as a row at a time operation.

    It's a bit like TRUNCATE TABLE in a way 😉

  • And a short time later she posted this [emphasis added], after a post from a MS person further pointing out the restrictions for minimal logging:

    "

    The document has a nice chart showing exactly what conditions are

    needed to allow minimal logging, depending on your existing table structure.

    The article also describes that even if minimal logging is done for

    INSERT/SELECT, it may not be done for every ROW inserted. You should

    probably reread the article.

    "

    She seems to have had her memory refreshed on the topic by that other post 🙂 .

    Scott Pletcher, SQL Server MVP 2008-2010

  • scott.pletcher (8/16/2010)


    And a short time later she posted this [emphasis added], after a post from a MS person further pointing out the restrictions for minimal logging:

    "

    The document has a nice chart showing exactly what conditions are

    needed to allow minimal logging, depending on your existing table structure.

    The article also describes that even if minimal logging is done for

    INSERT/SELECT, it may not be done for every ROW inserted. You should

    probably reread the article.

    "

    She seems to have had her memory refreshed on the topic by that other post 🙂 .

    No, she's talking about INSERT...SELECT there, with reference to the mis-typed Trace Flag (160) mentioned earlier in the thread.

    Anyway, look, Scott. I'm familiar with your posting history here, and I'm not getting involved.

    Choose to think what you will: minimally logged, optimally logged, logged with bulk load optimizations - I don't much care.

    Paul

  • I'm trying to make sure people don't get a mistaken impression of how SELECT ... INTO works based on absolutist statements that aren't true.

    It's very clear from all the articles that SELECT ... INTO is not necessarily minimally logged in SQL Server 2005. So it's unfair to others to categorically state that it is over and over when that's not true.

    Scott Pletcher, SQL Server MVP 2008-2010

  • Statistics are always created at the same time as an index, and always as if FULLSCAN had been specified. SQL Server automatically updates statistics as the optimser needs them (by default).

    It is important to note (which is why I said normally) the term by default. If auto create/auto update of statistics is turned off, then you will have to manually create them. And Paul is absolutely correct about the statistics being created as with FULLSCAN.

    Thanks Paul.

    Steve Jimmo
    Sr DBA
    “If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan

  • Hey, thanks guys. I will run the particular query in question in both ways - insert into and select into; and go with whichever works in less time. Thanks for the links.

    - arjun

    https://sqlroadie.com/

  • Guys,

    In case you were wondering about this:

    scott.pletcher (8/16/2010)


    But make sure you check the plan and verify that you are actually getting minimal logging, since other things can affect that as well.

    There's no way to tell from the query plan whether minimal logging has occurred or not.

    Let us know how you get on Arjun.

    Paul

  • Arjun Sivadasan (8/16/2010)


    Hey, thanks guys. I will run the particular query in question in both ways - insert into and select into; and go with whichever works in less time. Thanks for the links.

    - arjun

    Try with good volume data

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • Paul White NZ (8/17/2010)


    Guys,

    In case you were wondering about this:

    scott.pletcher (8/16/2010)


    But make sure you check the plan and verify that you are actually getting minimal logging, since other things can affect that as well.

    There's no way to tell from the query plan whether minimal logging has occurred or not.

    Paul

    Ya, I could make that out from the links you had shared Paul. I checked my query and it runs in less time if I use SELECT ... INTO and create the clustered index after bulk insertion. I use SQL Server 2005. So, I think I will stick with that and even in the event of an upgrade in the future, this will work just fine. INSERT INTO performs only slightly better than SELECT .. INTO in 2008.

    So, in short, all my questions are answered Paul and I learned a few other things as well. Thanks everyone.

    - arjun

    https://sqlroadie.com/

  • I know this is an old thread, but I just wanted to add some recent experience.

    I have a load process that, under normal operating conditions, loads anywhere from under a million to a bit over a million rows. It creates a table, fills it, then applies the PK. From there, it does a merge into the main table. Typically, this works fine.

    Over the weekend, we wanted to refresh a much larger range of history in the main table (about 12-months' worth.) The result was something on the order of a quarter of a billion rows. The step to create the primary key ran for about 3 hours before ultimately causing so much disk thrashing that the entire server became disk-bound. Website connections were failing, GUI functions in SSMS were failing. I couldn't even stop the Agent job responsible using the GUI, I had to use the T-SQL command for it.

    As far as I can tell, the takeaway is that if you have an exceedingly fast disk array, maybe you can pull off clustered index/PK creation after the table load for excessively large tables. But in this instance, it probably would have worked better if I created the PK first. While this would slow down the load, it would have prevented the performance-destroying disk thrashing at the end.

Viewing 14 posts - 16 through 28 (of 28 total)

You must be logged in to reply to this topic. Login to reply