Primary Key creation - before and after bulk insert

Question

Primary Key creation - before and after bulk insert

Viewing 14 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply

Paul White SSC Guru Points: 150468 More actions · Answer 1

Bhuvnesh (8/16/2010)
first, why optimizer will get confused by rownumber ( it return bigint value) for int type output ?

The optimiser can't trust the conversion to preserve the sort order, so an explicit sort is added to the plan.

Second, SELECT ...INTO approach is minimal logged operation. why ? or any related article ?

See Operations That Can Be Minimally Logged

SELECT...INTO is always minimally logged - though exactly what gets logged depends on the recovery model (even FULL).

I gave a full explanation of that in this thread: http://www.sqlservercentral.com/Forums/Topic912916-360-1.aspx#bm918624

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

sjimmo SSChampion Points: 11139 More actions · Answer 2

Arjun, when you create your indexes, the statistics are normally created for that index. I have seen major differences between statistics created on an empty table and on a loaded table. Granted, it would take longer to create the indexes on a loaded table than an empty one but the results should be worth it.

Steve Jimmo
Sr DBA
“If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan

Paul White SSC Guru Points: 150468 More actions · Answer 3

sjimmo (8/16/2010)
Arjun, when you create your indexes, the statistics are normally created for that index. I have seen major differences between statistics created on an empty table and on a loaded table. Granted, it would take longer to create the indexes on a loaded table than an empty one but the results should be worth it.

Statistics are always created at the same time as an index, and always as if FULLSCAN had been specified.

SQL Server automatically updates statistics as the optimser needs them (by default).

See the following links for details:

Statistics Used By the Query Optimiser

Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

scott.pletcher SSCertifiable Points: 5958 More actions · Answer 4

SELECT...INTO is always minimally logged ... (even FULL).

That's not my understanding, especially for SQL 2005.

In FULL recovery mode, I thought you got FULL logging, as otherwise point-in-time recovery is not possible, which FULL mode must support.

I think even your own link -- Operations That Can Be Minimally Logged -- makes that point 🙂 .

So, for SQL 2005, I think you must drop down from FULL recovery to get minimal logging. Whether or not you can afford to do that in a production system to gain load time is up to each db controller, of course. But make sure you check the plan and verify that you are actually getting minimal logging, since other things can affect that as well.

For example, see the table at the end of this blog by :

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2008/02/05/bulk-logging-optimizations-minimal-logging.aspx

[MS made some enhancements in 2008 to increase the changes of using minimal logging, but even then the restrictions can catch you.]

Scott Pletcher, SQL Server MVP 2008-2010

Paul White SSC Guru Points: 150468 More actions · Answer 5

Scott,

My views are fully expressed in the forum link I gave.

Go with whatever definition works best for you!

Thanks,

Paul

P.S. One more link for you:

http://www.developersdex.com/sql/message.asp?p=580&r=6876205

In which Kalen Delaney says:

SELECT INTO is always a minimally logged operation. Exactly how much

gets logged is different in the different recovery models, but it is never

logged as a row at a time operation.

It's a bit like TRUNCATE TABLE in a way 😉

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

scott.pletcher SSCertifiable Points: 5958 More actions · Answer 6

And a short time later she posted this [emphasis added], after a post from a MS person further pointing out the restrictions for minimal logging:

"

The document has a nice chart showing exactly what conditions are

needed to allow minimal logging, depending on your existing table structure.

The article also describes that even if minimal logging is done for

INSERT/SELECT, it may not be done for every ROW inserted. You should

probably reread the article.

"

She seems to have had her memory refreshed on the topic by that other post 🙂 .

Scott Pletcher, SQL Server MVP 2008-2010

Paul White SSC Guru Points: 150468 More actions · Answer 7

scott.pletcher (8/16/2010)
And a short time later she posted this [emphasis added], after a post from a MS person further pointing out the restrictions for minimal logging:
"
The document has a nice chart showing exactly what conditions are
needed to allow minimal logging, depending on your existing table structure.
The article also describes that even if minimal logging is done for
INSERT/SELECT, it may not be done for every ROW inserted. You should
probably reread the article.
"
She seems to have had her memory refreshed on the topic by that other post 🙂 .

No, she's talking about INSERT...SELECT there, with reference to the mis-typed Trace Flag (160) mentioned earlier in the thread.

Anyway, look, Scott. I'm familiar with your posting history here, and I'm not getting involved.

Choose to think what you will: minimally logged, optimally logged, logged with bulk load optimizations - I don't much care.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

scott.pletcher SSCertifiable Points: 5958 More actions · Answer 8

I'm trying to make sure people don't get a mistaken impression of how SELECT ... INTO works based on absolutist statements that aren't true.

It's very clear from all the articles that SELECT ... INTO is not necessarily minimally logged in SQL Server 2005. So it's unfair to others to categorically state that it is over and over when that's not true.

Scott Pletcher, SQL Server MVP 2008-2010

sjimmo SSChampion Points: 11139 More actions · Answer 9

Statistics are always created at the same time as an index, and always as if FULLSCAN had been specified. SQL Server automatically updates statistics as the optimser needs them (by default).

It is important to note (which is why I said normally) the term by default. If auto create/auto update of statistics is turned off, then you will have to manually create them. And Paul is absolutely correct about the statistics being created as with FULLSCAN.

Thanks Paul.

Steve Jimmo
Sr DBA
“If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan

Arjun Sivadasan SSCertifiable Points: 5902 More actions · Answer 10

Hey, thanks guys. I will run the particular query in question in both ways - insert into and select into; and go with whichever works in less time. Thanks for the links.

- arjun

https://sqlroadie.com/

Paul White SSC Guru Points: 150468 More actions · Answer 11

Guys,

In case you were wondering about this:

scott.pletcher (8/16/2010)
But make sure you check the plan and verify that you are actually getting minimal logging, since other things can affect that as well.

There's no way to tell from the query plan whether minimal logging has occurred or not.

Let us know how you get on Arjun.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Bhuvnesh SSC Guru Points: 59351 More actions · Answer 12

Arjun Sivadasan (8/16/2010)
Hey, thanks guys. I will run the particular query in question in both ways - insert into and select into; and go with whichever works in less time. Thanks for the links.
- arjun

Try with good volume data

-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

Arjun Sivadasan SSCertifiable Points: 5902 More actions · Answer 13

Paul White NZ (8/17/2010)
Guys,
In case you were wondering about this:
scott.pletcher (8/16/2010)
But make sure you check the plan and verify that you are actually getting minimal logging, since other things can affect that as well.
There's no way to tell from the query plan whether minimal logging has occurred or not.
Paul

Ya, I could make that out from the links you had shared Paul. I checked my query and it runs in less time if I use SELECT ... INTO and create the clustered index after bulk insertion. I use SQL Server 2005. So, I think I will stick with that and even in the event of an upgrade in the future, this will work just fine. INSERT INTO performs only slightly better than SELECT .. INTO in 2008.

So, in short, all my questions are answered Paul and I learned a few other things as well. Thanks everyone.

- arjun

https://sqlroadie.com/

Paul Golabowski SSChasing Mays Points: 646 More actions · Answer 14

I know this is an old thread, but I just wanted to add some recent experience.

I have a load process that, under normal operating conditions, loads anywhere from under a million to a bit over a million rows. It creates a table, fills it, then applies the PK. From there, it does a merge into the main table. Typically, this works fine.

Over the weekend, we wanted to refresh a much larger range of history in the main table (about 12-months' worth.) The result was something on the order of a quarter of a billion rows. The step to create the primary key ran for about 3 hours before ultimately causing so much disk thrashing that the entire server became disk-bound. Website connections were failing, GUI functions in SSMS were failing. I couldn't even stop the Agent job responsible using the GUI, I had to use the T-SQL command for it.

As far as I can tell, the takeaway is that if you have an exceedingly fast disk array, maybe you can pull off clustered index/PK creation after the table load for excessively large tables. But in this instance, it probably would have worked better if I created the PK first. While this would slow down the load, it would have prevented the performance-destroying disk thrashing at the end.