SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Randomizing Result Sets with NEWID


Randomizing Result Sets with NEWID

Author
Message
GabyYYZ
GabyYYZ
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1123 Visits: 2336
Neil Franken (3/1/2010)
Hi There

Cool article but here is what I have found with newid().

...

Right lets say the profile(salary) matches 1,5 million rows in the database. SQL server will return all 1,5 million rows then ORDER BY. Now I mentioned that a GUID is very very random. This causes a high cpu load on the server as the poor Server now has to first sort the 1,5 million rows then it can return the top 1000. Think about it. It first has to sort before it can return. I have tested this and it does not matter if I return 1 row or 750 000 out of the 1,5 million rows that matches the query it constantly runs at the same speed. The top can only be applied once the sorting is done. Granted for small tables and non mission critical queries this technique can work well I would not use it on large tables as you will create a bottle neck.

...


One option, especially if you have an indexed identity column on your source table, is to generate a separate table of random row numbers, create a clustered index on it, and join with the original table.

create table #lookup_table(row_num int)
declare @ctr int, @samplesize int
set @ctr = 0
set @samplesize = 1000 -- for example, a sample size of 1,000 is needed
while @ctr < @samplesize
BEGIN
insert into #lookup_table select abs(checksum(newid()))
set @ctr = @ctr + 1
END
create clustered index idxc on #lookup_table(row_num)


Do a join on this table and it should go much more quickly, so the entire original table would not be loaded. Not much chance there will be duplicate rows with this method as INT can be up to 2,147,483,647.

Gaby
________________________________________________________________
"In theory, theory and practice are the same. In practice, they are not."
- Albert Einstein

Paul White
Paul White
SSCoach
SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)

Group: General Forum Members
Points: 19982 Visits: 11359
adish (3/1/2010)
I've been using GUID as PKs, but this is a novel way of using it. Great.

Not a clustered Primary Key, I trust?



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Paul White
Paul White
SSCoach
SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)

Group: General Forum Members
Points: 19982 Visits: 11359
GabyYYZ (3/1/2010)
[quote]One option, especially if you have an indexed identity column on your source table, is to generate a separate table of random row numbers, create a clustered index on it, and join with the original table.

Nice idea. Of course, the 'random' numbers are then a bit, er, 'fixed' aren't they?
Can't believe you used a RBAR method to populate your table. :-P
For smallish numbers of random rows, I prefer an approach very similar to the one posted by Gary earlier.
It does require a table with a sequential ID, but that's pretty common - excepting those that like GUIDs as a PK *shudder*



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Jonathan Kehayias
Jonathan Kehayias
Hall of Fame
Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)

Group: General Forum Members
Points: 3658 Visits: 1807
Paul White (3/1/2010)
adish (3/1/2010)
I've been using GUID as PKs, but this is a novel way of using it. Great.

Not a clustered Primary Key, I trust?


It probably was clustered, its common for App Developers to do this kind of thing. It happened at Microsoft around the Windows 7 RC downloads...

http://www.sqlskills.com/BLOGS/PAUL/post/Why-did-the-Windows-7-RC-failure-happen.aspx

Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
My Blog | Twitter | MVP Profile
Training | Consulting | Become a SQLskills Insider
Troubleshooting SQL Server: A Guide for Accidental DBAs
SQLBOT
SQLBOT
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1158 Visits: 836
Paul White (3/1/2010)
adish (3/1/2010)
I've been using GUID as PKs, but this is a novel way of using it. Great.

Not a clustered Primary Key, I trust?



obviously the super-best data type for a PK is one that increments by 1 each time so the index is appended on each insert and there is no fragmentation,

Any other data type is just as likely to fragment as a guid. I left-handedly proved this in one of my shamefully RBAR riddled articles: http://www.sqlservercentral.com/articles/Indexing/64424/


It's the random insertion, not the datatype that causes the problem.
What's the differece if the data inserted is Johnson, Jonsonn, Johnsen or three guids?
Under the hood, there's not a difference.

The only way you're not going to frag your index at an equal rate as a guid is if the data is inserted in the same order as the clustered index key. In OLTP, that's pretty unlikely except with and identity (which has it's own set of issues).

~Craig O.

Craig Outcalt



Tips for new DBAs: http://www.sqlservercentral.com/articles/Career/64632
My other articles: http://www.sqlservercentral.com/Authors/Articles/Craig_Outcalt/560258
SQLBOT
SQLBOT
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1158 Visits: 836
I also forgot to say that there is the NEWSEQUENTIALID() function if you're going to batch load.
That should wreck your cluidx a little less, too.

Craig Outcalt



Tips for new DBAs: http://www.sqlservercentral.com/articles/Career/64632
My other articles: http://www.sqlservercentral.com/Authors/Articles/Craig_Outcalt/560258
Paul White
Paul White
SSCoach
SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)

Group: General Forum Members
Points: 19982 Visits: 11359
Jonathan Kehayias (3/1/2010)
It probably was clustered, its common for App Developers to do this kind of thing. It happened at Microsoft around the Windows 7 RC downloads...

Ah yes, I remember that one :-D
Highly amusing at the time...



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Jonathan Kehayias
Jonathan Kehayias
Hall of Fame
Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)

Group: General Forum Members
Points: 3658 Visits: 1807
SQLBOT (3/1/2010)


It's the random insertion, not the datatype that causes the problem.
What's the differece if the data inserted is Johnson, Jonsonn, Johnsen or three guids?
Under the hood, there's not a difference.



Craig,

Respectfully, the rate of fragmenation partially depends on what the datatype is for the column. If you are inserting random values into a varchar(8) column, the end impact for fragmentation would be different than a char(8), nchar(8) or nvarchar(8) column because the storage size is different for each so fragmentation rates would be different. A GUID is 16 bytes so it takes more space = fuller pages faster = more page splits = faster fragmentation rates.

Your point is accurate, just playing semantics with you is all. ;-)

Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
My Blog | Twitter | MVP Profile
Training | Consulting | Become a SQLskills Insider
Troubleshooting SQL Server: A Guide for Accidental DBAs
adish
adish
SSC Veteran
SSC Veteran (263 reputation)SSC Veteran (263 reputation)SSC Veteran (263 reputation)SSC Veteran (263 reputation)SSC Veteran (263 reputation)SSC Veteran (263 reputation)SSC Veteran (263 reputation)SSC Veteran (263 reputation)

Group: General Forum Members
Points: 263 Visits: 628
Paul White (3/1/2010)
Jonathan Kehayias (3/1/2010)
It probably was clustered, its common for App Developers to do this kind of thing. It happened at Microsoft around the Windows 7 RC downloads...

Ah yes, I remember that one :-D
Highly amusing at the time...


It is clustered. Am I doing something terribly wrong? Some light on this please, I am not an expert in SQL and would appreciate guidance from SQL gurus.
Thanks
Jonathan Kehayias
Jonathan Kehayias
Hall of Fame
Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)

Group: General Forum Members
Points: 3658 Visits: 1807
adish (3/1/2010)


It is clustered. Am I doing something terribly wrong? Some light on this please, I am not an expert in SQL and would appreciate guidance from SQL gurus.
Thanks


adish,

Read the blog post on the link I provided by Paul Randal. It explains why having a clustered index/primary key on GUID is suboptimal.

Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
My Blog | Twitter | MVP Profile
Training | Consulting | Become a SQLskills Insider
Troubleshooting SQL Server: A Guide for Accidental DBAs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search