Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««1234»»»

Randomizing Result Sets with NEWID Expand / Collapse
Author
Message
Posted Monday, March 1, 2010 7:51 AM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Yesterday @ 11:40 AM
Points: 810, Visits: 2,121
Neil Franken (3/1/2010)
Hi There

Cool article but here is what I have found with newid().

...

Right lets say the profile(salary) matches 1,5 million rows in the database. SQL server will return all 1,5 million rows then ORDER BY. Now I mentioned that a GUID is very very random. This causes a high cpu load on the server as the poor Server now has to first sort the 1,5 million rows then it can return the top 1000. Think about it. It first has to sort before it can return. I have tested this and it does not matter if I return 1 row or 750 000 out of the 1,5 million rows that matches the query it constantly runs at the same speed. The top can only be applied once the sorting is done. Granted for small tables and non mission critical queries this technique can work well I would not use it on large tables as you will create a bottle neck.

...


One option, especially if you have an indexed identity column on your source table, is to generate a separate table of random row numbers, create a clustered index on it, and join with the original table.
create table #lookup_table(row_num int)
declare @ctr int, @samplesize int
set @ctr = 0
set @samplesize = 1000 -- for example, a sample size of 1,000 is needed
while @ctr < @samplesize
BEGIN
insert into #lookup_table select abs(checksum(newid()))
set @ctr = @ctr + 1
END
create clustered index idxc on #lookup_table(row_num)

Do a join on this table and it should go much more quickly, so the entire original table would not be loaded. Not much chance there will be duplicate rows with this method as INT can be up to 2,147,483,647.


Gaby
________________________________________________________________
"In theory, theory and practice are the same. In practice, they are not."
- Albert Einstein
Post #874423
Posted Monday, March 1, 2010 11:48 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 2:50 AM
Points: 11,192, Visits: 11,098
adish (3/1/2010)
I've been using GUID as PKs, but this is a novel way of using it. Great.

Not a clustered Primary Key, I trust?




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #874608
Posted Monday, March 1, 2010 11:54 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 2:50 AM
Points: 11,192, Visits: 11,098
GabyYYZ (3/1/2010)
[quote]One option, especially if you have an indexed identity column on your source table, is to generate a separate table of random row numbers, create a clustered index on it, and join with the original table.

Nice idea. Of course, the 'random' numbers are then a bit, er, 'fixed' aren't they?
Can't believe you used a RBAR method to populate your table.
For smallish numbers of random rows, I prefer an approach very similar to the one posted by Gary earlier.
It does require a table with a sequential ID, but that's pretty common - excepting those that like GUIDs as a PK *shudder*




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #874613
Posted Monday, March 1, 2010 12:06 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Monday, June 23, 2014 11:55 AM
Points: 1,708, Visits: 1,792
Paul White (3/1/2010)
adish (3/1/2010)
I've been using GUID as PKs, but this is a novel way of using it. Great.

Not a clustered Primary Key, I trust?


It probably was clustered, its common for App Developers to do this kind of thing. It happened at Microsoft around the Windows 7 RC downloads...

http://www.sqlskills.com/BLOGS/PAUL/post/Why-did-the-Windows-7-RC-failure-happen.aspx



Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
My Blog | Twitter | MVP Profile
Training | Consulting | Become a SQLskills Insider
Troubleshooting SQL Server: A Guide for Accidental DBAs
Post #874621
Posted Monday, March 1, 2010 2:36 PM


SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Friday, January 3, 2014 10:59 AM
Points: 626, Visits: 836
Paul White (3/1/2010)
adish (3/1/2010)
I've been using GUID as PKs, but this is a novel way of using it. Great.

Not a clustered Primary Key, I trust?



obviously the super-best data type for a PK is one that increments by 1 each time so the index is appended on each insert and there is no fragmentation,

Any other data type is just as likely to fragment as a guid. I left-handedly proved this in one of my shamefully RBAR riddled articles: http://www.sqlservercentral.com/articles/Indexing/64424/


It's the random insertion, not the datatype that causes the problem.
What's the differece if the data inserted is Johnson, Jonsonn, Johnsen or three guids?
Under the hood, there's not a difference.

The only way you're not going to frag your index at an equal rate as a guid is if the data is inserted in the same order as the clustered index key. In OLTP, that's pretty unlikely except with and identity (which has it's own set of issues).

~Craig O.








Craig Outcalt



Tips for new DBAs: http://www.sqlservercentral.com/articles/Career/64632
My other articles: http://www.sqlservercentral.com/Authors/Articles/Craig_Outcalt/560258
Post #874727
Posted Monday, March 1, 2010 2:42 PM


SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Friday, January 3, 2014 10:59 AM
Points: 626, Visits: 836
I also forgot to say that there is the NEWSEQUENTIALID() function if you're going to batch load.
That should wreck your cluidx a little less, too.



Craig Outcalt



Tips for new DBAs: http://www.sqlservercentral.com/articles/Career/64632
My other articles: http://www.sqlservercentral.com/Authors/Articles/Craig_Outcalt/560258
Post #874732
Posted Monday, March 1, 2010 7:14 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 2:50 AM
Points: 11,192, Visits: 11,098
Jonathan Kehayias (3/1/2010)
It probably was clustered, its common for App Developers to do this kind of thing. It happened at Microsoft around the Windows 7 RC downloads...

Ah yes, I remember that one
Highly amusing at the time...




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #874821
Posted Monday, March 1, 2010 7:23 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Monday, June 23, 2014 11:55 AM
Points: 1,708, Visits: 1,792
SQLBOT (3/1/2010)


It's the random insertion, not the datatype that causes the problem.
What's the differece if the data inserted is Johnson, Jonsonn, Johnsen or three guids?
Under the hood, there's not a difference.



Craig,

Respectfully, the rate of fragmenation partially depends on what the datatype is for the column. If you are inserting random values into a varchar(8) column, the end impact for fragmentation would be different than a char(8), nchar(8) or nvarchar(8) column because the storage size is different for each so fragmentation rates would be different. A GUID is 16 bytes so it takes more space = fuller pages faster = more page splits = faster fragmentation rates.

Your point is accurate, just playing semantics with you is all.


Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
My Blog | Twitter | MVP Profile
Training | Consulting | Become a SQLskills Insider
Troubleshooting SQL Server: A Guide for Accidental DBAs
Post #874823
Posted Monday, March 1, 2010 8:07 PM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Tuesday, July 22, 2014 6:57 PM
Points: 231, Visits: 489
Paul White (3/1/2010)
Jonathan Kehayias (3/1/2010)
It probably was clustered, its common for App Developers to do this kind of thing. It happened at Microsoft around the Windows 7 RC downloads...

Ah yes, I remember that one
Highly amusing at the time...


It is clustered. Am I doing something terribly wrong? Some light on this please, I am not an expert in SQL and would appreciate guidance from SQL gurus.
Thanks
Post #874831
Posted Monday, March 1, 2010 8:13 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Monday, June 23, 2014 11:55 AM
Points: 1,708, Visits: 1,792
adish (3/1/2010)


It is clustered. Am I doing something terribly wrong? Some light on this please, I am not an expert in SQL and would appreciate guidance from SQL gurus.
Thanks


adish,

Read the blog post on the link I provided by Paul Randal. It explains why having a clustered index/primary key on GUID is suboptimal.


Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
My Blog | Twitter | MVP Profile
Training | Consulting | Become a SQLskills Insider
Troubleshooting SQL Server: A Guide for Accidental DBAs
Post #874835
« Prev Topic | Next Topic »

Add to briefcase ««1234»»»

Permissions Expand / Collapse