Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Randomizing Result Sets with NEWID


Randomizing Result Sets with NEWID

Author
Message
seth delconte
seth delconte
SSCommitted
SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)

Group: General Forum Members
Points: 1711 Visits: 1360
Comments posted to this topic are about the item Randomizing Result Sets with NEWID

_________________________________
seth delconte
http://sqlkeys.com
Norman Rasmussen
Norman Rasmussen
Old Hand
Old Hand (383 reputation)Old Hand (383 reputation)Old Hand (383 reputation)Old Hand (383 reputation)Old Hand (383 reputation)Old Hand (383 reputation)Old Hand (383 reputation)Old Hand (383 reputation)

Group: General Forum Members
Points: 383 Visits: 55
For the purpose of the article, you probably want to select all customers with order_count >= min(order_count of top 10). That way you don't exclude customers that sort 'later' and might otherwise get excluded.
Neil Franken
Neil Franken
SSC Veteran
SSC Veteran (241 reputation)SSC Veteran (241 reputation)SSC Veteran (241 reputation)SSC Veteran (241 reputation)SSC Veteran (241 reputation)SSC Veteran (241 reputation)SSC Veteran (241 reputation)SSC Veteran (241 reputation)

Group: General Forum Members
Points: 241 Visits: 84
Hi There

Cool article but here is what I have found with newid().

Doing a order by newid() is a performance killer. Let me explain. We have a table of 36 million prospective customers. We send leads to sales agent centres daily. They want random data. Great so we have been using order by newid() for ages. The problem is that is it extremely slow. Painfully slow. Here is why. A GUID returned by a newid() operation is essentially a very random number. Keep this fact in mind. In my daily tasks we send various amounts of leads out of our system for different centres. Basically we have a query like this(simplified the select for readability).

SELECT TOP 1000 leadname,contactdetails -- The TOP is variable per call centre
FROM Prospects
WHERE Salary=>2500 -- we match our prospect profile here

Right lets say the profile(salary) matches 1,5 million rows in the database. SQL server will return all 1,5 million rows then ORDER BY. Now I mentioned that a GUID is very very random. This causes a high cpu load on the server as the poor Server now has to first sort the 1,5 million rows then it can return the top 1000. Think about it. It first has to sort before it can return. I have tested this and it does not matter if I return 1 row or 750 000 out of the 1,5 million rows that matches the query it constantly runs at the same speed. The top can only be applied once the sorting is done. Granted for small tables and non mission critical queries this technique can work well I would not use it on large tables as you will create a bottle neck.

For larger tables it might worth randomizing the data on insert and not having to worry about the randomization during extraction. By the way newid() on a table as the clustered key is not a good idea as the fragmentation of your tables will remain consistently high.

Hope that helps.

Regards
adish
adish
SSC Veteran
SSC Veteran (233 reputation)SSC Veteran (233 reputation)SSC Veteran (233 reputation)SSC Veteran (233 reputation)SSC Veteran (233 reputation)SSC Veteran (233 reputation)SSC Veteran (233 reputation)SSC Veteran (233 reputation)

Group: General Forum Members
Points: 233 Visits: 607
I've been using GUID as PKs, but this is a novel way of using it. Great.
gary-915906
gary-915906
Forum Newbie
Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)

Group: General Forum Members
Points: 6 Visits: 23
Id agree - newid() is a performance killer.

its much faster to do something like this...


DECLARE @RandomNumber float
DECLARE @RandomInteger int
DECLARE @MaxValue int
DECLARE @MinValue int

SELECT @MinValue = MIN(Id),
@MaxValue = MAX(Id)
FROM dbo.SomeTable

SELECT @RandomNumber = RAND()

SELECT @RandomInteger = ((@MaxValue + 1) - @MinValue) * @RandomNumber + @MinValue

SELECT TOP (1) *
FROM dbo.SomeTable
WHERE Id >= @RandomInteger


DannyS
DannyS
Forum Newbie
Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)

Group: General Forum Members
Points: 6 Visits: 57
GUID and SQL Server function Newid() create globally unique identifiers. This is not they same as random and likely does not have very good random properties.

One digit (16 bits) is used to id the algorithm. I'm sure a good many bits represent the time of generation.

If your using SQL Server 2008, then CRYPT_GEN_RANDOM(n) (n= number of digits), creates a cryptographically secure pseudo random number. These are usually the best available without a true hardware random number generator and they do execute for each row, unlike rand()

If your on an earlier version, but have 2008 available consider a linked query to obtain the numbers, or write a custom CLR ( see System.Security.Cryptography.RNGCryptoServiceProvider. )
David.Poole
David.Poole
SSCarpal Tunnel
SSCarpal Tunnel (4.6K reputation)SSCarpal Tunnel (4.6K reputation)SSCarpal Tunnel (4.6K reputation)SSCarpal Tunnel (4.6K reputation)SSCarpal Tunnel (4.6K reputation)SSCarpal Tunnel (4.6K reputation)SSCarpal Tunnel (4.6K reputation)SSCarpal Tunnel (4.6K reputation)

Group: General Forum Members
Points: 4609 Visits: 3186
I've done quite a bit of testing of NEWID() and randomness and it is a very effective random number generator.

As the GUID comprises of blocks of hexadecimal these blocks can be converted into integer values.
I took a customer file where the PK was a GUID and based a partitioning scheme on a converted integer value and modulo 16 and it ended up with near ideal partition distribution.

Try the following

DECLARE
@Test char(4),
@MyInt INT,
@SQL NVARCHAR(200),
@ParmDefinition NVARCHAR(200)

SET @Test=LEFT(NEWID(),4)
SET @SQL='SET @MyInt=CONVERT(INT,0x'+@Test+')'
SET @ParmDefinition=N'@MyInt int OUTPUT'

EXEC sp_Executesql
@SQL,
@ParmDefinition,
@MyInt = @MyInt OUTPUT


SELECT @Test,@MyInt

LinkedIn Profile

Newbie on www.simple-talk.com
DannyS
DannyS
Forum Newbie
Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)

Group: General Forum Members
Points: 6 Visits: 57
The values generated have some nice properties, but aren't random.

Converting the first 8 digits to an integer, hex2dec(left([guid],8)), the numbers have definate correlation with the previous generated value, and the 16th previous value, even when the generated ID are generated by deecidely unpredictable times (visitors to a website taking a particular action)

In the prize example, every 16th person might have a 10% or more greater chance than the others.
Martin Vrieze
Martin Vrieze
Old Hand
Old Hand (358 reputation)Old Hand (358 reputation)Old Hand (358 reputation)Old Hand (358 reputation)Old Hand (358 reputation)Old Hand (358 reputation)Old Hand (358 reputation)Old Hand (358 reputation)

Group: General Forum Members
Points: 358 Visits: 125
I like the concept of using a guid as a pseudo random number. Very novel approach!!!

I've always used auxillary tables in tempdb combined with forward-only cursors to assign unique, random numbers to each record for setting up direct marketing test panels...I generally do not use cursors but this was quite fast for my needs and direct marketing / database marketing queries are mostly ad-hoc in nature anyhow and my systems have not had to worry about performance hits like a production OLTP system would.

This alternate approach you outline IS going to get tested.

Well Done!!!
Kendal Van Dyke
Kendal Van Dyke
SSC-Addicted
SSC-Addicted (466 reputation)SSC-Addicted (466 reputation)SSC-Addicted (466 reputation)SSC-Addicted (466 reputation)SSC-Addicted (466 reputation)SSC-Addicted (466 reputation)SSC-Addicted (466 reputation)SSC-Addicted (466 reputation)

Group: General Forum Members
Points: 466 Visits: 983
Using NEWID to do a random sort or grab a random number of rows from a result set is a HUGE performance killer and does not scale well. I've had developers slip this kind of stuff into production and in less than a minute the CPUs were pegged at 100%.

Wile this method works, I do not recommend it on anything beyond one time ad-hoc DBA queries or infrequently used applications. You can sort randomly much more efficiently using RAND. Rather than type a lengthy explanation here of how I will submit an article.

Kendal Van Dyke
http://kendalvandyke.blogspot.com/
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search