Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 1234»»»

Randomizing Result Sets with NEWID Expand / Collapse
Author
Message
Posted Saturday, February 27, 2010 11:44 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Friday, February 21, 2014 7:54 AM
Points: 1,619, Visits: 1,233
Comments posted to this topic are about the item Randomizing Result Sets with NEWID

_________________________________
seth delconte
http://sqlkeys.com
Post #874034
Posted Sunday, February 28, 2010 11:02 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Monday, June 16, 2014 3:57 AM
Points: 379, Visits: 55
For the purpose of the article, you probably want to select all customers with order_count >= min(order_count of top 10). That way you don't exclude customers that sort 'later' and might otherwise get excluded.
Post #874310
Posted Monday, March 1, 2010 12:13 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Friday, April 16, 2010 2:43 AM
Points: 239, Visits: 84
Hi There

Cool article but here is what I have found with newid().

Doing a order by newid() is a performance killer. Let me explain. We have a table of 36 million prospective customers. We send leads to sales agent centres daily. They want random data. Great so we have been using order by newid() for ages. The problem is that is it extremely slow. Painfully slow. Here is why. A GUID returned by a newid() operation is essentially a very random number. Keep this fact in mind. In my daily tasks we send various amounts of leads out of our system for different centres. Basically we have a query like this(simplified the select for readability).

SELECT TOP 1000 leadname,contactdetails -- The TOP is variable per call centre
FROM Prospects
WHERE Salary=>2500 -- we match our prospect profile here

Right lets say the profile(salary) matches 1,5 million rows in the database. SQL server will return all 1,5 million rows then ORDER BY. Now I mentioned that a GUID is very very random. This causes a high cpu load on the server as the poor Server now has to first sort the 1,5 million rows then it can return the top 1000. Think about it. It first has to sort before it can return. I have tested this and it does not matter if I return 1 row or 750 000 out of the 1,5 million rows that matches the query it constantly runs at the same speed. The top can only be applied once the sorting is done. Granted for small tables and non mission critical queries this technique can work well I would not use it on large tables as you will create a bottle neck.

For larger tables it might worth randomizing the data on insert and not having to worry about the randomization during extraction. By the way newid() on a table as the clustered key is not a good idea as the fragmentation of your tables will remain consistently high.

Hope that helps.

Regards
Post #874323
Posted Monday, March 1, 2010 12:18 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Tuesday, July 22, 2014 6:57 PM
Points: 231, Visits: 489
I've been using GUID as PKs, but this is a novel way of using it. Great.
Post #874324
Posted Monday, March 1, 2010 12:57 AM


Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, April 9, 2010 8:36 AM
Points: 2, Visits: 23
Id agree - newid() is a performance killer.

its much faster to do something like this...

DECLARE @RandomNumber float
DECLARE @RandomInteger int
DECLARE @MaxValue int
DECLARE @MinValue int

SELECT @MinValue = MIN(Id),
@MaxValue = MAX(Id)
FROM dbo.SomeTable

SELECT @RandomNumber = RAND()

SELECT @RandomInteger = ((@MaxValue + 1) - @MinValue) * @RandomNumber + @MinValue

SELECT TOP (1) *
FROM dbo.SomeTable
WHERE Id >= @RandomInteger

Post #874329
Posted Monday, March 1, 2010 1:00 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, August 23, 2010 9:31 AM
Points: 6, Visits: 57
GUID and SQL Server function Newid() create globally unique identifiers. This is not they same as random and likely does not have very good random properties.

One digit (16 bits) is used to id the algorithm. I'm sure a good many bits represent the time of generation.

If your using SQL Server 2008, then CRYPT_GEN_RANDOM(n) (n= number of digits), creates a cryptographically secure pseudo random number. These are usually the best available without a true hardware random number generator and they do execute for each row, unlike rand()

If your on an earlier version, but have 2008 available consider a linked query to obtain the numbers, or write a custom CLR ( see System.Security.Cryptography.RNGCryptoServiceProvider. )

Post #874330
Posted Monday, March 1, 2010 2:04 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 12:02 PM
Points: 2,892, Visits: 1,785
I've done quite a bit of testing of NEWID() and randomness and it is a very effective random number generator.

As the GUID comprises of blocks of hexadecimal these blocks can be converted into integer values.
I took a customer file where the PK was a GUID and based a partitioning scheme on a converted integer value and modulo 16 and it ended up with near ideal partition distribution.

Try the following

DECLARE
@Test char(4),
@MyInt INT,
@SQL NVARCHAR(200),
@ParmDefinition NVARCHAR(200)

SET @Test=LEFT(NEWID(),4)
SET @SQL='SET @MyInt=CONVERT(INT,0x'+@Test+')'
SET @ParmDefinition=N'@MyInt int OUTPUT'

EXEC sp_Executesql
@SQL,
@ParmDefinition,
@MyInt = @MyInt OUTPUT


SELECT @Test,@MyInt


LinkedIn Profile
Newbie on www.simple-talk.com
Post #874338
Posted Monday, March 1, 2010 6:06 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, August 23, 2010 9:31 AM
Points: 6, Visits: 57
The values generated have some nice properties, but aren't random.

Converting the first 8 digits to an integer, hex2dec(left([guid],8)), the numbers have definate correlation with the previous generated value, and the 16th previous value, even when the generated ID are generated by deecidely unpredictable times (visitors to a website taking a particular action)

In the prize example, every 16th person might have a 10% or more greater chance than the others.
Post #874376
Posted Monday, March 1, 2010 6:52 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Tuesday, December 7, 2010 8:19 AM
Points: 327, Visits: 123
I like the concept of using a guid as a pseudo random number. Very novel approach!!!

I've always used auxillary tables in tempdb combined with forward-only cursors to assign unique, random numbers to each record for setting up direct marketing test panels...I generally do not use cursors but this was quite fast for my needs and direct marketing / database marketing queries are mostly ad-hoc in nature anyhow and my systems have not had to worry about performance hits like a production OLTP system would.

This alternate approach you outline IS going to get tested.

Well Done!!!
Post #874393
Posted Monday, March 1, 2010 7:20 AM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Today @ 10:48 AM
Points: 361, Visits: 858
Using NEWID to do a random sort or grab a random number of rows from a result set is a HUGE performance killer and does not scale well. I've had developers slip this kind of stuff into production and in less than a minute the CPUs were pegged at 100%.

Wile this method works, I do not recommend it on anything beyond one time ad-hoc DBA queries or infrequently used applications. You can sort randomly much more efficiently using RAND. Rather than type a lengthy explanation here of how I will submit an article.


Kendal Van Dyke
http://kendalvandyke.blogspot.com/
Post #874405
« Prev Topic | Next Topic »

Add to briefcase 1234»»»

Permissions Expand / Collapse