Randomizing Result Sets with NEWID

Question

Randomizing Result Sets with NEWID

Viewing 8 posts - 31 through 38 (of 38 total)

You must be logged in to reply to this topic. Login to reply

seth delconte SSCertifiable Points: 6388 More actions · Answer 1

nick.mcdermaid (3/7/2010)
Someone appears to have drawn some inspiration from your article.
http://subhrosaha.wordpress.com/

Yes it certainly looks that way - even the reference links are the same. A little credit would have been nice!

_________________________________
seth delconte
http://sqlkeys.com

GabyYYZ SSCertifiable Points: 7913 More actions · Answer 2

Seth Delconte (3/8/2010)
nick.mcdermaid (3/7/2010)
Someone appears to have drawn some inspiration from your article.
http://subhrosaha.wordpress.com/
Yes it certainly looks that way - even the reference links are the same. A little credit would have been nice!

He has been served.

EDIT: The comment I submitted is awaiting moderation. If he rejects it, this is what I wrote

At least properly credit the appropriate article if you’re going to copy from it (not the author of this original article but on the same forum) <link back to the article here>

Gaby________________________________________________________________"In theory, theory and practice are the same. In practice, they are not." - Albert Einstein

seth delconte SSCertifiable Points: 6388 More actions · Answer 3

GabyYYZ (3/9/2010)
Seth Delconte (3/8/2010)
nick.mcdermaid (3/7/2010)
Someone appears to have drawn some inspiration from your article.
http://subhrosaha.wordpress.com/
Yes it certainly looks that way - even the reference links are the same. A little credit would have been nice!
He has been served.
EDIT: The comment I submitted is awaiting moderation. If he rejects it, this is what I wrote
At least properly credit the appropriate article if you’re going to copy from it (not the author of this original article but on the same forum) <link back to the article here>

Ahh thank you Gaby!

_________________________________
seth delconte
http://sqlkeys.com

rd_in_sd Grasshopper Points: 16 More actions · Answer 4

You're missing one really important point.

Using GUID's is fine, but when the primary key is a clustered index, then you end up with half-or-less empty data pages which creates a lot of I/O and useless, partially empty data pages.

When you use a GUID as a primary key, make sure it isn't a clustered index or you'll get dog-poor performance.

Ron

Paul White SSC Guru Points: 150468 More actions · Answer 5

rd_in_sd (3/14/2010)
Using GUID's is fine, but when the primary key is a clustered index, then you end up with half-or-less empty data pages which creates a lot of I/O and useless, partially empty data pages. When you use a GUID as a primary key, make sure it isn't a clustered index or you'll get dog-poor performance.

This point has been raised numerous times already in this thread...and the answer is always the same: use the NEWSEQUENTIALID in a default constraint to avoid this issue completely. Full pages, minimal page-splits...globally-unique goodness.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

DannyS SSC Veteran Points: 231 More actions · Answer 6

Ok, this isn't the most elegant solution, but is is a proper "Simple Random Sample" and is very fast

/*

FAST Simple Random Sample from a source table using pseudo random number generator RAND()

Usage:

Sample Size is set following the "select top"

Copy and paste more random value subqueries than the sample size

If the sample size is more than 20% of the source table size, you may have to include many time the sample size of random value subqueries

and if more than 50%, consider changing the query to exlude the random rows.

Modify the @SourceRowCount and the first part of the query for your source table name (dbo.phrases in this example)

A random seed is used, but a fixed one can be used if desired

Advantages:

Does not use NEWID for randomness, which is not technically a Random Number Genrator

Is FAST, even when the source table is large Select 10 from 10 million in 6 seconds, 1000 in 8 seconds

Does not rely on data page assumptions (see TABLESAMPLE)

Disadvantages:

Not TRUE random numbers

See Also:

http://msdn.microsoft.com/en-us/library/aa175776(SQL.80).aspx

Presents three methods for random samples

1) Cursors and RAND() Cursor Overhead

2) Rand( ROW ID) Bad randomness

3) NewID

http://msdn.microsoft.com/en-us/library/ms189108.aspx

Documentation for TABLESAMPLE

1) Can only be used on true tables, not linked, views, XML, etc

2) Samples pages, not rows, if pages contain correlated row its no longer a simple random sample

*/

/*

Find the number of rows in the source table

*/

declare @SourceRowCount integer

select @SourceRowCount=COUNT(*) from dbo.phrases

/*

Initialize the random number generator with a random seed (or a particular seed)

*/

declare @d integer

set @d=RAND() --set @d=RAND(an integer)

/*

Create Source table from the desired table and ROW_NUMBER()

Join Source to a table of random values between

Select top n from a slightly larger set

*/

select * from

(select *, ROW_NUMBER() over(order by GETDATE()) as Row from dbo.phrases)

as source,

(select top 10 n from

(select convert(integer, RAND()*@SourceRowCount) as n union all

select convert(integer, RAND()*@SourceRowCount) as n union all

select convert(integer, RAND()*@SourceRowCount) as n)

as tmp)

as R where source.Row=R.n

Paul White SSC Guru Points: 150468 More actions · Answer 7

DannyS (3/14/2010)
Ok, this isn't the most elegant solution, but is is a proper "Simple Random Sample" and is very fast

Yes it is neat. If you are interested in a more generic version, but using the same basic idea, check this active thread out:

http://www.sqlservercentral.com/Forums/FindPost882261.aspx

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

jtrudo Ten Centuries Points: 1168 More actions · Answer 8

A nice alternative to RAND and one of which I wasn't aware. Thanks!

John Trudo

Randomizing Result Sets with NEWID

Cookies on SQLServerCentral