Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««1234

Randomizing Result Sets with NEWID Expand / Collapse
Author
Message
Posted Monday, March 8, 2010 6:29 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Friday, February 21, 2014 7:54 AM
Points: 1,619, Visits: 1,233
nick.mcdermaid (3/7/2010)
Someone appears to have drawn some inspiration from your article.

http://subhrosaha.wordpress.com/



Yes it certainly looks that way - even the reference links are the same. A little credit would have been nice!


_________________________________
seth delconte
http://sqlkeys.com
Post #878552
Posted Tuesday, March 9, 2010 1:45 PM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Yesterday @ 3:30 PM
Points: 810, Visits: 2,120
Seth Delconte (3/8/2010)
nick.mcdermaid (3/7/2010)
Someone appears to have drawn some inspiration from your article.

http://subhrosaha.wordpress.com/



Yes it certainly looks that way - even the reference links are the same. A little credit would have been nice!

He has been served.

EDIT: The comment I submitted is awaiting moderation. If he rejects it, this is what I wrote
At least properly credit the appropriate article if you’re going to copy from it (not the author of this original article but on the same forum) <link back to the article here>


Gaby
________________________________________________________________
"In theory, theory and practice are the same. In practice, they are not."
- Albert Einstein
Post #879709
Posted Tuesday, March 9, 2010 3:00 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Friday, February 21, 2014 7:54 AM
Points: 1,619, Visits: 1,233
GabyYYZ (3/9/2010)
Seth Delconte (3/8/2010)
nick.mcdermaid (3/7/2010)
Someone appears to have drawn some inspiration from your article.

http://subhrosaha.wordpress.com/



Yes it certainly looks that way - even the reference links are the same. A little credit would have been nice!

He has been served.

EDIT: The comment I submitted is awaiting moderation. If he rejects it, this is what I wrote
At least properly credit the appropriate article if you’re going to copy from it (not the author of this original article but on the same forum) <link back to the article here>


Ahh thank you Gaby!




_________________________________
seth delconte
http://sqlkeys.com
Post #879776
Posted Sunday, March 14, 2010 3:02 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, February 7, 2011 7:26 PM
Points: 2, Visits: 28
You're missing one really important point.

Using GUID's is fine, but when the primary key is a clustered index, then you end up with half-or-less empty data pages which creates a lot of I/O and useless, partially empty data pages.

When you use a GUID as a primary key, make sure it isn't a clustered index or you'll get dog-poor performance.

Ron
Post #882622
Posted Sunday, March 14, 2010 9:41 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 1:55 AM
Points: 11,192, Visits: 11,095
rd_in_sd (3/14/2010)
Using GUID's is fine, but when the primary key is a clustered index, then you end up with half-or-less empty data pages which creates a lot of I/O and useless, partially empty data pages. When you use a GUID as a primary key, make sure it isn't a clustered index or you'll get dog-poor performance.

This point has been raised numerous times already in this thread...and the answer is always the same: use the NEWSEQUENTIALID in a default constraint to avoid this issue completely. Full pages, minimal page-splits...globally-unique goodness.




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #882681
Posted Sunday, March 14, 2010 10:11 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, August 23, 2010 9:31 AM
Points: 6, Visits: 57
Ok, this isn't the most elegant solution, but is is a proper "Simple Random Sample" and is very fast



/*
FAST Simple Random Sample from a source table using pseudo random number generator RAND()

Usage:
Sample Size is set following the "select top"

Copy and paste more random value subqueries than the sample size
If the sample size is more than 20% of the source table size, you may have to include many time the sample size of random value subqueries
and if more than 50%, consider changing the query to exlude the random rows.

Modify the @SourceRowCount and the first part of the query for your source table name (dbo.phrases in this example)

A random seed is used, but a fixed one can be used if desired


Advantages:
Does not use NEWID for randomness, which is not technically a Random Number Genrator
Is FAST, even when the source table is large Select 10 from 10 million in 6 seconds, 1000 in 8 seconds
Does not rely on data page assumptions (see TABLESAMPLE)


Disadvantages:
Not TRUE random numbers


See Also:
http://msdn.microsoft.com/en-us/library/aa175776(SQL.80).aspx
Presents three methods for random samples
1) Cursors and RAND() Cursor Overhead
2) Rand( ROW ID) Bad randomness
3) NewID


http://msdn.microsoft.com/en-us/library/ms189108.aspx
Documentation for TABLESAMPLE
1) Can only be used on true tables, not linked, views, XML, etc
2) Samples pages, not rows, if pages contain correlated row its no longer a simple random sample

*/


/*
Find the number of rows in the source table
*/
declare @SourceRowCount integer
select @SourceRowCount=COUNT(*) from dbo.phrases


/*
Initialize the random number generator with a random seed (or a particular seed)
*/
declare @d integer
set @d=RAND() --set @d=RAND(an integer)


/*
Create Source table from the desired table and ROW_NUMBER()
Join Source to a table of random values between
Select top n from a slightly larger set
*/

select * from
(select *, ROW_NUMBER() over(order by GETDATE()) as Row from dbo.phrases)
as source,
(select top 10 n from
(select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n)
as tmp)
as R where source.Row=R.n


Post #882695
Posted Sunday, March 14, 2010 10:47 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 1:55 AM
Points: 11,192, Visits: 11,095
DannyS (3/14/2010)
Ok, this isn't the most elegant solution, but is is a proper "Simple Random Sample" and is very fast

Yes it is neat. If you are interested in a more generic version, but using the same basic idea, check this active thread out:

http://www.sqlservercentral.com/Forums/FindPost882261.aspx




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #882709
Posted Tuesday, March 23, 2010 2:52 PM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Friday, July 25, 2014 7:40 AM
Points: 1,140, Visits: 650
A nice alternative to RAND and one of which I wasn't aware. Thanks!

Post #888540
« Prev Topic | Next Topic »

Add to briefcase «««1234

Permissions Expand / Collapse