﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / Article Discussions / Article Discussions by Author / Discuss content posted by Seth Delconte  / Randomizing Result Sets with NEWID / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Tue, 21 May 2013 17:35:53 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>A nice alternative to RAND and one of which I wasn't aware.  Thanks!</description><pubDate>Tue, 23 Mar 2010 14:52:45 GMT</pubDate><dc:creator>jtrudo</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>[quote][b]DannyS (3/14/2010)[/b][hr]Ok, this isn't the most elegant solution, but is is a proper "Simple Random Sample" and is very fast[/quote]Yes it is neat.  If you are interested in a more generic version, but using the same basic idea, check this active thread out:[url]http://www.sqlservercentral.com/Forums/FindPost882261.aspx[/url]</description><pubDate>Sun, 14 Mar 2010 22:47:29 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>Ok, this isn't the most elegant solution, but is is a proper "Simple Random Sample" and is very fast[code="sql"]/*FAST Simple Random Sample from a source table using pseudo random number generator RAND() Usage:Sample Size is set following the "select top" Copy and paste more random value subqueries than the sample size	If the sample size is more than 20% of the source table size, you may have to include many time the sample size of random value subqueries		and if more than 50%, consider changing the query to exlude the random rows.Modify the @SourceRowCount and the first part of the query for your source table name (dbo.phrases in this example)A random seed is used, but a fixed one can be used if desired		Advantages:	Does not use NEWID for randomness, which is not technically a Random Number Genrator	Is FAST, even when the source table is large Select 10 from 10 million in 6 seconds, 1000 in 8 seconds	Does not rely on data page assumptions (see TABLESAMPLE)		Disadvantages:	Not TRUE random numbers		See Also:http://msdn.microsoft.com/en-us/library/aa175776(SQL.80).aspx Presents three methods for random samples		1) Cursors and RAND()   Cursor Overhead		2) Rand( ROW ID)        Bad randomness		3) NewID        http://msdn.microsoft.com/en-us/library/ms189108.aspxDocumentation for TABLESAMPLE				1) Can only be used on true tables, not linked, views, XML, etc		2) Samples pages, not rows, if pages contain correlated row its no longer a simple random sample		        *//*Find the number of rows in the source table*/declare @SourceRowCount integerselect @SourceRowCount=COUNT(*) from dbo.phrases/*Initialize the random number generator with a random seed (or a particular seed)*/declare @d integerset @d=RAND()      --set @d=RAND(an integer)/*Create Source table from the desired table and ROW_NUMBER()Join Source to a table of random values betweenSelect top n from a slightly larger set*/select * from     (select *, ROW_NUMBER() over(order by GETDATE()) as Row from dbo.phrases) as source,(select top 10 n from(select convert(integer, RAND()*@SourceRowCount) as n union allselect convert(integer, RAND()*@SourceRowCount) as n union allselect convert(integer, RAND()*@SourceRowCount) as n union allselect convert(integer, RAND()*@SourceRowCount) as n union allselect convert(integer, RAND()*@SourceRowCount) as n union allselect convert(integer, RAND()*@SourceRowCount) as n union allselect convert(integer, RAND()*@SourceRowCount) as n union allselect convert(integer, RAND()*@SourceRowCount) as n union allselect convert(integer, RAND()*@SourceRowCount) as n union allselect convert(integer, RAND()*@SourceRowCount) as n union allselect convert(integer, RAND()*@SourceRowCount) as n union allselect convert(integer, RAND()*@SourceRowCount) as n union allselect convert(integer, RAND()*@SourceRowCount) as n union allselect convert(integer, RAND()*@SourceRowCount) as n union allselect convert(integer, RAND()*@SourceRowCount) as n union allselect convert(integer, RAND()*@SourceRowCount) as n union allselect convert(integer, RAND()*@SourceRowCount) as n union allselect convert(integer, RAND()*@SourceRowCount) as n union allselect convert(integer, RAND()*@SourceRowCount) as n union allselect convert(integer, RAND()*@SourceRowCount) as n) as tmp)as R where source.Row=R.n[/code]</description><pubDate>Sun, 14 Mar 2010 22:11:03 GMT</pubDate><dc:creator>DannyS</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>[quote][b]rd_in_sd (3/14/2010)[/b][hr]Using GUID's is fine, but when the primary key is a clustered index, then you end up with half-or-less empty data pages which creates a lot of I/O and useless, partially empty data pages.  When you use a GUID as a primary key, make sure it isn't a clustered index or you'll get dog-poor performance.[/quote]This point has been raised numerous times already in this thread...and the answer is always the same: use the NEWSEQUENTIALID in a default constraint to avoid this issue completely.  Full pages, minimal page-splits...globally-unique goodness.</description><pubDate>Sun, 14 Mar 2010 21:41:40 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>You're missing one really important point.Using GUID's is fine, but when the primary key is a clustered index, then you end up with half-or-less empty data pages which creates a lot of I/O and useless, partially empty data pages.When you use a GUID as a primary key, make sure it isn't a clustered index or you'll get dog-poor performance.Ron</description><pubDate>Sun, 14 Mar 2010 15:02:04 GMT</pubDate><dc:creator>rd_in_sd</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>[quote][b]GabyYYZ (3/9/2010)[/b][hr][quote][b]Seth Delconte (3/8/2010)[/b][hr][quote][b]nick.mcdermaid (3/7/2010)[/b][hr]Someone appears to have drawn some inspiration from your article.http://subhrosaha.wordpress.com/[/quote]Yes it certainly looks that way - even the reference links are the same.  A little credit would have been nice![/quote]He has been [url=http://subhrosaha.wordpress.com/2010/03/01/randomizing-the-result-set-with-newid/#comment-6]served[/url].EDIT: The comment I submitted is awaiting moderation.  If he rejects it, this is what I wrote[i]At least properly credit the appropriate article if you’re going to copy from it (not the author of this original article but on the same forum)[/i] &amp;lt;link back to the article here&amp;gt;[/quote]Ahh thank you Gaby!</description><pubDate>Tue, 09 Mar 2010 15:00:15 GMT</pubDate><dc:creator>seth delconte</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>[quote][b]Seth Delconte (3/8/2010)[/b][hr][quote][b]nick.mcdermaid (3/7/2010)[/b][hr]Someone appears to have drawn some inspiration from your article.http://subhrosaha.wordpress.com/[/quote]Yes it certainly looks that way - even the reference links are the same.  A little credit would have been nice![/quote]He has been [url=http://subhrosaha.wordpress.com/2010/03/01/randomizing-the-result-set-with-newid/#comment-6]served[/url].EDIT: The comment I submitted is awaiting moderation.  If he rejects it, this is what I wrote[i]At least properly credit the appropriate article if you’re going to copy from it (not the author of this original article but on the same forum)[/i] &amp;lt;link back to the article here&amp;gt;</description><pubDate>Tue, 09 Mar 2010 13:45:11 GMT</pubDate><dc:creator>GabyYYZ</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>[quote][b]nick.mcdermaid (3/7/2010)[/b][hr]Someone appears to have drawn some inspiration from your article.http://subhrosaha.wordpress.com/[/quote]Yes it certainly looks that way - even the reference links are the same.  A little credit would have been nice!</description><pubDate>Mon, 08 Mar 2010 06:29:30 GMT</pubDate><dc:creator>seth delconte</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>Someone appears to have drawn some inspiration from your article.http://subhrosaha.wordpress.com/</description><pubDate>Sun, 07 Mar 2010 16:21:59 GMT</pubDate><dc:creator>nick.mcdermaid</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>All that means from MSDN is that a random sample MAY be grouped.  Its still a random sample.Say you had 26 buckets - 1 for each letter of the alphabet - and they were filled with names and you wanted to choose a persons name randomly from one of those buckets.Tablesample + NewID() would still get you a random person.Tablesample would randomly get you one of the 26 letters and then newid() would get you a random person from that letter.Seems as random as any other method.[quote][b]sqlservercentral-1070393 (3/3/2010)[/b][hr]Mr Random again....tablesample limitations per msdn:Rows on individual pages of the table are not correlated with other rows on the same page.Never saw a database where that condition could be assumed. most are entered sequentially which is very likley to have correlations.newid() and any function of it is "too perfect" the nice properties, such as good distribution of digits, has to be built in. true random aren't so perfect exept in very large samples, and good psuedo random should be difficult to distinguish from true random.If your doing a lottery, or a statistical study, definately look for better solutions.[/quote]</description><pubDate>Wed, 03 Mar 2010 03:03:36 GMT</pubDate><dc:creator>Barry-193141</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>Mr Random again....tablesample limitations per msdn:Rows on individual pages of the table are not correlated with other rows on the same page.Never saw a database where that condition could be assumed. most are entered sequentially which is very likley to have correlations.newid() and any function of it is "too perfect" the nice properties, such as good distribution of digits, has to be built in. true random aren't so perfect exept in very large samples, and good psuedo random should be difficult to distinguish from true random.If your doing a lottery, or a statistical study, definately look for better solutions.</description><pubDate>Wed, 03 Mar 2010 01:49:53 GMT</pubDate><dc:creator>DannyS</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>[quote][b]Barry-193141 (3/3/2010)[/b][hr]To counter performance issues, the easiest thing to do is add tablesample (10 percent) to the query.This way the newid() function only needs to run on an already randomized sample of 10 percent instead of against the entire data set.[/quote]Just to sure never to use the technique with small tables - you'll likely get no rows at all.One other point for the general discussion: If a good distribution of random values is important to you, ORDER BY CHECKSUM(NEWID()) is better in that respect.Paul</description><pubDate>Wed, 03 Mar 2010 01:11:59 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>To counter performance issues, the easiest thing to do is add tablesample (10 percent) to the query.This way the newid() function only needs to run on an already randomized sample of 10 percent instead of against the entire data set.Barry</description><pubDate>Wed, 03 Mar 2010 00:53:30 GMT</pubDate><dc:creator>Barry-193141</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>[quote][b]Paul White (3/1/2010)[/b][hr][quote][b]GabyYYZ (3/1/2010)[/b][hr][quote]One option, especially if you have an indexed identity column on your source table, is to generate a separate table of random row numbers, create a clustered index on it, and join with the original table.[/quote]Nice idea.  Of course, the 'random' numbers are then a bit, er, 'fixed' aren't they?Can't believe you used a RBAR method to populate your table.  :-PFor smallish numbers of random rows, I prefer an approach very similar to the one posted by Gary earlier.It does require a table with a sequential ID, but that's pretty common - excepting those that like GUIDs as a PK *shudder*[/quote]LOL, just tried it like I suggested, it did NOT work.  Strange, so for now, newid() is still the best way. :-)</description><pubDate>Tue, 02 Mar 2010 07:55:15 GMT</pubDate><dc:creator>GabyYYZ</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>[quote][b]SQLBOT (3/1/2010)[/b][hr]I also forgot to say that there is the NEWSEQUENTIALID() function if you're going to batch load.That should wreck your cluidx a little less, too.[/quote]If NEWSEQUENTIALID() is the default value on a column, it would be great, but that's the only way to use it as it normally can't be generated on the fly.  To get around that, a kluge someone taught me once is:[code]create proc GenerateSequentialIDas  create table #temp_seqid (rowval uniqueidentifier default newsequentialid())  insert into #temp_seqid default values  select rowval from #temp_seqidgo[/code]</description><pubDate>Tue, 02 Mar 2010 07:35:42 GMT</pubDate><dc:creator>GabyYYZ</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>I'm curious after reading this thread...why would you assign a random number via RAND() to a varchar instead of simply using one of the numeric data types available?  If you are going to index or sort on a column, my recollection from a past read is that numeric data types are more efficient for indexing / sorting.Any thoughts based on experience from the group on this?</description><pubDate>Tue, 02 Mar 2010 07:31:59 GMT</pubDate><dc:creator>Martin Vrieze</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>[quote][b]Jonathan Kehayias (3/1/2010)[/b][hr][quote][b]SQLBOT (3/1/2010)[/b][hr]It's the random insertion, not the datatype that causes the problem. What's the differece if the data inserted is Johnson, Jonsonn, Johnsen or three guids?Under the hood, there's not a difference.[/quote]Craig,Respectfully, the rate of fragmenation partially depends on what the datatype is for the column.  If you are inserting random values into a varchar(8) column, the end impact for fragmentation would be different than a char(8), nchar(8) or nvarchar(8) column because the storage size is different for each so fragmentation rates would be different.  A GUID is 16 bytes so it takes more space = fuller pages faster = more page splits = faster fragmentation rates.Your point is accurate, just playing semantics with you is all.   ;-)[/quote]Hey, that's a great point.If one were to write up a list of best/worst clustered index keys, I think the guid will fall somewhere in the middle... that's all I'm saying.  Worst would be (I think) a long composite key based on random insertions for the reasons we both pointed out.I smell another article coming on!PM me if you want to contribute.Thanks,~Craig</description><pubDate>Tue, 02 Mar 2010 07:20:39 GMT</pubDate><dc:creator>SQLBOT</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>Thanks for the article.</description><pubDate>Mon, 01 Mar 2010 23:03:16 GMT</pubDate><dc:creator>SQLRNNR</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>[quote][b]adish (3/1/2010)[/b][hr]It [i]is[/i] clustered. Am I doing something terribly wrong? Some light on this please, I am not an expert in SQL and would appreciate guidance from SQL gurus.Thanks[/quote]adish,Read the blog post on the link I provided by Paul Randal.  It explains why having a clustered index/primary key on GUID is suboptimal.</description><pubDate>Mon, 01 Mar 2010 20:13:11 GMT</pubDate><dc:creator>Jonathan Kehayias</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>[quote][b]Paul White (3/1/2010)[/b][hr][quote][b]Jonathan Kehayias (3/1/2010)[/b][hr]It probably was clustered, its common for App Developers to do this kind of thing.  It happened at Microsoft around the Windows 7 RC downloads...[/quote]Ah yes, I remember that one :-DHighly amusing at the time...[/quote]It [i]is[/i] clustered. Am I doing something terribly wrong? Some light on this please, I am not an expert in SQL and would appreciate guidance from SQL gurus.Thanks</description><pubDate>Mon, 01 Mar 2010 20:07:09 GMT</pubDate><dc:creator>adish</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>[quote][b]SQLBOT (3/1/2010)[/b][hr]It's the random insertion, not the datatype that causes the problem. What's the differece if the data inserted is Johnson, Jonsonn, Johnsen or three guids?Under the hood, there's not a difference.[/quote]Craig,Respectfully, the rate of fragmenation partially depends on what the datatype is for the column.  If you are inserting random values into a varchar(8) column, the end impact for fragmentation would be different than a char(8), nchar(8) or nvarchar(8) column because the storage size is different for each so fragmentation rates would be different.  A GUID is 16 bytes so it takes more space = fuller pages faster = more page splits = faster fragmentation rates.Your point is accurate, just playing semantics with you is all.   ;-)</description><pubDate>Mon, 01 Mar 2010 19:23:44 GMT</pubDate><dc:creator>Jonathan Kehayias</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>[quote][b]Jonathan Kehayias (3/1/2010)[/b][hr]It probably was clustered, its common for App Developers to do this kind of thing.  It happened at Microsoft around the Windows 7 RC downloads...[/quote]Ah yes, I remember that one :-DHighly amusing at the time...</description><pubDate>Mon, 01 Mar 2010 19:14:46 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>I also forgot to say that there is the NEWSEQUENTIALID() function if you're going to batch load.That should wreck your cluidx a little less, too.</description><pubDate>Mon, 01 Mar 2010 14:42:47 GMT</pubDate><dc:creator>SQLBOT</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>[quote][b]Paul White (3/1/2010)[/b][hr][quote][b]adish (3/1/2010)[/b][hr]I've been using GUID as PKs, but this is a novel way of using it. Great.[/quote]Not a clustered Primary Key, I trust?[/quote]obviously the super-best data type for a PK is one that increments by 1 each time so the index is appended on each insert and there is no fragmentation, Any other data type is just as likely to fragment as a guid.  I left-handedly proved this in one of my shamefully RBAR riddled articles: [url]http://www.sqlservercentral.com/articles/Indexing/64424/[/url]It's the random insertion, not the datatype that causes the problem. What's the differece if the data inserted is Johnson, Jonsonn, Johnsen or three guids?Under the hood, there's not a difference.The only way you're not going to frag your index at an equal rate as a guid is if the data is inserted in the same order as the clustered index key.  In OLTP, that's pretty unlikely except with and identity (which has it's own set of issues).~Craig O.</description><pubDate>Mon, 01 Mar 2010 14:36:53 GMT</pubDate><dc:creator>SQLBOT</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>[quote][b]Paul White (3/1/2010)[/b][hr][quote][b]adish (3/1/2010)[/b][hr]I've been using GUID as PKs, but this is a novel way of using it. Great.[/quote]Not a clustered Primary Key, I trust?[/quote]It probably was clustered, its common for App Developers to do this kind of thing.  It happened at Microsoft around the Windows 7 RC downloads...[url]http://www.sqlskills.com/BLOGS/PAUL/post/Why-did-the-Windows-7-RC-failure-happen.aspx[/url]</description><pubDate>Mon, 01 Mar 2010 12:06:07 GMT</pubDate><dc:creator>Jonathan Kehayias</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>[quote][b]GabyYYZ (3/1/2010)[/b][hr][quote]One option, especially if you have an indexed identity column on your source table, is to generate a separate table of random row numbers, create a clustered index on it, and join with the original table.[/quote]Nice idea.  Of course, the 'random' numbers are then a bit, er, 'fixed' aren't they?Can't believe you used a RBAR method to populate your table.  :-PFor smallish numbers of random rows, I prefer an approach very similar to the one posted by Gary earlier.It does require a table with a sequential ID, but that's pretty common - excepting those that like GUIDs as a PK *shudder*</description><pubDate>Mon, 01 Mar 2010 11:54:22 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>[quote][b]adish (3/1/2010)[/b][hr]I've been using GUID as PKs, but this is a novel way of using it. Great.[/quote]Not a clustered Primary Key, I trust?</description><pubDate>Mon, 01 Mar 2010 11:48:53 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>[quote][b]Neil Franken (3/1/2010)[/b][hr]Hi ThereCool article but here is what I have found with newid()....Right lets say the profile(salary) matches 1,5 million rows in the database. SQL server will return all 1,5 million rows then ORDER BY. Now I mentioned that a GUID is very very random. This causes a high cpu load on the server as the poor Server now has to first sort the 1,5 million rows then it can return the top 1000. Think about it. It first has to sort before it can return. I have tested this and it does not matter if I return 1 row or 750 000 out of the 1,5 million rows that matches the query it constantly runs at the same speed. The top can only be applied once the sorting is done. Granted for small tables and non mission critical queries this technique can work well I would not use it on large tables as you will create a bottle neck....[/quote]One option, especially if you have an indexed identity column on your source table, is to generate a separate table of random row numbers, create a clustered index on it, and join with the original table.[code]create table #lookup_table(row_num int)declare @ctr int, @samplesize intset @ctr = 0set @samplesize = 1000 -- for example, a sample size of 1,000 is neededwhile @ctr &amp;lt; @samplesizeBEGIN  insert into #lookup_table select abs(checksum(newid()))  set @ctr = @ctr + 1ENDcreate clustered index idxc on #lookup_table(row_num)[/code]Do a join on this table and it should go much more quickly, so the entire original table would not be loaded.  Not much chance there will be duplicate rows with this method as INT can be up to 2,147,483,647.</description><pubDate>Mon, 01 Mar 2010 07:51:28 GMT</pubDate><dc:creator>GabyYYZ</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>Using NEWID to do a random sort or grab a random number of rows from a result set is a HUGE performance killer and does not scale well. I've had developers slip this kind of stuff into production and in less than a minute the CPUs were pegged at 100%.Wile this method works, I do not recommend it on anything beyond one time ad-hoc DBA queries or infrequently used applications. You can sort randomly much more efficiently using RAND. Rather than type a lengthy explanation here of how I will submit an article.</description><pubDate>Mon, 01 Mar 2010 07:20:23 GMT</pubDate><dc:creator>Kendal Van Dyke</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>I like the concept of using a guid as a pseudo random number.  Very novel approach!!!I've always used auxillary tables in tempdb combined with forward-only cursors to assign unique, random numbers to each record for setting up direct marketing test panels...I generally do not use cursors but this was quite fast for my needs and direct marketing / database marketing queries are mostly ad-hoc in nature anyhow and my systems have not had to worry about performance hits like a production OLTP system would.This alternate approach you outline IS going to get tested.Well Done!!!</description><pubDate>Mon, 01 Mar 2010 06:52:47 GMT</pubDate><dc:creator>Martin Vrieze</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>The values generated have some nice properties, but aren't random.Converting the first 8 digits to an integer, hex2dec(left([guid],8)), the numbers have definate correlation with the previous generated value, and the 16th previous value, even when the generated ID are generated by deecidely unpredictable times (visitors to a website taking a particular action)In the prize example, every 16th person might have a 10% or more greater chance than the others.</description><pubDate>Mon, 01 Mar 2010 06:06:40 GMT</pubDate><dc:creator>DannyS</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>I've done quite a bit of testing of NEWID() and randomness and it is a very effective random number generator.As the GUID comprises of blocks of hexadecimal these blocks can be converted into integer values.I took a customer file where the PK was a GUID and based a partitioning scheme on a converted integer value and modulo 16 and it ended up with near ideal partition distribution.Try the followingDECLARE 	@Test char(4), 	@MyInt INT,	@SQL NVARCHAR(200),	@ParmDefinition NVARCHAR(200)SET @Test=LEFT(NEWID(),4)SET @SQL='SET @MyInt=CONVERT(INT,0x'+@Test+')'SET @ParmDefinition=N'@MyInt int OUTPUT'EXEC sp_Executesql 	@SQL,	@ParmDefinition,	@MyInt = @MyInt OUTPUTSELECT @Test,@MyInt</description><pubDate>Mon, 01 Mar 2010 02:04:28 GMT</pubDate><dc:creator>David.Poole</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>GUID and SQL Server function Newid() create globally unique identifiers. This is not they same as random and likely does not have very good random properties.One digit (16 bits) is used to id the algorithm. I'm sure a good many bits represent the time of generation.If your using SQL Server 2008, then CRYPT_GEN_RANDOM(n) (n= number of digits), creates a cryptographically secure pseudo random number. These are usually the best available without a true hardware random number generator and they do execute for each row, unlike rand()If your on an earlier version, but have 2008 available consider a linked query to obtain the numbers, or write a custom CLR ( see System.Security.Cryptography.RNGCryptoServiceProvider. )</description><pubDate>Mon, 01 Mar 2010 01:00:36 GMT</pubDate><dc:creator>DannyS</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>Id agree - newid() is a performance killer.its much faster to do something like this...[code="sql"]DECLARE @RandomNumber floatDECLARE @RandomInteger intDECLARE @MaxValue intDECLARE @MinValue intSELECT @MinValue = MIN(Id),         @MaxValue = MAX(Id)FROM dbo.SomeTableSELECT @RandomNumber = RAND()SELECT @RandomInteger = ((@MaxValue + 1) - @MinValue) * @RandomNumber + @MinValueSELECT TOP (1) *FROM dbo.SomeTableWHERE Id &amp;gt;= @RandomInteger[/code]</description><pubDate>Mon, 01 Mar 2010 00:57:34 GMT</pubDate><dc:creator>gary-915906</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>I've been using GUID as PKs, but this is a novel way of using it. Great.</description><pubDate>Mon, 01 Mar 2010 00:18:36 GMT</pubDate><dc:creator>adish</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>Hi ThereCool article but here is what I have found with newid().Doing a order by newid() is a performance killer. Let me explain. We have a table of 36 million prospective customers. We send leads to sales agent centres daily. They want random data. Great so we have been using order by newid() for ages. The problem is that is it extremely slow. Painfully slow. Here is why. A GUID returned by a newid() operation is essentially a very random number. Keep this fact in mind. In my daily tasks we send various amounts of leads out of our system for different centres. Basically we have a query like this(simplified the select for readability).SELECT  TOP 1000 leadname,contactdetails -- The TOP is variable per call centreFROM ProspectsWHERE Salary=&amp;gt;2500 -- we match our prospect profile hereRight lets say the profile(salary) matches 1,5 million rows in the database. SQL server will return all 1,5 million rows then ORDER BY. Now I mentioned that a GUID is very very random. This causes a high cpu load on the server as the poor Server now has to first sort the 1,5 million rows then it can return the top 1000. Think about it. It first has to sort before it can return. I have tested this and it does not matter if I return 1 row or 750 000 out of the 1,5 million rows that matches the query it constantly runs at the same speed. The top can only be applied once the sorting is done. Granted for small tables and non mission critical queries this technique can work well I would not use it on large tables as you will create a bottle neck.For larger tables it might worth randomizing the data on insert and not having to worry about the randomization during extraction. By the way newid() on a table as the clustered key is not a good idea as the fragmentation of your tables will remain consistently high.Hope that helps.Regards</description><pubDate>Mon, 01 Mar 2010 00:13:23 GMT</pubDate><dc:creator>Neil Franken</dc:creator></item><item><title>RE: Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>For the purpose of the article, you probably want to select all customers with order_count &amp;gt;= min(order_count of top 10).  That way you don't exclude customers that sort 'later' and might otherwise get excluded.</description><pubDate>Sun, 28 Feb 2010 23:02:37 GMT</pubDate><dc:creator>Norman Rasmussen</dc:creator></item><item><title>Randomizing Result Sets with NEWID</title><link>http://www.sqlservercentral.com/Forums/Topic874034-1700-1.aspx</link><description>Comments posted to this topic are about the item [B]&lt;A HREF="/articles/T-SQL/69111/"&gt;Randomizing Result Sets with NEWID&lt;/A&gt;[/B]</description><pubDate>Sat, 27 Feb 2010 11:44:15 GMT</pubDate><dc:creator>seth delconte</dc:creator></item></channel></rss>