|
|
|
SSCertifiable
       
Group: Moderators
Last Login: Thursday, May 09, 2013 12:38 PM
Points: 6,462,
Visits: 1,384
|
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Monday, November 19, 2007 4:44 AM
Points: 175,
Visits: 2
|
|
How about this? http://askgabbar.blogspot.com/2007/03/pick-up-random-records-from-table.html
Didn't check it for performance though.
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Thursday, November 01, 2012 1:18 PM
Points: 110,
Visits: 261
|
|
I had a need for pseudo randomness (display a few "random" images from a gallery) that I addressed thusly:
-Assign a random value to each row as a column value and have a lookup index on this. -Each time I want 10 random images, I grab TOP 10 order by abs(@currentRandom - persistedRandom) ASC. So this gives me the nearest 10 rows to whatever my random seek point is. The problem is you will get clusters of results for a given set of persistedRandom values.
I'm sure there are better ways but this worked decent for my needs.
|
|
|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Yesterday @ 1:20 PM
Points: 36,
Visits: 40
|
|
I have also not had time to benchmark performance, but here is another technique that gives the illusion of randomisation: ORDER BY SUBSTRING(REVERSE(REPLACE(table.textcolumn,' ','')),convert(int,rand()*10),1) ASCThis uses the text in a column from the query, reverses it and removes spaces, and then extracts a randomly selected character to order the results. Obviously this is limited to situations where you have a text column to work with...
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Tuesday, April 05, 2011 8:25 AM
Points: 1,
Visits: 78
|
|
| how do you calculate the reads?
|
|
|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Sunday, September 16, 2012 3:26 AM
Points: 1,038,
Visits: 443
|
|
Minor point... At the end of the article where you say you could set the uniqueID column as a pkey I think you mean to set it as a clustered index thus allowing returning of rows in the GUID order without any sorting effort. Common to mix up the pkeys & the clustered index of a table as they often tend to be one and the same if the defaults are accepted  It is tricky getting random data in a set-based fashion. Perhaps you could do something along the lines of 1. Having a unique row number for each row in your table without gaps - an identity column would be perfect 2. Indexing by this column either clustered or using at least a covering index 3. Use some mechanism to generate row numbers to select. Perhaps the pairs of digits in a generated GUID (or triplets, quads, etc depending on how many rows you have).... Step #3 might be the lengthy & wasteful bit though..... Will give it some more thought 
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 1:24 PM
Points: 32,913,
Visits: 26,806
|
|
When I first started to read this, I thought "Jeez... that's a little obvious, Andy. Hardly worth an article." Now that I see how some folks are trying to randomize data, I have to admit that such a short and sweet article about it is well overdue.  I don't believe that anyone will come up with a method that is either faster or more consistently random than just doing the ORDER BY NEWID()... but it is fun to watch folks try. 
--Jeff Moden "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".
First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Tuesday, May 29, 2012 11:22 AM
Points: 1,755,
Visits: 4,652
|
|
Using "ORDER BY CHECKSUM(NEWID())" rather than "ORDER BY NEWID()" apparently produces values with a better random distribution.
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Thursday, November 01, 2012 1:18 PM
Points: 110,
Visits: 261
|
|
Ian, I think this is a good approach and it is actually what some coworkers and I ended up with awhile back (more them than me ). Our problem statement involved noncontiguous ranges, so using an identity column was not sufficient. The solution was to use row_number() ranking function and join against a temp table of random values.
I find that with 3.3 MM rows, this approach is about half the cpu and twice as fast as doing order by newid() for this contrived test. Subsequent executions take about 20% of the cpu that order by newid(), and still about twice as fast.
So when you have many rows and a noncontiguous key to index off of, this approach appears to be superior if not as simple.
-- BEGIN TEST PREP create table test_table (number int primary key clustered, payload varchar(1));
WITH -- first CTE which returns 10 rows (0-9) digits AS ( SELECT 0 as Number UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 ) -- second CTE which returns 10 million rows by using -- a CROSS JOIN on the first CTE , dig AS ( SELECT (millions.Number * 1000000) + (hThousands.Number * 100000) + (tThousands.Number * 10000) + (thousands.Number * 1000) + (hundreds.Number * 100) + (tens.Number * 10) + ones.Number AS Number FROM digits AS ones CROSS JOIN digits AS tens CROSS JOIN digits AS hundreds CROSS JOIN digits AS thousands CROSS JOIN digits AS tThousands CROSS JOIN digits AS hThousands CROSS JOIN digits as millions ) INSERT test_table(number, payload) SELECT number, 'z' FROM dig WHERE number % 3 = 0 go
--END TEST PREP
set statistics time on set statistics io on
--BEGIN TEST 1 select top 10 number, payload from test_table order by newid(); GO 2 --END TEST 1
--BEGIN TEST 2
DECLARE @tv table(rand_num int) DECLARE @max int, @num int select @num=10, @max = count(*) from test_table;
DECLARE @i int SET @i = 0 WHILE (@i < @num) BEGIN INSERT @tv values (ceiling(rand() * @max)) SET @i = @i + 1 END
SELECT * FROM ( select row_number() over (order by number) row, * from test_table ) a WHERE a.row in (SELECT rand_num from @tv) GO 2 --END TEST 2
|
|
|
|
|
SSCertifiable
       
Group: Moderators
Last Login: Thursday, May 09, 2013 12:38 PM
Points: 6,462,
Visits: 1,384
|
|
|
|
|