

SSCertifiable
Group: Moderators
Last Login: Thursday, July 2, 2015 10:15 PM
Points: 6,899,
Visits: 2,152





SSCEnthusiastic
Group: General Forum Members
Last Login: Monday, November 19, 2007 4:44 AM
Points: 175,
Visits: 2


How about this? http://askgabbar.blogspot.com/2007/03/pickuprandomrecordsfromtable.html
Didn't check it for performance though.




SSCEnthusiastic
Group: General Forum Members
Last Login: Thursday, November 1, 2012 1:18 PM
Points: 110,
Visits: 261


I had a need for pseudo randomness (display a few "random" images from a gallery) that I addressed thusly:
Assign a random value to each row as a column value and have a lookup index on this. Each time I want 10 random images, I grab TOP 10 order by abs(@currentRandom  persistedRandom) ASC. So this gives me the nearest 10 rows to whatever my random seek point is. The problem is you will get clusters of results for a given set of persistedRandom values.
I'm sure there are better ways but this worked decent for my needs.




SSC Rookie
Group: General Forum Members
Last Login: Thursday, May 23, 2013 1:20 PM
Points: 36,
Visits: 40


I have also not had time to benchmark performance, but here is another technique that gives the illusion of randomisation: ORDER BY SUBSTRING(REVERSE(REPLACE(table.textcolumn,' ','')),convert(int,rand()*10),1) ASC This uses the text in a column from the query, reverses it and removes spaces, and then extracts a randomly selected character to order the results. Obviously this is limited to situations where you have a text column to work with...




Forum Newbie
Group: General Forum Members
Last Login: Tuesday, April 5, 2011 8:25 AM
Points: 1,
Visits: 78


how do you calculate the reads?




Ten Centuries
Group: General Forum Members
Last Login: Wednesday, January 14, 2015 8:59 PM
Points: 1,038,
Visits: 445


Minor point... At the end of the article where you say you could set the uniqueID column as a pkey I think you mean to set it as a clustered index thus allowing returning of rows in the GUID order without any sorting effort. Common to mix up the pkeys & the clustered index of a table as they often tend to be one and the same if the defaults are accepted It is tricky getting random data in a setbased fashion. Perhaps you could do something along the lines of 1. Having a unique row number for each row in your table without gaps  an identity column would be perfect 2. Indexing by this column either clustered or using at least a covering index 3. Use some mechanism to generate row numbers to select. Perhaps the pairs of digits in a generated GUID (or triplets, quads, etc depending on how many rows you have).... Step #3 might be the lengthy & wasteful bit though..... Will give it some more thought




SSCDedicated
Group: General Forum Members
Last Login: Yesterday @ 9:10 PM
Points: 37,867,
Visits: 34,744





SSCommitted
Group: General Forum Members
Last Login: Tuesday, May 29, 2012 11:22 AM
Points: 1,755,
Visits: 4,652


Using "ORDER BY CHECKSUM(NEWID())" rather than "ORDER BY NEWID()" apparently produces values with a better random distribution.
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.




SSCEnthusiastic
Group: General Forum Members
Last Login: Thursday, November 1, 2012 1:18 PM
Points: 110,
Visits: 261


Ian, I think this is a good approach and it is actually what some coworkers and I ended up with awhile back (more them than me ). Our problem statement involved noncontiguous ranges, so using an identity column was not sufficient. The solution was to use row_number() ranking function and join against a temp table of random values.
I find that with 3.3 MM rows, this approach is about half the cpu and twice as fast as doing order by newid() for this contrived test. Subsequent executions take about 20% of the cpu that order by newid(), and still about twice as fast.
So when you have many rows and a noncontiguous key to index off of, this approach appears to be superior if not as simple.
 BEGIN TEST PREP create table test_table (number int primary key clustered, payload varchar(1));
WITH  first CTE which returns 10 rows (09) digits AS ( SELECT 0 as Number UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 )  second CTE which returns 10 million rows by using  a CROSS JOIN on the first CTE , dig AS ( SELECT (millions.Number * 1000000) + (hThousands.Number * 100000) + (tThousands.Number * 10000) + (thousands.Number * 1000) + (hundreds.Number * 100) + (tens.Number * 10) + ones.Number AS Number FROM digits AS ones CROSS JOIN digits AS tens CROSS JOIN digits AS hundreds CROSS JOIN digits AS thousands CROSS JOIN digits AS tThousands CROSS JOIN digits AS hThousands CROSS JOIN digits as millions ) INSERT test_table(number, payload) SELECT number, 'z' FROM dig WHERE number % 3 = 0 go
END TEST PREP
set statistics time on set statistics io on
BEGIN TEST 1 select top 10 number, payload from test_table order by newid(); GO 2 END TEST 1
BEGIN TEST 2
DECLARE @tv table(rand_num int) DECLARE @max int, @num int select @num=10, @max = count(*) from test_table;
DECLARE @i int SET @i = 0 WHILE (@i < @num) BEGIN INSERT @tv values (ceiling(rand() * @max)) SET @i = @i + 1 END
SELECT * FROM ( select row_number() over (order by number) row, * from test_table ) a WHERE a.row in (SELECT rand_num from @tv) GO 2 END TEST 2




SSCertifiable
Group: Moderators
Last Login: Thursday, July 2, 2015 10:15 PM
Points: 6,899,
Visits: 2,152




