Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Getting Random Results


Getting Random Results

Author
Message
Andy Warren
Andy Warren
SSCertifiable
SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)

Group: Moderators
Points: 7261 Visits: 2680
Comments posted here are about the content posted at http://www.sqlservercentral.com/columnists/awarren/3175.asp

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter
~Prasad Puranik
~Prasad Puranik
SSC-Enthusiastic
SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)

Group: General Forum Members
Points: 175 Visits: 2
How about this?
http://askgabbar.blogspot.com/2007/03/pick-up-random-records-from-table.html

Didn't check it for performance though.
Adrian Hains
Adrian Hains
SSC-Enthusiastic
SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)

Group: General Forum Members
Points: 112 Visits: 261
I had a need for pseudo randomness (display a few "random" images from a gallery) that I addressed thusly:

-Assign a random value to each row as a column value and have a lookup index on this.
-Each time I want 10 random images, I grab TOP 10 order by abs(@currentRandom - persistedRandom) ASC. So this gives me the nearest 10 rows to whatever my random seek point is. The problem is you will get clusters of results for a given set of persistedRandom values.

I'm sure there are better ways but this worked decent for my needs.
bredins
bredins
SSC Rookie
SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)

Group: General Forum Members
Points: 36 Visits: 40

I have also not had time to benchmark performance, but here is another technique that gives the illusion of randomisation:

ORDER BY SUBSTRING(REVERSE(REPLACE(table.textcolumn,' ','')),convert(int,rand()*10),1) ASC

This uses the text in a column from the query, reverses it and removes spaces, and then extracts a randomly selected character to order the results.

Obviously this is limited to situations where you have a text column to work with...





james zhang-339284
james zhang-339284
Forum Newbie
Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)

Group: General Forum Members
Points: 1 Visits: 92
how do you calculate the reads?
Ian Yates
Ian Yates
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1058 Visits: 445

Minor point... At the end of the article where you say you could set the uniqueID column as a pkey I think you mean to set it as a clustered index thus allowing returning of rows in the GUID order without any sorting effort. Common to mix up the pkeys & the clustered index of a table as they often tend to be one and the same if the defaults are accepted

It is tricky getting random data in a set-based fashion. Perhaps you could do something along the lines of

1. Having a unique row number for each row in your table without gaps - an identity column would be perfect
2. Indexing by this column either clustered or using at least a covering index
3. Use some mechanism to generate row numbers to select. Perhaps the pairs of digits in a generated GUID (or triplets, quads, etc depending on how many rows you have)....

Step #3 might be the lengthy & wasteful bit though..... Will give it some more thought





Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45388 Visits: 39940

When I first started to read this, I thought "Jeez... that's a little obvious, Andy. Hardly worth an article." Now that I see how some folks are trying to randomize data, I have to admit that such a short and sweet article about it is well overdue.

I don't believe that anyone will come up with a method that is either faster or more consistently random than just doing the ORDER BY NEWID()... but it is fun to watch folks try.



--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
RyanRandall
RyanRandall
SSCommitted
SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)

Group: General Forum Members
Points: 1761 Visits: 4652

Using "ORDER BY CHECKSUM(NEWID())" rather than "ORDER BY NEWID()" apparently produces values with a better random distribution.



Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.
Adrian Hains
Adrian Hains
SSC-Enthusiastic
SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)

Group: General Forum Members
Points: 112 Visits: 261
Ian,
I think this is a good approach and it is actually what some coworkers and I ended up with awhile back (more them than me ). Our problem statement involved noncontiguous ranges, so using an identity column was not sufficient. The solution was to use row_number() ranking function and join against a temp table of random values.

I find that with 3.3 MM rows, this approach is about half the cpu and twice as fast as doing order by newid() for this contrived test. Subsequent executions take about 20% of the cpu that order by newid(), and still about twice as fast.

So when you have many rows and a noncontiguous key to index off of, this approach appears to be superior if not as simple.

-- BEGIN TEST PREP
create table test_table (number int primary key clustered, payload varchar(1));

WITH
-- first CTE which returns 10 rows (0-9)
digits AS (
SELECT 0 as Number
UNION SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
)
-- second CTE which returns 10 million rows by using
-- a CROSS JOIN on the first CTE
, dig AS (
SELECT
(millions.Number * 1000000)
+ (hThousands.Number * 100000)
+ (tThousands.Number * 10000)
+ (thousands.Number * 1000)
+ (hundreds.Number * 100)
+ (tens.Number * 10)
+ ones.Number AS Number
FROM digits AS ones
CROSS JOIN digits AS tens
CROSS JOIN digits AS hundreds
CROSS JOIN digits AS thousands
CROSS JOIN digits AS tThousands
CROSS JOIN digits AS hThousands
CROSS JOIN digits as millions
)
INSERT test_table(number, payload)
SELECT number, 'z'
FROM dig
WHERE number % 3 = 0
go

--END TEST PREP

set statistics time on
set statistics io on


--BEGIN TEST 1
select top 10 number, payload from test_table order by newid();
GO 2
--END TEST 1

--BEGIN TEST 2

DECLARE @tv table(rand_num int)
DECLARE @max int, @num int
select @num=10, @max = count(*) from test_table;

DECLARE @i int
SET @i = 0
WHILE (@i < @num)
BEGIN
INSERT @tv values (ceiling(rand() * @max))
SET @i = @i + 1
END

SELECT *
FROM
(
select
row_number() over (order by number) row,
*
from test_table
) a
WHERE a.row in (SELECT rand_num from @tv)
GO 2
--END TEST 2
Andy Warren
Andy Warren
SSCertifiable
SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)

Group: Moderators
Points: 7261 Visits: 2680
Jeff, I'll grant it is a simple topic - I figured by now I'd get clobbered by some statistics major for even using the word random! Maybe someone with a stats background could analyze all the options posed to see which is most random/best random?:-)

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search