Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Getting Random Results Expand / Collapse
Author
Message
Posted Thursday, August 9, 2007 11:53 AM
SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: Moderators
Last Login: Today @ 2:51 PM
Points: 6,790, Visits: 1,902
Comments posted here are about the content posted at http://www.sqlservercentral.com/columnists/awarren/3175.asp

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter
Post #389404
Posted Wednesday, August 22, 2007 9:59 PM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, November 19, 2007 4:44 AM
Points: 175, Visits: 2
How about this?
http://askgabbar.blogspot.com/2007/03/pick-up-random-records-from-table.html

Didn't check it for performance though.
Post #393059
Posted Wednesday, August 22, 2007 10:39 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, November 1, 2012 1:18 PM
Points: 110, Visits: 261
I had a need for pseudo randomness (display a few "random" images from a gallery) that I addressed thusly:

-Assign a random value to each row as a column value and have a lookup index on this.
-Each time I want 10 random images, I grab TOP 10 order by abs(@currentRandom - persistedRandom) ASC. So this gives me the nearest 10 rows to whatever my random seek point is. The problem is you will get clusters of results for a given set of persistedRandom values.

I'm sure there are better ways but this worked decent for my needs.
Post #393063
Posted Thursday, August 23, 2007 2:51 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, May 23, 2013 1:20 PM
Points: 36, Visits: 40

I have also not had time to benchmark performance, but here is another technique that gives the illusion of randomisation:

ORDER BY SUBSTRING(REVERSE(REPLACE(table.textcolumn,' ','')),convert(int,rand()*10),1) ASC

This uses the text in a column from the query, reverses it and removes spaces, and then extracts a randomly selected character to order the results.

Obviously this is limited to situations where you have a text column to work with...




Post #393116
Posted Thursday, August 23, 2007 6:50 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, April 5, 2011 8:25 AM
Points: 1, Visits: 78
how do you calculate the reads?
Post #393197
Posted Thursday, August 23, 2007 7:45 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, January 30, 2014 10:08 PM
Points: 1,038, Visits: 444

Minor point... At the end of the article where you say you could set the uniqueID column as a pkey I think you mean to set it as a clustered index thus allowing returning of rows in the GUID order without any sorting effort.  Common to mix up the pkeys & the clustered index of a table as they often tend to be one and the same if the defaults are accepted

It is tricky getting random data in a set-based fashion.  Perhaps you could do something along the lines of

1. Having a unique row number for each row in your table without gaps - an identity column would be perfect
2. Indexing by this column either clustered or using at least a covering index
3. Use some mechanism to generate row numbers to select. Perhaps the pairs of digits in a generated GUID (or triplets, quads, etc depending on how many rows you have)....

Step #3 might be the lengthy & wasteful bit though.....  Will give it some more thought

 




Post #393240
Posted Thursday, August 23, 2007 8:00 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 1:53 PM
Points: 35,366, Visits: 31,905

When I first started to read this, I thought "Jeez... that's a little obvious, Andy.  Hardly worth an article."   Now that I see how some folks are trying to randomize data, I have to admit that such a short and sweet article about it is well overdue.

I don't believe that anyone will come up with a method that is either faster or more consistently random than just doing the ORDER BY NEWID()... but it is fun to watch folks try.



--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #393256
Posted Thursday, August 23, 2007 8:36 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Tuesday, May 29, 2012 11:22 AM
Points: 1,755, Visits: 4,652

Using "ORDER BY CHECKSUM(NEWID())" rather than "ORDER BY NEWID()" apparently produces values with a better random distribution.




Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.
Post #393300
Posted Thursday, August 23, 2007 9:23 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, November 1, 2012 1:18 PM
Points: 110, Visits: 261
Ian,
I think this is a good approach and it is actually what some coworkers and I ended up with awhile back (more them than me ). Our problem statement involved noncontiguous ranges, so using an identity column was not sufficient. The solution was to use row_number() ranking function and join against a temp table of random values.

I find that with 3.3 MM rows, this approach is about half the cpu and twice as fast as doing order by newid() for this contrived test. Subsequent executions take about 20% of the cpu that order by newid(), and still about twice as fast.

So when you have many rows and a noncontiguous key to index off of, this approach appears to be superior if not as simple.

-- BEGIN TEST PREP
create table test_table (number int primary key clustered, payload varchar(1));

WITH
-- first CTE which returns 10 rows (0-9)
digits AS (
SELECT 0 as Number
UNION SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
)
-- second CTE which returns 10 million rows by using
-- a CROSS JOIN on the first CTE
, dig AS (
SELECT
(millions.Number * 1000000)
+ (hThousands.Number * 100000)
+ (tThousands.Number * 10000)
+ (thousands.Number * 1000)
+ (hundreds.Number * 100)
+ (tens.Number * 10)
+ ones.Number AS Number
FROM digits AS ones
CROSS JOIN digits AS tens
CROSS JOIN digits AS hundreds
CROSS JOIN digits AS thousands
CROSS JOIN digits AS tThousands
CROSS JOIN digits AS hThousands
CROSS JOIN digits as millions
)
INSERT test_table(number, payload)
SELECT number, 'z'
FROM dig
WHERE number % 3 = 0
go

--END TEST PREP

set statistics time on
set statistics io on


--BEGIN TEST 1
select top 10 number, payload from test_table order by newid();
GO 2
--END TEST 1

--BEGIN TEST 2

DECLARE @tv table(rand_num int)
DECLARE @max int, @num int
select @num=10, @max = count(*) from test_table;

DECLARE @i int
SET @i = 0
WHILE (@i < @num)
BEGIN
INSERT @tv values (ceiling(rand() * @max))
SET @i = @i + 1
END

SELECT *
FROM
(
select
row_number() over (order by number) row,
*
from test_table
) a
WHERE a.row in (SELECT rand_num from @tv)
GO 2
--END TEST 2
Post #393347
Posted Thursday, August 23, 2007 10:28 AM
SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: Moderators
Last Login: Today @ 2:51 PM
Points: 6,790, Visits: 1,902
Jeff, I'll grant it is a simple topic - I figured by now I'd get clobbered by some statistics major for even using the word random! Maybe someone with a stats background could analyze all the options posed to see which is most random/best random?

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter
Post #393372
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse