Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Random rows generator: Expand / Collapse
Author
Message
Posted Tuesday, October 14, 2008 1:54 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, October 16, 2008 8:53 AM
Points: 22, Visits: 49

I have a table of 100 records. I need a random of 2.5% of the reocrds. If I use Select Top (2.5) Percent , I get the first 2.5 % rows. I want a random of 2.5 rows. Any suggestions how to go about doing it ?
Post #585758
Posted Tuesday, October 14, 2008 2:41 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Thursday, June 5, 2014 10:54 AM
Points: 9,902, Visits: 9,480
You can use TABLESAMPLE, like this:
SELECT *
FROM YourTable TABLESAMPLE (2.5 PERCENT)

However, BOL notes the following limitations of TABLESAMPLE:
You can use TABLESAMPLE to quickly return a sample from a large table when either of the following conditions is true:

The sample does not have to be a truly random sample at the level of individual rows.


Rows on individual pages of the table are not correlated with other rows on the same page.


If you really need a "random" sample then BOL suggests this:
Important:
If you really want a random sample of individual rows, modify your query to filter out rows randomly, instead of using TABLESAMPLE. For example, the following query uses the NEWID function to return approximately one percent of the rows of the Sales.SalesOrderDetail table:

SELECT * FROM Sales.SalesOrderDetail

WHERE 0.01 >= CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float)

/ CAST (0x7fffffff AS int)

The SalesOrderID column is included in the CHECKSUM expression so that NEWID() evaluates once per row to achieve sampling on a per-row basis. The expression CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float / CAST (0x7fffffff AS int) evaluates to a random float value between 0 and 1.



-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Post #585804
Posted Saturday, March 1, 2014 9:02 PM
SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Yesterday @ 11:56 AM
Points: 605, Visits: 1,410
Hi RBarryYoung,
Can you please explain use of 0x7fffffff ?




it helps to talk it out
Post #1546685
Posted Sunday, March 2, 2014 11:14 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:18 AM
Points: 36,751, Visits: 31,202
KoldCoffee (3/1/2014)
Hi RBarryYoung,
Can you please explain use of 0x7fffffff ?


I don't know why they did that in Books Online except, maybe, that it's easier to remember than 2147483647, which is the maximum postive number that an INT can contain.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1546717
Posted Sunday, March 2, 2014 11:30 AM
SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Yesterday @ 11:56 AM
Points: 605, Visits: 1,410

I don't know why they did that in Books Online except, maybe, that it's easier to remember than 2147483647, which is the maximum postive number that an INT can contain.


by which algorithm (trying to sound smart with that word) does 0x7fffffff convert to 2147483647?


it helps to talk it out
Post #1546719
Posted Sunday, March 2, 2014 11:33 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:18 AM
Points: 36,751, Visits: 31,202
Shifting gears, I find the formula in Books Online to be inaccurate. The number of rows that it returns varies quite a bit. The following will always return the same number of rows although they will be random. Yeah... it takes longer than the BOL formula but it does return an accurate number of rows in random order. It's also a hell of a lot easier to understand for most people.

SELECT TOP 2.5 PERCENT * FROM Sales.SalesOrderDetail 
ORDER BY NEWID()
;

And, yeah... I know this is a six year old thread but I'm answering the question from yesterday.



--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1546720
Posted Sunday, March 2, 2014 11:48 AM
SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Yesterday @ 11:56 AM
Points: 605, Visits: 1,410
Well, I thank you very much for advising on the NEWID() function.
I went searching for the answer to my question in an existing thread and found these. I hope it's ok to revive an old thread to see if there's any update since time has passed.


it helps to talk it out
Post #1546724
Posted Sunday, March 2, 2014 2:29 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:18 AM
Points: 36,751, Visits: 31,202
KoldCoffee (3/2/2014)

I don't know why they did that in Books Online except, maybe, that it's easier to remember than 2147483647, which is the maximum postive number that an INT can contain.


by which algorithm (trying to sound smart with that word) does 0x7fffffff convert to 2147483647?


It's not an algorithm. It's a simple Hex number once you remove the "0x" hex indicator. From right to left, each character represents a power of 16 (starting at 0) just like each character in a decimal number represents a power of 10.

To be sure about the right most character position is most whole numbering systems, any "base" value (16 for hex, 10 for decimal) raised to the 0 power is "1". That's why (for example), 5 hex = 5 dec.

An over simplified but effective primer on the subject can be found at the following URL (scroll down and hit the "Next" button on the screen):
http://www.wisc-online.com/Objects/ViewObject.aspx?ID=DIG1102


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1546732
Posted Friday, March 14, 2014 1:12 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, July 22, 2014 12:22 AM
Points: 43, Visits: 481
thanx i did`t know that really exists....


===========================================
performance issue:
(1) Have you update the statistics on all tables of underling query?
(2) Are you using index well on the tables?
(3) Can you simplify your Query by re-writing it .
(4) are indexes are Defragmented well ?

Post #1551036
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse