Random Sample

Question

Post reply

Random Sample

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply

PChiragS SSCarpal Tunnel Points: 4965 More actions · Answer 1

PChiragS

SSCarpal Tunnel

Points: 4965

July 7, 2014 at 9:31 am

#1727452

nice question Chris.

christian_nowicki Valued Member Points: 54 More actions · Answer 2

The only problem is that the rand function returns the same value for each row. Changing the seed value changes the value returned by the rand function, but it is still the same value for all rows. The only exception is if you base the seed on the newid function, but its easier just to use the newid function to generate random values.

Zagyg SSCrazy Points: 2091 More actions · Answer 3

christian_nowicki (7/7/2014)
The only problem is that the rand function returns the same value for each row. Changing the seed value changes the value returned by the rand function, but it is still the same value for all rows. The only exception is if you base the seed on the newid function, but its easier just to use the newid function to generate random values.

How does "basing the seed on the newid function" make any difference? rand() will still return the same value for every row, whatever you base it on I think.

SQLRNNR SSC Guru Points: 281334 More actions · Answer 4

Thomas Abraham (7/7/2014)
Nice question Christian. Thanks for contributing.

+1

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Primo Dang SSCrazy Points: 2643 More actions · Answer 5

Toreador (7/7/2014)
BWFC (7/7/2014)
I'm sure I'm missing something very basic here but I was under the impression that because there is no ordering automatically on a table, the row returned by TOP n is not ever guaranteed to be the same. Admittedly you could run a query 20000 times and get the same result but I didn't think run 20001 would necessarily produce the same value. Can somebody enlighten me please?
You are right. Option A, with no Order By, does (apparently) return row 1 every time, at least in SQL2014. But there is no guarantee, as the sequence is not defined. You might get a different result if you ran it 20001 times. It's quite possible (though unlikely) that SQL2016 will always return row 5. Or row 2. Or a random row.

Hugo Kornelis has a great article on result order without an order by clause: http://sqlblog.com/blogs/hugo_kornelis/archive/2006/12/31/Beatles-vs-Stones.aspx

andy.brown (7/7/2014)
christian_nowicki (7/7/2014)
The only problem is that the rand function returns the same value for each row. Changing the seed value changes the value returned by the rand function, but it is still the same value for all rows. The only exception is if you base the seed on the newid function, but its easier just to use the newid function to generate random values.
How does "basing the seed on the newid function" make any difference? rand() will still return the same value for every row, whatever you base it on I think.

As each row has a unique seed, rand is guaranteed return unique values for each one. As the MSDN page says:

If seed is not specified, the SQL Server Database Engine assigns a seed value at random. For a specified seed value, the result returned is always the same.

When seed is not specified, it seems a single random value is generated to be used as seed for the whole query.

If you're still in doubt, like I was, you can try it yourself:

select top 20 RAND(CAST(NEWID() as varbinary(16))) from sys.columns

Ed Wagner SSC Guru Points: 287026 More actions · Answer 6

Good question to get you to think. Thanks.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

christian_nowicki Valued Member Points: 54 More actions · Answer 7

Using the checksum of a newid as the seed in the rand function does work because the newid is reinterpreted for each row. The only time I'd do it was if I wanted random numbers in a certain range, but I could accomplish this more easily by manipulating the result of the checksum. Try the following query.

select checksum(newid()) [checksum(newid())],

rand(checksum(newid())) [rand(checksum(newid()))]

from information_schema.tables

Which produces the following

checksum(newid())rand(checksum(newid()))

16358037610.986924437232023

4473023160.681755274955645

2546138230.41800577680907

-3939144280.916135911188986

-14742183580.834318560936923

-2246331400.59610060597369

5113407530.131479052219459

3026621430.144764000770371

19953135300.67384831581366

10774383140.646245068324023

7781166490.981185604553419

13188310180.796331062731996

8842640550.907034626273068

-12349050270.437798511528195

-7251237140.638412548866134

11756891450.977022118022555

21320417380.263610315709295

-10223100170.127302936488477

14839420740.587850586235753

Iulian -207023 SSCertifiable Points: 7639 More actions · Answer 8

christian_nowicki (7/7/2014)
The only problem is that the rand function returns the same value for each row. Changing the seed value changes the value returned by the rand function, but it is still the same value for all rows. The only exception is if you base the seed on the newid function, but its easier just to use the newid function to generate random values.

I understand now, thank you

Carlo Romagnano SSC-Insane Points: 22642 More actions · Answer 9

Thank you for question!

Theoretically, any query would return different value!

In practice, Only ORDER BY NEWID() works for us.

Hany Helmy SSChampion Points: 13921 More actions · Answer 10

Carlo Romagnano (7/9/2014)
Thank you for question!
Theoretically, any query would return different value!
In practice, Only ORDER BY NEWID() works for us.

Yes, that`s true, same found here 🙂

Thanks & Best Regards,
Hany Helmy
SQL Server Database Consultant

Raghavendra Mudugal SSChampion Points: 10658 More actions · Answer 11

Hany Helmy (7/9/2014)
Carlo Romagnano (7/9/2014)
Thank you for question!
Theoretically, any query would return different value!
In practice, Only ORDER BY NEWID() works for us.
Yes, that`s true, same found here 🙂

+1

(to be honest, in this area, I have not done any proper analysis, I just used this once, that also I got it from the net and using the same method, ever since. Now is the time to make a actual visit for me:-) )

ww; Raghu
--
The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

Raghavendra Mudugal SSChampion Points: 10658 More actions · Answer 12

christian_nowicki (7/8/2014)
Using the checksum of a newid as the seed in the rand function does work because the newid is reinterpreted for each row. The only time I'd do it was if I wanted random numbers in a certain range, but I could accomplish this more easily by manipulating the result of the checksum. Try the following query.
select checksum(newid()) [checksum(newid())],
rand(checksum(newid())) [rand(checksum(newid()))]
from information_schema.tables
Which produces the following
checksum(newid())rand(checksum(newid()))
16358037610.986924437232023
4473023160.681755274955645
2546138230.41800577680907
-3939144280.916135911188986
-14742183580.834318560936923
-2246331400.59610060597369
5113407530.131479052219459
3026621430.144764000770371
19953135300.67384831581366
10774383140.646245068324023
7781166490.981185604553419
13188310180.796331062731996
8842640550.907034626273068
-12349050270.437798511528195
-7251237140.638412548866134
11756891450.977022118022555
21320417380.263610315709295
-10223100170.127302936488477
14839420740.587850586235753

Thank you, Christian, even knowing all these three function (I mean, referring to the BOL) - never used them together like this, this is new and fine example. 🙂

ww; Raghu
--
The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.