Random Sample

  • nice question Chris.

  • The only problem is that the rand function returns the same value for each row. Changing the seed value changes the value returned by the rand function, but it is still the same value for all rows. The only exception is if you base the seed on the newid function, but its easier just to use the newid function to generate random values.

  • christian_nowicki (7/7/2014)


    The only problem is that the rand function returns the same value for each row. Changing the seed value changes the value returned by the rand function, but it is still the same value for all rows. The only exception is if you base the seed on the newid function, but its easier just to use the newid function to generate random values.

    How does "basing the seed on the newid function" make any difference? rand() will still return the same value for every row, whatever you base it on I think.

  • Thomas Abraham (7/7/2014)


    Nice question Christian. Thanks for contributing.

    +1

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Toreador (7/7/2014)


    BWFC (7/7/2014)


    I'm sure I'm missing something very basic here but I was under the impression that because there is no ordering automatically on a table, the row returned by TOP n is not ever guaranteed to be the same. Admittedly you could run a query 20000 times and get the same result but I didn't think run 20001 would necessarily produce the same value. Can somebody enlighten me please?

    You are right. Option A, with no Order By, does (apparently) return row 1 every time, at least in SQL2014. But there is no guarantee, as the sequence is not defined. You might get a different result if you ran it 20001 times. It's quite possible (though unlikely) that SQL2016 will always return row 5. Or row 2. Or a random row.

    Hugo Kornelis has a great article on result order without an order by clause: http://sqlblog.com/blogs/hugo_kornelis/archive/2006/12/31/Beatles-vs-Stones.aspx

    andy.brown (7/7/2014)


    christian_nowicki (7/7/2014)


    The only problem is that the rand function returns the same value for each row. Changing the seed value changes the value returned by the rand function, but it is still the same value for all rows. The only exception is if you base the seed on the newid function, but its easier just to use the newid function to generate random values.

    How does "basing the seed on the newid function" make any difference? rand() will still return the same value for every row, whatever you base it on I think.

    As each row has a unique seed, rand is guaranteed return unique values for each one. As the MSDN page says:

    If seed is not specified, the SQL Server Database Engine assigns a seed value at random. For a specified seed value, the result returned is always the same.

    When seed is not specified, it seems a single random value is generated to be used as seed for the whole query.

    If you're still in doubt, like I was, you can try it yourself:

    select top 20 RAND(CAST(NEWID() as varbinary(16))) from sys.columns

  • Using the checksum of a newid as the seed in the rand function does work because the newid is reinterpreted for each row. The only time I'd do it was if I wanted random numbers in a certain range, but I could accomplish this more easily by manipulating the result of the checksum. Try the following query.

    select checksum(newid()) [checksum(newid())],

    rand(checksum(newid())) [rand(checksum(newid()))]

    from information_schema.tables

    Which produces the following

    checksum(newid())rand(checksum(newid()))

    16358037610.986924437232023

    4473023160.681755274955645

    2546138230.41800577680907

    -3939144280.916135911188986

    -14742183580.834318560936923

    -2246331400.59610060597369

    5113407530.131479052219459

    3026621430.144764000770371

    19953135300.67384831581366

    10774383140.646245068324023

    7781166490.981185604553419

    13188310180.796331062731996

    8842640550.907034626273068

    -12349050270.437798511528195

    -7251237140.638412548866134

    11756891450.977022118022555

    21320417380.263610315709295

    -10223100170.127302936488477

    14839420740.587850586235753

  • christian_nowicki (7/7/2014)


    The only problem is that the rand function returns the same value for each row. Changing the seed value changes the value returned by the rand function, but it is still the same value for all rows. The only exception is if you base the seed on the newid function, but its easier just to use the newid function to generate random values.

    I understand now, thank you

  • Thank you for question!

    Theoretically, any query would return different value!

    In practice, Only ORDER BY NEWID() works for us.

  • Carlo Romagnano (7/9/2014)


    Thank you for question!

    Theoretically, any query would return different value!

    In practice, Only ORDER BY NEWID() works for us.

    Yes, that`s true, same found here 🙂

    Thanks & Best Regards,
    Hany Helmy
    SQL Server Database Consultant

  • Hany Helmy (7/9/2014)


    Carlo Romagnano (7/9/2014)


    Thank you for question!

    Theoretically, any query would return different value!

    In practice, Only ORDER BY NEWID() works for us.

    Yes, that`s true, same found here 🙂

    +1

    (to be honest, in this area, I have not done any proper analysis, I just used this once, that also I got it from the net and using the same method, ever since. Now is the time to make a actual visit for me:-) )

    ww; Raghu
    --
    The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

  • christian_nowicki (7/8/2014)


    Using the checksum of a newid as the seed in the rand function does work because the newid is reinterpreted for each row. The only time I'd do it was if I wanted random numbers in a certain range, but I could accomplish this more easily by manipulating the result of the checksum. Try the following query.

    select checksum(newid()) [checksum(newid())],

    rand(checksum(newid())) [rand(checksum(newid()))]

    from information_schema.tables

    Which produces the following

    checksum(newid())rand(checksum(newid()))

    16358037610.986924437232023

    4473023160.681755274955645

    2546138230.41800577680907

    -3939144280.916135911188986

    -14742183580.834318560936923

    -2246331400.59610060597369

    5113407530.131479052219459

    3026621430.144764000770371

    19953135300.67384831581366

    10774383140.646245068324023

    7781166490.981185604553419

    13188310180.796331062731996

    8842640550.907034626273068

    -12349050270.437798511528195

    -7251237140.638412548866134

    11756891450.977022118022555

    21320417380.263610315709295

    -10223100170.127302936488477

    14839420740.587850586235753

    Thank you, Christian, even knowing all these three function (I mean, referring to the BOL) - never used them together like this, this is new and fine example. 🙂

    ww; Raghu
    --
    The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply