random records

  • How to select random (10 or 20) records from table?

  • Use Top clause without order by clause.

    Select top 10 columnname from yourtable.

    --------------------------------------------------------------------------------------------------
    I am just an another naive wannabe DBA trying to learn SQL Server

  • Sachin Nandanwar (12/16/2010)


    Use Top clause without order by clause.

    Select top 10 columnname from yourtable.

    This would mostly give the same result again and again. Use NEWID() in the ORDER BY Clause

    SELECT TOP 10 * FROM YourTable ORDER BY NEWID()


    Kingston Dhasian

    How to post data/code on a forum to get the best help - Jeff Moden
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • What about the TABLESAMPLE clause?

    EG.

    SELECT *

    FROM tableX

    TABLESAMPLE (10 percent)

    More info here

  • Here is another method I've seen/used that is pretty similar to the ORDER BY NEWID(), but much faster:

    SELECT TOP 10 * FROM MyTable

    WHERE 0.01 >= CAST(CHECKSUM(NEWID(), SomeColumnName) & 0x7fffffff AS float) / CAST (0x7fffffff AS int)

    NOTE: Repalce "SomeColumnName" with some columns or set of columns from your table, PKs work well.

  • Using ORDER BY NEWID() with huge files is slow, because of the sort time.

    Generating TOP (X) WHERE (random number test) is fast, but tends to skew towards the front of the table, whatever that may be.

    If you have an indexed IDENTITY (ID) column in your primary table, I would generate a temp table of 10-20 distinct random numbers between min(ID) and max(ID) and join it to the primary.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Here is an example of pulling 20 random rows from a million row table.

    Note to Jeff Moden: It may not be "Nasty Fast" but it's "Wicked Quick" 😀

    Had to make this an attachment. For some reason, SSC is blocking my posts.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • The Dixie Flatline (12/17/2010)


    Note to Jeff Moden: It may not be "Nasty Fast" but it's "Wicked Quick"

    Heh... too funny, Bob. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The Dixie Flatline (12/16/2010)


    Using ORDER BY NEWID() with huge files is slow, because of the sort time.

    Generating TOP (X) WHERE (random number test) is fast, but tends to skew towards the front of the table, whatever that may be.

    If you have an indexed IDENTITY (ID) column in your primary table, I would generate a temp table of 10-20 distinct random numbers between min(ID) and max(ID) and join it to the primary.

    I am in a place where I also need to select random rows. I like the NEWID() solution because it's simple. Can you say when a file is too huge to use it on?

    Is the temp table of random numbers between min(ID) and max(ID) etc....really the superior one?

    Thanks.

  • Would this work? DECLARE @Rand INT

    SET @Rand = (RAND() * 20) + 10

    SELECT TOP @Rand FROM MYTable WHERE SomeIntColumn > @Rand

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • KoldCoffee (3/1/2014)


    Is the temp table of random numbers between min(ID) and max(ID) etc....really the superior one?

    For performance, absolutely. For accuracy in returning the proper number of rows, probably not because the IDENTITY column can have gaps in it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here is another way to improve performance while ensuring the desired number of rows. (see attachment) Whether or not you really need the performance gain depends on the size of the gain, how often you are going to run the query, etc.

    One of the problems with the TOP/ORDER BY technique and large tables is that it can takes huge sorts in order to select a small number of rows. The more rows you throw into a sort, the slower it runs. The code below avoids this by selecting a random subset of rows prior to the sort (with a cushion to make sure that more than the desired number of rows is available.) It still uses the ORDER BY technique to defeat the front-end loading problem, but runs consistently faster simply because fewer rows have to be sorted. For larger number of rows it is outperforming the original solution. Good hunting.

    P.S. to Jeff: The subject table now deliberately includes gaps, but its a moot point since the newer technique doesn't generate the table of random N values.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Hi Dixie Flatline, I saw this on another post and was trying to understand it so I appreciate that you explained the code more in this version. I will study it and thanks.

  • I know Bob (Dixie) has done a masterful job with this script.

    Here is a slightly different route I have taken in the past that was also quite efficient and performed well.

    http://jasonbrimhall.info/2010/12/13/sql-confessions/

    Yes it is similar to other methods mentioned.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Masterful? Do you need to borrow money or something, Jason?

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply