Generating Test Data: Part 1 - Generating Random Integers and Floats

  • Comments posted to this topic are about the item Generating Test Data: Part 1 - Generating Random Integers and Floats

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Outstanding article Jeff! Just what the doctor ordered for something I'm working on at this instant.

    I can't wait for the purists to berate you for using "pseudo" random numbers though. :w00t:

    And let me guess:

    DECLARE @Range INT

    ,@StartValueDATETIME

    ,@EndValueDATETIME

    SELECT @StartValue = '2012-02-15', @EndValue = '2012-12-31'

    SELECT @Range = DATEDIFF(day, @StartValue, @EndValue)

    SELECT TOP 20-- Random dates

    DATEADD(day, ABS(CHECKSUM(NEWID()) % @Range), @StartValue) As SomeRandomTime

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    SELECT @Range = DATEDIFF(second, @StartValue, @EndValue)

    SELECT TOP 20-- Random times (to the second)

    DATEADD(second, ABS(CHECKSUM(NEWID()) % @Range), @StartValue) As SomeRandomDate

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    🙂


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Hi Jeff,

    Nice Article, Keep going.

    I have a question on sys tables, most of them using the Sys.tables for generating random numbers. Assume that developers don't have an access to System tables, then how can they get the random numbers.

    Thanks,

    Karthik

  • Krtyknm (3/26/2012)

    I have a question on sys tables, most of them using the Sys.tables for generating random numbers. Assume that developers don't have an access to System tables, then how can they get the random numbers.

    He's only using a system table (or view, strictly speaking) because of the number of rows it has--any table with a decent number of rows will work, whether it's a tally table or one of your main data tables.

  • This article was called "Generating Test Data: Part 1 - Generating Random Integers and Floats" It should have been called "Generating Test Data: Part 1 - A Way Cool Compendium of Well Explained El Neato T-SQL Techniques"

    Just one minor gripe - not about the article - about us, the SQL Server community. I know times are tough but we should all be ashamed of ourselves for allowing a true luminary of the T-SQL world to slip into such penury that he is only able to afford the sort of machine that my kids wouldn't (well probably couldn't) watch YouTube on. What say we pass the hat around and get a natty 2006 machine off ebay for Jeff? Maybe we could start up a One Laptop Per MVP project.

    ...One of the symptoms of an approaching nervous breakdown is the belief that ones work is terribly important.... Bertrand Russell

  • Good idea. If you count all the posters except Jeff you are up to 10 cents:-D

    ------------
    Buy the ticket, take the ride. -- Hunter S. Thompson

  • Krtyknm (3/26/2012)


    Hi Jeff,

    Nice Article, Keep going.

    I have a question on sys tables, most of them using the Sys.tables for generating random numbers. Assume that developers don't have an access to System tables, then how can they get the random numbers.

    Thanks,

    Karthik

    They can use virtually any table. Any table with just 1000 rows will allow them to build a million row table using the Cross Join.

    That, notwithstanding, in such situations and like I said in the article, I'll build a Tally table for them. My Tally tables usually have 11000 rows in them so I can use it to build 30 years of dates. That's enough for a Cross Join to build a million rows... or 121 million if they want. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • paul.knibbs (3/26/2012)


    Krtyknm (3/26/2012)

    I have a question on sys tables, most of them using the Sys.tables for generating random numbers. Assume that developers don't have an access to System tables, then how can they get the random numbers.

    He's only using a system table (or view, strictly speaking) because of the number of rows it has--any table with a decent number of rows will work, whether it's a tally table or one of your main data tables.

    Ah... I should have scrolled down a bit more before replying. Thanks for the cover, Paul!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • GPO (3/26/2012)


    This article was called "Generating Test Data: Part 1 - Generating Random Integers and Floats" It should have been called "Generating Test Data: Part 1 - A Way Cool Compendium of Well Explained El Neato T-SQL Techniques"

    Just one minor gripe - not about the article - about us, the SQL Server community. I know times are tough but we should all be ashamed of ourselves for allowing a true luminary of the T-SQL world to slip into such penury that he is only able to afford the sort of machine that my kids wouldn't (well probably couldn't) watch YouTube on. What say we pass the hat around and get a natty 2006 machine off ebay for Jeff? Maybe we could start up a One Laptop Per MVP project.

    That's a much cooler title! Thanks for the great feedback.

    I actually do have a more modern HP G71 laptop that comparatively screams and will do parallelism, etc but I actually like my ol' war horse. If I can make something run fast on it, I know you folks with "real" machines are going to love it. 🙂

    That, notwithstanding, maybe we could get Al Gore to start a "No MVP left behind" project. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • G Bryant McClellan (3/26/2012)


    Good idea. If you count all the posters except Jeff you are up to 10 cents:-D

    Heh... that's more than Al Gore has given me so far. :hehe:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Another great article! And it explains the reasons to use that technique going back to basics.

    Sometimes we know that we should do things one way or another because it's recommended everywhere but sometimes we don't exactly know "why" it's better. Articles that explain "why" are needed. Thank you.

    Best regards,

    Andre Guerreiro Neto

    Database Analyst
    http://www.softplan.com.br
    MCITPx1/MCTSx2/MCSE/MCSA

  • dwain.c (3/26/2012)


    Outstanding article Jeff! Just what the doctor ordered for something I'm working on at this instant.

    I can't wait for the purists to berate you for using "pseudo" random numbers though. :w00t:

    And let me guess:

    DECLARE @Range INT

    ,@StartValueDATETIME

    ,@EndValueDATETIME

    SELECT @StartValue = '2012-02-15', @EndValue = '2012-12-31'

    SELECT @Range = DATEDIFF(day, @StartValue, @EndValue)

    SELECT TOP 20-- Random dates

    DATEADD(day, ABS(CHECKSUM(NEWID()) % @Range), @StartValue) As SomeRandomTime

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    SELECT @Range = DATEDIFF(second, @StartValue, @EndValue)

    SELECT TOP 20-- Random times (to the second)

    DATEADD(second, ABS(CHECKSUM(NEWID()) % @Range), @StartValue) As SomeRandomDate

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    🙂

    I might be safe for the next 10 minutes or so. Although the "next" random value is certainly predictable, you'd have to know a fair bit about how NEWID() is generated to predict the next value 😀

    You're just about spot on in your code. Just substitute @Range for the 20 in TOP 20 and Bob's your non-hardcoded Uncle. 🙂 In Part 2, I'll explain how to easily include random times as a part of generating random dates. I'll also cover making period bins.

    Thanks for the feedback, Dwain!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • codebyo (3/26/2012)


    Another great article! And it explains the reasons to use that technique going back to basics.

    Sometimes we know that we should do things one way or another because it's recommended everywhere but sometimes we don't exactly know "why" it's better. Articles that explain "why" are needed. Thank you.

    You've hit the nail on the head, Andre. Random data generation is just like the Tally Table used to be. A lot of people were using it thanks to some posted code examples but they may not have known the "Why" of how it all worked. I wanted to make sure that people knew "Why" things worked the way they did so they can really think outside the box when the time comes.

    Of course, the other reason I'm writing this is so that people who might not have otherwise been able to do so, can easily write some test data to see if that "works for 10 rows" code example they find on the internet is worth its salt. :hehe:

    Thanks for the feedback and for "getting it".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Excellent article Jeff. Yet another home run!!! Your explanations are always so clear that even us lessers can understand. Keep them coming!!!

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (3/26/2012)


    Excellent article Jeff. Yet another home run!!! Your explanations are always so clear that even us lessers can understand. Keep them coming!!!

    Heh... "lessers". I've seen your posts and I wouldn't associate the word "lesser" with someone like you. As you said, "Keep them coming!!!"

    Thanks for the feedback, Sean.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 60 total)

You must be logged in to reply to this topic. Login to reply