Generating Test Data: Part 1 - Generating Random Integers and Floats

  • Jeff Moden

    SSC Guru

    Points: 993883

    Comments posted to this topic are about the item Generating Test Data: Part 1 - Generating Random Integers and Floats

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

  • Dwain Camps

    SSC Guru

    Points: 86873

    Outstanding article Jeff! Just what the doctor ordered for something I'm working on at this instant.

    I can't wait for the purists to berate you for using "pseudo" random numbers though. :w00t:

    And let me guess:

    DECLARE @Range INT

    ,@StartValue DATETIME

    ,@EndValue DATETIME

    SELECT @StartValue = '2012-02-15', @EndValue = '2012-12-31'

    SELECT @Range = DATEDIFF(day, @StartValue, @EndValue)

    SELECT TOP 20 -- Random dates

    DATEADD(day, ABS(CHECKSUM(NEWID()) % @Range), @StartValue) As SomeRandomTime

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    SELECT @Range = DATEDIFF(second, @StartValue, @EndValue)

    SELECT TOP 20 -- Random times (to the second)

    DATEADD(second, ABS(CHECKSUM(NEWID()) % @Range), @StartValue) As SomeRandomDate

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    🙂


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Krtyknm

    SSC Eights!

    Points: 884

    Hi Jeff,

    Nice Article, Keep going.

    I have a question on sys tables, most of them using the Sys.tables for generating random numbers. Assume that developers don't have an access to System tables, then how can they get the random numbers.

    Thanks,

    Karthik

  • paul.knibbs

    SSCoach

    Points: 15270

    Krtyknm (3/26/2012)

    I have a question on sys tables, most of them using the Sys.tables for generating random numbers. Assume that developers don't have an access to System tables, then how can they get the random numbers.

    He's only using a system table (or view, strictly speaking) because of the number of rows it has--any table with a decent number of rows will work, whether it's a tally table or one of your main data tables.

  • GPO

    SSCarpal Tunnel

    Points: 4450

    This article was called "Generating Test Data: Part 1 - Generating Random Integers and Floats" It should have been called "Generating Test Data: Part 1 - A Way Cool Compendium of Well Explained El Neato T-SQL Techniques"

    Just one minor gripe - not about the article - about us, the SQL Server community. I know times are tough but we should all be ashamed of ourselves for allowing a true luminary of the T-SQL world to slip into such penury that he is only able to afford the sort of machine that my kids wouldn't (well probably couldn't) watch YouTube on. What say we pass the hat around and get a natty 2006 machine off ebay for Jeff? Maybe we could start up a One Laptop Per MVP project.

    ...One of the symptoms of an approaching nervous breakdown is the belief that ones work is terribly important.... Bertrand Russell

  • Bryant McClellan

    SSCarpal Tunnel

    Points: 4212

    Good idea. If you count all the posters except Jeff you are up to 10 cents:-D

    ------------
    Buy the ticket, take the ride. -- Hunter S. Thompson

  • Jeff Moden

    SSC Guru

    Points: 993883

    Krtyknm (3/26/2012)


    Hi Jeff,

    Nice Article, Keep going.

    I have a question on sys tables, most of them using the Sys.tables for generating random numbers. Assume that developers don't have an access to System tables, then how can they get the random numbers.

    Thanks,

    Karthik

    They can use virtually any table. Any table with just 1000 rows will allow them to build a million row table using the Cross Join.

    That, notwithstanding, in such situations and like I said in the article, I'll build a Tally table for them. My Tally tables usually have 11000 rows in them so I can use it to build 30 years of dates. That's enough for a Cross Join to build a million rows... or 121 million if they want. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

  • Jeff Moden

    SSC Guru

    Points: 993883

    paul.knibbs (3/26/2012)


    Krtyknm (3/26/2012)

    I have a question on sys tables, most of them using the Sys.tables for generating random numbers. Assume that developers don't have an access to System tables, then how can they get the random numbers.

    He's only using a system table (or view, strictly speaking) because of the number of rows it has--any table with a decent number of rows will work, whether it's a tally table or one of your main data tables.

    Ah... I should have scrolled down a bit more before replying. Thanks for the cover, Paul!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

  • Jeff Moden

    SSC Guru

    Points: 993883

    GPO (3/26/2012)


    This article was called "Generating Test Data: Part 1 - Generating Random Integers and Floats" It should have been called "Generating Test Data: Part 1 - A Way Cool Compendium of Well Explained El Neato T-SQL Techniques"

    Just one minor gripe - not about the article - about us, the SQL Server community. I know times are tough but we should all be ashamed of ourselves for allowing a true luminary of the T-SQL world to slip into such penury that he is only able to afford the sort of machine that my kids wouldn't (well probably couldn't) watch YouTube on. What say we pass the hat around and get a natty 2006 machine off ebay for Jeff? Maybe we could start up a One Laptop Per MVP project.

    That's a much cooler title! Thanks for the great feedback.

    I actually do have a more modern HP G71 laptop that comparatively screams and will do parallelism, etc but I actually like my ol' war horse. If I can make something run fast on it, I know you folks with "real" machines are going to love it. 🙂

    That, notwithstanding, maybe we could get Al Gore to start a "No MVP left behind" project. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

  • Jeff Moden

    SSC Guru

    Points: 993883

    G Bryant McClellan (3/26/2012)


    Good idea. If you count all the posters except Jeff you are up to 10 cents:-D

    Heh... that's more than Al Gore has given me so far. :hehe:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

  • Andre Guerreiro

    SSCertifiable

    Points: 7319

    Another great article! And it explains the reasons to use that technique going back to basics.

    Sometimes we know that we should do things one way or another because it's recommended everywhere but sometimes we don't exactly know "why" it's better. Articles that explain "why" are needed. Thank you.

    Best regards,

    Andre Guerreiro Neto

    Database Analyst
    http://www.softplan.com.br
    MCITPx1/MCTSx2/MCSE/MCSA

  • Jeff Moden

    SSC Guru

    Points: 993883

    dwain.c (3/26/2012)


    Outstanding article Jeff! Just what the doctor ordered for something I'm working on at this instant.

    I can't wait for the purists to berate you for using "pseudo" random numbers though. :w00t:

    And let me guess:

    DECLARE @Range INT

    ,@StartValue DATETIME

    ,@EndValue DATETIME

    SELECT @StartValue = '2012-02-15', @EndValue = '2012-12-31'

    SELECT @Range = DATEDIFF(day, @StartValue, @EndValue)

    SELECT TOP 20 -- Random dates

    DATEADD(day, ABS(CHECKSUM(NEWID()) % @Range), @StartValue) As SomeRandomTime

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    SELECT @Range = DATEDIFF(second, @StartValue, @EndValue)

    SELECT TOP 20 -- Random times (to the second)

    DATEADD(second, ABS(CHECKSUM(NEWID()) % @Range), @StartValue) As SomeRandomDate

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    🙂

    I might be safe for the next 10 minutes or so. Although the "next" random value is certainly predictable, you'd have to know a fair bit about how NEWID() is generated to predict the next value 😀

    You're just about spot on in your code. Just substitute @Range for the 20 in TOP 20 and Bob's your non-hardcoded Uncle. 🙂 In Part 2, I'll explain how to easily include random times as a part of generating random dates. I'll also cover making period bins.

    Thanks for the feedback, Dwain!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

  • Jeff Moden

    SSC Guru

    Points: 993883

    codebyo (3/26/2012)


    Another great article! And it explains the reasons to use that technique going back to basics.

    Sometimes we know that we should do things one way or another because it's recommended everywhere but sometimes we don't exactly know "why" it's better. Articles that explain "why" are needed. Thank you.

    You've hit the nail on the head, Andre. Random data generation is just like the Tally Table used to be. A lot of people were using it thanks to some posted code examples but they may not have known the "Why" of how it all worked. I wanted to make sure that people knew "Why" things worked the way they did so they can really think outside the box when the time comes.

    Of course, the other reason I'm writing this is so that people who might not have otherwise been able to do so, can easily write some test data to see if that "works for 10 rows" code example they find on the internet is worth its salt. :hehe:

    Thanks for the feedback and for "getting it".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

  • Sean Lange

    SSC Guru

    Points: 286402

    Excellent article Jeff. Yet another home run!!! Your explanations are always so clear that even us lessers can understand. Keep them coming!!!

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Jeff Moden

    SSC Guru

    Points: 993883

    Sean Lange (3/26/2012)


    Excellent article Jeff. Yet another home run!!! Your explanations are always so clear that even us lessers can understand. Keep them coming!!!

    Heh... "lessers". I've seen your posts and I wouldn't associate the word "lesser" with someone like you. As you said, "Keep them coming!!!"

    Thanks for the feedback, Sean.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

Viewing 15 posts - 1 through 15 (of 61 total)

You must be logged in to reply to this topic. Login to reply