Generating Test Data: Part 2 - Generating Sequential and Random Dates

  • Dev (5/20/2012)


    Nice Article Jeff!!!

    Thanks, Dev. I appreciate both the read and the feedback.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Great article Jeff.

    One thing to mention... in the article, you pointed out that you cannot directly add / subtract a number to the new DATE data time. This applies to all of the new date data types: DATETIME2, DATETIMEOFFSET and TIME as well as DATE.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Jeff, thanks for the great articles (part 1 and part 2 and look forward to part3).

    Is there a way to pick a random value (like a color) from a subquery? here is what I tried but getting the same value on every run:

    SELECT TOP (100)

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) as ID,--Sequential number from 1 to ..

    (SELECT top 1 color FROM ( VALUES (0,'Red'),(1,'Green'),(2,'Yellow') ) colors(id,color) ORDER BY NEWID()) as RandomColor,

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    but if I run this piece alone, I am getting different colors:

    SELECT top 1 color FROM ( VALUES (0,'Red'),(1,'Green'),(2,'Yellow') ) colors(id,color) ORDER BY NEWID()

  • No... not from a sub-query. At least I haven't figured out a way to do it with a TOP 1/ORDER BY like that.

    You could do it using a CASE function based on ABS(CHECKSUM(NEWID()))%3 but that will actually come up with 4 values because, since NEWID() isn't deterministic, the CASE function will recalculate each and every WHEN even if you use CASE ABS(CHECKSUM(NEWID()))%3. You could then change the formula to ABS(CHECKSUM(NEWID()))%2 and use ELSE but that will give you an uneven distribution.

    So, the only thing to do is to gen the numbers that control the color separately and then CASE that number. Here's one way of doing that in a single query.

    DROP TABLE #MyHead;

    WITH

    cteRandom AS

    ( --=== Generate the random number first

    SELECT TOP (1000)

    ID = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),

    RandomColor# = ABS(CHECKSUM(NEWID()))%3

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    ) --=== Now, pick the color according to the generated number

    SELECT ID,

    RandomColor = CASE RandomColor#

    WHEN 1 THEN 'Red'

    WHEN 2 THEN 'Green'

    ELSE 'Yellow'

    END

    INTO #MyHead

    FROM cteRandom

    ;

    --===== Show the distribution

    SELECT RandomColor, COUNT(*)

    FROM #MyHead

    GROUP BY RandomColor

    ORDER BY RandomColor

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here's another way... same idea, though.

    DROP TABLE #MyHead;

    SELECT TOP (1000)

    ID = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),

    RandomColor = CASE RandomColor#

    WHEN 1 THEN 'Red'

    WHEN 2 THEN 'Green'

    ELSE 'Yellow'

    END

    INTO #MyHead

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    CROSS APPLY (SELECT ABS(CHECKSUM(NEWID()))%3) ca (RandomColor#)

    ;

    --===== Show the distribution

    SELECT RandomColor, COUNT(*)

    FROM #MyHead

    GROUP BY RandomColor

    ORDER BY RandomColor

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/16/2012)


    Here's another way... same idea, though.

    awesome, it will work! thanks for the quick response, Jeff! and I am going to use this opportunity to say BIG THANK YOU for all your articles and knowledge you shared with us!

  • Thanks, Boriskey :blush: I really appreciate the feedback. I aim to please... I sometimes miss but I'm always aiming. πŸ˜€

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, moving the (SELECT ABS(CHECKSUM(NEWID()))%3) from the CROSS APPLY into the CASE seems to work as well. When should CROSS APPLY be used, or is it equivalent in this case?

    SELECT TOP (1000)

    ID = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),

    RandomColor = CASE (SELECT ABS(CHECKSUM(NEWID()))%3)

    WHEN 1 THEN 'Red'

    WHEN 2 THEN 'Green'

    ELSE 'Yellow'

    END

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    ;

    Thanks much. I always enjoy reading your articles and comments. And I always learn or re-learn something.

  • Sorry... I didn't see the new code correctly. I'll be back.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • John Rempel (8/28/2013)


    Jeff, moving the (SELECT ABS(CHECKSUM(NEWID()))%3) from the CROSS APPLY into the CASE seems to work as well. When should CROSS APPLY be used, or is it equivalent in this case?

    SELECT TOP (1000)

    ID = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),

    RandomColor = CASE (SELECT ABS(CHECKSUM(NEWID()))%3)

    WHEN 1 THEN 'Red'

    WHEN 2 THEN 'Green'

    ELSE 'Yellow'

    END

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    ;

    If you look deep in the properties of some of the compute scalars in the actual execution plan, you see that this has the same problem I spoke of before. Here's where...

    Scalar Operator(CASE WHEN [Expr1006]=(1) THEN 'Red' ELSE CASE WHEN [Expr1007]=(2) THEN 'Green' ELSE 'Yellow' END END)

    Expr1006 and Expr1007 are separate copies of the Scalar Operator(abs(checksum(newid()))%(3)). What ends up happening is that Yellow will turn up more than the other two colors because if the first random forumula isn't a 1, then it calculates a new random formula. Each random number generator only has a 1 out of 3 chance of finding its mark. That means that 2/3rds of the time, each will miss its mark. I don't know what that works out to odds wise (no coffee in the last 5 hours), but it means that Yellow will always come out with the most hits followed by Red, followed by Green. The CROSS APPLY doesn't have the same problem.

    Thanks much. I always enjoy reading your articles and comments. And I always learn or re-learn something.

    Thank you very much for the feedback. I'm glad that I can return something to the community that has helped me so much.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • boriskey (12/16/2012)


    Jeff, thanks for the great articles (part 1 and part 2 and look forward to part3).

    Is there a way to pick a random value (like a color) from a subquery? here is what I tried but getting the same value on every run:

    SELECT TOP (100)

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) as ID,--Sequential number from 1 to ..

    (SELECT top 1 color FROM ( VALUES (0,'Red'),(1,'Green'),(2,'Yellow') ) colors(id,color) ORDER BY NEWID()) as RandomColor,

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    but if I run this piece alone, I am getting different colors:

    SELECT top 1 color FROM ( VALUES (0,'Red'),(1,'Green'),(2,'Yellow') ) colors(id,color) ORDER BY NEWID()

    You have to make two changes to your subquery to make this work:

    1. Correlate it to the outer query, or it will run once per query rather than once per row

    2. Make it do something early on, or it will still get "optimized leftwards" (what the hell am I talking about? Is there a proper term for this?)

    What you are looking for in the constant scan operator at the rightmost end of the subquery branch is 'Number of Executions = 1000' and 'Actual number of Rows' = 3000.

    Something like this:

    SELECT TOP 1000 ID = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),

    x.*

    FROM sys.all_columns ac1

    OUTER APPLY (

    SELECT TOP 1

    RandomColor = color,

    IDi = ac1.object_id + ac1.column_id + ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM (VALUES (0,'Red'),(1,'Green'),(2,'Yellow')) colors (id,color)

    ORDER BY NEWID()

    ) x

    If you remove IDi from the outer SELECT, the whole optimiser spoof collapses πŸ˜‰

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 11 posts - 31 through 40 (of 40 total)

You must be logged in to reply to this topic. Login to reply