Inserting a words from a range randomly into a table

  • Im in the process of creating test data

    Now I have a field in the test data called SerialNo

    this has the following allowed values  '111111','222222', '333333','444444','555555','666666'

    Now I want to be able insert rows randomly into a target table which can take any one of the values above but

    im not sure how to do this?

    I thought of using a local temp table like the following

    CREATE TABLE [#LoggerNotifications](

    [LoggerNotifcationID] [int] IDENTITY(1,1) NOT NULL,

    [SerialNumber] [nvarchar](60) NOT NULL

    )

    insert into #LoggerNotifications

    Select '111111'

    union all

    Select '222222'

    union all

    Select '333333'

    union all

    Select '444444'

    union all

    Select '555555'

    union all

    Select '666666'

    Then I can randonmly select from this table between 1 and 6 and grab the serial number and insert it into the table

    But this is quite a bit of work to insert a serial number range and I was wondering if there is any easier way to just randomly

    select from a selection of values?

     

     

  • SELECT TOP (n) ID

    FROM MyTable

    ORDER BY NewID()

    then build the insert from that. ORDER BY NEWID()

  • Here's an easy way that doesn't require much in the form of prep...

    --===== Demonstrate one of many "right" ways to do this.
    DROP TABLE IF EXISTS #MyHead;
    SELECT TOP 1000000
    SomeSerialNo = CHOOSE(ca.N,'111111','222222', '333333','444444','555555','666666')
    ,SomeOtherRandomCols = 'SomeOtherRandomCols'
    INTO #MyHead
    FROM sys.all_columns ac1
    CROSS JOIN sys.all_columns ac2
    CROSS APPLY (VALUES (ABS(CHECKSUM(NEWID())%6)+1))ca(N)
    ;
    --===== Demonstrate that the distribution is random but reasonably flat.
    SELECT SomeSerialNo, COUNT(*)
    FROM #MyHead
    GROUP BY SomeSerialNo
    ORDER BY SomeSerialNo
    ;

    Notice that I say it's one "right" way meaning that the distribution is random but reasonably flat like you'd expect random values to be.

    Results from above (and the numbers will change each time you run it but will always be close to "flat" distribution)...

    There are "wrong" ways to do it.  CHOOSE() actually resolves to multiple cases with multiple executions of the NEWID() formula (and so will a supposed single value CASE, which is annoying as hell).  That will result in NULLs and value usage being severely skewed if you use the formula directly in the CHOOSE.  You can see that in the execution plan but here's the code for one of the many ways to do it incorrectly.

     

    --===== Demonstrate one of many "wrong" ways to do this.
    DROP TABLE IF EXISTS #MyHead;
    SELECT TOP 1000000
    SomeSerialNo = CHOOSE(ABS(CHECKSUM(NEWID())%6)+1,'111111','222222', '333333','444444','555555','666666')
    ,SomeOtherRandomCols = 'SomeOtherRandomCols'
    INTO #MyHead
    FROM sys.all_columns ac1
    CROSS JOIN sys.all_columns ac2
    ;
    --===== Demonstrate that the distribution is random but improperly skewed.
    SELECT SomeSerialNo, COUNT(*)
    FROM #MyHead
    GROUP BY SomeSerialNo
    ORDER BY SomeSerialNo
    ;

    Here are the results from that run, which shows the skews and the NULLs I was talking about.

    The difference between the two snippets is that the CROSS APPLY in the first "correct" snippet forces the result of the NEWID()-based formula to materialize before it is used by the evaluation in the CHOOSE.  The second "wrong" snippet allows the multi-CASE statements behind the scenes of CHOOSE to actually generate as many as 6 times if it doesn't match an evaluation right away.

    So here's from the execution plan from the first "correct" snipped.  You can see that it's using a pre-caclulated expression in the CASE statement behind the scenes and so the cascading CASE statement actually works correctly.

    [Expr1204] = Scalar Operator(CASE 
    WHEN [Expr1203] = (1)
    THEN '111111'
    ELSE CASE
    WHEN [Expr1203] = (2)
    THEN '222222'
    ELSE CASE
    WHEN [Expr1203] = (3)
    THEN '333333'
    ELSE CASE
    WHEN [Expr1203] = (4)
    THEN '444444'
    ELSE CASE
    WHEN [Expr1203] = (5)
    THEN '555555'
    ELSE CASE
    WHEN [Expr1203] = (6)
    THEN '666666'
    ELSE NULL
    END
    END
    END
    END
    END
    END)

     

    Here's what happens (again, you can pick it up from the "Compute Scalar" block properties in the execution plan) if you do it one of the "wrong" ways and you can clearly see why it's a problem.

    Scalar Operator(CASE 
    WHEN (abs(checksum(newid()) % (6)) + (1)) = (1)
    THEN '111111'
    ELSE CASE
    WHEN (abs(checksum(newid()) % (6)) + (1)) = (2)
    THEN '222222'
    ELSE CASE
    WHEN (abs(checksum(newid()) % (6)) + (1)) = (3)
    THEN '333333'
    ELSE CASE
    WHEN (abs(checksum(newid()) % (6)) + (1)) = (4)
    THEN '444444'
    ELSE CASE
    WHEN (abs(checksum(newid()) % (6)) + (1)) = (5)
    THEN '555555'
    ELSE CASE
    WHEN (abs(checksum(newid()) % (6)) + (1)) = (6)
    THEN '666666'
    ELSE NULL
    END
    END
    END
    END
    END
    END)

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Guys

    Some very useful stuff there

    I elected to use this as I need to generate a 2nd column of random values

    SELECT TOP 10

    SomeSerialNo = CHOOSE(ca.N,'111111','222222', '333333','444444','555555','666666')

    ,SomeOtherRandomCols = CHOOSE(ca2.N,'Open','Closed', 'Polling','Listening','Error','Transmitting')

    -- INTO #MyHead

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    CROSS APPLY (VALUES (ABS(CHECKSUM(NEWID())%6)+1))ca(N)

    CROSS APPLY (VALUES (ABS(CHECKSUM(NEWID())%6)+1))ca2(N)

     

  • Thanks for the feedback.  Looks to me like you totally "get it"! 😀

    Just in case you want to do even more in the area of rapidly generating random but constrained data of a bunch of different types, have a look at the following articles... the methods are also useful for generating things like on-the-fly calendar tables, etc.

    https://www.sqlservercentral.com/articles/generating-test-data-part-1-generating-random-integers-and-floats-1

    https://www.sqlservercentral.com/articles/generating-test-data-part-2-generating-sequential-and-random-dates

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply