How using SQL to generate random string?

  • It's on my to-do list



    Clear Sky SQL
    My Blog[/url]

  • Article shown by Dave, give me a hope.

  • And even though these random string generators may be very good for what they do, none of them get you out of having to do a loop to create a random string, check your table, then possibly create another, recheck your table, etc. etc. Think about what happens as you start to cover a higher % of your available strings. Granted, that's a lot of combinations, but data does have a tendency to grow over time, and I'd hate to be the guy who walked into this in a few years. Theoretically, once you hit the 95%+ coverage range, that thing could run for minutes repeatedly trying random combinations and not finding one that fit.

    If this is just a training exercise or you want to see what you can do, then by all means, go at it. Challenges/puzzles like this are always good(although I'd suggest you coming up with your own solution before you put it on the forums so that you get the full value out of the exercise), but I'd still strongly caution you against putting this into any real world situation.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Anyone interested in Randomness , should read this

    http://www.amazon.co.uk/Drunkards-Walk-Randomness-Rules-Lives/dp/0713999225



    Clear Sky SQL
    My Blog[/url]

  • Garadin

    And even though these random string generators may be very good for what they do, none of them get you out of having to do a loop to create a random string, check your table, then possibly create another, recheck your table, etc. etc. Think about what happens as you start to cover a higher % of your available strings.

    O.K. forced to disagree with you on several points. Tested a few different methods inserting into this table structure, which diffuses your comment

    check your table, then possibly create another, recheck your table, etc. etc

    :

    CREATE TABLE [dbo].[tStrRmdm](

    [Id] [int] IDENTITY(1,1) NOT NULL,

    [BoardCd] [varchar](10) NOT NULL

    ) ON [PRIMARY]

    END

    GO

    SET ANSI_PADDING OFF

    GO

    /****** Object: Index [IX_tStrRmdm] Script Date: 11/23/2009 16:35:30 ******/

    IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[tStrRmdm]') AND name = N'IX_tStrRmdm')

    CREATE UNIQUE NONCLUSTERED INDEX [IX_tStrRmdm] ON [dbo].[tStrRmdm]

    ([BoardCd] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    And to use Lynn Pettis' function to generate the random strings:

    create function [dbo].[ufn_RandomString](

    @pStringLength int = 20

    ) returns varchar(max)

    /* Requires create view dbo.MyNewID as

    select newid() as NewIDValue;

    Run as: SELECT ufn_RandomString(10)

    Function developed by Lynn Pettis */

    as begin

    declare @RandomString varchar(max);

    with

    a1 as (select 1 as N union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1),

    a2 as (select

    1 as N

    from

    a1 as a

    cross join a1 as b),

    a3 as (select

    1 as N

    from

    a2 as a

    cross join a2 as b),

    a4 as (select

    1 as N

    from

    a3 as a

    cross join a2 as b),

    Tally as (select

    row_number() over (order by N) as N

    from

    a4)

    , cteRandomString (

    RandomString

    ) as (

    select top (@pStringLength)

    substring(x,(abs(checksum((select NewIDValue from MyNewID)))%36)+1,1)

    from

    Tally cross join (select x=''0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'') a

    )

    select @RandomString =

    replace((select

    '','' + RandomString

    from

    cteRandomString

    for xml path ('''')),'','','''');

    return (@RandomString);

    end

    Then liberally taking from Jeff Moden's work on using a Tally table:

    CREATE PROCEDURE [dbo].[Load_tStrRmdm_2]

    @Num INT = 1,

    @Size INT = 3

    AS

    --===== Build the table 100 rows at a time to "mix things up"

    --===== Taken from Jeff Moden's work on use of a Tally table

    SET NOCOUNT ON --Supress the auto-display of rowcounts for appearance/speed

    DECLARE @Counter INT

    SET @Counter = 0

    DECLARE @Stime DATETIME

    SET @Stime = GETDATE()

    DECLARE @Etime DATETIME

    WHILE @Counter < @Num * 100

    BEGIN

    --===== Add 1000 rows to the test table

    INSERT INTO dbo.tSTRrmdm

    SELECT TOP 100 --100

    BoardCD = Dbo.ufn_RandomString(@Size)

    FROM Master.dbo.SysColumns t1

    CROSS JOIN Master.dbo.SysColumns t2

    --===== Increment the counter

    SET @Counter = @Counter + 100

    END

    SET @Etime = GETDATE()

    SELECT DATEDIFF(ms,@Etime,@Stime) AS 'runtime'

    Ran the following to double check for duplicate values, and each time it returned 0 (Zero). No duplicates found:

    SELECT BoardCD FROM Dbo.tStrRmdm GROUP BY BoardCD HAVING COUNT(BoardCD) > 1

    Ran the above code with 2 different string lenths. With the following results.

    String Len 6 characters

    Attempted to add 1,000,000 rows

    Rows written to the table 975,800

    Writes rejected: 24,200

    String Len 8 characters

    Attempted to add 1,000,000 rows

    Rows written 1,000,000

    Writes rejected 0

    String Len 6 characters

    Add in batches of 100,000 rows in other words ran the procedure 11 (eleven) times

    Rows attempted 1,100,000

    Rows written to the table 1,072,100

    Rows rejected 72,100

    Based on the above results I am forced to reject your disparagement of what was proposed to the OP as a valid, useful technique.

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • Nice bit of work there bitbucket. I'd have expected more duplicates in the shorter string with it only being six characters long.

  • I'm actually a bit surprised at the high numbers of rejects... seems like for a base 36 set of 6 characters, there should only have been a "chance" of about 2176 dupes per million instead of 24,000 or 72,000. Very interesting...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I guess it depends how random the random numbers really are in this case. Would probably need to see if there is a way to capture the distribution. Unfortunately, I don't have the time to do that at the moment. I have several other pressing items on my plate for outside work at this time.

  • bitbucket-25253 (11/23/2009)


    O.K. forced to disagree with you on several points. Tested a few different methods inserting into this table structure, which diffuses your comment

    Always happy to be disagreed with! :hehe:

    I re-evaluated and I partially agree with you. I still believe my objections to be valid ones, but they do require some clarification. I mentioned the amount of possible looping required as you approached a high % of coverage, which your situations fail to disprove. What I didn't mention(or calculate) is how many combinations were actually possible with his pattern (A-Z0-9). Assuming I'm calculating this properly, that would give him 36^6 combinations, or 2,176,782,336. Testing with a million rows doesn't really come close to putting a dent in the coverage % I was referring to.

    However, at 2.1 billion possibilities, you're probably right that it's safe to assume you won't have to loop repeatedly to get a valid value very often if you're looking at a dataset in the million row range... but you still have to *have* a loop there to cover the situations where you do.

    It is also important that anyone who sees this and wants to do the same with... say numbers only (10^6 = only 1M rows), or a 3 character sequence (36^3 = 46656 combinations) etc. be aware of how bad coming close to filling your available combinations could be.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Jeff Moden (11/23/2009)


    I'm actually a bit surprised at the high numbers of rejects... seems like for a base 36 set of 6 characters, there should only have been a "chance" of about 2176 dupes per million instead of 24,000 or 72,000. Very interesting...

    I did do some testing on this a while back with my script ..

    To Quote myself from here

    http://www.sqlservercentral.com/Forums/FindPost750890.aspx

    checksumming the newid value is pretty safe due to the size (in terms of bytes) or the uniqueidentifier type. You could reverse engineer 2 ids to have the same checksum , but after abs(checksum) 1 million newid()'s i have 217 collisions. I think generating an 8 character random string with this method is as random as any method can be. Having populated 2million rows with my test script , i only have 1 duplicate string. I think that is is 'fit for purpose'.

    BTW Sorted my blog entry 😉



    Clear Sky SQL
    My Blog[/url]

  • To further clarify / agree with Bit, I'm reminded by my more math inclined buddies that even at 95% coverage, on average you're still only going to be doing 20 checks, so again, probably not the horrible picture I painted in the beginning.

    All in all, it's probably not as scary as I was making it out to be. You'll get some repeats, you'll get some extra work, but it isn't likely to bring your system to its knees until you get extremely close to full on your combinations or you hit an especially bad deviation. A lot still depends on the usage you put it to though. It could still create a bottleneck.

    I mainly object to this in the same way I object to say... using a cursor in a trigger. Sure, they work, but there's almost always a better way to do it and because the OP isn't willing to share his reasons for doing so, I tend to assume they aren't valid.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Garadin

    Perhaps, no, I was off base, having only experimented with 6 and 8 characters in the random string. For grins and giggles and to help some other OP who might misinterpret my post where I quoted you. I decided to run some additional tests this time for a shorter length random string and for those the procedure is, to say the least, not good.

    All tests attempted to write 1,000,000 rows to table

    String Len 3 characters

    Rows written to the table 3,500

    Errors begin after 2,322 rows written

    a 3 character sequence (36^3 = 46656 combinations)

    Percentage of valid attempts (SELECT (3500./46656.) * 100.) = 7.501700

    Now the question becomes, why so few (less than 10 percent) of all possible combinations, or how many rows must we attempt to write to even come close to 46,656 valid entries?

    Garadin - are you sure of the formula for the number of possible combinations?

    String len 4 characters

    Rows written to the table 68,600

    Errors begin after 18,717 rows written

    String Len 5 characters

    Rows written to the table 591,200

    Errors begin after rows written - not checked

    String Len 10 characters

    Rows written to the table 1,000,000

    Errors - none

    So it all boils down to old SQL adage "It depends". But this can be said in a generalization: "The shorter the length of the random character string the more difficult it will become to generate unique values"

    For Lynn Pettis - a little more information about the characteristics of your function.

    If anyone wants additional testing to develop whatever, please advise, and I will attempt to find time to do so.

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • A couple facts about this setup:

    You can never rely on 2 random numbers to be unique, no matter how big the pool. Even in a pool of 40 billion numbers, you could generate the same string twice in a row... it's just a matter of percentages.

    Each successful row that you write increases the probability that the next one will fail if you must maintain uniqueness in the dataset.

    The closer you get to filling your entire pool the more attempts(on average) will have to be made to generate each number to fill in those slots.

    As far as the formula, yes, I'm sure, I had to do some rapid scrambling to check earlier, but it's correct. To figure out number of permutations with repeats it is n^r, with N being the number of possible choices per repetition(36) and r being the number of repetitions(6). If you didn't want to allow repeats it'd be n!/(n-r!).

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • bitbucket-25253 (11/23/2009)


    String len 4 characters

    Rows written to the table 68,600

    Errors begin after 18,717 rows written

    String Len 5 characters

    Rows written to the table 591,200

    Errors begin after rows written - not checked

    String Len 10 characters

    Rows written to the table 1,000,000

    Errors - none

    So it all boils down to old SQL adage "It depends". But this can be said in a generalization: "The shorter the length of the random character string the more difficult it will become to generate unique values"

    For Lynn Pettis - a little more information about the characteristics of your function.

    If anyone wants additional testing to develop whatever, please advise, and I will attempt to find time to do so.

    For anything below 4 - you'd be better off generating the entire cartesian product, and picking however many you need at random from the list. Even 5 is only 60 million rows, so that might be arguable.

    The 5 character test seems to crank out 1M within a minute. One pass.

    declare @allchars varchar(100)

    set @ALLCHARS='ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'

    ;WITH ALLOWEDCHARSCTE AS

    (

    select substring(@ALLCHARS,n,1) let

    from tally

    where n between 1 and 36

    ),

    RandomCTE as (

    select a1.let+a2.let+a3.let+a4.let+a5.let RandomStr,

    newid() rn

    from

    ALLOWEDCHARSCTE a1 cross join

    ALLOWEDCHARSCTE a2 cross join

    ALLOWEDCHARSCTE a3 cross join

    ALLOWEDCHARSCTE a4 cross join

    ALLOWEDCHARSCTE a5

    )

    select top (1000000) RandomStr

    from RandomCTE

    order by RN

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Sharul Nizam (11/21/2009)


    My statement as follow,

    create table tStrRndm

    (

    BoardCd varchar(6) -- this is a unique

    )

    DECLARE @myString AS varchar(50)

    SET @myString= '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'

    DECLARE @gStrRndm as varchar(6)

    1. I'm looking a solution to build a T-SQL to generate random string from @myString?

    2. Random string is a 6 character

    3. Let's say, random string generated is 7MW72X

    4. If random string = 7MW72X exist in a tStrRndm, then SQL re-generate random string --- i believe this in a loop

    5. If random string = 7MW72X not exist in a tStrRndm, random string insert into tStrRndm and exit loop. As a result @gStrRndm=random string

    Looking for help to built the T-SQL for above scenario

    After several days, me develop as below as above requirement,

    CREATE TABLE [dbo].[tIDTest](

    [NewIDVal] [char](6)

    )

    declare @help char(6)

    declare @rowcount int

    set @rowcount=0

    while @rowcount=0

    begin

    select @help= right(newid(),6)

    insert tIDTest select @help

    where not exists(select * from tIDTest where NewIDVal = @help)

    select @rowcount=@@rowcount

    end

    It's work.

    Any comment, welcome. tq

Viewing 15 posts - 16 through 30 (of 33 total)

You must be logged in to reply to this topic. Login to reply