How using SQL to generate random string?

Question

Post reply

How using SQL to generate random string?

Viewing 15 posts - 16 through 30 (of 33 total)

You must be logged in to reply to this topic. Login to reply

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 1

Dave Ballantyne

SSC-Dedicated

Points: 33667

November 23, 2009 at 6:08 am

#1082899

It's on my to-do list

Clear Sky SQL
My Blog[/url]

Sharul Nizam Ten Centuries Points: 1302 More actions · Answer 2

Dave Ballantyne (11/22/2009)
Here's another method (requires 2005+)
http://sqlblogcasts.com/blogs/sqlandthelike/archive/2009/08/27/a-faster-tsql-random-length-random-string-generator.aspx

Article shown by Dave, give me a hope.

Garadin One Orange Chip Points: 29613 More actions · Answer 3

And even though these random string generators may be very good for what they do, none of them get you out of having to do a loop to create a random string, check your table, then possibly create another, recheck your table, etc. etc. Think about what happens as you start to cover a higher % of your available strings. Granted, that's a lot of combinations, but data does have a tendency to grow over time, and I'd hate to be the guy who walked into this in a few years. Theoretically, once you hit the 95%+ coverage range, that thing could run for minutes repeatedly trying random combinations and not finding one that fit.

If this is just a training exercise or you want to see what you can do, then by all means, go at it. Challenges/puzzles like this are always good(although I'd suggest you coming up with your own solution before you put it on the forums so that you get the full value out of the exercise), but I'd still strongly caution you against putting this into any real world situation.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup.

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 4

Anyone interested in Randomness , should read this

http://www.amazon.co.uk/Drunkards-Walk-Randomness-Rules-Lives/dp/0713999225

Clear Sky SQL
My Blog[/url]

Ron McCullough SSC Guru Points: 63877 More actions · Answer 5

Garadin
And even though these random string generators may be very good for what they do, none of them get you out of having to do a loop to create a random string, check your table, then possibly create another, recheck your table, etc. etc. Think about what happens as you start to cover a higher % of your available strings.

O.K. forced to disagree with you on several points. Tested a few different methods inserting into this table structure, which diffuses your comment

check your table, then possibly create another, recheck your table, etc. etc

:

CREATE TABLE [dbo].[tStrRmdm](

[Id] [int] IDENTITY(1,1) NOT NULL,

[BoardCd] [varchar](10) NOT NULL

) ON [PRIMARY]

END

GO

SET ANSI_PADDING OFF

GO

/****** Object: Index [IX_tStrRmdm] Script Date: 11/23/2009 16:35:30 ******/

IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[tStrRmdm]') AND name = N'IX_tStrRmdm')

CREATE UNIQUE NONCLUSTERED INDEX [IX_tStrRmdm] ON [dbo].[tStrRmdm]

([BoardCd] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

And to use Lynn Pettis' function to generate the random strings:

create function [dbo].[ufn_RandomString](

@pStringLength int = 20

) returns varchar(max)

/* Requires create view dbo.MyNewID as

select newid() as NewIDValue;

Run as: SELECT ufn_RandomString(10)

Function developed by Lynn Pettis */

as begin

declare @RandomString varchar(max);

with

a1 as (select 1 as N union all

select 1 union all

select 1),

a2 as (select

1 as N

from

a1 as a

cross join a1 as b),

a3 as (select

1 as N

from

a2 as a

cross join a2 as b),

a4 as (select

1 as N

from

a3 as a

cross join a2 as b),

Tally as (select

row_number() over (order by N) as N

from

a4)

, cteRandomString (

RandomString

) as (

select top (@pStringLength)

substring(x,(abs(checksum((select NewIDValue from MyNewID)))%36)+1,1)

from

Tally cross join (select x=''0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'') a

)

select @RandomString =

replace((select

'','' + RandomString

from

cteRandomString

for xml path ('''')),'','','''');

return (@RandomString);

end

Then liberally taking from Jeff Moden's work on using a Tally table:

CREATE PROCEDURE [dbo].[Load_tStrRmdm_2]

@Num INT = 1,

@Size INT = 3

AS

--===== Build the table 100 rows at a time to "mix things up"

--===== Taken from Jeff Moden's work on use of a Tally table

SET NOCOUNT ON --Supress the auto-display of rowcounts for appearance/speed

DECLARE @Counter INT

SET @Counter = 0

DECLARE @Stime DATETIME

SET @Stime = GETDATE()

DECLARE @Etime DATETIME

WHILE @Counter < @Num * 100

BEGIN

--===== Add 1000 rows to the test table

INSERT INTO dbo.tSTRrmdm

SELECT TOP 100 --100

BoardCD = Dbo.ufn_RandomString(@Size)

FROM Master.dbo.SysColumns t1

CROSS JOIN Master.dbo.SysColumns t2

--===== Increment the counter

SET @Counter = @Counter + 100

END

SET @Etime = GETDATE()

SELECT DATEDIFF(ms,@Etime,@Stime) AS 'runtime'

Ran the following to double check for duplicate values, and each time it returned 0 (Zero). No duplicates found:

SELECT BoardCD FROM Dbo.tStrRmdm GROUP BY BoardCD HAVING COUNT(BoardCD) > 1

Ran the above code with 2 different string lenths. With the following results.

String Len 6 characters

Attempted to add 1,000,000 rows

Rows written to the table 975,800

Writes rejected: 24,200

String Len 8 characters

Attempted to add 1,000,000 rows

Rows written 1,000,000

Writes rejected 0

String Len 6 characters

Add in batches of 100,000 rows in other words ran the procedure 11 (eleven) times

Rows attempted 1,100,000

Rows written to the table 1,072,100

Rows rejected 72,100

Based on the above results I am forced to reject your disparagement of what was proposed to the OP as a valid, useful technique.

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 6

Nice bit of work there bitbucket. I'd have expected more duplicates in the shorter string with it only being six characters long.

Jeff Moden SSC Guru Points: 1004601 More actions Follow · Answer 7

I'm actually a bit surprised at the high numbers of rejects... seems like for a base 36 set of 6 characters, there should only have been a "chance" of about 2176 dupes per million instead of 24,000 or 72,000. Very interesting...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 8

I guess it depends how random the random numbers really are in this case. Would probably need to see if there is a way to capture the distribution. Unfortunately, I don't have the time to do that at the moment. I have several other pressing items on my plate for outside work at this time.

Garadin One Orange Chip Points: 29613 More actions · Answer 9

bitbucket-25253 (11/23/2009)
O.K. forced to disagree with you on several points. Tested a few different methods inserting into this table structure, which diffuses your comment

Always happy to be disagreed with! :hehe:

I re-evaluated and I partially agree with you. I still believe my objections to be valid ones, but they do require some clarification. I mentioned the amount of possible looping required as you approached a high % of coverage, which your situations fail to disprove. What I didn't mention(or calculate) is how many combinations were actually possible with his pattern (A-Z0-9). Assuming I'm calculating this properly, that would give him 36^6 combinations, or 2,176,782,336. Testing with a million rows doesn't really come close to putting a dent in the coverage % I was referring to.

However, at 2.1 billion possibilities, you're probably right that it's safe to assume you won't have to loop repeatedly to get a valid value very often if you're looking at a dataset in the million row range... but you still have to *have* a loop there to cover the situations where you do.

It is also important that anyone who sees this and wants to do the same with... say numbers only (10^6 = only 1M rows), or a 3 character sequence (36^3 = 46656 combinations) etc. be aware of how bad coming close to filling your available combinations could be.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup.

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 10

Jeff Moden (11/23/2009)
I'm actually a bit surprised at the high numbers of rejects... seems like for a base 36 set of 6 characters, there should only have been a "chance" of about 2176 dupes per million instead of 24,000 or 72,000. Very interesting...

I did do some testing on this a while back with my script ..

To Quote myself from here

http://www.sqlservercentral.com/Forums/FindPost750890.aspx

checksumming the newid value is pretty safe due to the size (in terms of bytes) or the uniqueidentifier type. You could reverse engineer 2 ids to have the same checksum , but after abs(checksum) 1 million newid()'s i have 217 collisions. I think generating an 8 character random string with this method is as random as any method can be. Having populated 2million rows with my test script , i only have 1 duplicate string. I think that is is 'fit for purpose'.

BTW Sorted my blog entry

Clear Sky SQL
My Blog[/url]

Garadin One Orange Chip Points: 29613 More actions · Answer 11

To further clarify / agree with Bit, I'm reminded by my more math inclined buddies that even at 95% coverage, on average you're still only going to be doing 20 checks, so again, probably not the horrible picture I painted in the beginning.

All in all, it's probably not as scary as I was making it out to be. You'll get some repeats, you'll get some extra work, but it isn't likely to bring your system to its knees until you get extremely close to full on your combinations or you hit an especially bad deviation. A lot still depends on the usage you put it to though. It could still create a bottleneck.

I mainly object to this in the same way I object to say... using a cursor in a trigger. Sure, they work, but there's almost always a better way to do it and because the OP isn't willing to share his reasons for doing so, I tend to assume they aren't valid.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup.

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Ron McCullough SSC Guru Points: 63877 More actions · Answer 12

Garadin

Perhaps, no, I was off base, having only experimented with 6 and 8 characters in the random string. For grins and giggles and to help some other OP who might misinterpret my post where I quoted you. I decided to run some additional tests this time for a shorter length random string and for those the procedure is, to say the least, not good.

All tests attempted to write 1,000,000 rows to table

String Len 3 characters

Rows written to the table 3,500

Errors begin after 2,322 rows written

a 3 character sequence (36^3 = 46656 combinations)

Percentage of valid attempts (SELECT (3500./46656.) * 100.) = 7.501700

Now the question becomes, why so few (less than 10 percent) of all possible combinations, or how many rows must we attempt to write to even come close to 46,656 valid entries?

Garadin - are you sure of the formula for the number of possible combinations?

String len 4 characters

Rows written to the table 68,600

Errors begin after 18,717 rows written

String Len 5 characters

Rows written to the table 591,200

Errors begin after rows written - not checked

String Len 10 characters

Rows written to the table 1,000,000

Errors - none

So it all boils down to old SQL adage "It depends". But this can be said in a generalization: "The shorter the length of the random character string the more difficult it will become to generate unique values"

For Lynn Pettis - a little more information about the characteristics of your function.

If anyone wants additional testing to develop whatever, please advise, and I will attempt to find time to do so.

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

Garadin One Orange Chip Points: 29613 More actions · Answer 13

A couple facts about this setup:

You can never rely on 2 random numbers to be unique, no matter how big the pool. Even in a pool of 40 billion numbers, you could generate the same string twice in a row... it's just a matter of percentages.

Each successful row that you write increases the probability that the next one will fail if you must maintain uniqueness in the dataset.

The closer you get to filling your entire pool the more attempts(on average) will have to be made to generate each number to fill in those slots.

As far as the formula, yes, I'm sure, I had to do some rapid scrambling to check earlier, but it's correct. To figure out number of permutations with repeats it is n^r, with N being the number of possible choices per repetition(36) and r being the number of repetitions(6). If you didn't want to allow repeats it'd be n!/(n-r!).

Seth Phelabaum

Consistency is only a virtue if you're not a screwup.

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 14

bitbucket-25253 (11/23/2009)
String len 4 characters
Rows written to the table 68,600
Errors begin after 18,717 rows written
String Len 5 characters
Rows written to the table 591,200
Errors begin after rows written - not checked
String Len 10 characters
Rows written to the table 1,000,000
Errors - none
So it all boils down to old SQL adage "It depends". But this can be said in a generalization: "The shorter the length of the random character string the more difficult it will become to generate unique values"
For Lynn Pettis - a little more information about the characteristics of your function.
If anyone wants additional testing to develop whatever, please advise, and I will attempt to find time to do so.

For anything below 4 - you'd be better off generating the entire cartesian product, and picking however many you need at random from the list. Even 5 is only 60 million rows, so that might be arguable.

The 5 character test seems to crank out 1M within a minute. One pass.

declare @allchars varchar(100)

set @ALLCHARS='ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'

;WITH ALLOWEDCHARSCTE AS

(

select substring(@ALLCHARS,n,1) let

from tally

where n between 1 and 36

),

RandomCTE as (

select a1.let+a2.let+a3.let+a4.let+a5.let RandomStr,

newid() rn

from

ALLOWEDCHARSCTE a1 cross join

ALLOWEDCHARSCTE a2 cross join

ALLOWEDCHARSCTE a3 cross join

ALLOWEDCHARSCTE a4 cross join

ALLOWEDCHARSCTE a5

)

select top (1000000) RandomStr

from RandomCTE

order by RN

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Sharul Nizam Ten Centuries Points: 1302 More actions · Answer 15

Sharul Nizam (11/21/2009)
My statement as follow,
create table tStrRndm
(
BoardCd varchar(6) -- this is a unique
)
DECLARE @myString AS varchar(50)
SET @myString= '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'
DECLARE @gStrRndm as varchar(6)
1. I'm looking a solution to build a T-SQL to generate random string from @myString?
2. Random string is a 6 character
3. Let's say, random string generated is 7MW72X
4. If random string = 7MW72X exist in a tStrRndm, then SQL re-generate random string --- i believe this in a loop
5. If random string = 7MW72X not exist in a tStrRndm, random string insert into tStrRndm and exit loop. As a result @gStrRndm=random string
Looking for help to built the T-SQL for above scenario

After several days, me develop as below as above requirement,

CREATE TABLE [dbo].[tIDTest](

[NewIDVal] [char](6)

)

declare @help char(6)

declare @rowcount int

set @rowcount=0

while @rowcount=0

begin

select @help= right(newid(),6)

insert tIDTest select @help

where not exists(select * from tIDTest where NewIDVal = @help)

select @rowcount=@@rowcount

end

It's work.

Any comment, welcome. tq

How using SQL to generate random string?

Cookies on SQLServerCentral