November 23, 2009 at 6:08 am
It's on my to-do list
November 23, 2009 at 6:59 am
Dave Ballantyne (11/22/2009)
Here's another method (requires 2005+)
Article shown by Dave, give me a hope.
November 23, 2009 at 8:14 am
And even though these random string generators may be very good for what they do, none of them get you out of having to do a loop to create a random string, check your table, then possibly create another, recheck your table, etc. etc. Think about what happens as you start to cover a higher % of your available strings. Granted, that's a lot of combinations, but data does have a tendency to grow over time, and I'd hate to be the guy who walked into this in a few years. Theoretically, once you hit the 95%+ coverage range, that thing could run for minutes repeatedly trying random combinations and not finding one that fit.
If this is just a training exercise or you want to see what you can do, then by all means, go at it. Challenges/puzzles like this are always good(although I'd suggest you coming up with your own solution before you put it on the forums so that you get the full value out of the exercise), but I'd still strongly caution you against putting this into any real world situation.
November 23, 2009 at 8:58 am
Anyone interested in Randomness , should read this
http://www.amazon.co.uk/Drunkards-Walk-Randomness-Rules-Lives/dp/0713999225
November 23, 2009 at 3:07 pm
Garadin
And even though these random string generators may be very good for what they do, none of them get you out of having to do a loop to create a random string, check your table, then possibly create another, recheck your table, etc. etc. Think about what happens as you start to cover a higher % of your available strings.
O.K. forced to disagree with you on several points. Tested a few different methods inserting into this table structure, which diffuses your comment
check your table, then possibly create another, recheck your table, etc. etc
:
CREATE TABLE [dbo].[tStrRmdm](
[Id] [int] IDENTITY(1,1) NOT NULL,
[BoardCd] [varchar](10) NOT NULL
) ON [PRIMARY]
END
GO
SET ANSI_PADDING OFF
GO
/****** Object: Index [IX_tStrRmdm] Script Date: 11/23/2009 16:35:30 ******/
IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[tStrRmdm]') AND name = N'IX_tStrRmdm')
CREATE UNIQUE NONCLUSTERED INDEX [IX_tStrRmdm] ON [dbo].[tStrRmdm]
([BoardCd] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
And to use Lynn Pettis' function to generate the random strings:
create function [dbo].[ufn_RandomString](
@pStringLength int = 20
) returns varchar(max)
/* Requires create view dbo.MyNewID as
select newid() as NewIDValue;
Run as: SELECT ufn_RandomString(10)
Function developed by Lynn Pettis */
as begin
declare @RandomString varchar(max);
with
a1 as (select 1 as N union all
select 1 union all
select 1 union all
select 1 union all
select 1 union all
select 1 union all
select 1 union all
select 1 union all
select 1 union all
select 1),
a2 as (select
1 as N
from
a1 as a
cross join a1 as b),
a3 as (select
1 as N
from
a2 as a
cross join a2 as b),
a4 as (select
1 as N
from
a3 as a
cross join a2 as b),
Tally as (select
row_number() over (order by N) as N
from
a4)
, cteRandomString (
RandomString
) as (
select top (@pStringLength)
substring(x,(abs(checksum((select NewIDValue from MyNewID)))%36)+1,1)
from
Tally cross join (select x=''0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'') a
)
select @RandomString =
replace((select
'','' + RandomString
from
cteRandomString
for xml path ('''')),'','','''');
return (@RandomString);
end
Then liberally taking from Jeff Moden's work on using a Tally table:
CREATE PROCEDURE [dbo].[Load_tStrRmdm_2]
@Num INT = 1,
@Size INT = 3
AS
--===== Build the table 100 rows at a time to "mix things up"
--===== Taken from Jeff Moden's work on use of a Tally table
SET NOCOUNT ON --Supress the auto-display of rowcounts for appearance/speed
DECLARE @Counter INT
SET @Counter = 0
DECLARE @Stime DATETIME
SET @Stime = GETDATE()
DECLARE @Etime DATETIME
WHILE @Counter < @Num * 100
BEGIN
--===== Add 1000 rows to the test table
INSERT INTO dbo.tSTRrmdm
SELECT TOP 100 --100
BoardCD = Dbo.ufn_RandomString(@Size)
FROM Master.dbo.SysColumns t1
CROSS JOIN Master.dbo.SysColumns t2
--===== Increment the counter
SET @Counter = @Counter + 100
END
SET @Etime = GETDATE()
SELECT DATEDIFF(ms,@Etime,@Stime) AS 'runtime'
Ran the following to double check for duplicate values, and each time it returned 0 (Zero). No duplicates found:
SELECT BoardCD FROM Dbo.tStrRmdm GROUP BY BoardCD HAVING COUNT(BoardCD) > 1
Ran the above code with 2 different string lenths. With the following results.
String Len 6 characters
Attempted to add 1,000,000 rows
Rows written to the table 975,800
Writes rejected: 24,200
String Len 8 characters
Attempted to add 1,000,000 rows
Rows written 1,000,000
Writes rejected 0
String Len 6 characters
Add in batches of 100,000 rows in other words ran the procedure 11 (eleven) times
Rows attempted 1,100,000
Rows written to the table 1,072,100
Rows rejected 72,100
Based on the above results I am forced to reject your disparagement of what was proposed to the OP as a valid, useful technique.
November 23, 2009 at 3:28 pm
Nice bit of work there bitbucket. I'd have expected more duplicates in the shorter string with it only being six characters long.
November 23, 2009 at 3:44 pm
I'm actually a bit surprised at the high numbers of rejects... seems like for a base 36 set of 6 characters, there should only have been a "chance" of about 2176 dupes per million instead of 24,000 or 72,000. Very interesting...
--Jeff Moden
Change is inevitable... Change for the better is not.
November 23, 2009 at 3:49 pm
I guess it depends how random the random numbers really are in this case. Would probably need to see if there is a way to capture the distribution. Unfortunately, I don't have the time to do that at the moment. I have several other pressing items on my plate for outside work at this time.
November 23, 2009 at 3:57 pm
bitbucket-25253 (11/23/2009)
O.K. forced to disagree with you on several points. Tested a few different methods inserting into this table structure, which diffuses your comment
Always happy to be disagreed with! :hehe:
I re-evaluated and I partially agree with you. I still believe my objections to be valid ones, but they do require some clarification. I mentioned the amount of possible looping required as you approached a high % of coverage, which your situations fail to disprove. What I didn't mention(or calculate) is how many combinations were actually possible with his pattern (A-Z0-9). Assuming I'm calculating this properly, that would give him 36^6 combinations, or 2,176,782,336. Testing with a million rows doesn't really come close to putting a dent in the coverage % I was referring to.
However, at 2.1 billion possibilities, you're probably right that it's safe to assume you won't have to loop repeatedly to get a valid value very often if you're looking at a dataset in the million row range... but you still have to *have* a loop there to cover the situations where you do.
It is also important that anyone who sees this and wants to do the same with... say numbers only (10^6 = only 1M rows), or a 3 character sequence (36^3 = 46656 combinations) etc. be aware of how bad coming close to filling your available combinations could be.
November 23, 2009 at 4:13 pm
Jeff Moden (11/23/2009)
I'm actually a bit surprised at the high numbers of rejects... seems like for a base 36 set of 6 characters, there should only have been a "chance" of about 2176 dupes per million instead of 24,000 or 72,000. Very interesting...
I did do some testing on this a while back with my script ..
To Quote myself from here
http://www.sqlservercentral.com/Forums/FindPost750890.aspx
checksumming the newid value is pretty safe due to the size (in terms of bytes) or the uniqueidentifier type. You could reverse engineer 2 ids to have the same checksum , but after abs(checksum) 1 million newid()'s i have 217 collisions. I think generating an 8 character random string with this method is as random as any method can be. Having populated 2million rows with my test script , i only have 1 duplicate string. I think that is is 'fit for purpose'.
BTW Sorted my blog entry
November 23, 2009 at 4:14 pm
To further clarify / agree with Bit, I'm reminded by my more math inclined buddies that even at 95% coverage, on average you're still only going to be doing 20 checks, so again, probably not the horrible picture I painted in the beginning.
All in all, it's probably not as scary as I was making it out to be. You'll get some repeats, you'll get some extra work, but it isn't likely to bring your system to its knees until you get extremely close to full on your combinations or you hit an especially bad deviation. A lot still depends on the usage you put it to though. It could still create a bottleneck.
I mainly object to this in the same way I object to say... using a cursor in a trigger. Sure, they work, but there's almost always a better way to do it and because the OP isn't willing to share his reasons for doing so, I tend to assume they aren't valid.
November 23, 2009 at 6:35 pm
Garadin
Perhaps, no, I was off base, having only experimented with 6 and 8 characters in the random string. For grins and giggles and to help some other OP who might misinterpret my post where I quoted you. I decided to run some additional tests this time for a shorter length random string and for those the procedure is, to say the least, not good.
All tests attempted to write 1,000,000 rows to table
String Len 3 characters
Rows written to the table 3,500
Errors begin after 2,322 rows written
a 3 character sequence (36^3 = 46656 combinations)
Percentage of valid attempts (SELECT (3500./46656.) * 100.) = 7.501700
Now the question becomes, why so few (less than 10 percent) of all possible combinations, or how many rows must we attempt to write to even come close to 46,656 valid entries?
Garadin - are you sure of the formula for the number of possible combinations?
String len 4 characters
Rows written to the table 68,600
Errors begin after 18,717 rows written
String Len 5 characters
Rows written to the table 591,200
Errors begin after rows written - not checked
String Len 10 characters
Rows written to the table 1,000,000
Errors - none
So it all boils down to old SQL adage "It depends". But this can be said in a generalization: "The shorter the length of the random character string the more difficult it will become to generate unique values"
For Lynn Pettis - a little more information about the characteristics of your function.
If anyone wants additional testing to develop whatever, please advise, and I will attempt to find time to do so.
November 23, 2009 at 11:01 pm
A couple facts about this setup:
You can never rely on 2 random numbers to be unique, no matter how big the pool. Even in a pool of 40 billion numbers, you could generate the same string twice in a row... it's just a matter of percentages.
Each successful row that you write increases the probability that the next one will fail if you must maintain uniqueness in the dataset.
The closer you get to filling your entire pool the more attempts(on average) will have to be made to generate each number to fill in those slots.
As far as the formula, yes, I'm sure, I had to do some rapid scrambling to check earlier, but it's correct. To figure out number of permutations with repeats it is n^r, with N being the number of possible choices per repetition(36) and r being the number of repetitions(6). If you didn't want to allow repeats it'd be n!/(n-r!).
November 24, 2009 at 8:33 am
bitbucket-25253 (11/23/2009)
String len 4 characters
Rows written to the table 68,600
Errors begin after 18,717 rows written
String Len 5 characters
Rows written to the table 591,200
Errors begin after rows written - not checked
String Len 10 characters
Rows written to the table 1,000,000
Errors - none
So it all boils down to old SQL adage "It depends". But this can be said in a generalization: "The shorter the length of the random character string the more difficult it will become to generate unique values"
For Lynn Pettis - a little more information about the characteristics of your function.
If anyone wants additional testing to develop whatever, please advise, and I will attempt to find time to do so.
For anything below 4 - you'd be better off generating the entire cartesian product, and picking however many you need at random from the list. Even 5 is only 60 million rows, so that might be arguable.
The 5 character test seems to crank out 1M within a minute. One pass.
declare @allchars varchar(100)
set @ALLCHARS='ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
;WITH ALLOWEDCHARSCTE AS
(
select substring(@ALLCHARS,n,1) let
from tally
where n between 1 and 36
),
RandomCTE as (
select a1.let+a2.let+a3.let+a4.let+a5.let RandomStr,
newid() rn
from
ALLOWEDCHARSCTE a1 cross join
ALLOWEDCHARSCTE a2 cross join
ALLOWEDCHARSCTE a3 cross join
ALLOWEDCHARSCTE a4 cross join
ALLOWEDCHARSCTE a5
)
select top (1000000) RandomStr
from RandomCTE
order by RN
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
November 24, 2009 at 8:40 am
Sharul Nizam (11/21/2009)
My statement as follow,create table tStrRndm
(
BoardCd varchar(6) -- this is a unique
)
DECLARE @myString AS varchar(50)
SET @myString= '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'
DECLARE @gStrRndm as varchar(6)
1. I'm looking a solution to build a T-SQL to generate random string from @myString?
2. Random string is a 6 character
3. Let's say, random string generated is 7MW72X
4. If random string = 7MW72X exist in a tStrRndm, then SQL re-generate random string --- i believe this in a loop
5. If random string = 7MW72X not exist in a tStrRndm, random string insert into tStrRndm and exit loop. As a result @gStrRndm=random string
Looking for help to built the T-SQL for above scenario
After several days, me develop as below as above requirement,
CREATE TABLE [dbo].[tIDTest](
[NewIDVal] [char](6)
)
declare @help char(6)
declare @rowcount int
set @rowcount=0
while @rowcount=0
begin
select @help= right(newid(),6)
insert tIDTest select @help
where not exists(select * from tIDTest where NewIDVal = @help)
select @rowcount=@@rowcount
end
It's work.
Any comment, welcome. tq
Viewing 15 posts - 16 through 30 (of 33 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy