Finding Unique Non-Repeating Random Numbers

  • Comments posted to this topic are about the item Finding Unique Non-Repeating Random Numbers

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Regarding Method 1:

    Running this code against a table with 55,000 records took me 4 minutes (longer if you have more columns in your table).

    I currently have a table with 17 columns, containing 278,998 records. The "key" field (used for URLs) is a varchar(8) filled with random alphanumeric strings.

    Creating a new unique key, using the loop method, takes 0.0620035 seconds according to Management Studio.

    Why would the amount of columns have anything to do with it?

    (Did you create an Index on the TaxID column?)

    Bye

    A.

  • I know this is a much more 'techie' type of solution, but for an alternative to your methods, have a look at this:

    Linear Feedback Shift Registers

    LFSRs are basically very simple, very fast pseudo-random number generators that generate a set of unique, non-repeating (but cyclic) numbers.

    This is possibly not a solution for implementing in T-SQL, but if you're able to use C# (or maybe even C++) assemblies, it might just work for you.

    Alex.

  • Great article! I found it as I was looking for a solution to a slightly different problem. Creating a table seating chart for 70 people who are seated seven to a table, there are 10 tables. The goal is to have everyone meet each other at the end of ten rounds, each round is seven minutes.

    So I am looking to show a matrix of numbers in column row format from 1 to 70. 7 rows deep, 10 columns wide. The seven rows represent a 'table' of people, each person represented by a number.

    The first row of numbers one to seven are the seven people sitting at that table. 8 - 14 the people at the second table, etc up to the last table - numbers 64 -70. This is were they are start.

    From there each iteration, or next round would represent shifting the numbers to show a new set of rows with unique numbers, but different from the previous set and so forth till every number has been matched with another. So number one would have been matched with 2 - 70, and two would have been matched with one - 70, etc.

    I have been looking for scripts that already exisit for this, no luck yet. Any help would be greatly appreciated and of course the solution credited to the author. 🙂 HELP

    Using SQL to solve this problem is vexing me to no end. I am an intermediate SQL user.

  • Great post. Good example of use of random numbers.

    Rand versus NewID has been discussed. Rand only runs once per query, NewID is like other functions, but requires conversion and is not proven to have good distributions and seems a bit slow.

    An alternative is to union a bunch of "select Rand()" statments to create the list of unique random values.

    Select top 10 from

    (

    select cast(rand()*100000 as int) as rnd union

    select cast(rand()*100000 as int) as rnd union

    ...

    select cast(rand()*100000 as int) as rnd union

    select cast(rand()*100000 as int) as rnd

    ) as t

    Union All if you want non-unique numbers

    it may be a long sql query, but its very fast

    BTW if the multiple can easily be a count(*) from some other table, instead of a constant.

  • There is a terminolgy issue here. The numbers in this sequence are not actually random as we could predict the next number in the sequence by looking at the previous values, the accuracy of this prediction would get better as we went along.

    For example, if there were 9 numbers in the range and we specified that they would not repeat then we might get a sequence

    3,2,5,4,7,8,1,6

    The next number would have to be 9 as it's the only number left in the sequence.

    I'm not sure if what you described would clasifiy as pseudo random but I know that the Linear Feedback Shift Register technique definately is.

    However, with regards to generating test data both these techniques are great and it's a well written article so many thanks.

    Other functions to play with with regards to generating test data are the HashBytes function and CHECKSUM function which both have the useful property of being able to take a string and generate a unique value from it.

    So for example Checksum('John Doe') would always return the same value.

    See also:

    http://msdn.microsoft.com/en-us/library/aa175776(SQL.80).aspx

  • wppatton (8/12/2010)


    I have been looking for scripts that already exisit for this, no luck yet. Any help would be greatly appreciated and of course the solution credited to the author. 🙂 HELP

    Have you posted this problem to the T-SQL Forums yet? There are several people lurking there who love assisting with problems that might not be reading the this particular thread.

    Just an FYI: It would help for you to post what you've already come up with in terms of table structure, data, and the actual T-SQL code. People around here help with problems, they don't solve the entire problem for you. If you can show you've been making an effort, that will get you a lot more responses than if you just post the problem and wait for someone else's solution.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Alexander-1013105 (8/12/2010)


    LFSRs are basically very simple, very fast pseudo-random number generators that generate a set of unique, non-repeating (but cyclic) numbers.

    This is possibly not a solution for implementing in T-SQL, but if you're able to use C# (or maybe even C++) assemblies, it might just work for you.

    It's definitely something to keep in mind, but for my solution, I needed T-SQL code since I'm restoring backups to dev environments and "washing" the numbers before letting the developers & testers at the back end data.

    Thanks for the link, though. I like hearing about new stuff.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • alexander.schaaf (8/12/2010)


    Why would the amount of columns have anything to do with it?

    (Did you create an Index on the TaxID column?)

    Answering the questions backwards:

    2) No, I did not create an Index on TaxID. I can't because it belongs to a table that already has more than enough indexes on it as it is. And the index wouldn't be used because of this.

    1) In my experience, wider tables actually do consume more processing resources. Part of that, though, is the design of our database and the fact that our tables aren't just wide, but super long. You can have a wide, short table that processes fast, or a narrow, long table that processes fast. But put the wide and long together and you've just created a monster that takes forever to update. Especially when (as I said in #2) you have developers that like to index every column they call in query.

    We're working on breaking that habit, but it's a long row to hoe. @=)

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • I've tried to implement a LFSR as follows but it's not quite working. But in principle it should be possible.

    set nocount on

    declare @lfsrTable table(period int,value int NOT NULL)

    declare @lfsr int

    declare @period int

    declare @nextbit int

    set @lfsr = 1

    set @period = 0

    while (@period=0 or @lfsr <> 1)

    begin

    -- taps: 16 14 13 11; characteristic polynomial: x^16 + x^14 + x^13 + x^11 + 1

    set @nextbit = ((@lfsr & 65536)/65536) ^ ((@lfsr & 16384)/16384) ^ ((@lfsr & 8192)/8192) ^ ((@lfsr & 2048)/2048)

    set @lfsr = ((@lfsr * 2) & 65535) + @nextbit

    set @period = @period + 1

    insert @lfsrTable values(@period,@lfsr)

    end

    set nocount off

    select * from @lfsrTable

  • wppatton,

    I'm not going to give you the answer because that looks way too much like a homework problem. I will give you a fairly broad hint however--The mod operator returns the remainder of an integer division.

    --

    JimFive

  • Great article Brandie, like how you offered multiple solutions and showed how you worked through them.

    Jeff likes to use ABS(CHECKSUM(NEWID())) to generate random numbers, I haven't done any testing on it like you did, so don't know if it has issues with repetition or not. I've been treating it like it doesn't for non-critical use, suppose I should check that.;-)

    See http://www.sqlservercentral.com/Forums/Topic666309-338-1.aspx for an example of his use.

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • As AndyC mentions this is not a random set.

    However, for this case you don't need a random set, you need an arbitrary set of numbers. (More accurately, you need an arbitrary set that is easy to generate and impossible to reverse.) One way to achieve this would be to use RANK over the TaxID and then pad them out to the proper length. For nonnumeric data I would look at using CHECKSUM.

    --

    JimFive

  • James Goodwin (8/12/2010)


    As AndyC mentions this is not a random set.

    However, for this case you don't need a random set, you need an arbitrary set of numbers. (More accurately, you need an arbitrary set that is easy to generate and impossible to reverse.) One way to achieve this would be to use RANK over the TaxID and then pad them out to the proper length. For nonnumeric data I would look at using CHECKSUM.

    --

    JimFive

    Good point Jim, I actually had the opposite problem when generating person name test data in that everyone had a unique surnames where more typical would be that several people have the same name e.g. Smith and some surnames have just one person. Same principle goes for dates of birth

  • Jim,

    It's not even close to being a homework problem. I wouldn't post homework for someone else to do. It's real world problem being brought to me by a business associate.

Viewing 15 posts - 1 through 15 (of 51 total)

You must be logged in to reply to this topic. Login to reply