Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Finding Unique Non-Repeating Random Numbers


Finding Unique Non-Repeating Random Numbers

Author
Message
magarity kerns
magarity kerns
Old Hand
Old Hand (384 reputation)Old Hand (384 reputation)Old Hand (384 reputation)Old Hand (384 reputation)Old Hand (384 reputation)Old Hand (384 reputation)Old Hand (384 reputation)Old Hand (384 reputation)

Group: General Forum Members
Points: 384 Visits: 397
The benefits of random numbers are distribution and unpredictability. If you don't need those, then asking for randomness just makes your life more difficult.

I think this is the same basic point I was asking about; if this article is about a theoretical exercise it is worded poorly. I can't figure out why anyone would jump through these hoops in the given situation except maybe a consultant paid by the hour.

Stores in franchise #1, obfuscated EIN = 1. Stores in franchise 2, obfuscated EIN = 2. What am I missing??

PS - I wish some people would have pity on my underpowered netbook and its tiny screen trying to render these pages and go easy on the animated gifs and 10 line sigs attached to one word comments Crying
Brandie Tarvin
Brandie Tarvin
SSCertifiable
SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)

Group: General Forum Members
Points: 7767 Visits: 8731
It's not a theoretical exercise. I actually had to do this for a project. And setting up encryption was not an option given me.

EDIT: Since I had to jump through so many hoops and I didn't find a lot of articles on the subject, I thought I'd post for anyone else who is looking for something like this.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45018 Visits: 39887
James Goodwin (8/13/2010)
Since you brought it up, please explain the semantical differences between a random set and an arbitrary set of numbers.

Jeff,
An arbitrary number is a number that you don't care how it is generated. As in the example in my previous post, since you just need an arbitrary number you can use Rank() to generate the numbers. They're not random, but they solve the deidentification issue presented.

A random number, on the other hand, is a number generated through a random (or pseudo-random) process. This means that any number has an equal chance of coming up on any given test and therefore a set of random numbers is not guaranteed to contain unique numbers. The benefits of random numbers are distribution and unpredictability. If you don't need those, then asking for randomness just makes your life more difficult. Putting constraints (such as uniqueness) on a set of random numbers makes the set less random.


To see why asking another person for a random number is bad, see http://scienceblogs.com/cognitivedaily/2007/02/is_17_the_most_random_number.php

--
JimFive


Thanks, James. As Magarity Kerns just posted, that ties it all together. I think a lot of people spend a lot of time making "random" numbers in an attempt to "obfuscate" data and all they really need are "arbitrary" place holders. It makes life even simpler when folks are made to understand that a list of test SSN's can actually be just a bunch of sequential 9 digit numbers.

Anyway, thank you and Magarity for putting a very different and still very practical spin on all of this subject.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45018 Visits: 39887
Bert De Haes (8/13/2010)
You can easely resolve the "unique" problem by adding a distinct to the select top().
I tried this, and it works fine on SQL 2005. The distinct will not generate an ordered list ... ( you can use newid() or rand() )

CREATE TABLE dbo.#RandomTaxID (TaxID int);
GO
INSERT INTO dbo.#RandomTaxID (TaxID)
SELECT distinct TOP (70000)
RIGHT(CAST(CAST(CAST(NEWID() AS VARBINARY) AS BIGINT) AS VARCHAR(30)),9)
FROM dbo.Tally;

ALTER TABLE dbo.#RandomTaxID ADD MyID int Identity(1,1) NOT NULL primary key clustered;
go
-- verify uniqueness :
create unique index UQ_RandomTaxID on #RandomTaxID(TaxID);

select top 10 * from #RandomTaxID order by MyID;




TaxID MyID
514443407 1
959020860 2
287024448 3
269261913 4
575137863 5
423204402 6
566434316 7
66537729 8
82584852 9
752384001 10




Works just fine... until you need more than what your Tally table may have in it. Try it to gen a million numbers using a single cross join on an 11000 row Tally table. On second thought, don't try it... I stopped the code after 2.5 hours just trying to gen 70K rows.

Using a Million row Tally, it generated 999541 rows after the distinct in about 27 seconds. Deduping appears to cost a lot on the fly both in your code and mine. :-)

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45018 Visits: 39887
Brandie Tarvin (8/13/2010)
Jeff Moden (8/13/2010)

You can see that an average of about 520 numbers out of the million duplicate on each run. The run above (not including the dupe check) takes about 2 seconds to execute. The following run consitently gens a million UNIQUE random numbers in random order in about 26 seconds (again, not including the verification times).


Oh. Now I get to play with more of your code. Thanks, Jeff. @=)


Having code to play with is always fun. Nice article, Brandie. It brought out several good discussions including the discussion that you may not need "random" numbers at all... just "arbitrary" numbers.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
pete callaghan
pete callaghan
Grasshopper
Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)

Group: General Forum Members
Points: 10 Visits: 82
Support the animated giff comment - they get do very old very quickly.
jcrawf02
jcrawf02
Ten Centuries
Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)

Group: General Forum Members
Points: 1420 Visits: 19324
pete callaghan (8/22/2010)
Support the animated giff comment - they get do very old very quickly.
For those of you who are bothered, go to your Control Panel (top left of the page), Forum Settings, under Topic Display Options, uncheck the Show Avatars checkbox. Save your settings, and you're good to go.

Or, upgrade your computer. :-P

---------------------------------------------------------
How best to post your question
How to post performance problems
Tally Table:What it is and how it replaces a loop

"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."
Brandie Tarvin
Brandie Tarvin
SSCertifiable
SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)

Group: General Forum Members
Points: 7767 Visits: 8731
I had no idea what that comment was supposed to mean. Since you replied, maybe you can explain it to me? @=)

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.
jcrawf02
jcrawf02
Ten Centuries
Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)

Group: General Forum Members
Points: 1420 Visits: 19324
Brandie Tarvin (8/23/2010)
I had no idea what that comment was supposed to mean. Since you replied, maybe you can explain it to me? @=)
Related to the PS on magarity's post
PS - I wish some people would have pity on my underpowered netbook and its tiny screen trying to render these pages and go easy on the animated gifs and 10 line sigs attached to one word comments
Actually, there's a Show Signatures checkbox there too, so if you wanted to disable both for your view, you could do so.

---------------------------------------------------------
How best to post your question
How to post performance problems
Tally Table:What it is and how it replaces a loop

"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."
Brandie Tarvin
Brandie Tarvin
SSCertifiable
SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)

Group: General Forum Members
Points: 7767 Visits: 8731
jcrawf02 (8/23/2010)
Brandie Tarvin (8/23/2010)
I had no idea what that comment was supposed to mean. Since you replied, maybe you can explain it to me? @=)
Related to the PS on magarity's post


Ah. Totally missed that. Thanks for the clarification.

And ditto on the editing the views to meet your personal preferences. That way you can have everything just the way you like it, or just the way your PC/Laptop/Netbook can handle it.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search