Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««23456»»

Finding Unique Non-Repeating Random Numbers Expand / Collapse
Author
Message
Posted Friday, August 13, 2010 11:16 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Thursday, May 8, 2014 10:08 PM
Points: 358, Visits: 397
The benefits of random numbers are distribution and unpredictability. If you don't need those, then asking for randomness just makes your life more difficult.

I think this is the same basic point I was asking about; if this article is about a theoretical exercise it is worded poorly. I can't figure out why anyone would jump through these hoops in the given situation except maybe a consultant paid by the hour.

Stores in franchise #1, obfuscated EIN = 1. Stores in franchise 2, obfuscated EIN = 2. What am I missing??

PS - I wish some people would have pity on my underpowered netbook and its tiny screen trying to render these pages and go easy on the animated gifs and 10 line sigs attached to one word comments
Post #969126
Posted Friday, August 13, 2010 11:33 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 4:38 AM
Points: 7,211, Visits: 6,353
It's not a theoretical exercise. I actually had to do this for a project. And setting up encryption was not an option given me.

EDIT: Since I had to jump through so many hoops and I didn't find a lot of articles on the subject, I thought I'd post for anyone else who is looking for something like this.


Brandie Tarvin, MCITP Database Administrator

Webpage: http://www.BrandieTarvin.net
LiveJournal Blog: http://brandietarvin.livejournal.com/
On LinkedIn!, Google+, and Twitter.

Freelance Writer: Shadowrun
Latchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.
Post #969139
Posted Friday, August 13, 2010 6:13 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 10:00 AM
Points: 37,099, Visits: 31,650
James Goodwin (8/13/2010)
Since you brought it up, please explain the semantical differences between a random set and an arbitrary set of numbers.

Jeff,
An arbitrary number is a number that you don't care how it is generated. As in the example in my previous post, since you just need an arbitrary number you can use Rank() to generate the numbers. They're not random, but they solve the deidentification issue presented.

A random number, on the other hand, is a number generated through a random (or pseudo-random) process. This means that any number has an equal chance of coming up on any given test and therefore a set of random numbers is not guaranteed to contain unique numbers. The benefits of random numbers are distribution and unpredictability. If you don't need those, then asking for randomness just makes your life more difficult. Putting constraints (such as uniqueness) on a set of random numbers makes the set less random.


To see why asking another person for a random number is bad, see http://scienceblogs.com/cognitivedaily/2007/02/is_17_the_most_random_number.php

--
JimFive


Thanks, James. As Magarity Kerns just posted, that ties it all together. I think a lot of people spend a lot of time making "random" numbers in an attempt to "obfuscate" data and all they really need are "arbitrary" place holders. It makes life even simpler when folks are made to understand that a list of test SSN's can actually be just a bunch of sequential 9 digit numbers.

Anyway, thank you and Magarity for putting a very different and still very practical spin on all of this subject.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #969324
Posted Friday, August 13, 2010 9:16 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 10:00 AM
Points: 37,099, Visits: 31,650
Bert De Haes (8/13/2010)
You can easely resolve the "unique" problem by adding a distinct to the select top().
I tried this, and it works fine on SQL 2005. The distinct will not generate an ordered list ... ( you can use newid() or rand() )
CREATE TABLE dbo.#RandomTaxID (TaxID int);
GO
INSERT INTO dbo.#RandomTaxID (TaxID)
SELECT distinct TOP (70000)
RIGHT(CAST(CAST(CAST(NEWID() AS VARBINARY) AS BIGINT) AS VARCHAR(30)),9)
FROM dbo.Tally;

ALTER TABLE dbo.#RandomTaxID ADD MyID int Identity(1,1) NOT NULL primary key clustered;
go
-- verify uniqueness :
create unique index UQ_RandomTaxID on #RandomTaxID(TaxID);

select top 10 * from #RandomTaxID order by MyID;


TaxID MyID
514443407 1
959020860 2
287024448 3
269261913 4
575137863 5
423204402 6
566434316 7
66537729 8
82584852 9
752384001 10




Works just fine... until you need more than what your Tally table may have in it. Try it to gen a million numbers using a single cross join on an 11000 row Tally table. On second thought, don't try it... I stopped the code after 2.5 hours just trying to gen 70K rows.

Using a Million row Tally, it generated 999541 rows after the distinct in about 27 seconds. Deduping appears to cost a lot on the fly both in your code and mine.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #969336
Posted Friday, August 13, 2010 9:19 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 10:00 AM
Points: 37,099, Visits: 31,650
Brandie Tarvin (8/13/2010)
Jeff Moden (8/13/2010)

You can see that an average of about 520 numbers out of the million duplicate on each run. The run above (not including the dupe check) takes about 2 seconds to execute. The following run consitently gens a million UNIQUE random numbers in random order in about 26 seconds (again, not including the verification times).


Oh. Now I get to play with more of your code. Thanks, Jeff. @=)


Having code to play with is always fun. Nice article, Brandie. It brought out several good discussions including the discussion that you may not need "random" numbers at all... just "arbitrary" numbers.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #969337
Posted Sunday, August 22, 2010 7:33 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, March 4, 2014 2:50 AM
Points: 10, Visits: 80
Support the animated giff comment - they get do very old very quickly.
Post #973046
Posted Monday, August 23, 2010 8:49 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 7:04 AM
Points: 2,678, Visits: 19,271
pete callaghan (8/22/2010)
Support the animated giff comment - they get do very old very quickly.
For those of you who are bothered, go to your Control Panel (top left of the page), Forum Settings, under Topic Display Options, uncheck the Show Avatars checkbox. Save your settings, and you're good to go.

Or, upgrade your computer.


---------------------------------------------------------
How best to post your question
How to post performance problems
Tally Table:What it is and how it replaces a loop

"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."
Post #973499
Posted Monday, August 23, 2010 10:19 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 4:38 AM
Points: 7,211, Visits: 6,353
I had no idea what that comment was supposed to mean. Since you replied, maybe you can explain it to me? @=)

Brandie Tarvin, MCITP Database Administrator

Webpage: http://www.BrandieTarvin.net
LiveJournal Blog: http://brandietarvin.livejournal.com/
On LinkedIn!, Google+, and Twitter.

Freelance Writer: Shadowrun
Latchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.
Post #973599
Posted Monday, August 23, 2010 10:38 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 7:04 AM
Points: 2,678, Visits: 19,271
Brandie Tarvin (8/23/2010)
I had no idea what that comment was supposed to mean. Since you replied, maybe you can explain it to me? @=)
Related to the PS on magarity's post
PS - I wish some people would have pity on my underpowered netbook and its tiny screen trying to render these pages and go easy on the animated gifs and 10 line sigs attached to one word comments
Actually, there's a Show Signatures checkbox there too, so if you wanted to disable both for your view, you could do so.


---------------------------------------------------------
How best to post your question
How to post performance problems
Tally Table:What it is and how it replaces a loop

"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."
Post #973614
Posted Monday, August 23, 2010 10:44 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 4:38 AM
Points: 7,211, Visits: 6,353
jcrawf02 (8/23/2010)
Brandie Tarvin (8/23/2010)
I had no idea what that comment was supposed to mean. Since you replied, maybe you can explain it to me? @=)
Related to the PS on magarity's post


Ah. Totally missed that. Thanks for the clarification.

And ditto on the editing the views to meet your personal preferences. That way you can have everything just the way you like it, or just the way your PC/Laptop/Netbook can handle it.


Brandie Tarvin, MCITP Database Administrator

Webpage: http://www.BrandieTarvin.net
LiveJournal Blog: http://brandietarvin.livejournal.com/
On LinkedIn!, Google+, and Twitter.

Freelance Writer: Shadowrun
Latchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.
Post #973616
« Prev Topic | Next Topic »

Add to briefcase «««23456»»

Permissions Expand / Collapse