SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


REAL Sample Data available for free - better than fake!


REAL Sample Data available for free - better than fake!

Author
Message
jpSQLDude
jpSQLDude
SSC Eights!
SSC Eights! (819 reputation)SSC Eights! (819 reputation)SSC Eights! (819 reputation)SSC Eights! (819 reputation)SSC Eights! (819 reputation)SSC Eights! (819 reputation)SSC Eights! (819 reputation)SSC Eights! (819 reputation)

Group: General Forum Members
Points: 819 Visits: 939
I was reading the discussion:
http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx
about this article:
http://www.sqlservercentral.com/articles/BETWEEN/71395/
having to do with a faster BETWEEN for dates.

One of the issues that came up was problems when using fake/unrealistic sample data.

I had already bookmarked many places that have sample data. Much of it is "real" -- meaning it is actually data people use, meaning you should be able to have more realistic examples, very large data sets, better optimize your queries/plans, etc. And once you get familiar with some of these data sets, it is probably faster than building some loop to generate faked data.

So here are some sources, do you know any more?

MASSIVE: http://www.datawrangling.com/some-datasets-available-on-the-web

http://www.guardian.co.uk/data-store
http://data.un.org/
http://infochimps.com/datasets
http://theinfo.org/get/data
http://stackoverflow.com/questions/57068/good-databases-with-sample-data
http://www.readwriteweb.com/archives/where_to_find_open_data_on_the.php
http://www.ferc.gov/docs-filing/eqr/soft-tools/sample-csv.asp
http://www.baseball1.com/
http://www.findingdulcinea.com/guides.html?pg=00&topic=/categories/sports/football
http://www.postneo.com/2007/09/09/accidental-apis-nfl-edition
http://www.livefantasyscoring.com/2008.shtml
http://www.red-gate.com/products/SQL_Data_Generator/index.htm
http://www.mauvais.com/Publish/ZD-Northwind.htm
http://www.microsoft.com/downloads/en/details.aspx?FamilyID=06616212-0356-46A0-8DA2-EEBC53A68034&displaylang=en
http://www.codeplex.com/Wikipage?ProjectName=SqlServerSamples#databases
http://msftdbprodsamples.codeplex.com/
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (336K reputation)SSC Guru (336K reputation)SSC Guru (336K reputation)SSC Guru (336K reputation)SSC Guru (336K reputation)SSC Guru (336K reputation)SSC Guru (336K reputation)SSC Guru (336K reputation)

Group: General Forum Members
Points: 336496 Visits: 42591
I, for one, really appreciate the lists of data you've published. My problem, though, has frequently been that it takes way too much time to find a database from those types of samples that resembles what I need close enough. For example, let's say I need to simulate something that needs to have 1 million rows of random but constrained dates and amounts across 2 years and 200 accounts. How long would it take someone to find just the right data? Chances are they could never find it quite the way they want it sooooooo, they may have to find two or three databases and write some code to glean the data in a format they wanted.

Why not just write code to make the data to meet the requirements to begin with? It's really not that hard. I know, I know... what about some random names? Do they have to be real names or just variable lengths of characters that we could derive from a GUID? What about addresses? Again, do they have to be real street addresses or can we generate a random number and concatenate with parts of a GUID. Sure, there will be times when the addresses actually have to be valid for test... then maybe one of those databases you posted will come in handy. But for that date problem you cited? It's easier to build the data than to find it in one of the databases you were kind enough to post.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
jpSQLDude
jpSQLDude
SSC Eights!
SSC Eights! (819 reputation)SSC Eights! (819 reputation)SSC Eights! (819 reputation)SSC Eights! (819 reputation)SSC Eights! (819 reputation)SSC Eights! (819 reputation)SSC Eights! (819 reputation)SSC Eights! (819 reputation)

Group: General Forum Members
Points: 819 Visits: 939
Data sets with addresses, etc:

http://aws.amazon.com/publicdatasets/
http://www.manifold.net/updates/product_downloads.shtml
http://www.gsd.harvard.edu/gis/manual/realestate/index.htm
http://www.ibm.com/developerworks/xml/library/x-geomap2/index.html
http://www.gearthblog.com/blog/archives/2006/06/huge_database_u.html


Free data set generator, including names, addresses, phone numbers, emails, etc. Not real-world data, but probably pretty close and much easier than creating your own script...

Demo: http://www.generatedata.com/#generator
Download: http://www.generatedata.com/#download
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (336K reputation)SSC Guru (336K reputation)SSC Guru (336K reputation)SSC Guru (336K reputation)SSC Guru (336K reputation)SSC Guru (336K reputation)SSC Guru (336K reputation)SSC Guru (336K reputation)

Group: General Forum Members
Points: 336496 Visits: 42591
jpSQLDude (11/2/2010)
Data sets with addresses, etc:

http://aws.amazon.com/publicdatasets/
http://www.manifold.net/updates/product_downloads.shtml
http://www.gsd.harvard.edu/gis/manual/realestate/index.htm
http://www.ibm.com/developerworks/xml/library/x-geomap2/index.html
http://www.gearthblog.com/blog/archives/2006/06/huge_database_u.html


Free data set generator, including names, addresses, phone numbers, emails, etc. Not real-world data, but probably pretty close and much easier than creating your own script...

Demo: http://www.generatedata.com/#generator
Download: http://www.generatedata.com/#download



From the data generator site...
Requirements

MySQL 4+
PHP 4+
Any modern, JS-enabled browser


Have you tried it with T-SQL?

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search