Why Make Up Test Data? Snag Some Government Data

, 2009-07-28

… and by government data, I mean the mountain of data recently made available by the G-Men on Data.gov.  This site contains what must be terabytes of data on every topic from environmental measurements to crime statistics, from geographical data to labor statistics.  The Obama administration has committed to greater transparency, and the availability of this data is a significant step toward that goal.  The trendy geek magazine Wired.com recently did a feature on Data.gov that is worth reading.

It’s obvious that Data.gov is an immature portal.  Delivery types are inconsistent – some files are available only as flat files, others as only Excel, and a few claim to offer XML feeds.  The formatting can vary wildly from one set of data to the next, and often includes headers and footers which muddy up otherwise clean raw data files.

So why should you, as a database professional, care about this information? If you’re trying to improve your skills in database technologies (and especially in this economy, who isn’t trying to improve him/herself?), this data store is a great place to start.  Because of the sheer size and sometimes unusual layouts, this information is an excellent test bed for honing one’s skills at Integration Services, Analysis Services, or for creating VLDBs (very large databases) on which to practice.  And if you’re truly ambitious, there’s a contest to come up with the best application of this data, with a $10,000 bounty to the winner.

As for me, I’m currently pulling down some FBI crime data with the intention of using it in an upcoming SSIS class I’m presenting.  Perhaps I’ll think up an app that could win the $10K as well….





Related content

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.


1,567 reads

Networking - Part 4

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...


1,530 reads

Speaking at Community Events - More Thoughts

Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.


360 reads