Public Data Sets

  • Comments posted to this topic are about the item Public Data Sets

  • Here's another list for healthcare data. This download contains info for healthcare providers. It includes name, business address, certification (MD, RN, LPN, etc.). The primary key is the National Provider Identifier or NPI number.

    This is the web site: http://download.cms.gov/nppes/NPI_Files.html

    The file is updated monthly. So with a little coding, you could make a process to grab the current file each month. This is a sample link: http://download.cms.gov/nppes/NPPES_Data_Dissemination_Feb_2016.zip

    These files are huge... over 500 MB zipped. Once unzipped and imported, the table contains over 4 million rows.

  • I rather like the UK Electoral Commission's data sets on voting.

    Thomas Rushton
    blog: https://thelonedba.wordpress.com

  • Here's an interesting site by NANPA, the North American Numbering Plan Association. Lot of different things concerning telephony, specifically about NPA/NXX (area code/exchange), etc.

    https://www.nationalnanpa.com/reports/reports_cocodes.html

    Most interesting, fun to play with, and useful for a lot of things is the following link on that page.

    Central Office Code Assignment Records

    It contains all of the active NPA/NXX combinations in various files (including one larger file that contains them all), which is easy and useful for testing import methods such as BCP, Bulk Insert, SSIS, and what have you. It's not as extensive as some of the paid files available in that it doesn't contain VnH coordinates nor Lat/Lon or other geo-location information but it does contain enough simple data in a consistent format while still echoing some of the mistakes in data format the we have to contend with. For example, the files are tab delimited but the columns have been right padded to also make the files "Fixed Field" in nature. NPA/NXX are listed as NPA-NXX using dash as the delimiter instead of a tab. And, the company name isn't just right padded but it also used quoted identifiers. It's a really easy data set to understand the typical problems it contains are great for practicing.

    On the first URL above, there's a good mix of zip and xls files of all shapes and sizes. If you think you know how to import and normalize Excel files, there are some really good typical examples that will test that for you. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This page would truly work better as a wiki than a post and comments.

    412-977-3526 call/text

  • Sector7G (2/26/2016)


    Here's another list for healthcare data. This download contains info for healthcare providers. It includes name, business address, certification (MD, RN, LPN, etc.). The primary key is the National Provider Identifier or NPI number.

    This is the web site: http://download.cms.gov/nppes/NPI_Files.html

    The file is updated monthly. So with a little coding, you could make a process to grab the current file each month. This is a sample link: http://download.cms.gov/nppes/NPPES_Data_Dissemination_Feb_2016.zip

    These files are huge... over 500 MB zipped. Once unzipped and imported, the table contains over 4 million rows.

    Added

  • ThomasRushton (2/26/2016)


    I rather like the UK Electoral Commission's data sets on voting.

    Added, and if you'd like to show how to get data from here and analyze..... 😀

  • Jeff Moden (2/26/2016)


    Here's an interesting site by NANPA, the North American Numbering Plan Association.

    It contains all of the active NPA/NXX combinations in various files (including one larger file that contains them all), which is easy and useful for testing import methods such as BCP, Bulk Insert, SSIS, and what have you.

    Added, and maybe you'd like to show some ways to pull in tricky data? 😛

  • robert.sterbal 56890 (2/26/2016)


    This page would truly work better as a wiki than a post and comments.

    I work with what I have. I don't have a wiki here, nor resources to put one up right now

  • Well, if you are willing to post updates to your article as they drip in, that will work just as well.

    Thanks for your efforts.

    412-977-3526 call/text

  • Steve Jones - SSC Editor (2/26/2016)


    ThomasRushton (2/26/2016)


    I rather like the UK Electoral Commission's data sets on voting.

    Added, and if you'd like to show how to get data from here and analyze..... 😀

    I'm finishing up the "Devils in the Data" article with the suggestions you made and some things I discovered when creating a presentation on the subject. I have several partial articles in the works for importing and handling data including a special one on how to normalize spreadsheet data automatically even when the spreadsheet has new months and categories added to it.

    You're going to like it. 😀

    I do wish that you could hum a rock at the folks that maintain the code part of the site and find out why I can't do a paste into the "Contribute and Article" pages. Some of the articles are quite complicated to format and if I just send you Word documents instead, you're going to spend a month of Sundays doing the formatting and maybe even get it wrong just because of the complexity.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Tiger data:

    TIGER/Line® Shapefiles and TIGER/Line® Files

    http://www.census.gov/geo/maps-data/data/tiger-line.html?eml=gd&utm_medium=email&utm_source=govdelivery

    412-977-3526 call/text

  • I'm always looking for free geography related data sets. Here is one that can serve up census data and more on US locations. I've used it in the past to gather place names for validation and reporting:

    http://mcdc2.missouri.edu/websas/geocorr2k.html

    Kindest Regards,

    Clayton

  • If anyone knows of a good online table other than stock data, that would be appreciated.

    I'm taking an online Excel course and the first assignment is to pull in some data using the Excel Data => Get External Data => From the Web functionality and then do some (very) low-level analysis. It's an intro level lesson, and I can do the assignment easily, but the course used an online stock table for the examples, and if I can, I'd like to find something a bit different and more interesting.

    In addition, I'd like for it to update at a frequency of two hours or less, although I could work with something that even only updates once a day. It should also have at least five columns (I know, pretty picky for someone who's begging for help, right?). I've been searching for the last three or four days now. I can find plenty of data sets to download, but I specifically need a table that's hosted on a web page. So, again, if anyone has any good suggestions, that would be greatly appreciated.

    Thanks.

  • robert.sterbal 56890 (2/26/2016)


    Tiger data:

    TIGER/Line® Shapefiles and TIGER/Line® Files

    http://www.census.gov/geo/maps-data/data/tiger-line.html?eml=gd&utm_medium=email&utm_source=govdelivery

    Oohh, that's good. Need to add a spatial secion.

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply