Public Data Sets

  • jetboy2k (2/26/2016)


    If anyone knows of a good online table other than stock data, that would be appreciated.

    I'm taking an online Excel course and the first assignment is to pull in some data using the Excel Data => Get External Data => From the Web functionality and then do some (very) low-level analysis. It's an intro level lesson, and I can do the assignment easily, but the course used an online stock table for the examples, and if I can, I'd like to find something a bit different and more interesting.

    In addition, I'd like for it to update at a frequency of two hours or less, although I could work with something that even only updates once a day. It should also have at least five columns (I know, pretty picky for someone who's begging for help, right?). I've been searching for the last three or four days now. I can find plenty of data sets to download, but I specifically need a table that's hosted on a web page. So, again, if anyone has any good suggestions, that would be greatly appreciated.

    Thanks.

    Not sure about hourly updates. I'd say that during the baseball season, you might find a web page that updates stats during the game.

    This might be interesting to you: http://www.worldometers.info/

    there are places that update during games, like this: http://stats.statbroadcast.com/statmonitr/?id=106595

    I suppose if you limited your "refresh" cycle, this might work.

  • Stackexchange is a so-so community, but their database is awesome!

    https://archive.org/details/stackexchange

    http://meta.stackexchange.com/questions/2677/database-schema-documentation-for-the-public-data-dump-and-sede

    412-977-3526 call/text

  • The NetFlix Prize database is no longer available, but here is something similar.

    20 million ratings and 465,000 tag applications applied to 27,000 movies by 138,000 users. Released 4/2015.

    http://grouplens.org/datasets/movielens/20m/

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Steve, thanks very much. Starting to really hate this assignment.

    Both of those would be great, but I'm running into issues trying to access them via Excel. The World Monitor page is just cool in general, and I'd love to use it, but apparently the way that it is embedded into the page prevents Excel from recognizing it as an importable table. You may already know this, but when you use the Data => Get External Data => From the Web functionality, once you load the web page, anything that Excel sees as a table has a small box with an arrow in in the upper left corner of that table. You click on that arrow if you want to import the table, and unfortunately, none of the counter tables has that box with arrow.

    The Sports Monitor page would be great, but it asks me to log in when I try to access it in Excel, which is a bit odd since the link you provided works just fine. I can only assume that you have an account with that site and that somehow your username and password are stored with the direct link but lost when I cut and paste.

    Anyway, I did want to say thank you, these are both very cool links that should work for me, so I do appreciate the information.

  • Here is another website hosting 10,688 public datasets.

    https://datahub.io/dataset

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Have you tried using Power Query to extract data from the web site instead? Power Query has a lot more capability than the standard web data snagging wizard. Check out Chris Webb's blog at [/url] for a ton of great content on Power Query.

    Kindest Regards,

    Clayton

  • jetboy2k (2/26/2016)


    Steve, thanks very much. Starting to really hate this assignment.

    Both of those would be great, but I'm running into issues trying to access them via Excel. The World Monitor page is just cool in general, and I'd love to use it, but apparently the way that it is embedded into the page prevents Excel from recognizing it as an importable table. You may already know this, but when you use the Data => Get External Data => From the Web functionality, once you load the web page, anything that Excel sees as a table has a small box with an arrow in in the upper left corner of that table. You click on that arrow if you want to import the table, and unfortunately, none of the counter tables has that box with arrow.

    The Sports Monitor page would be great, but it asks me to log in when I try to access it in Excel, which is a bit odd since the link you provided works just fine. I can only assume that you have an account with that site and that somehow your username and password are stored with the direct link but lost when I cut and paste.

    Anyway, I did want to say thank you, these are both very cool links that should work for me, so I do appreciate the information.

    Ah, sorry. I don't have an account, but there must be something Excel sees as crendentials embedded in.

    I'll look around. Or maybe create something. That would be a neat project.

  • Found one!!!

    http://tidesonline.nos.noaa.gov/data_read.shtml?station_info=9444900+Port+Townsend,+WA

    The data itself is not particularly interesting, but it gave me the opportunity to do what I thought was some pretty fun stuff, given the low level of analysis the assignment required. One of the things I did that I really had fun doing was converting the wind direction from what I assumed was a degree reading into a compass point direction by building a range table, and then assigning compass points to the corresponding ranges (each range was 22.5 degrees, with the compass point mapped to 11.25 degrees less than and 11.25 degrees greater than its actual position on a compass - so, for example, if NNE is at 22.5 degrees, anything between 11.25 and 33.75 was mapped to NNE). Yeah, I'm a geek, but this is a pretty good site to be a member on if you're a geek 😀

    I also did some simple calculations to determine if the observed water level was above or below the predicted water level and by what percentage, whether or not the wind speed was higher than or lower than the average wind speed, and whether or not a record had data (this based on all columns except the predicted water level having a value of -99.99, as per the site)

    I'm not sure what the update frequency is, but the data contains rows for every 6 minutes going back what looks to be approximately 48 hours. It also includes records up to the current time and approximately 24 hours ahead, which are, of course, the records that don't have data.

    If anyone is at all interested, the spreadsheet is here. I will reiterate, though, that this is really low-level stuff.

    All that said, @steve-2 Jones, thank you again for your assistance. It is greatly appreciated.

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply