The keynote started out with Rob Farley and Buck Woody singing an awesome song about a slow running query. What a great way to start out the day. Next Wayne Snider was recognized. He spoke at his first summit in 1999. Both Wayne and Rick Heiges are rolling off the PASS board this year. Wayne gave a good roast of Rick and congratulated him on helping create the 24 hours of PASS. Well done Rick.
Wayne got a bit emotional up on the stage which he said was either gratitude or intergesten . Wayne gave a great quote “As you slide down the banister of life, may the splinters of success stick in your career”. Here’s to Wayne for making such an impact on the SQL Community. Thanks for all you have done.
May 10th – May 11th – SQL Rally. SQL Saturdays are everywhere. PASS Summit 2012 will be November 6 -9th in Seattle WA. Two days of pre-cons starting on Nov 5th. $995 for Summit $1395 includes both days of pre-con’s. Free ebook by Manning on MVP Deep Dives collection. This is a combination of both MVP Deep Dives books. It is 96 pages and FREE.
Birds of a Feather lunch is today. This is always a huge hit.
Last day to get the DVD set for $125 plus S/H. That is only $.73 per session.
David Dewitt came on stage to the largest applause yet. He is clearly a fan favorite. David created two new hash tags so we can tweet which specs we want him to wear. This data will be analyzed as big data.
What is big data? Think pedabytes. 2700 nodes and 60 PD’s is what Facebook deals with. Now that is BIG DATA. Estimates for the worlds data is 35 ZB by 2020. That is enough dvd’s to be stacked almost to Mars. What is generating the increase? More data, web searches, tweets, people realizing data is to valuable to delete, and the cost of storage is decreasing. The old guard (ebay) uses 10 PB on 256 nodes, young turks (facebook) use 20 PB on 2700 nodes, Bing 150 PB on 40k nodes.
NoSQL does not mean NO to SQL. It’s really to say Not Only SQL. So why NoSQL? More data model flexibility. JSON as a data model, No “schema first”, relaxed consistence models. They are willing to trade consistency for availability. Low upfront software costs. The folks just don’t understand SQL. (applause from the audience)
We now have two universes. Structured and Unstructured. Relational DB and NoSQL Systems. ACID and NoACID. Relation DB’s provide maturity, stability, efficiency. NoSQL provides a large amount of flexibility.
This is not a shift to a new DB platform. SQL is not going away. RDBMS will dominate transaction processing and ALL small to medium sized data warehouses. Many business will end up with data in both universes.
Dr Dewitt started explaining how Hadoop came to be and how it stores data by splitting large files into smaller chunks and storing them across the cluster nodes. They are stored in a file system. Based on its method of storing the data on clusters in different racks with different switches fault tolerance and speed are great. One of the nodes could actually be in another datacenter. Sounds like some complicated algorithms making this happen.
When a datanode fails, the data that was stored on that node are then stored on other available nodes in the cluster. When a new node is brought back online the file system will start spreading the data around to this new node. These events are all down under the covers.
Pro’s – Highly fault tolerant, relatively easy to write, MR framework removes burde of dealing with failures from programmers.
Con’s – Schema embedded in application code, a lack of shared schema.
Dr Dewitt went on to say that Facebook created HIVE and Yahoo created PIG in order to query Hadoop data. MapReduce jobs are difficult to write when you have to join data. Tables in HIVE or more relation DBMS like with data stored in tables.
Connecting the universes – Sqoop. Reasons were stated on why we would want to connect the universes such as being able to use procedural language to query. You may also need to access data that is in both relational and NoSQL environments for the business need. Makes sense to me.
Ok, so I got caught up in Dewitt’s speech and didn’t type as much. He basically laid it all down on how Hadoop stores all the data. He covered the pro’s and con’s of it all. Very well done. I suggest everyone who is interested to stream the keynote from the SQLPASS.ORG website. It is recorded and available. Really good stuff.
Basically to sum it up – NoSQL tools: Hive, Pig, and Sqoop. We learned their history and some of the things they are useful for. NoSQL = Not Only SQL. Relational Databases are not going anywhere and there is a market place for both. There are now TWO universes. Structured and “Not So” Structured.
Watch the keynote. One of the final slides is worth it alone. Great visual aide.