Last night’s Oregon SQL user group meeting was very enlightening. Mark Simms form the SQLCAT team came down from Redmond to talk about StreamInsight, a substantial new feature in SQL Server 2008 R2 that allows data to be analyzed and aggregated in-memory as it flows through the server. I’ve read a little about this new capability and have seem some short demos but Mark gave us a comprehensive look under the hood after walking the group through a few scenarios that really made this technology very relevant.
StreamInsight is part of the SQL Server product suite, which makes sense because it is a platform for working with and analyzing data. However, it has very little to do with traditional database storage and relational or stored analytical data and query paradigms. It looks like more of a programming tool than a database technology. StreamInsight sits on top of Dot Net and uses data adaptors to received and send streams of data that travel through memory – a little bit like SQL Server Integration Services (SSIS). It uses a scripting language called LINQ, which is actually a set of .NET extension libraries, typically used with the C# .NET programming language. The tenants if StreamInsight are events, windows and various operators such as aggregates. Like .NET object-oriented method coding, nested commands and operators can be built into named code constructs called macros but this code doesn’t have to be pre-compiled into assemblies. In fact, the programming and debugging experience feels a lot like writing and testing SQL using rich programming tools.
Mark did a great job of demonstrating some very meaningful scenarios. By itself, StreamInsight has nothing to do with data storage. The data flowing through an application can originate as live transactions, streaming video, messages or something like chats and traffic on a social network. Most applications are used to analyze data in-flight without the latency and overhead of disk I/O; but all or part of this data could also be read from persistent storage or written to a database after it’s been processed. It would normally run on a dedicated server that just pipes the data through for analysis and to control other components of a larger solution.
He gave examples of reading flow information from oil pipelines, monitoring large-scale streaming media events like the Olympics or a championship game, active stock trading, and monitoring traffic patterns on a highway system. There’s little value in storing all these events but it’s critically important to monitor it in real-time to capture samples and states form time to time. I think I get StreamInsight now and I can envision a variety of useful applications.
One of my takeaways as a BI practitioner is this: In our practice, we are often asked to work with a consulting client to build business dashboards. It’s common to sit down with a business user who says something like “I would like to see a dashboard that shows me a timeline of historical KPIs, measures and dimensions” – that’s typical BI analysis that we would read from a data warehouse. …”then I’d like to see the continuum of that data as it relates to our daily production activity”. This is operational data, stored in a live, transactional system. The volume of data is low but and the data can change. It comes from system that stores records as they occur. “Finally, I want to see what’s going on in our live processes, how our production equipment is operating and what components are performing above or below capacity.” That’s not necessarily information they may need to keep around but it is certainly useful data during and shortly after thousands of events are taking place.
A recording of Mark’s session will be available soon with sample and code. I’ll update this post accordingly.