This is one of my favorite technologies for SQL Server 2008 R2. This session was from Torsten Grabs, the program manager.
It starts looking at the value of timely analysis. Being able to view, and use data in a timely manner. A trend line of analysis with time. Data that is months and years old is for historical analysis. Days and hours is primarily used in forecasting. The data that is minutes and seconds old, can be very valuable. This is stuff from web analytics, etc. More and more verticals are seeing value in real time information, and they want to query it. Health case, financial services, utilities, etc.
Many companies have built home grown solutions for this, but they are expensive and require lots of development resources as the number of events grows.
MS decided to build this into SQL Server since it was a missing part of their stack. Where does this fit into SQL Server?
Operational Intelligence as a platform comes from multiple sources of data being cached, processed, distributed, and then visualized in multiple ways.
From the slide, you want data from the left to move to the right as quickly as possible for the top, push items. The pull items, reports, mining, etc, you also want to be able to refresh each other quickly.
How do event driven applications differ from standard database applications? There is a good set of contrasts. The query paradigm is from adhoc queries. For event driven apps, there are continuous standing queries.
For latency, a standard db looks for days, minutes, seconds. The event driven app wants ms or less.
For the data rate, standard db wants hundreds of events per sec. For event driven, there are tens of thousands per second.
For the query semantics, both use relational queries, but event driven items need temporal queries as well.
The target applications are for operational analytics apps, web analytics, manufacturing apps.
For an overview, there are development with VS 2010 using .NET, C#, LINQ.
The application at runtime takes input adapters, perhaps some static reference data from SQL Server, and it sends the data through output adapters to consumers. These can be devices, or they can be event or data stores.
Inside, StreamInsight takes your LINQ queries and compiles them to be used against the streams of data. They become triggers against events.
Out of the box, this is an in memory query processing engine. You do not need to persist any data to work with it in StreamInsight. Persisting data is expensive, and introduces latency. You have the choice of where, and if, you want to deal with this latency.
StreamInsight query is very composable, so you can take out smaller portions of the queries and reuse them, or build on them for other queries.
This is needed because data acquisition costs are falling, and with low storage requirements of StreamInsight, you can now process data while it is in flight and then avoid loading huge amounts of data.
Utility Scenario: households can be instrumented with smart grid power meter. This provides continuous insight into usage, and/or demand. StreamInsight scales to millions of meters, with 100,000s meter readings per second. You can then write validation, editing, estimation rules in LINQ, that might allow them to see a constant view of their grid, and perhaps help prevent power outages.
Financial Services scenario: You can get continuous low latency market waching, and managing risk better. You can actually detect and notify users in real time of actual risk. You can also take the same models or LINQ queries and run them against historical data.
Web Analytics: StreamInsight allows continuous analysis of online behavior. This can identify relevant content before the next click, and perhaps define that new content based on the click behavior. It also allows analysis of web logs online, in near real time instead of offline in a batch.
Event Types – basic structure of what you are writing queries against. Structure, using the .NET type “system” and have a series of fields. The SI engine provides timestamp fields to capture all temporal event characteristics. All calculations are based on business time.
Event streams can be continuous, or bursty. They can go on indefinitely, and potentially arrive out of order.
There are a number of operators that provide all sorts of things you might expect. JOINs, existence, filtering, grouping, APPLY operations, sum, count, ranking, and more. There are a number of temporal operations. Four are provided out of the box: hopping window, sliding window, tumbling window, and a count window.
The demo shows some real time data management. In this case, it is a financial application, examining MSFT and IBM as stock tickets. Incoming data is partitioned with a temporal function by stock, and data is pushed to some applications, and the stored in SQL Server as well. We saw the stream start coming into an application, which could then display the data as a stream.
By enabling various output adapters, we could see graphs updating in real time, some data being stored in SQL Server, and even Excel pulling refreshed data for someone. It’s very cool.
There is a Linqpad like application that can connect to a stream and allow you to write some LNQ queries and see results from the stream.
We also saw the debugger. You can see what is happening in your application, and even record a stream. If you pick an output row, you can do a root cause analysis, and see at each stop of the query processing, what rows contributed to that output row. That would be amazing for T-SQL as well. If you could see a complex query and find out which rows “contributed” to that final result.
This is an extensible framework, allowing you to build your own operators or extensions into the StreamInsight applications. These can be temporal or non-temporal operators.
There are different deployment scenarios. You can process close to the data source, or you can deploy further downstream and capture events from multiple types of sources.
StreamInsight is being deployed as an embedded engine, a few DLLs inside your application. This gives a low footprint and very little overhead. It can run as a Windows Service as well.
The packaging is something I’ve noted for tomorrow’s blog, and the latency and event size varies according to the edition.
There are a number of partners that are working with MS on solutions. Check the StreamInsight site for more information. Lab 49, OSISoft, Matrikon, and more can help you.
This stuff is cool. Mostly high end, but I can see lots of smaller sites thinking of using this for more real time web analytic analysis.