SQLServerCentral Article

Building a Yahoo Finance input adapter for SQL Server StreamInsight

,

Introduction

This article describes Microsoft SQL Server StreamInsight, including the what, why and how questions, with a very simplified example how to stream data from Yahoo Finance. It is mainly divided in the following sections:

  1. What is StreamInsight and why should you bother?
  2. Step by step how to build a StreamInsight application
  3. Debugging and conclusion

StreamInsight overview

What is StreamInsight and what is it good for? StreamInsight is a platform for developing and deploying applications that handle high-speed streaming data. It could be used for near real-time processing of data from production environments, structured data such as financial information and unstructured data such as Facebook, Twitter and blogs. Multiple sources can be combined and refined before they are being output. This technology is called CEP - complex event processing.

Combine StreamInsight with data mining, and you can have real-time fraud detection, stock market prediction, you name it... Imagine a real-time Business Intelligence-application where the management can actually see the KPI gauges moving on the screen. I would say that monitoring real-time flow of information is one of the key success factors for tomorrow's Business Intelligence. This is also supported by Ralph Kimball, saying in an interview that Business Intelligence is moving from the strategic level towards operational level processes such as customer support, logistics and sales. At the operational level data must be continuously updated, not just once per day. I would add also that the new generation, that has grown up with Facebook and Twitter, will make it necessary to monitor new sources for successful Business Intelligence.

You develop your application in Visual Studio 2008, using libraries from StreamInsight. Basically you develop (or use existing) input and output adapters and connect them through LINQ queries. The picture below (picture used with permission from Microsoft) shows a good overview of the product:

There are three different development models supported by StreamInsight:

  • Explicit model, which provides full control and access to the CEP-server.
  • Implicit model, which can handle only a single query and runs all in memory. Good for debugging.
  • IObservable/IObserver interface, which hides much of the complexity of the CEP-server API

All data in StreamInsight is organized into event streams. The events are of two kinds:

  • INSERT events, that contains a header with timestamp information and a payload with application specific data. These events can be of different models, such as Interval, Point and Edge, to support different temporal characteristics.
  • CTI events, that indicates the completeness of previous events in the stream.

Adapters can be of two types:

  • Untyped adapters that work with multiple types of INSERT events
  • Typed adapters that work with a single type of INSERT events

Queries are written in LINQ with the possibility to do calculations, filtering, grouping, windowing, aggregations, combining and invoke user-defined functions. A query template is bound to specific input and output adapters to produce a query instance.

 

Our example - building an input adapter for Yahoo Finance stock quotes

For testing StreamInsight, I chose to build an input adapter that reads stock quotes from Yahoo Finance. Note however that quotes from Yahoo Finance are delayed at least 15 minutes unless you pay for their premium service. To test the input adapter I also built an output adapter the displays the quotes on the screen (just a simple console application). These I joined together in an application that I built using the implicit development model.


The point here is not the functionality of the application, but that you could easily replace the output adapter with something much more useful. By adding SQL Server 2008 Data Mining, using any algorithm for predictive analysis, you could build something much more useful. Adding an output adapter that places buy and sell orders into a trading system could make you rich...

The picture below shows the Visual Studio 2008 project. The steps involved in building it will be described next.

 

Step 1 - choosing a development model

The implicit development model was chosen because it uses the full scale input (and output) adapters and at the same time makes debugging easier. An adapter developed with the implicit model can be reused for the explicit model.

Step 2 - defining stream data

A payload class has to be defined for each kind of event that is supported. StockTicker supports only one kind of event and thus needs only one payload class, StockPayload, defined as below:

public class StockPayload
{

///
/// Unique ID of stock or index
///
public string StockID { get; set; }

///
/// Current value of stock or index
///
public double Value { get; set; }
}

The data members can only be basic types such as numbers, strings, etc. This is probably because the serialization mechanism in StreamInsight requires a fixed size payload.

Step 3 - input adapter

Since there is only one kind of event (stock quote) and it occurs at points in time, we can use TypedPointInputAdapter as base class. Basically the adapter has to implement methods for Start() and Resume(), monitor the AdapterState and enqueue events to the output stream. The input adapter can work with asynchronous sources through callback functions, and synchronous sources. I've implemented the input adapter as a loop that reads data from Yahoo Finance and then waits for a determined time. A CTI event is sent after each quote, to indicate there are no other events awaiting.

Associated with the Input Adapter is a configuration class, StockTickerInputConfig, that enables passing parameters to input adapter instances. It is used here to specify the stock symbol to retrieve.

The input adapter user two helper classes (not specific for StreamInsight). ScreenScraper is a helper class that reads the HTML contents of a web page and extracts a string using pattern matching (a regular expression). YahooFinance is a helper class that provides the ScreenScraper with suitable URL and pattern matching expression for the desired stock symbol.

The source code of the input adapter is attached to this article and can also be downloaded from http://stockticker.codeplex.com.

Step 4 - factory class

A factory class is provided to instantiate the adapters. The factory implements a Create method that takes a configuration and an event shape as parameter. Based on these parameters it instantiates the input adapter.

/// <summary>
/// Factory for instantiating of input adapter
/// </summary>
public class StockTickerInputFactory : ITypedInputAdapterFactory<StockTickerInputConfig>
{
public InputAdapterBase Create<TPayload>(StockTickerInputConfig config, EventShape eventShape)
{
// Only support the point event model
if (eventShape == EventShape.Point)
return new StockTickerTypedPointInput(config);
else
return default(InputAdapterBase);
}

The output adapter and output adapter factory is built similarly as the input adapter, so I don't describe it here.

Step 5 - joining the adapters

The adapters are tied together by a LINQ query. In the implicit development model we can do it as shown below. In the explicit model, the query would instead be bound through the CreateQuery method of the application.

// Join input adapters with a simple LINQ query
var combinedInputStream = (from e in input1Stream
select e).Union
(from e in input2Stream
select e);

// Connect input adapters with output adapter
var query = combinedInputStream.ToQuery(outputSink);

Finally the query is being run:

// Run the query
query.Start();
adapterStopSignal.WaitOne();
query.Stop();

Testing and debugging

SQL Server 2008 R2 StreamInsight comes with a tool, StreamInsight Debugger, that can visualize a recorded event flow. To use it you first need to write tracing data to a file. Enable tracing, run your application, and the disable tracing. This is done from command-line as shown below:

Opening the trace data file, sample.etl, in StreamInsight Debugger you can monitor the stream pipeline. The picture below shows data from the input adapter instances and how they are unioned in the LINQ query.

You even have an option in StreamInsight to replay the sequence so you can watch it like a movie. To do that you click on the clock icon in the toolbar.

Conclusion

Working with near real-time data is very different from traditional database development such as data warehouses, which are unfit for real-time data. Therefore a new approach is required. The CTP of StreamInsight looks promising as a technology for monitoring and responding to new data sources such as production systems, Facebook, Twitter and blogs.

Resources

Rate

4.8 (20)

You rated this post out of 5. Change rating

Share

Share

Rate

4.8 (20)

You rated this post out of 5. Change rating