Blog Post

A Rickety Stairway to SQL Server Data Mining, Part 12: Accessing SSDM with AMO, ADOMD, SSIS, XMLA and RS

,

by Steve Bolton

                It may set a SQLServerCentral record for the thickest alphabet soup of acronyms ever crammed into a single title, but this post neatly wraps up numerous alternative means of accessing SQL Server Data Mining (SSDM) programmatically. In four previous posts in the series of amateur self-tutorials on SSDM (as well as a brief wrap-up of four validation procedures in last week’s article), I introduced Data Mining Expressions (DMX), the SQL-like language normally used to access SQL Server mining data. It takes far less time and effort to cover the other programmatic means, since many of them have quite limited use cases and act as wrappers for DMX code. When training in new software packages, users typically learn to perform automation tasks and write custom code at the tail end of the process, since these are advanced lessons that few of them will ever need to put to practical use. Furthermore, users need to understand the features they’re automating or customizing first, which is why we tackled such heavy but indispensable topics as SSDM’s nine algorithms and how to validate mining models earlier in the series. My usual disclaimer that I’m writing this series in order to familiarize with SSDM better while giving the most under-utilized component of SQL Server some badly needed free press is especially true in this week’s post, because I’ve only recently practiced some of the functionality I’ll be introducing. I have been vaguely familiar for years with the whole alphabet soup of AMO (Analysis Management Objects), ActiveX Data Objects Multi-Dimensional for .Net (ADOMD.Net), the SQL Server Integration Services (SSIS) and Reporting Services (RS) functionality pertaining to data mining and XMLA (XML for Analysis), but never used them in-depth until preparing to write this post. As usual, I learned a few things along the way, which means one of the primary purposes of this series was served. Hopefully it will also contribute to another purpose of this series, by demonstrating to other SQL Server DBAs that they too can make productive use of SSDM without a formal background in statistics, just as they don’t need the ability to build a custom motorcycle in order to ride one.

                One of the benefits of a series like this is that it can illuminate some of the pitfalls fellow amateurs should watch out for when learning the product. A couple of these “gotchas” occur in Reporting Services, which has fairly simple functionality for one-way retrieval of DMX queries, solely for presentation purposes, in the singular medium of RS webpages. I’ll assume for the sake of convenience that readers already understand the basics of Reporting Services and how to create reports and related objects through two rival means, Report Builder and Repotting Services projects built with Visual Studio. Many tutorials posted on computing topics are bloated with steps that are self-evident or are basic to the point of being unnecessary, so throughout this series I’ve avoided in-depth discussions of how to do such simplistic tasks as creating projects in Visual Studio. Likewise, I’ll assume readers are already familiar with Report Builder, since executing DMX queries within it would be considered an advanced lesson in a tutorial on RS itself. Suffice it to say that Report Builder is an application developers download through their browsers to write reports with, while Visual Studio Reporting Services projects represent an alternate means that provides additional functionality. Some of the GUI elements of these tools are similar in appearance and are used for nearly identical purposes, but there are some subtle differences between them. The most important dissimilarity for our purposes is that DMX queries apparently can’t be written with Report Builder.

Figures 1 and 2: The Visual Studio Report Designer Interface vs. Report Builder
ReportDesignerWindow
ReportBuilderWindow

                In Figure 1, the mouse is positioned near a hammer icon that allows users to toggle the Query Designer to DMX mode; hovering over it shows the label “Command Type DMX.” The GUI in Report Builder is practically identical, except there is no hammer icon at the same mouse point in the main menu. The column names in the left cell are for a fake cube I had to add to the MonitorMiningProject we’ve been using throughout this series, in order to bypass a warning that “No cubes can be found”” when creating a data source with Report Builder. If you try to generate a report model from an RS data source through Report Builder, it only includes the cubes in your database but not the mining models. The interoperability problems between Report Builder and Visual Studio’s Report Designer don’t end there though. After creating an RS project in Visual Studio, followed by an RS data source and RS dataset, you can access the DMX Query Designer depicted in Figure 3. It works almost exactly like the Mining Model Prediction tab we already covered in A Rickety Stairway to SQL Server Data Mining, Part 10.3: DMX Prediction Queries, so I won’t belabor the point. The icon next to the mouse point allows you to edit the DMX query text directly, which is often easier for the same reasons that writing T-SQL by hand is often preferable to writing queries through SQL Server Management Studio (SSMS)’s GUI tools. The icon in between the hammer and the Import… menu item allows you to toggle back to writing Multidimensional Expressions (MDX) code for SQL Server Analysis Services (SSAS) cubes. The process of writing and deploying the datasets that use DMX is straightforward with Report Designer, until you try to open reports written with them in Report Builder. You may receive an error to the effect that “one or more data sources is missing credentials” because the data sources you created in Report Designer are left blank, even though Report Builder is capable of  opening the datasets you built from them. Adding the correct data source from your Report Server folders through Report Builder leads to the same error. If you click on the Data Source Properties and click Test Connection, it may succeed. Yet if you switch to the Credentials tab, you may see it set to “Do no use credentials,” regardless of what the setting actually is in the original data source; furthermore, all of the radio buttons to change this to the correct setting may be greyed out. I leave this for Reporting Services aficionados to puzzle over, since it’s a little beyond the scope of this series. Suffice it to say that a data miner is likely to have less difficulty using Report Designer than Report t Builder, whenever it is possible to choose between them.

Figure 3: The Query Designer in the Reporting Services Report Designer
RSQueryDesigner

                There isn’t much more to say about the data mining features SSIS provides, which is only a little more sophisticated than what we’ve already covered with Reporting Services. The two components of SQL Server are designed for different purposes though, one for extraction, transformation and loading  (ETL) and the other for publishing views of data on the Web, so any functionality they duplicate isn’t really redundant. SSIS provides two tasks specifically for SSAS, but the Analysis Services Execute DDL Task does not work with mining objects; I verified this by using the DMX code DROP MINING MODEL [SQQueryTextIDView 1], which threw a “DDL is not valid” error even before I could run the package.

Figure 4: The Analysis Services Processing Task Using Mining Models
SSISProcessingTask

                As you can see from Analysis Services Processing Task Editor in Figure 4, it is possible to use the other SSAS task to process mining objects; many of the mining models we used for practice purposes throughout this series are listed in the Add Analysis Services Object dialog in the bottom left corner. SSIS also provides a dedicated Data Mining Query Task with three tabs, the first of which allows you to select a mining model from the SSAS database you’re currently connected to. The DMX code you want to execute is supplied on the Query tab, which also has child Parameter Mapping and Result Set tabs for coordinating the data returned by the query with SSIS objects like package variables. The Build New Query button brings up the same aforementioned graphical DMX query builder used in SSMS and Report Designer. As depicted in Figure 7, the Output tab allows users to insert the results into a relational table using the specified relational connection. It appears to automatically prepend the relational database’s default schema – i.e., dbo in most cases – so don’t try tacking a schema onto the table name in the Output tab. One day I may attempt a performance comparison between the SSIS method of inserting DMX results (which seems to perform a bulk insert, from what I can glean from a Profiler trace) and the one I discussed in A Rickety Stairway to SQL Server Data Mining, Part 10.4: Data Mining Dimensions vs. Drastic Dynamic DMX, using OpenQuery. I mentioned in that post that there are three further operations we can performed on data once it has been mined, including presentation to the end user for human analysis by a real intelligence, plugging it back into the mining process for further refinement, or automated actions that are taken without human intervention. SSIS is a good candidate tool for the second and is really the only option for the third. Recursive mining pipelines which pump data in a circle from relational tables into cubes and mining algorithms, then back to the relational side again for another round trip reside on the cutting edge of data mining, given that SQL Server (and probably none of its competitors) doesn’t provide a way to do it out of the box. Automated action is beyond the bleeding edge, into uncharted regions only futurists dare to explore.

Figures 5 to 7: The SSIS Data Mining Query Task and Its Tabs
DataMiningQueryTaskMiningModelTab
DataMiningQueryTaskQueryTab
DataMiningQueryTaskOutputTab

                Whenever that future comes to pass, automated action based on mining results will be performed by tools like SSIS. It is unlikely to be performed by XMLA, which is merely the language SSAS uses to communicate with other components, using HTTP and the Simple Object Access Protocol (SOAP). Thankfully, server communication is ordinarily performed through a compressed binary version of XMLA, which is otherwise verbose to the point that it would waste server resources. Both the code and data returned by XMLA are illegible and obese, but SSMS provides a button for XMLA queries for those souls intrepid or foolhardy enough to attempt the feat of writing an XMLA query by hand. Except for checking performance counters or performing backup operations, I have yet to find a use case for writing XMLA by hand, and I question why examples like these are even necessary. The relational side of SQL Server doesn’t demand that you access management information or perform important tasks like backups in such a wasteful manner; imagine the outcry – or laughter by our competitors at Oracle – if our relational DBAs had to hand-code XML each time they wanted to check a performance counter or perform a backup. Figure 8 shows an example of how to query the DMSCHEMA_MINING_COLUMNS rowset, which provides metadata in a manner similar to a dynamic management view (DMV). Using SELECT * FROM [$system].[DMSCHEMA_MINING_COLUMNS] we can perform the same query with one line of code that takes milliseconds to type. Equivalent relational code likewise takes one line: SELECT * FROM sys.columns. The query in Figure 8 takes eight lines, while the output requires a long scrollbar. Simply reading through the code to figure out what table is being referenced is difficult enough, but tuning out all of the noise in the XML results is practically impossible. I hereby throw down the gauntlet and challenge anyone on the planet to a duel, to see how much time it takes to type XMLA code or read its results, vs. alternative means like DMX. It is possible to run DMX Select queries in XMLA by encapsulating them with tags like <Execute> and <Statement>, but they suffer from the same illegibility.

Figure 8: A Typical Bloated XMLA Query
XMLAQuery

               You’re much better off simply issuing a DMX query, or even encapsulating it in a T-SQL OpenQuery statement, as I discussed in the last tutorial; no matter how complex the syntax gets, it can’t be harder to wade through such a mess. It is a mystery to me why languages like MDX and DMX haven’t been upgraded with T-SQL style keywords like BACKUP; perhaps it’s all part of the XML programming craze that began a decade ago, and went overboard on occasion by reviving verbose declarative programming and reversing the long-term industry trend towards encapsulation of code. I would love Windows Presentation Foundation (WPF), except for the fact that I now find myself hand-typing Extensible Markup Language (XAML) code, provoking flashbacks of slaving over typesetters at various newspapers in the 1980s. I get the same sinking feeling of retrogression when I’m forced to use PowerShell, which can be utilized to issue ADOMD and XMLA commands, thereby further eroding encapsulation and legibility, in addition to vastly increasing the amount of time it takes to type commands. As a courtesy, I will post a few links here to those who want to use PowerShell with SSDM, such as ScriptingGuy’s Use PowerShell to Perform SQL Data Mining page and the Marktab.net pages Programming SQL Server Data Mining with PowerShell 2.0 and Ed Wilson Interview — Microsoft PowerShell. I also ran across an example of someone who attempted to perform data mining entirely within PowerShell, without accessing any tool like SSDM. That’s impressive in the same sense that walking on stilts deserves applause, but I wouldn’t use it in a production environment, for the same reason that I wouldn’t try to run from a hungry lion on stilts. It’s just not practical. Data mining is difficult enough without the added clutter imposed by verbose programming languages, so whenever possible, substitute DMX code, SSIS tasks or ADOMD.Net. Perhaps there are use cases for XMLA and PowerShell with SSDM that I haven’t thought of yet, so I will bite my lip about my true inner feelings about both, which are whispered about by other SQL Server users in hushed tones. At the risk of starting a nuclear flame war that might burn down half of the SQL Server blogosphere, I will question its usefulness as  a SQL Server tool in a Windows environment, until someone gives me clear examples of use cases where writing PowerShell code is faster than equivalent SQL Server Management Object (SMO) code. If it can access functionality that doesn’t yet exist in SMO, it needs to be added ASAP, because SMO beats PowerShell hands down in code reuse. This is one of the many Achilles Heels which make command line tools such a drag on the whole computing industry; they usually perform the exact same tasks as their GUI counterparts, but with a lot more unnecessary and arcane code that is bound to slow down even those who are proficient at using them. Usually I’m open-minded about programming tools, where “the more the merrier” is sometimes the case – as long as there is a valid use case, no matter how narrow, that the tool can perform better than others. There’s no point in keeping a hand crank telephone in your kitchen though, except perhaps for aesthetic reasons. I suppose that a new market could be created for hand crank telephones if AT&T and the other leading telecommunications companies added a few bells and whistles and mounted a big propaganda campaign, painting them as a “hot technology.” But I wouldn’t want to have to dial 911 with one in an emergency.

               Data mining is a high-level activity that is not going to leave you much time or mental energy for monkeying around with low-level tasks like hand-editing XML or PowerShell scripts on a regular basis. In practically every use case, you’re better off writing .Net code using AMO and ADOMD classes, which perform the same automation tasks as SMO, except for Analysis Services. They are used for DDL and DML respectively and can be programmed with any of the .Net languages. I went to the trouble of getting a Microsoft Certified Solution Developer (MCSD) certification in Visual Basic 6.0 back in the day and have kept up with the language ever since, so I’ll stick with VB in the examples I provide. It’s a matter of taste, but I consider the syntax to be more legible than C#. Unfortunately, I’ve watched the quality of Microsoft’s Visual Studio documentation decline precipitously with each passing release since VB 5.0, with the latest manifestation being the elimination of many useful code examples in favor of C#, which is clearly Microsoft’s preferred programming primus inter pares now. That is also one of the reasons why I have seen calls for AMO and ADOMD code samples written in VB.Net proliferate in recent years. I don’t recall running across any yet, so I’ll provide a few in this post.

                The DLLs Visual Studio needs to reference for AMO and ADOMD programming may not already be installed with SQL Server 2012 Developer Edition, in which case you will need to download SQL_AS_AMO.msi and SQL_AS_ADOMD.msi from the SQL Server 2012 Feature Pack webpage. You will also see a SQL_AS_OLEDB.msi installer for OLEDB programming, which I’m not familiar with. You can also download SQL_AS_DMViewer.msi if you want to use the Data Mining Viewer control in a Visual Studio project, but the disappointing thing is that it only works in Windows Forms 2.0, which is rapidly becoming obsolete. It was already superseded by WPF in Visual Studio 9.0 and 10.0, but even WPF may be on its way out, should Microsoft succeeded in its wildest dreams with its unworkable Metro interface. After finishing installation of AMO and AMOMD .msi packages (should it prove necessary), set references in your Visual Studio project to Analysis Management Objects and Microsoft.AnalysisServices.AdomdClient.dll. If the latter is not visible in the list of available references, try navigating to your application’s Properties command under the Project menu item, then select the Compile tab and click on the Advanced Compile Options… button hidden at the bottom of the page. Changing the Target framework from .net Framework 4 Client Profile to .Net Framework 4 in the Advanced Compiler Settings dialog window fixed this problem for me.[ii] For some tasks we will cover in next week’s tutorial on ADOMDServer stored procedures it may also be necessary to set a reference in your Visual Studio project to msmgdsrv.dll, which may be located in your Program Files\Microsoft Analysis Services\AS OLEDB\110 folder.

                As usual, I’m not going to clutter thise tutorial with unnecessary or irrelevant steps, so I’ll assume readers have a working knowledge of the Visual Basic language and how to create and manage solutions in Visual Studio.Net 2010. After creating a project, you must add an Imports Microsoft.AnalysisServices statement at the top of any code window in which you want to reference the AMO or ADOMD object models. The next task is usually to instantiate some items at the apex of the object model, such as the server and database objects in the AMO example in Figure 9. Typically when working with SQL Server, one of the first things we need to do is open a connection – which can be trickier than it seems, given that connection strings for any components of all database server software in general seem to be poorly documented. Visual Studio simplifies the rest of the process, because you can use a combination of the Object Browser, Intellisense and watches to figure out what an object model’s items can do. Quite often I can figure out how to write code for an object model on my own faster this way than by consulting the documentation or professional tutorials. I suppose I should use some of that saved time to learn how to post VB code properly; this is my first attempt, and I obviously have much to learn about incorporating line numbers and wrapping text. The word “Temp” is just the flag I’ve used for years to indicate local variables. Yet it is fairly easy to decipher the code in Figure 9, which loops through the mining structure collection of a database and the mining models collection of each structure to display some important properties. In most cases of AMO and ADOMD objects, you can also use standard VB collection syntax to perform operations like Add, Remove, Count and Item. If we chose to, we could also dig down deeper into lower-level collections, like the Columns collections for both the structures and the models. This example goes four levels below the parent database, into the model algorithms and their parameters.

Figure 9: AMO Code to View Mining Object Properties
Dim TempServer As New Microsoft.AnalysisServices.Server
Dim TempDatabase As New Microsoft.AnalysisServices.Database

TempServer.Connect(“Data Source=127.0.0.1:2384;Initial Catalog=MonitorMiningProject”)
TempDatabase = TempServer.Databases(“MonitorMiningProject”)

For I = 0 To TempDatabase.MiningStructures.Count - 1

Debug.Print(“Structure Name: “ + TempDatabase.MiningStructures(I).Name)
       Debug.Print(HoldoutActualSize: ” + TempDatabase.MiningStructures(1).HoldoutActualSize.ToString

       Debug.Print(“HoldoutMaxPercent: “ + TempDatabase.MiningStructures(1).HoldoutMaxPercent.ToString) ‘you can set these structure properties as well etc.

       Debug.Print(“HoldoutMaxCases: “ + TempDatabase.MiningStructures(1).HoldoutMaxCases.ToString)

       Debug.Print(“HoldoutSeed: “ + TempDatabase.MiningStructures(1).HoldoutSeed.ToString)

For J = 0 To TempDatabase.MiningStructures(I).MiningModels.Count – 1
          Debug.Print(    + TempDatabase.MiningStructures(I).MiningModels(J).Name()
          Debug.Print(         + TempDatabase.MiningStructures(I).MiningModels(J).Algorithm())

 For K = 0 To TempDatabase.MiningStructures(I).MiningModels(J).AlgorithmParameters.Count – 1

             Debug.Print(         + TempDatabase.MiningStructures(I).MiningModels(J).AlgorithmParameters(K).Name

+ ” “ + TempDatabase.MiningStructures(I).MiningModels(J).AlgorithmParameters(K).Value.ToString)

           Next K

      Next J
Next I       We can also instantiate or delete objects and change most of their properties. In fact, we can do things with AMO that aren’t possible in DMX, such as creating data sources and data source views (DSVs). It is necessary, however, to call the Update command on parent objects before creating objects that reference, which is why the data source in Figure 10 is created and updated first, then the DSV, followed by the mining structure and its model. It is also necessary to set a unique ID and often the Name property for many new objects as well. The data source is little more than a glorified connection, but the DSV requires code showing SSDM how to load data from the relational table or cube that supplies the data to be mined. In this case, we’ll be selecting some records from Monitoring.dm_os_wait_stats, a table of wait stat data that we’ve been using throughout this series for practice data. The dataset/adapter/SQLCommand code depicted below is fairly standard; I double-checked the syntax against Data Mining with Microsoft SQL Server 2008, the classic reference written by former members of Microsoft’s Data Mining Team, but there really aren’t many different ways you can code this. I did depend on their reference to learn how to do data bindings though.[iii]

Figure 10: AMO Code to Create New Mining Objects
create the data source
Dim NewRelationalDataSource As New RelationalDataSource
NewRelationalDataSource.ConnectionString = “Provider=SQLNCLI11.1;Data Source=MYSERVER;Integrated Security=SSPI;Initial Catalog=Monitoring”
NewRelationalDataSource.Name = NewRelationalDataSource
NewRelationalDataSource.ID = NewRelationalDataSource

TempDatabase.DataSources.Add(NewRelationalDataSource)

in this case, the DataSource must exist server side before we can create the DSV below that references it
TempDatabase.Update(UpdateOptions.ExpandFull)

 

create a DSV that references the dataset
Dim TempDataset As New System.Data.DataSet

Dim TempDataAdapter As New System.Data.SqlClient.SqlDataAdapter
Dim TempSQLCommand As New System.Data.SqlClient.SqlCommand
Dim TempSQLConnection As New System.Data.SqlClient.SqlConnection

TempSQLConnection.ConnectionString = “Data Source=MYSERVER;Integrated Security=SSPI;Initial Catalog=Monitoring”
TempSQLCommand.Connection = TempSQLConnection
TempSQLConnection.Open()

TempSQLCommand.CommandText = “SELECT TOP 100 ID, WaitTypeID FROM Monitoring.dm_os_wait_stats ‘this is a table of practice data we used throughout the series of tutorials
TempDataAdapter.SelectCommand = TempSQLCommand
TempDataAdapter.Fill(TempDataset, WaitTypeTable)
TempSQLConnection.Close()

Dim NewDataSourceView As New Microsoft.AnalysisServices.DataSourceView
NewDataSourceView.DataSourceID = NewRelationalDataSource
NewDataSourceView.Name = NewDataSourceView
NewDataSourceView.ID = NewDataSourceView
TempDatabase.DataSourceViews.Add(NewDataSourceView)

use the Update command after performing DDL operations on the database
in this case, the DSV must exist server side before we can create the mining structure below that references it
TempDatabase.Update(UpdateOptions.ExpandFull)

 create a new mining structure based on the new data source

Dim NewMiningStructure As New Microsoft.AnalysisServices.MiningStructure
Dim IDStructureColumn As New Microsoft.AnalysisServices.ScalarMiningStructureColumn
Dim WaitTypeIDStructureColumn As New Microsoft.AnalysisServices.ScalarMiningStructureColumn

set the structure binding to the new data source view
NewMiningStructure.Source = New DataSourceViewBinding(NewDataSourceView)

now set various column properties and bindings
IDStructureColumn.Content = “Key”  ‘don’t forget to set this for at least 1 column
IDStructureColumn.IsKey = True ‘another “Key” property to set ;)
IDStructureColumn.Type = “Long”
IDStructureColumn.Distribution = “Uniform”
IDStructureColumn.Name = “ID”
IDStructureColumn.ID = “ID”
IDStructureColumn.KeyColumns.Add(WaitTypeTable, “ID”, System.Data.OleDb.OleDbType.BigInt)
‘ set the data bindings
WaitTypeIDStructureColumn.Content = “Discretized” ‘the Content property can also refer to ClassifiedColumns types, which are relevant to the advanced topic of plug-in algorithms
WaitTypeIDStructureColumn.DiscretizationBucketCount = 15
WaitTypeIDStructureColumn.DiscretizationMethod = “EqualAreas”
WaitTypeIDStructureColumn.Type = “Long”
WaitTypeIDStructureColumn.Distribution = “Uniform”
WaitTypeIDStructureColumn.Name = “WaitTypeID”
WaitTypeIDStructureColumn.ID = “WaitTypeID”
WaitTypeIDStructureColumn.ModelingFlags.Add(“NOT

NULL”)
WaitTypeIDStructureColumn.KeyColumns.Add(WaitTypeTable, “WaitTypeID”, System.Data.OleDb.OleDbType.BigInt)

set the data bindings
NewMiningStructure.Name = VBPracticeMiningStructure
NewMiningStructure.ID = VBPracticeMiningStructure
NewMiningStructure.Columns.Add(IDStructureColumn)
NewMiningStructure.Columns.Add(WaitTypeIDStructureColumn)
 
use the Update command after performing DDL operations on the database
TempDatabase.MiningStructures.Add(NewMiningStructure)
TempDatabase.Update()

Dim NewMiningModel As New Microsoft.AnalysisServices.MiningModel
NewMiningModel.Name = VBPracticeMiningModel
NewMiningModel.Algorithm = Microsoft_Clustering
NewMiningModel.AlgorithmParameters.Add(“CLUSTER_COUNT”, 15)

Dim NewMiningModelColumn As New Microsoft.AnalysisServices.MiningModelColumn
NewMiningModelColumn.Name = “ID” ‘the mining model must include the case key column
NewMiningModelColumn.Usage = “Key” ‘yet another “Key” property to set ;)
NewMiningModelColumn.SourceColumnID = “ID” ‘this is the ID of Structure column
NewMiningModel.Columns.Add(NewMiningModelColumn)

Dim NewMiningModelColumn2 As New Microsoft.AnalysisServices.MiningModelColumn
NewMiningModelColumn2.Name = “WaitTypeID”
NewMiningModelColumn2.SourceColumnID = “WaitTypeID” ‘this is the ID of Structure column
NewMiningModel.Columns.Add(NewMiningModelColumn2)

use the Update command after performing DDL operations on the database
TempDatabase.MiningStructures(“VBPracticeMiningStructure”).MiningModels.Add(NewMiningModel)
NewMiningStructure.Update()
NewMiningModel.Update()

Debug.Print(TempDatabase.MiningStructures(1).LastProcessed.ToString)
            process a structure or model
           TempDatabase.MiningStructures(1).Process()
            TempDatabase.MiningStructures(1).MiningModels(1).Process()

             The example above creates two columns from the new DSV, ID and WaitTypeID, and adds them to the columns collection of a new mining structure. Pretty much all of the same structure, model and column properties you can set in the GUI of SSDM are available in the object model, plus a few that aren’t. At least one of the columns must have  a Content type of Key and have its IsKey property set to true, otherwise you’ll receive an error like: “Error (Data mining): The ‘VBPracticeMiningStructure’ mining structure does not have a case key column.” After this we can create mining model columns and correlate them with the appropriate structure columns using their SourceColumnID properties. We can add the model columns to their parent models after this, then add the new mining model to its parent structure, followed by a couple of Update statements. For the sake of simplicity I’ll leave out nested tables, but it would be fairly simple to include a second table in the dataset code that defines the DSV, or by creating an object of type AnalysisServices.TableMiningStructureColumn rather than AnalysisServices.ScalarMiningStructureColumn like we did above; just remember to add some child columns, otherwise you’ll receive an error when you try to deploy the new objects. That big mass of code up there may look intimidating, but it’s all really quite elementary. If you prefer C# examples I suggest you refer to the DM Team’s book, which goes into much more depth and tackles other important use cases, such as how to process models. In a nutshell, it’s fairly simple; both structure and model objects have a Process method, which you can call like so: TempDatabase.MiningStructures(1).MiningModels(1).Process(). Voila. If you can think of an operation that can be performed on an SSDM object, it’s certain to be in the object model – plus a few that can’t be done in the GUI or in DMX, such as creating DSVs. SSMS and Business Intelligence Development Studio, i.e. the 2008 version of SQL Server Data Tools (SSDT), were both written with AMO, so any functionality you find there has to be in the object model somewhere.[iv] Nonetheless, it’s a fairly simple object model that isn’t difficult to navigate either.

               ADOMD.Net is probably even easier to use, although it doesn’t look like it from the three figures that follow. I’m more accustomed to using .Net datagrids and datasets, so I decided to build some of those for my example rather than use an ADOMD DataReader, as the DM Team explains how to do in their book. In Figure 11 you’ll see an ordinary .Net 2010 dataset, which will hold the results of our DMX query. In this case, we will be retrieving a subset of the columns of DMSCHEMA_MINING_SERVICES, the equivalent of a relational system table for SSDM algorithms, as we touched on briefly in A Rickety Stairway to SQL Server Data Mining, Part 10.2: DMX DML. Keep in mind that your column names must match those returned by the query, unless you use column aliases. I had a devil of a time with data type conversion errors until I realized that I couldn’t remove the underscores from the column names in the dataset, as I usually do whenever possible for legibility purposes, because .Net and ADOMD does not automatically convert them by their ordinal position in a query. Some of my columns were populated properly and others were left null in my query and dataset, at a time when I was retrieving the full set of DMSCHEMA_MINING_SERVICES columns. Also make sure to assign the .Net data types that correspond to the right OLE DB data types that ADOMD will return; you may need to consult Books Online (BOL) to see the OLE DB types that system objects like this will return, then look up the conversion table at a webpage like Mapping .NET Framework Data Provider Data Types to .NET Framework Data Types. I’ll skip over an explanation of Figure 12, which is merely the XAML to create the WPF datagrid that we will display our results in. There’s nothing special of interest there unless you want to reverse engineer this project.

Figure 11: Dataset Structure for the ADOMD Query Example
Dataset

 Figure 12: XAML for the ADOMD Query Example
<Window x:Class=”MainWindow”
    xmlns=”http://schemas.microsoft.com/winfx/2006/xaml/presentation&#8221; xmlns:x=”http://schemas.microsoft.com/winfx/2006/xaml“ Title=”MainWindow xmlns:my=”clr-namespace:WpfApplication1″ Height=”617″ Width=”1481″>
<Window.Resources>
<my:ADOMDResultDataaset x:Key=”ADOMDResultDataset”

/>
</Window.Resources>
<Grid Width=”1466″>
<Grid.ColumnDefinitions>     
     
<ColumnDefinition Width=”1437*” />
            <ColumnDefinition Width=”12*” />
</Grid.ColumnDefinitions>
<DataGrid AutoGenerateColumns=”False” Height=”524″ HorizontalAlignment=”Left” Margin=”12,42,0,0 Name=”DataGrid1″ VerticalAlignment=”Top” Width=”1422″ ItemsSource=”{Binding Source={StaticResource MiningServicesViewSource}}” FontSize=”9″>
             <DataGrid.Columns>
                <DataGridTextColumn x:Name=”Column1″ Binding=”{Binding Path=ID}” Header=”ID” Width=”30″ IsReadOnly=”True” FontSize=”8″ />
                <DataGridTextColumn x:Name=”Column2″ Binding=”{Binding Path=Service_Name}” Header=”ServiceName Width=”100″  />
                <DataGridTextColumn x:Name=”Column5″ Binding=”{Binding Path=Service_GUID}” Header=”ServiceGUID Width=”80″  />
                <DataGridTextColumn x:Name=”Column8″ Binding=”{Binding Path=Supported_Distribution_Flags}” Header=”DistributionFlags Width=”100″  />
                <DataGridTextColumn x:Name=”Column9″ Binding=”{Binding Path=Supported_Input_Content_Types}” Header=”InputContentTypes Width=”100″  />
                <DataGridTextColumn x:Name=”Column10″ Binding=”{Binding Path=Supported_Prediction_Content_Types}” Header=”PredictionContentTypes Width=”100″  />
                <DataGridTextColumn x:Name=”Column11″ Binding=”{Binding Path=Supported_Modeling_Flags}” Header=”ModelingFlags” Width=”100″  />
                <DataGridTextColumn x:Name=”Column13″ Binding=”{Binding Path=Training_Complexity}” Header=”TrainingComplexity Width=”100″  />               
                <DataGridTextColumn x:Name=”Column14″ Binding=”{Binding Path=Prediction_Complexity}” Header=”PredictionComplexity Width=”100″  />

                <DataGridTextColumn x:Name=”Column15″ Binding=”{Binding Path=Expected_Quality}” Header=”ExpectedQuality Width=”100″  />
                <DataGridCheckBoxColumn x:Name=”Column18″ Binding=”{Binding Path=Allow_PMML_Initialization}” Header=”AllowPMMLInitialization Width=”100″  />
                <DataGridCheckBoxColumn x:Name=”Column24″ Binding=”{Binding Path=MSOLAP_Supports_Analysis_Services_DDL}” Header=”SupportsSSASDDL Width=”100″  />
               <DataGridCheckBoxColumn x:Name=”Column25″ Binding=”{Binding Path=MSOLAP_Supports_OLAP_Mining_Models}” Header=”SupportsOLAPModels Width=”100″  />
              <DataGridCheckBoxColumn x:Name=”Column26″ Binding=”{Binding Path=MSOLAP_Supports_Data_Mining_Dimensions}” Header=”SupportsMiningDimensions Width=”100″  />
              <DataGridCheckBoxColumn x:Name=”Column27″ Binding=”{Binding Path=MSOLAP_Supports_Drillthrough}” Header=”SupportsDrillthrough Width=”100″  />
 </DataGrid.Columns>
</DataGrid>

<Button Content=”Fill Grid” Height=”32″ HorizontalAlignment=”Left” Margin=”12,0,0,0 Name=”Button1″ VerticalAlignment=”Top” Width=”75″ />
<Button Content=”Create New Structure” Height=”37″ HorizontalAlignment=”Left” Margin=”93,-2,0,0 Name=”Button2″ VerticalAlignment=”Top” Width=”176″ Visibility=”Hidden” />
<Button Content=”Create SSAS SP” Height=”28″ HorizontalAlignment=”Left” Margin=”287,4,0,0 Name=”Button3″ VerticalAlignment=”Top” Width=”103″ Visibility=”Hidden” />
</Grid>
</Window>

               The actual ADOMD code is much shorter than either the dataset or XAML code above, or the AMO code from the previous example. The first line is just a reference to the dataset we just created, which has project-wide scope and requires a reference in the Windows.Resource section of the XAML. The code to create data command and data adapter objects is of little consequence, since it’s nearly identical to ADO and SMO code. We just create a connection, open it, set the command text, then fill the only table in our dataset and close the connection. After we start the project, all we need to do to fill the datagrid in Figure 13 with the results of our query is click the Fill Grid button, whose event handler includes this ADOMD code.

Figure 13: ADOMD Code to Display a Simple DMX Query in WPF
Dim ADOMDResultDataaset As WpfApplication1.ADOMDResultDataaset = CType(Me.FindResource(ADOMDResultDataset), WpfApplication1.ADOMDResultDataaset)
Dim TempConnection As New Microsoft.AnalysisServices.AdomdClient.AdomdConnection
Dim TempCommand As New Microsoft.AnalysisServices.AdomdClient.AdomdCommand
Dim TempAdapter As New Microsoft.AnalysisServices.AdomdClient.AdomdDataAdapter

TempConnection.ConnectionString = “Data Source=127.0.0.1:2384;Initial Catalog=MonitorMiningProject”
TempConnection.Open()

TempCommand = TempConnection.CreateCommand() ‘make sure the column names match
TempCommand.CommandText = “SELECT Service_Name, Service_GUID, Supported_Distribution_Flags,
Supported_Input_Content_Types, Supported_Prediction_Content_Types, Supported_Modeling_Flags, Training_Complexity,
Prediction_Complexity, Expected_Quality, Allow_PMML_Initialization,  MSOLAP_Supports_Analysis_Services_DDL, MSOLAP_Supports_OLAP_Mining_Models,
MSOLAP_Supports_Data_Mining_Dimensions, MSOLAP_Supports_Drillthrough FROM

[$system].[DMSCHEMA_MINING_SERVICES]“ ‘select some data from the server

TempAdapter.SelectCommand = TempCommand
TempAdapter.Fill(ADOMDResultDataaset, MiningServicesTable)
TempConnection.Close()

Figure 14: Results of an ADOMD Query Displayed in a VB WPF Datagrid (click to enlarge)
GridResults

                The ADOMD object model also exposes pretty much every property or method you need to perform anything you’d ordinarily do in DMX code or the SSDM GUI. It’s also fairly simple as object models go, especially since most of the class members are relevant to OLAP functionality, such as collections and properties for cubes, tuples, named sets, KPIs, members, levels, hierarchies, dimensions and cells. If you go exploring through the ADOMD and AMO object models, you may even find curious references to functionality that doesn’t seem to be anywhere else. For example, the model column DiscretizationMethod property allows you to specify two values I’ve never seen before, Thresholds and EqualRanges, and which I’ve only seen documented in the AMO object model. I also ran across a curious method called GetFullStatistics while experimenting with Microsoft.AnalysisServices.AdomdServer, a version of ADOMD.Net which can boost conserve processing resources by performing its operations server-side. This was a feature of SSDM I was completely unaware of until I read the DM Team’s book, or so I thought; I had planned ages ago to learn how to write SSAS stored procedures and was glad to learn in middle of my experiments that they’re the same thing. This is one of the most neglected features of Analysis Services at present, but I suspect that in the long run it may prove to be a hidden gem. Writing the code for my first SSAS procedures was a beneficial learning experience for me, one that really calls for a separate article. I’ve also done little experimentation with similar CLR functionality on the relational side, like managed code aggregates and user defined functions (UDFs), so it might be appropriate to compare the workflows, use cases and pitfalls of them all in one gulp, in next week’s installment. After that, there will be no place left to climb up our stairway except the ricketiest, most challenging and perhaps most rewarding one of them all, writing plug-in algorithms.


p. 480, MacLennan, Jamie; Tang, ZhaoHui and Crivat, Bogdan, 2009, Data Mining with Microsoft SQL Server 2008. Wiley Publishing: Indianapolis.

[ii] I found this solution in a post by Robert Williams at the webpage “How to Change VS2010 Add Reference Box Filter?” published April 16, 2010 at StackOverflow.com.

[iii] pp. 510-511, MacLennan, et al.

[iv] IBID., p. 502

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating