Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Integrating Other Data Mining Tools with SQL Server, Part 2.1: The Minuscule Hassles of Minitab

By Steve Bolton

…………It may be called Minitab, but SQL Server users can derive maximum benefits from the Windows version of this professional data mining and statistics tool – provided that they use it for tasks that SQL Server doesn’t do natively. This was one the caveats I also… Read more

0 comments, 87 reads

Posted in Multidimensional Mayhem on 30 June 2015

Integrating Other Data Mining Tools with SQL Server, Part 1.2: Finding Use Cases for WEKA

By Steve Bolton

…………As recounted in the first installment of this occasional series of amateur self-tutorials, there are some serious limitations to using Waikato Environment for Knowledge Analysis (WEKA), a popular open source data mining tool, in a SQL Server environment. The documentation is full of white space and… Read more

0 comments, 490 reads

Posted in Multidimensional Mayhem on 2 June 2015

Integrating Other Data Mining Tools with SQL Server, Part 1.1: The Weaknesses of WEKA

By Steve Bolton

…………The same rules that applied to my amateur mistutorial series on A Rickety Stairway to SQL Server Data Mining and Outlier Detection with SQL Server are in play for this series of occasional articles, which will provide a brief overview of using various third-party data mining… Read more

0 comments, 202 reads

Posted in Multidimensional Mayhem on 28 May 2015

Outlier Detection with SQL Server, part 6.3: Visual Outlier Detection with Reporting Services Plots and SSDM Clustering

By Steve Bolton

…………When the goal is to illustrate how just how outlying an outlier may be, the efficiency with which scatter plots represent distances really can’t be beaten. It doesn’t take any training in mathematics to look at one and notice that a few data points are further… Read more

0 comments, 146 reads

Posted in Multidimensional Mayhem on 17 May 2015

Outlier Detection with SQL Server, part 6.2: Finding Outliers Visually with Reporting Services Box Plots

 By Steve Bolton

…………Throughout this series of amateur mistutorials in using SQL Server to identify outliers, we have repeatedly seen that the existing tried-and-true methods of detection long used for such purposes as hypothesis testing are actually poorly suited for finding aberrant values in large databases. The same problem… Read more

0 comments, 182 reads

Posted in Multidimensional Mayhem on 29 April 2015

Outlier Detection with SQL Server, part 6.1: Visual Outlier Detection with Reporting Services

 By Steve Bolton

…………Most of the previous articles in this self-tutorials on using SQL Server to find outliers required us to implement statistical formulas, in order to derive measures that required some explanation before they could be interpreted correctly. In this segment of the series, we’ll be discussing a… Read more

0 comments, 192 reads

Posted in Multidimensional Mayhem on 21 April 2015

Outlier Detection with SQL Server, part 5: Interquartile Range

By Steve Bolton

…………The last seven articles in this series of mistutorials on identifying outlying values in SQL Server database were clunkers, in the sense that the methods had many properties in common that made them inapplicable to the scenarios DBAs typically need them for. Chauvenet’s Criterion, Peirce’s Criterion,… Read more

0 comments, 222 reads

Posted in Multidimensional Mayhem on 30 March 2015

Outlier Detection with SQL Server, part 4: Peirce’s Criterion

By Steve Bolton

…………In the last couple of installments of this amateur series of self-tutorials on outlier identification with SQL Server, we dealt with detection methods that required recursive recomputation of the underlying aggregates. This week’s topic, Peirce’s Criterion, also flags outliers in an iterative manner, but doesn’t require… Read more

2 comments, 5,571 reads

Posted in Multidimensional Mayhem on 20 March 2015

Outlier Detection with SQL Server, part 3.6: Chauvenet’s Criterion

By Steve Bolton

…………This is the last of six articles I’ve segregated in this middle of my mistutorial series on identifying outlying values with SQL Server, because they turned out to be difficult to apply to the typical use cases DBAs encounter. After this detour we’ll get back on… Read more

0 comments, 284 reads

Posted in Multidimensional Mayhem on 28 February 2015

Outlier Detection with SQL Server, part 3.5: The Modified Thompson Tau Test

By Steve Bolton

…………Based on what little experience I’ve gained from writing this series on finding outliers in SQL Server databases, I expected the Modified Thompson Tau test to be a clunker. It marries the math underpinning one of the most ubiquitous means of outlier detection, Z-Scores, with the… Read more

0 comments, 6,291 reads

Posted in Multidimensional Mayhem on 14 February 2015

Outlier Detection with SQL Server, part 3.4: Dixon’s Q-Test

By Steve Bolton

…………In the last three installments of this amateur series of mistutorials on finding outliers using SQL Server, we delved into a subset of standard detection methods taken from the realm of statistical hypothesis testing. These are generally more difficult to apply to tables of thousands of… Read more

0 comments, 468 reads

Posted in Multidimensional Mayhem on 30 January 2015

Outlier Detection with SQL Server, part 3.3: The Limitations of the Tietjen-Moore Test

By Steve Bolton

…………The Tietjen-Moore test may have the coolest-soundest name of any of the outlier detection methods I’ll be surveying haphazardly in this amateur series of mistutorials, yet it suffers from some debilitating limitations that may render it among the least useful for SQL Server DBAs. It is… Read more

0 comments, 252 reads

Posted in Multidimensional Mayhem on 20 January 2015

Outlier Detection with SQL Server, part 3.2: GESD

By Steve Bolton

…………In the last edition of this amateur series of self-tutorials on finding outlying values in SQL Server columns, I mentioned that Grubbs’ Test has a number of limitations that sharply constrain its usefulness to DBAs. The Generalized Extreme Studentized Deviate Test (GESD) suffers from some of… Read more

0 comments, 5,349 reads

Posted in Multidimensional Mayhem on 17 December 2014

Outlier Detection with SQL Server, part 3.1: Grubbs’ Test


By Steve Bolton

…………In the last two installments of this series of amateur self-tutorials, I mentioned that the various means of detecting outliers with SQL Server might best be explained as a function of the uses cases, the context determined by the questions one chooses to ask of the… Read more

2 comments, 6,175 reads

Posted in Multidimensional Mayhem on 29 November 2014

Outlier Detection with SQL Server, part 2.2: Modified Z-Scores

By Steve Bolton

…………There are apparently many subtle variations on Z-Scores, a ubiquitous measure that is practically a cornerstone in the foundation of statistics. The popularity and ease of implementation of Z-Scores are what made me decide to tackle them early on in this series of amateur self-tutorials, on… Read more

0 comments, 492 reads

Posted in Multidimensional Mayhem on 13 November 2014

Outlier Detection with SQL Server, part 2.1: Z-Scores

By Steve Bolton

…………Using SQL Server to ferret out those aberrant data points we call outliers may call for some complex T-SQL, Multidimensional Expressions (MDX) or Common Language Runtime (CLR) code. Yet thankfully, the logic and math that underpin the standard means of outlier detection I’ll delve into in… Read more

2 comments, 767 reads

Posted in Multidimensional Mayhem on 28 October 2014

Outlier Detection with SQL Server, part 1: Finding Fraud and Folly with Benford’s Law

By Steve Bolton

…………My last blog series, A Rickety Stairway to SQL Server Data Mining, often epitomized a quip by University of Connecticut statistician Daniel T. Larose, to the effect that “data mining is easy to do badly.”[1] It is clear that today’s sophisticated mining algorithms can still… Read more

2 comments, 1,521 reads

Posted in Multidimensional Mayhem on 19 September 2014

Stay Tuned…for a SQL Server Tutorial Series Juggling Act

by Steve Bolton

…………If all goes according to plan, my blog will return in a few weeks with two brand new series, Using Other Data Mining Tools with SQL Server and Information Measurement with SQL Server. Yes, I will be attempting what amounts to a circus act among SQL… Read more

0 comments, 277 reads

Posted in Multidimensional Mayhem on 1 July 2014

A Rickety Stairway to SQL Server Data Mining, Part 15, The Grand Finale: Custom Data Mining Viewers


By Steve Bolton

…………As mentioned previously in this amateur self-tutorial series on the most neglected component of Microsoft’s leading database server software, SQL Server Data Mining (SSDM) can be extended through many means, such as Analysis Services stored procedures, CLR functionality, custom mining functions and plug-in algorithms. I had… Read more

0 comments, 1,215 reads

Posted in Multidimensional Mayhem on 11 February 2014

A Rickety Stairway to SQL Server Data Mining, Part 14.8: PMML Hell


By Steve Bolton

…………In A Rickety Stairway to SQL Server Data Mining, Part 14.3: Debugging and Deployment, we passed the apex of this series of amateur self-tutorials on SQL Server Data Mining (SSDM) and have seen the difficulty level and real-world usefulness of the material decline on a… Read more

0 comments, 608 reads

Posted in Multidimensional Mayhem on 15 January 2014

Older posts