Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Outlier Detection with SQL Server, part 7: Cook’s Distance

By Steve Bolton[

…………I originally intended to save Cook’s and Mahalanobis Distances to close out this series not only because the calculations and concepts are more difficult yet worthwhile to grasp, but also in part to serve as a bridge to a future series of tutorials on using information… Read more

0 comments, 885 reads

Posted in Multidimensional Mayhem on 24 August 2015

Integrating Other Data Mining Tools with SQL Server, Part 2.2: Minitab vs. SSDM and Reporting Services

By Steve Bolton

…………Professional statistical software like Minitab can fill some important gaps in SQL Server’s functionality, as I addressed in the last post of this occasional series of pseudo-reviews. I’m only concerned here with assessing how well a particular data mining tool might fit into a SQL Server… Read more

0 comments, 203 reads

Posted in Multidimensional Mayhem on 8 July 2015

Integrating Other Data Mining Tools with SQL Server, Part 2.1: The Minuscule Hassles of Minitab

By Steve Bolton

…………It may be called Minitab, but SQL Server users can derive maximum benefits from the Windows version of this professional data mining and statistics tool – provided that they use it for tasks that SQL Server doesn’t do natively. This was one the caveats I also… Read more

0 comments, 184 reads

Posted in Multidimensional Mayhem on 30 June 2015

Integrating Other Data Mining Tools with SQL Server, Part 1.2: Finding Use Cases for WEKA

By Steve Bolton

…………As recounted in the first installment of this occasional series of amateur self-tutorials, there are some serious limitations to using Waikato Environment for Knowledge Analysis (WEKA), a popular open source data mining tool, in a SQL Server environment. The documentation is full of white space and… Read more

0 comments, 777 reads

Posted in Multidimensional Mayhem on 2 June 2015

Integrating Other Data Mining Tools with SQL Server, Part 1.1: The Weaknesses of WEKA

By Steve Bolton

…………The same rules that applied to my amateur mistutorial series on A Rickety Stairway to SQL Server Data Mining and Outlier Detection with SQL Server are in play for this series of occasional articles, which will provide a brief overview of using various third-party data mining… Read more

0 comments, 271 reads

Posted in Multidimensional Mayhem on 28 May 2015

Outlier Detection with SQL Server, part 6.3: Visual Outlier Detection with Reporting Services Plots and SSDM Clustering

By Steve Bolton

…………When the goal is to illustrate how just how outlying an outlier may be, the efficiency with which scatter plots represent distances really can’t be beaten. It doesn’t take any training in mathematics to look at one and notice that a few data points are further… Read more

0 comments, 204 reads

Posted in Multidimensional Mayhem on 17 May 2015

Outlier Detection with SQL Server, part 6.2: Finding Outliers Visually with Reporting Services Box Plots

 By Steve Bolton

…………Throughout this series of amateur mistutorials in using SQL Server to identify outliers, we have repeatedly seen that the existing tried-and-true methods of detection long used for such purposes as hypothesis testing are actually poorly suited for finding aberrant values in large databases. The same problem… Read more

0 comments, 247 reads

Posted in Multidimensional Mayhem on 29 April 2015

Outlier Detection with SQL Server, part 6.1: Visual Outlier Detection with Reporting Services

 By Steve Bolton

…………Most of the previous articles in this self-tutorials on using SQL Server to find outliers required us to implement statistical formulas, in order to derive measures that required some explanation before they could be interpreted correctly. In this segment of the series, we’ll be discussing a… Read more

0 comments, 253 reads

Posted in Multidimensional Mayhem on 21 April 2015

Outlier Detection with SQL Server, part 5: Interquartile Range

By Steve Bolton

…………The last seven articles in this series of mistutorials on identifying outlying values in SQL Server database were clunkers, in the sense that the methods had many properties in common that made them inapplicable to the scenarios DBAs typically need them for. Chauvenet’s Criterion, Peirce’s Criterion,… Read more

0 comments, 293 reads

Posted in Multidimensional Mayhem on 30 March 2015

Outlier Detection with SQL Server, part 4: Peirce’s Criterion

By Steve Bolton

…………In the last couple of installments of this amateur series of self-tutorials on outlier identification with SQL Server, we dealt with detection methods that required recursive recomputation of the underlying aggregates. This week’s topic, Peirce’s Criterion, also flags outliers in an iterative manner, but doesn’t require… Read more

2 comments, 5,681 reads

Posted in Multidimensional Mayhem on 20 March 2015

Outlier Detection with SQL Server, part 3.6: Chauvenet’s Criterion

By Steve Bolton

…………This is the last of six articles I’ve segregated in this middle of my mistutorial series on identifying outlying values with SQL Server, because they turned out to be difficult to apply to the typical use cases DBAs encounter. After this detour we’ll get back on… Read more

0 comments, 358 reads

Posted in Multidimensional Mayhem on 28 February 2015

Outlier Detection with SQL Server, part 3.5: The Modified Thompson Tau Test

By Steve Bolton

…………Based on what little experience I’ve gained from writing this series on finding outliers in SQL Server databases, I expected the Modified Thompson Tau test to be a clunker. It marries the math underpinning one of the most ubiquitous means of outlier detection, Z-Scores, with the… Read more

0 comments, 6,494 reads

Posted in Multidimensional Mayhem on 14 February 2015

Outlier Detection with SQL Server, part 3.4: Dixon’s Q-Test

By Steve Bolton

…………In the last three installments of this amateur series of mistutorials on finding outliers using SQL Server, we delved into a subset of standard detection methods taken from the realm of statistical hypothesis testing. These are generally more difficult to apply to tables of thousands of… Read more

0 comments, 560 reads

Posted in Multidimensional Mayhem on 30 January 2015

Outlier Detection with SQL Server, part 3.3: The Limitations of the Tietjen-Moore Test

By Steve Bolton

…………The Tietjen-Moore test may have the coolest-soundest name of any of the outlier detection methods I’ll be surveying haphazardly in this amateur series of mistutorials, yet it suffers from some debilitating limitations that may render it among the least useful for SQL Server DBAs. It is… Read more

0 comments, 303 reads

Posted in Multidimensional Mayhem on 20 January 2015

Outlier Detection with SQL Server, part 3.2: GESD

By Steve Bolton

…………In the last edition of this amateur series of self-tutorials on finding outlying values in SQL Server columns, I mentioned that Grubbs’ Test has a number of limitations that sharply constrain its usefulness to DBAs. The Generalized Extreme Studentized Deviate Test (GESD) suffers from some of… Read more

0 comments, 5,393 reads

Posted in Multidimensional Mayhem on 17 December 2014

Outlier Detection with SQL Server, part 3.1: Grubbs’ Test


By Steve Bolton

…………In the last two installments of this series of amateur self-tutorials, I mentioned that the various means of detecting outliers with SQL Server might best be explained as a function of the uses cases, the context determined by the questions one chooses to ask of the… Read more

2 comments, 6,248 reads

Posted in Multidimensional Mayhem on 29 November 2014

Outlier Detection with SQL Server, part 2.2: Modified Z-Scores

By Steve Bolton

…………There are apparently many subtle variations on Z-Scores, a ubiquitous measure that is practically a cornerstone in the foundation of statistics. The popularity and ease of implementation of Z-Scores are what made me decide to tackle them early on in this series of amateur self-tutorials, on… Read more

0 comments, 561 reads

Posted in Multidimensional Mayhem on 13 November 2014

Outlier Detection with SQL Server, part 2.1: Z-Scores

By Steve Bolton

…………Using SQL Server to ferret out those aberrant data points we call outliers may call for some complex T-SQL, Multidimensional Expressions (MDX) or Common Language Runtime (CLR) code. Yet thankfully, the logic and math that underpin the standard means of outlier detection I’ll delve into in… Read more

2 comments, 857 reads

Posted in Multidimensional Mayhem on 28 October 2014

Outlier Detection with SQL Server, part 1: Finding Fraud and Folly with Benford’s Law

By Steve Bolton

…………My last blog series, A Rickety Stairway to SQL Server Data Mining, often epitomized a quip by University of Connecticut statistician Daniel T. Larose, to the effect that “data mining is easy to do badly.”[1] It is clear that today’s sophisticated mining algorithms can still… Read more

2 comments, 1,670 reads

Posted in Multidimensional Mayhem on 19 September 2014

Stay Tuned…for a SQL Server Tutorial Series Juggling Act

by Steve Bolton

…………If all goes according to plan, my blog will return in a few weeks with two brand new series, Using Other Data Mining Tools with SQL Server and Information Measurement with SQL Server. Yes, I will be attempting what amounts to a circus act among SQL… Read more

0 comments, 306 reads

Posted in Multidimensional Mayhem on 1 July 2014

Older posts