Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 

Goodness-of-Fit Testing with SQL Server Part 4.2: The Hosmer–Lemeshow Test with Logistic Regression

By Steve Bolton

…………The last installment of this amateur series of self-tutorials was the beginning of a short detour into using SQL Server to perform goodness-of-fit testing on regression lines, rather than on probability distributions. These are actually quite simple concepts; any college freshman ought to be able to… Read more

0 comments, 681 reads

Posted in Multidimensional Mayhem on 26 January 2016

Goodness-of-Fit Testing with SQL Server Part 4.1: R2, RMSE and Regression-Related Routines

By Steve Bolton

…………Throughout most of this series of amateur self-tutorials, the main topic has been and will continue to be in using SQL Server to perform goodness-of-testing on probability distributions. Don’t let the long syllables (or the alliteration practice in the title) fool you, because the underlying concept… Read more

0 comments, 808 reads

Posted in Multidimensional Mayhem on 13 January 2016

Goodness-of-Fit Testing with SQL Server, part 3.2: D’Agostino’s K-Squared Test

By Steve Bolton

…………In the last edition of this amateur series of self-tutorials on goodness-of-fit testing with SQL Server, we discussed the Jarque-Bera Test, a measure that unfortunately doesn’t scale well on datasets of the size that DBAs are accustomed to using. The problem is not with the usefulness… Read more

0 comments, 1,379 reads

Posted in Multidimensional Mayhem on 21 December 2015

Goodness-of-Fit Testing with SQL Server, part 3.1: Skewness, Kurtosis and the Jarque-Bera Test

By Steve Bolton

…………In the last installment of this series of amateur self-tutorials on using SQL Server to identify probability distributions, we saw how devices like probability plots can provide simple visual confirmation of a dataset’s shape. I considered doing a quick detour into Q-Q plots, but decided against… Read more

0 comments, 563 reads

Posted in Multidimensional Mayhem on 2 December 2015

Goodness-of-Fit Testing with SQL Server, part 2.1: Implementing Probability Plots in Reporting Services

By Steve Bolton

…………In the first installment of this series of amateur self-tutorials, I explained how to implement the most basic goodness-of-fit tests in SQL Server. All of those produced simple numeric results that are trivial to calculate, but in terms of interpretability, you really can’t beat the straightforwardness… Read more

0 comments, 619 reads

Posted in Multidimensional Mayhem on 3 November 2015

Goodness-of-Fit Testing with SQL Server, part 1: The Simplest Methods

By Steve Bolton

…………In the last series of mistutorials I published in this amateur SQL Server blog, the outlier detection methods I explained were often of limited usefulness because of a chicken-and-egg problem: some of the tests could tell us that certain data points did not fit a particular… Read more

0 comments, 1,688 reads

Posted in Multidimensional Mayhem on 17 October 2015

Outlier Detection with SQL Server, part 8: A T-SQL Hack for Mahalanobis Distance

By Steve Bolton

…………Longer code and substantial performance limitations were the prices we paid in return for greater sophistication with Cook’s Distance, the topic of the last article in this series of amateur self-tutorials on identifying outliers with SQL Server. The same tradeoff was even more conspicuous in this… Read more

0 comments, 1,459 reads

Posted in Multidimensional Mayhem on 12 September 2015

Outlier Detection with SQL Server, part 7: Cook’s Distance

By Steve Bolton[

…………I originally intended to save Cook’s and Mahalanobis Distances to close out this series not only because the calculations and concepts are more difficult yet worthwhile to grasp, but also in part to serve as a bridge to a future series of tutorials on using information… Read more

2 comments, 1,258 reads

Posted in Multidimensional Mayhem on 24 August 2015

Integrating Other Data Mining Tools with SQL Server, Part 2.2: Minitab vs. SSDM and Reporting Services

By Steve Bolton

…………Professional statistical software like Minitab can fill some important gaps in SQL Server’s functionality, as I addressed in the last post of this occasional series of pseudo-reviews. I’m only concerned here with assessing how well a particular data mining tool might fit into a SQL Server… Read more

0 comments, 404 reads

Posted in Multidimensional Mayhem on 8 July 2015

Integrating Other Data Mining Tools with SQL Server, Part 2.1: The Minuscule Hassles of Minitab

By Steve Bolton

…………It may be called Minitab, but SQL Server users can derive maximum benefits from the Windows version of this professional data mining and statistics tool – provided that they use it for tasks that SQL Server doesn’t do natively. This was one the caveats I also… Read more

0 comments, 416 reads

Posted in Multidimensional Mayhem on 30 June 2015

Integrating Other Data Mining Tools with SQL Server, Part 1.2: Finding Use Cases for WEKA

By Steve Bolton

…………As recounted in the first installment of this occasional series of amateur self-tutorials, there are some serious limitations to using Waikato Environment for Knowledge Analysis (WEKA), a popular open source data mining tool, in a SQL Server environment. The documentation is full of white space and… Read more

0 comments, 1,183 reads

Posted in Multidimensional Mayhem on 2 June 2015

Integrating Other Data Mining Tools with SQL Server, Part 1.1: The Weaknesses of WEKA

By Steve Bolton

…………The same rules that applied to my amateur mistutorial series on A Rickety Stairway to SQL Server Data Mining and Outlier Detection with SQL Server are in play for this series of occasional articles, which will provide a brief overview of using various third-party data mining… Read more

0 comments, 446 reads

Posted in Multidimensional Mayhem on 28 May 2015

Outlier Detection with SQL Server, part 6.3: Visual Outlier Detection with Reporting Services Plots and SSDM Clustering

By Steve Bolton

…………When the goal is to illustrate how just how outlying an outlier may be, the efficiency with which scatter plots represent distances really can’t be beaten. It doesn’t take any training in mathematics to look at one and notice that a few data points are further… Read more

0 comments, 330 reads

Posted in Multidimensional Mayhem on 17 May 2015

Outlier Detection with SQL Server, part 6.2: Finding Outliers Visually with Reporting Services Box Plots

 By Steve Bolton

…………Throughout this series of amateur mistutorials in using SQL Server to identify outliers, we have repeatedly seen that the existing tried-and-true methods of detection long used for such purposes as hypothesis testing are actually poorly suited for finding aberrant values in large databases. The same problem… Read more

0 comments, 443 reads

Posted in Multidimensional Mayhem on 29 April 2015

Outlier Detection with SQL Server, part 6.1: Visual Outlier Detection with Reporting Services

 By Steve Bolton

…………Most of the previous articles in this self-tutorials on using SQL Server to find outliers required us to implement statistical formulas, in order to derive measures that required some explanation before they could be interpreted correctly. In this segment of the series, we’ll be discussing a… Read more

0 comments, 429 reads

Posted in Multidimensional Mayhem on 21 April 2015

Outlier Detection with SQL Server, part 5: Interquartile Range

By Steve Bolton

…………The last seven articles in this series of mistutorials on identifying outlying values in SQL Server database were clunkers, in the sense that the methods had many properties in common that made them inapplicable to the scenarios DBAs typically need them for. Chauvenet’s Criterion, Peirce’s Criterion,… Read more

0 comments, 521 reads

Posted in Multidimensional Mayhem on 30 March 2015

Outlier Detection with SQL Server, part 4: Peirce’s Criterion

By Steve Bolton

…………In the last couple of installments of this amateur series of self-tutorials on outlier identification with SQL Server, we dealt with detection methods that required recursive recomputation of the underlying aggregates. This week’s topic, Peirce’s Criterion, also flags outliers in an iterative manner, but doesn’t require… Read more

2 comments, 5,913 reads

Posted in Multidimensional Mayhem on 20 March 2015

Outlier Detection with SQL Server, part 3.6: Chauvenet’s Criterion

By Steve Bolton

…………This is the last of six articles I’ve segregated in this middle of my mistutorial series on identifying outlying values with SQL Server, because they turned out to be difficult to apply to the typical use cases DBAs encounter. After this detour we’ll get back on… Read more

0 comments, 551 reads

Posted in Multidimensional Mayhem on 28 February 2015

Outlier Detection with SQL Server, part 3.5: The Modified Thompson Tau Test

By Steve Bolton

…………Based on what little experience I’ve gained from writing this series on finding outliers in SQL Server databases, I expected the Modified Thompson Tau test to be a clunker. It marries the math underpinning one of the most ubiquitous means of outlier detection, Z-Scores, with the… Read more

0 comments, 6,981 reads

Posted in Multidimensional Mayhem on 14 February 2015

Outlier Detection with SQL Server, part 3.4: Dixon’s Q-Test

By Steve Bolton

…………In the last three installments of this amateur series of mistutorials on finding outliers using SQL Server, we delved into a subset of standard detection methods taken from the realm of statistical hypothesis testing. These are generally more difficult to apply to tables of thousands of… Read more

0 comments, 855 reads

Posted in Multidimensional Mayhem on 30 January 2015

Older posts