Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 

Goodness-of-Fit Testing with SQL Server Part 7.3: The Anderson-Darling Test

By Steve Bolton

…………As mentioned in previous installments of this series of amateur self-tutorials, goodness-of-fit tests can be differentiated in many ways, including by the data and content types of the inputs and the mathematical properties, data types and cardinality of the outputs, not to mention the performance impact… Read more

0 comments, 661 reads

Posted in Multidimensional Mayhem on 2 May 2016

Goodness-of-Fit Testing with SQL Server Part 7.2: The Lilliefors Test

By Steve Bolton

…………Since I’m teaching myself as I go in this series of self-tutorials, I often have only a vague idea of the challenges that will arise when trying to implement the next goodness-of-fit test with SQL Server. In retrospect, had I known that the Lilliefors Test was… Read more

4 comments, 1,067 reads

Posted in Multidimensional Mayhem on 15 April 2016

Goodness-of-Fit Testing with SQL Server Part 7.1: The Kolmogorov-Smirnov and Kuiper’s Tests

By Steve Bolton

…………“The names statisticians use for non-parametric analyses are misnomers too, in my opinion: Kruskal-Wallis tests and Kolmogorov-Smirnov statistics, for example. Good grief! These analyses are simple applications of parametric modeling that belie their intimidating exotic names.”[i]
                Apparently even experts like Will G. Hopkins, the author of… Read more

1 comments, 1,229 reads

Posted in Multidimensional Mayhem on 24 March 2016

Goodness-of-Fit Testing with SQL Server Part 6.2: The Ryan-Joiner Test

By Steve Bolton

…………In the last installment of this amateur series of self-tutorials, we saw how the Shapiro-Wilk Test might probably prove less useful to SQL Server users, despite the fact that it is one of the most popular goodness-of-fit tests among statisticians and researchers. Its impressive statistical power… Read more

3 comments, 229 reads

Posted in Multidimensional Mayhem on 12 March 2016

Goodness-of-Fit Testing with SQL Server Part 6.1: The Shapiro-Wilk Test

By Steve Bolton

…………Just as a good garage mechanic will fill his or her Craftsman with tools designed to fix specific problems, it is obviously wise for data miners to stockpile a wide range of algorithms, statistical tools, software packages and the like to deal with a wide variety… Read more

0 comments, 1,133 reads

Posted in Multidimensional Mayhem on 29 February 2016

Goodness-of-Fit Testing with SQL Server Part 5: The Chi-Squared Test

By Steve Bolton

…………As I’ve cautioned before, I’m writing this series of amateur self-tutorials in order to learn how to use SQL Server to perform goodness-of-fit testing on probability distributions and regression lines, not because I already know the topic well. Along the way, one of the things I’ve… Read more

0 comments, 345 reads

Posted in Multidimensional Mayhem on 13 February 2016

Goodness-of-Fit Testing with SQL Server Part 4.2: The Hosmer–Lemeshow Test with Logistic Regression

By Steve Bolton

…………The last installment of this amateur series of self-tutorials was the beginning of a short detour into using SQL Server to perform goodness-of-fit testing on regression lines, rather than on probability distributions. These are actually quite simple concepts; any college freshman ought to be able to… Read more

0 comments, 959 reads

Posted in Multidimensional Mayhem on 26 January 2016

Goodness-of-Fit Testing with SQL Server Part 4.1: R2, RMSE and Regression-Related Routines

By Steve Bolton

…………Throughout most of this series of amateur self-tutorials, the main topic has been and will continue to be in using SQL Server to perform goodness-of-testing on probability distributions. Don’t let the long syllables (or the alliteration practice in the title) fool you, because the underlying concept… Read more

0 comments, 1,119 reads

Posted in Multidimensional Mayhem on 13 January 2016

Goodness-of-Fit Testing with SQL Server, part 3.2: D’Agostino’s K-Squared Test

By Steve Bolton

…………In the last edition of this amateur series of self-tutorials on goodness-of-fit testing with SQL Server, we discussed the Jarque-Bera Test, a measure that unfortunately doesn’t scale well on datasets of the size that DBAs are accustomed to using. The problem is not with the usefulness… Read more

0 comments, 1,613 reads

Posted in Multidimensional Mayhem on 21 December 2015

Goodness-of-Fit Testing with SQL Server, part 3.1: Skewness, Kurtosis and the Jarque-Bera Test

By Steve Bolton

…………In the last installment of this series of amateur self-tutorials on using SQL Server to identify probability distributions, we saw how devices like probability plots can provide simple visual confirmation of a dataset’s shape. I considered doing a quick detour into Q-Q plots, but decided against… Read more

0 comments, 665 reads

Posted in Multidimensional Mayhem on 2 December 2015

Goodness-of-Fit Testing with SQL Server, part 2.1: Implementing Probability Plots in Reporting Services

By Steve Bolton

…………In the first installment of this series of amateur self-tutorials, I explained how to implement the most basic goodness-of-fit tests in SQL Server. All of those produced simple numeric results that are trivial to calculate, but in terms of interpretability, you really can’t beat the straightforwardness… Read more

0 comments, 696 reads

Posted in Multidimensional Mayhem on 3 November 2015

Goodness-of-Fit Testing with SQL Server, part 1: The Simplest Methods

By Steve Bolton

…………In the last series of mistutorials I published in this amateur SQL Server blog, the outlier detection methods I explained were often of limited usefulness because of a chicken-and-egg problem: some of the tests could tell us that certain data points did not fit a particular… Read more

1 comments, 1,797 reads

Posted in Multidimensional Mayhem on 17 October 2015

Outlier Detection with SQL Server, part 8: A T-SQL Hack for Mahalanobis Distance

By Steve Bolton

…………Longer code and substantial performance limitations were the prices we paid in return for greater sophistication with Cook’s Distance, the topic of the last article in this series of amateur self-tutorials on identifying outliers with SQL Server. The same tradeoff was even more conspicuous in this… Read more

0 comments, 1,666 reads

Posted in Multidimensional Mayhem on 12 September 2015

Outlier Detection with SQL Server, part 7: Cook’s Distance

By Steve Bolton[

…………I originally intended to save Cook’s and Mahalanobis Distances to close out this series not only because the calculations and concepts are more difficult yet worthwhile to grasp, but also in part to serve as a bridge to a future series of tutorials on using information… Read more

2 comments, 1,376 reads

Posted in Multidimensional Mayhem on 24 August 2015

Integrating Other Data Mining Tools with SQL Server, Part 2.2: Minitab vs. SSDM and Reporting Services

By Steve Bolton

…………Professional statistical software like Minitab can fill some important gaps in SQL Server’s functionality, as I addressed in the last post of this occasional series of pseudo-reviews. I’m only concerned here with assessing how well a particular data mining tool might fit into a SQL Server… Read more

0 comments, 528 reads

Posted in Multidimensional Mayhem on 8 July 2015

Integrating Other Data Mining Tools with SQL Server, Part 2.1: The Minuscule Hassles of Minitab

By Steve Bolton

…………It may be called Minitab, but SQL Server users can derive maximum benefits from the Windows version of this professional data mining and statistics tool – provided that they use it for tasks that SQL Server doesn’t do natively. This was one the caveats I also… Read more

0 comments, 568 reads

Posted in Multidimensional Mayhem on 30 June 2015

Integrating Other Data Mining Tools with SQL Server, Part 1.2: Finding Use Cases for WEKA

By Steve Bolton

…………As recounted in the first installment of this occasional series of amateur self-tutorials, there are some serious limitations to using Waikato Environment for Knowledge Analysis (WEKA), a popular open source data mining tool, in a SQL Server environment. The documentation is full of white space and… Read more

0 comments, 1,658 reads

Posted in Multidimensional Mayhem on 2 June 2015

Integrating Other Data Mining Tools with SQL Server, Part 1.1: The Weaknesses of WEKA

By Steve Bolton

…………The same rules that applied to my amateur mistutorial series on A Rickety Stairway to SQL Server Data Mining and Outlier Detection with SQL Server are in play for this series of occasional articles, which will provide a brief overview of using various third-party data mining… Read more

0 comments, 622 reads

Posted in Multidimensional Mayhem on 28 May 2015

Outlier Detection with SQL Server, part 6.3: Visual Outlier Detection with Reporting Services Plots and SSDM Clustering

By Steve Bolton

…………When the goal is to illustrate how just how outlying an outlier may be, the efficiency with which scatter plots represent distances really can’t be beaten. It doesn’t take any training in mathematics to look at one and notice that a few data points are further… Read more

0 comments, 406 reads

Posted in Multidimensional Mayhem on 17 May 2015

Outlier Detection with SQL Server, part 6.2: Finding Outliers Visually with Reporting Services Box Plots

 By Steve Bolton

…………Throughout this series of amateur mistutorials in using SQL Server to identify outliers, we have repeatedly seen that the existing tried-and-true methods of detection long used for such purposes as hypothesis testing are actually poorly suited for finding aberrant values in large databases. The same problem… Read more

0 comments, 597 reads

Posted in Multidimensional Mayhem on 29 April 2015

Older posts