Multidimensional Mayhem

Information Measurement with SQL Server, Part 5.1: A Digression on the Unsuitability of Fisher Information for SQL Use Cases

By Steve Bolton

…………One of the major thrusts of my various mistutorial series has been to drive home the point that SQL Server's set-based languages can be efficiently applied to a wider variety of data mining tasks than is generally appreciated. In some instances a well-indexed and coded stored…

Posted in Multidimensional Mayhem on 25 January 2019

Information Measurement with SQL Server Part 4.3: Euclidean Distance and the Minkowski Family

By Steve Bolton

…………“The shortest distance between two points is a straight line.”
…………"The shortest distance between two points is a straight line."
…………In this famous remark, Archimedes[1] (287-212 B.C.) managed to sum up all of the uses for Euclidean Distance, i.e. the way that laymen and data scientists ordinarily use to measure ranges of space on a…

Posted in Multidimensional Mayhem on 30 November 2018

Information Measurement with SQL Server, Part 4.2: The Jensen–Shannon Divergence and Its Relatives

by Steve Bolton

…………In the last installment of this amateur series of self-tutorials on DIY data mining metrics, we saw how a mathematical property known as absolute continuity hampered the Kullback-Leibler Divergence. It is still perhaps the most widely mentioned information distance measure based on Shannon's Entropy, but a…

Posted in Multidimensional Mayhem on 31 July 2018

Information Measurement with SQL Server, Part 4.1: The Kullback-Leibler Divergence

By Steve Bolton

…………This informal series of tutorials on how to implement a hodgepodge of information metrics in SQL Server is somewhat disorganized by necessity, given that I'm trying to cram in every noteworthy type of measure into one series while simultaneously acquainting myself with the topics along the…

Posted in Multidimensional Mayhem on 14 June 2018

Information Measurement with SQL Server, Part 3: Inverse Probability and Bayes Factors

By Steve Bolton

…………In the last segment of this series of amateur self-tutorials, we discussed how to code various ways of quantifying how much we don't know about the data in SQL Server tables and cubes. The various probabilistic entropies I translated into T-SQL in those six articles can…

Posted in Multidimensional Mayhem on 1 March 2018

Information Measurement with SQL Server, Part 2.5: Mutual, Lautum and Shared Information

By Steve Bolton

…………The sample T-SQL I posted in the last article wasn't as difficult as it looked, considering that it merely implemented the same code on the same data we used in Information Measurement with SQL Server, Part 2.1: The Uses and Abuses of Shannon's Entropy, except…

Posted in Multidimensional Mayhem on 25 January 2018

Information Measurement with SQL Server, Part 2.4: Conditional and Joint Entropy

By Steve Bolton

…………Since this series on using SQL Server to implement the whole gamut of information metrics is wide-ranging in nature, it will also be somewhat disorganized by necessity; this is doubly true, given that I'm writing it in order to learn the material faster, not because I'm…

Posted in Multidimensional Mayhem on 30 November 2017

Information Measurement with SQL Server, Part 2.3: Thermodynamic and Quantum Entropies

By Steve Bolton

…………When I was about 12 years old, I suddenly discovered football. Many lessons still awaited far in the future – such as the risks of being a Buffalo Bills fan, the explosive sound footballs make when they hit a Saguaro cactus, or the solid reasons for…

Posted in Multidimensional Mayhem on 31 October 2017

Information Measurement with SQL Server, Part 2.2: The Rényi Entropy and Its Kin

By Steve Bolton

…………I kicked off this far-ranging series on using SQL Server to quantify information by discussing two of the earliest and most important measures, the Hartley function and Shannon's Entropy. These foundations of information theory are intimately related to a more general measure, Rényi Entropy, which is…

Posted in Multidimensional Mayhem on 22 September 2017

Information Measurement with SQL Server, Part 2.1: The Uses and Abuses of Shannon’s Entropy

By Steve Bolton

…………In the first installment of this wide-ranging series of amateur tutorials, I noted that the Hartley function indeed returns "information," but of a specific kind that could be described as "newsworthiness." This week's measure also quantifies how much we add to our existing knowledge from each…

Posted in Multidimensional Mayhem on 31 August 2017

Information Measurement with SQL Server, Part 1: A Quick Review of the Hartley Function

By Steve Bolton

…………This long-delayed series of amateur self-tutorials has been in the works ever since I began writing my A Rickety Stairway to SQL Server Data Mining series, which made it clear to me that I didn't know enough about what was going on under the hood in…

Posted in Multidimensional Mayhem on 28 July 2017

Implementing Fuzzy Sets in SQL Server, Part 11: Fuzzy Addenda

By Steve Bolton

…………One of the key reasons I looked into the topic of fuzzy sets in the first place was my suspicion that T-SQL, as a set-based language, would be ideal for modeling them. That turned out to be an understatement of sorts: I definitely was not prepared…

Posted in Multidimensional Mayhem on 30 June 2017

Implementing Fuzzy Sets in SQL Server, Part 10.2: Measuring Uncertainty in Evidence Theory

By Steve Bolton

…………To avoid overloading readers with too many concepts at once, I split my discussion of Dempster-Shafer Evidence Theory into two parts, with the bulk of the data modeling aspects and theory occurring in the last article. This time around, I'll cover how fuzzy measures can be…

Posted in Multidimensional Mayhem on 31 May 2017

Implementing Fuzzy Sets in SQL Server, Part 10.1: A Crude Introduction to Dempster-Shafer Evidence Theory

By Steve Bolton

…………Early on in this series, we learned how the imprecision in natural language statements like "the weather is hot" can be modeled using fuzzy sets. Ordinarily, the membership grades assigned to fuzzy sets are not to be interpreted as probabilities, even though they're both implemented on…

Posted in Multidimensional Mayhem on 12 April 2017

Implementing Fuzzy Sets in SQL Server, Part 9: Measuring Nonspecificity with the Hartley Function

By Steve Bolton

…………Imagine how empowering it would be to quantify what you don't know. Even an inaccurate measure might be helpful in making better decisions in any area of life, but particularly in the business world, where change is the only certainty. This is where a program of…

Posted in Multidimensional Mayhem on 8 March 2017

Implementing Fuzzy Sets in SQL Server, Part 8: Possibility Theory and Alpha Cuts

By Steve Bolton

…………To get the point across that fuzzy sets require membership grades of some sort, throughout this series I've borrowed the stored procedure I coded for Outlier Detection with SQL Server, part 2.1: Z-Scores and rescaled the results on the customary range of 0 to 1. The…

Posted in Multidimensional Mayhem on 13 February 2017

Implementing Fuzzy Sets in SQL Server, Part 7: The Significance of Fuzzy Stats

By Steve Bolton

…………In the world of fuzzy sets and imprecision modeling, the concept of cardinality takes on new shades of meaning that are not applicable to ordinary "crisp" sets, i.e. those without membership grades. In the last article in this series of amateur-self-tutorials, I mentioned one type of…

Posted in Multidimensional Mayhem on 20 January 2017

Implementing Fuzzy Sets in SQL Server, Part 6: Fuzzy Numbers and Linguistic Modifiers

By Steve Bolton

…………I've written several amateur tutorial series on this blog in order to more quickly absorb difficult data mining, statistical and machine learning topics, while hopefully helping other SQL Server users avoid some of my inevitable mistakes. Since I don't know what I'm talking about, I'm occasionally…

Posted in Multidimensional Mayhem on 20 December 2016

Implementing Fuzzy Sets in SQL Server, Part 5: The Mystery of the Missing Left Join

By Steve Bolton

…………Information on set operations like complements, intersections and unions is plentiful in the literature on fuzzy sets, which made the last three articles in this series of amateur self-tutorials easier to write in a certain sense. These topics are far more complex than with ordinary "crisp"…

Posted in Multidimensional Mayhem on 7 November 2016

Implementing Fuzzy Sets in SQL Server, Part 4: From Fuzzy Unions to Fuzzy Logic

By Steve Bolton

…………Fuzzy set relations carry an added layer of complexity not seen in ordinary "crisp" sets, due to the need to derive new grades for membership in the resultset from the scores in the original sets. As I explained two weeks ago in this series of amateur…