Mala's Data Blog

My name is Malathi, a.k.a Mala - I am a DBA turned BI/Data Science person, working with SQL Server since 6.5. I am also founder of the Louisville SQL Server user group, organizer of 8 SQL Saturdays, Regional mentor for northeast, and 12-year PASS conference attendee. In my spare time I love to garden, travel, read, paint, and do yoga.

Understanding ANOVA

ANOVA – or analysis of variance, is a term given to a set of statistical models that are used to analyze differences among groups and if the differences are statistically significant to arrive at any conclusion. The models were developed by statistician and evolutionary biologist Ronald Fischer. To give a…

Posted in Mala's Data Blog on 9 October 2017

Box-and-whisker plot and data patterns with R and T-SQL

R is particularly good with drawing graphs with data. Some graphs are familiar to most DBAs as it has been things we have seen and used over time – bar charts, pie diagram and so on. Some are not. Understanding exploratory graphics is vitally important to the R programmer/data science…

Posted in Mala's Data Blog on 25 September 2017

Confidence Intervals for a proportion – using R

What is the difference between reading numbers as they are presented, and interpreting them in a mature, deeper way? One way perhaps to look at the latter is what statisticians call ‘confidence interval’.

Suppose I look at a sampling of 100 americans who are asked if they approve…

Posted in Mala's Data Blog on 18 September 2017

What is networking, really?

I am still trying to get up to speed on blogging after a gap. Today I managed to push myself to write some R code and test it, and it worked. Am getting there, although need more work to turn it into a blog post. So, here is another on…

Posted in Mala's Data Blog on 10 September 2017

14 years of Summit…

I have been trying to get my blogging going again after a gap of two months. It has been incredibly hard. To warm up, I decided to try some non technical posts. One of them is stuff I have been wanting to write a long time – with this year…

Posted in Mala's Data Blog on 4 September 2017

Getting back to blogging

The past two months have been very hectic for me. I had an unexpected job offer towards end of July, which I gladly accepted – that was followed by some much needed home renovation, and a long vacation/tour of the west coast with my beloved sister. All of this has…

Posted in Mala's Data Blog on 28 August 2017

Understanding Relative Risk – with T-SQL

In this post we will explore a common statistical term – Relative Risk, otherwise called Risk Factor. Relative Risk is a term that is important to understand when you are doing comparative studies of two groups that are different in some specific way. The most common usage of this is…

Posted in Mala's Data Blog on 19 June 2017

Cochran-Mantel-Haenzel Method with T-SQL and R – Part I

This test is an extension of the Chi Square test I blogged of earlier. This is applied when we have to compare two groups over several levels and comparison may involve a third variable.
Let us consider a cohort study as an example – we have two medications A and…

Posted in Mala's Data Blog on 12 June 2017

Dataset for Cochran-Mantel-Hanzel Test

Below is the script to create the table and dataset I used. This is just test data and not copied from anywhere.

USE [yourdb]
/****** Object: Table [dbo].[DrugResponse] Script Date: 6/12/2017 6:45:46 AM ******/
CREATE TABLE [dbo].[DrugResponse](…

Posted in Mala's Data Blog on 12 June 2017

Fischer’s Exact Test – with T-SQL and R

This post is a long overdue second part to the post on Chi Square Test that I did a few months ago.  This post addresses relationships between two categorical variables, but in cases where data is sparse, and the numbers (in any cell) are less than 5. The Chi Square…

Posted in Mala's Data Blog on 22 May 2017

SQL Saturday Louisville precon – interview Andy Leonard

This will be year #9 of sql saturdays in Louisville. Every year (starting with 3rd or 4th), it has been a tradition to do 'precons' on Fridays. For those who don't know – Precons are day long paid trainings by an expert in the subject, held on the friday…

Posted in Mala's Data Blog on 8 May 2017

SQL Saturdays – down memory lane

A casual twitter-conversation with Karla Landrum and some other peeps led me down memory lane on older events. Our SQL Saturday at Louisville will be 9 years old this year. We were event #23, in 2009. SQL Saturdays started two years before, in 2007.

Our first event was held at…

Posted in Mala's Data Blog on 1 May 2017

The Birthday Problem – with T-SQL and R

When I was on SQLCruise recently – Buck Woody (b|t) made a interesting statement – that in a room of 23 people, there is over a 50% chance that  two or more have the same birthdays. And sure enough, we did end up having more than…

Posted in Mala's Data Blog on 24 April 2017

Normal approximation to binomial distribution using T-SQL and R

In the previous post I demonstrated the use of binomial formula to calculate probabilities of events occurring in certain situations. In this post am going to explore the same situation with a bigger sample set. Let us assume, for example, that instead of 7 smokers we had 100 smokers. We…

Posted in Mala's Data Blog on 17 April 2017

The Binomial Formula with T-SQL and R

In a previous post I explained the basics of probability. In this post I will use some of those principles to see how to solve certain problems. I will pick a very simple problem that I found in a statistics textbook. Suppose I have 7 friends who are smokers. The…

Posted in Mala's Data Blog on 10 April 2017

Sampling Distribution and Central Limit Theorem

In this post am going to explain (in highly simplified terms) two very important statistical concepts – the sampling distribution and central limit  theorem.

The sampling distribution is the distribution of means collected from random samples taken from a population. So, for example, if i have a population of life…

Posted in Mala's Data Blog on 3 April 2017

Basics of Probability

In this post am going to introduce into some of the basic principles of probability – and use it in other posts going forward. Quite a number of people would have learned these things in high school math and then forgotten – I personally needed a refresher. These concepts are…

Posted in Mala's Data Blog on 20 March 2017

TSQL2sday – Daily database WTF

This month's TSQL Tuesday is organized by Kennie T Pontoppidan(t) – the topic is 'Daily Database WTF' – or a horror story from the database world.  As someone who has worked databases for nearly two decades, there are several of these – I picked one of…

Posted in Mala's Data Blog on 12 March 2017

Generating Frequency Table

This week's blog post is rather simple. One of the main characteristics of a data set involving classes, or discrete variables – are frequencies. The number of times each data element or class is observed is called its frequency. A table that displays the discrete variable and number of times…

Posted in Mala's Data Blog on 6 March 2017

The Empirical Rule

I am resuming technical blogging after a gap of nearly a month. I will continue to blog my re learning of statistics and basic concepts, and illustrate them to the best of my ability using R and T-SQL where appropriate.

For this week I have chosen a statistical concept called…

Posted in Mala's Data Blog on 27 February 2017

