## Mala's Data Blog

### Understanding ANOVA

ANOVA – or analysis of variance, is a term given to a set of statistical models that are used to analyze differences among groups and if the differences are statistically significant to arrive at any conclusion. The models were developed by statistician and evolutionary biologist Ronald Fischer. To give a… Read more

0 comments, 1,340 reads

Posted in Mala's Data Blog on 9 October 2017

### Box-and-whisker plot and data patterns with R and T-SQL

R is particularly good with drawing graphs with data. Some graphs are familiar to most DBAs as it has been things we have seen and used over time – bar charts, pie diagram and so on. Some are not. Understanding exploratory graphics is vitally important to the R programmer/data science… Read more

0 comments, 1,349 reads

Posted in Mala's Data Blog on 25 September 2017

### Confidence Intervals for a proportion – using R

What is the difference between reading numbers as they are presented, and interpreting them in a mature, deeper way? One way perhaps to look at the latter is what statisticians call ‘confidence interval’.

Suppose I look at a sampling of 100 americans who are asked if they approve… Read more

0 comments, 786 reads

Posted in Mala's Data Blog on 18 September 2017

### What is networking, really?

I am still trying to get up to speed on blogging after a gap. Today I managed to push myself to write some R code and test it, and it worked. Am getting there, although need more work to turn it into a blog post. So, here is another on… Read more

0 comments, 1,176 reads

Posted in Mala's Data Blog on 10 September 2017

### 14 years of Summit…

I have been trying to get my blogging going again after a gap of two months. It has been incredibly hard. To warm up, I decided to try some non technical posts. One of them is stuff I have been wanting to write a long time – with this year… Read more

0 comments, 154 reads

Posted in Mala's Data Blog on 4 September 2017

### Getting back to blogging

The past two months have been very hectic for me. I had an unexpected job offer towards end of July, which I gladly accepted – that was followed by some much needed home renovation, and a long vacation/tour of the west coast with my beloved sister. All of this has… Read more

0 comments, 180 reads

Posted in Mala's Data Blog on 28 August 2017

### Understanding Relative Risk – with T-SQL

In this post we will explore a common statistical term – Relative Risk, otherwise called Risk Factor. Relative Risk is a term that is important to understand when you are doing comparative studies of two groups that are different in some specific way. The most common usage of this is… Read more

0 comments, 1,504 reads

Posted in Mala's Data Blog on 19 June 2017

### Cochran-Mantel-Haenzel Method with T-SQL and R – Part I

This test is an extension of the Chi Square test I blogged of earlier. This is applied when we have to compare two groups over several levels and comparison may involve a third variable.

Let us consider a cohort study as an example – we have two medications A and… Read more

0 comments, 191 reads

Posted in Mala's Data Blog on 12 June 2017

### Dataset for Cochran-Mantel-Hanzel Test

Below is the script to create the table and dataset I used. This is just test data and not copied from anywhere.

USE [yourdb] GO /****** Object: Table [dbo].[DrugResponse] Script Date: 6/12/2017 6:45:46 AM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIERl ON GO SET ANSI_PADDING ON GO CREATE TABLE [dbo].[DrugResponse](…

0 comments, 166 reads

Posted in Mala's Data Blog on 12 June 2017

### Fischer’s Exact Test – with T-SQL and R

This post is a long overdue second part to the post on Chi Square Test that I did a few months ago. This post addresses relationships between two categorical variables, but in cases where data is sparse, and the numbers (in any cell) are less than 5. The Chi Square… Read more

3 comments, 1,104 reads

Posted in Mala's Data Blog on 22 May 2017

### SQL Saturday Louisville precon – interview Andy Leonard

This will be year #9 of sql saturdays in Louisville. Every year (starting with 3rd or 4th), it has been a tradition to do ‘precons’ on Fridays. For those who don’t know – Precons are day long paid trainings by an expert in the subject, held on the friday… Read more

0 comments, 213 reads

Posted in Mala's Data Blog on 8 May 2017

### SQL Saturdays – down memory lane

A casual twitter-conversation with Karla Landrum and some other peeps led me down memory lane on older events. Our SQL Saturday at Louisville will be 9 years old this year. We were event #23, in 2009. SQL Saturdays started two years before, in 2007.

Our first event was held at… Read more

0 comments, 214 reads

Posted in Mala's Data Blog on 1 May 2017

### The Birthday Problem – with T-SQL and R

When I was on SQLCruise recently – Buck Woody (b|t) made a interesting statement – that in a room of 23 people, there is over a 50% chance that two or more have the same birthdays. And sure enough, we did end up having more than… Read more

0 comments, 1,573 reads

Posted in Mala's Data Blog on 24 April 2017

### Normal approximation to binomial distribution using T-SQL and R

In the previous post I demonstrated the use of binomial formula to calculate probabilities of events occurring in certain situations. In this post am going to explore the same situation with a bigger sample set. Let us assume, for example, that instead of 7 smokers we had 100 smokers. We… Read more

0 comments, 385 reads

Posted in Mala's Data Blog on 17 April 2017

### The Binomial Formula with T-SQL and R

In a previous post I explained the basics of probability. In this post I will use some of those principles to see how to solve certain problems. I will pick a very simple problem that I found in a statistics textbook. Suppose I have 7 friends who are smokers. The… Read more

3 comments, 1,543 reads

Posted in Mala's Data Blog on 10 April 2017

### Sampling Distribution and Central Limit Theorem

In this post am going to explain (in highly simplified terms) two very important statistical concepts – the sampling distribution and central limit theorem.

The sampling distribution is the distribution of means collected from random samples taken from a population. So, for example, if i have a population of life… Read more

0 comments, 270 reads

Posted in Mala's Data Blog on 3 April 2017

### Basics of Probability

In this post am going to introduce into some of the basic principles of probability – and use it in other posts going forward. Quite a number of people would have learned these things in high school math and then forgotten – I personally needed a refresher. These concepts are… Read more

0 comments, 2,358 reads

Posted in Mala's Data Blog on 20 March 2017

### TSQL2sday – Daily database WTF

This month’s TSQL Tuesday is organized by Kennie T Pontoppidan(t) – the topic is ‘Daily Database WTF‘ – or a horror story from the database world. As someone who has worked databases for nearly two decades, there are several of these – I picked one of… Read more

0 comments, 264 reads

Posted in Mala's Data Blog on 12 March 2017

### Generating Frequency Table

This week’s blog post is rather simple. One of the main characteristics of a data set involving classes, or discrete variables – are frequencies. The number of times each data element or class is observed is called its frequency. A table that displays the discrete variable and number of times… Read more

0 comments, 2,294 reads

Posted in Mala's Data Blog on 6 March 2017

### The Empirical Rule

I am resuming technical blogging after a gap of nearly a month. I will continue to blog my re learning of statistics and basic concepts, and illustrate them to the best of my ability using R and T-SQL where appropriate.

For this week I have chosen a statistical concept called… Read more

0 comments, 1,870 reads

Posted in Mala's Data Blog on 27 February 2017