Calculating odds ratio with T-SQL and R

Question

Post reply

Calculating odds ratio with T-SQL and R

Diligentdba 46159

SSCommitted

Points: 1554
More actions
August 30, 2016 at 12:03 am

#312538

Comments posted to this topic are about the item Calculating odds ratio with T-SQL and R

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply

Jonathan Mallia SSCertifiable Points: 5192 More actions · Answer 1

Jonathan Mallia

SSCertifiable

Points: 5192

August 30, 2016 at 2:04 am

#1898233

Thanks for the article!

reece.watkins SSC Rookie Points: 36 More actions · Answer 2

Perhaps my maths are a bit rusty from years of disuse, but it seems the formulas stated and the formulas coded for the confidence levels don't match. The article states:

Lower confidence interval = Log(OR) - 1.96* Standard Error* LN(OR)

Upper confidence interval = Log(OR) + 1.96* Standard Error* LN(OR)

but the examples are coded as if the formulas should be:

Lower confidence interval = (Log(OR) - 1.96* Standard Error)* LN(OR)

Upper confidence interval = (Log(OR) + 1.96* Standard Error)* LN(OR)

My apologies if I've missed something!

Thanks and regards,

Reece Watkins

akljfhnlaflkj SSC Guru Points: 76202 More actions · Answer 3

akljfhnlaflkj

SSC Guru

Points: 76202

August 30, 2016 at 6:30 am

#1898282

Thanks for the education.

Robert Sterbal SSChampion Points: 11079 More actions · Answer 4

Are you blogging about this elsewhere? I know a small group of people who might enjoy the content without the technical stuff.

412-977-3526 call/text

Diligentdba 46159 SSCommitted Points: 1554 More actions · Answer 5

Sorry for missing the brackets there. Will get that corrected, good find, thank you.

Diligentdba 46159 SSCommitted Points: 1554 More actions · Answer 6

Hello, yes I do plan to write on the same lines at curiousaboutdba.com, my personal blog. I write one every week and there are a few there. Will submit more interesting ones to be published here. My posts will have technical stuff though as my goal is to educate and learn on T-SQL versus R too. I hope that is ok. For basic understanding you can skip that part. Thank you.

Robert Sterbal SSChampion Points: 11079 More actions · Answer 7

That is exactly what I wanted to know. Thanks!

This is the data load part of the article? https://curiousaboutdata.com/category/sql-server/dba/

412-977-3526 call/text

Diligentdba 46159 SSCommitted Points: 1554 More actions · Answer 8

Yes it is. And I have a link to that in the article too. Thank you!!

MMartin1 One Orange Chip Points: 27879 More actions · Answer 9

It seems to me that the odds ratio is equivalent to a percentage. n1/n2 only tells me how many of the bottom subgroup there are for each one of the top subgroup. The 'odds' is relative to the total population, ,,, n1/(n1+n2). This is the odds of someone being in group n1 relative to the total population/universe. So maybe I am mistaken but the statistical math may have been layed out incorrectly here, based on using the term "odds" incorrectly. There could have been many subgroups here, not just two.

The definition of a confidence interval is not best explained. It read more like a refresher for those that already knew this.

Though a good illustration into the basic syntactical structure of R.

----------------------------------------------------

Diligentdba 46159 SSCommitted Points: 1554 More actions · Answer 10

Hello, The difference between probability and odds ratio are best explained http://mathforum.org/library/drmath/view/56706.html.

I intentionally did not go into too much math with confidence interval, this is a very basic level post , the sql server world is not full of statisticians and many people get very intimidated by too much explanation with strange terms. Making it simplistic is wholly intentional and clearly if you know advanced statistics this is absolutely low level for you.

Thank you.

MMartin1 One Orange Chip Points: 27879 More actions · Answer 11

Odds ratio is better explained here since there is a more detailed example https://en.wikipedia.org/wiki/Odds_ratio

It makes better sense now, though I wouldn't call this simplistic necessarily.

the sql server world is not full of statisticians and many people get very intimidated by too much explanation with strange terms.

Is this a fact or an assumption? ( I am getting into statistics here 😀 )

Still for those who are new to stats is why I considered it nicer to have a more involved definition of terms. I would not underestimate the competence level of members here to pick up new knowledge.

----------------------------------------------------

Diligentdba 46159 SSCommitted Points: 1554 More actions · Answer 12

' I would not underestimate the competence level of members here to pick up new knowledge. ' - neither do I but I do think the average sql server person generally does not need an indepth understanding of confidence levels and other statistical terms. R programming is picking up just now and the vast majority of sql server jobs do not necessarily demand it as a skill. The problem is around where to draw the line between getting terribly statistical and keeping it relevant to the readers who are basically sql server people, not statisticians. That is why I like to start from simple basics and keep new terms minimal. And also to take it to level of data analysis and presenting findings rather than explaining a whole lot of raw statistics to them. Thanks in advance for your understanding, and appreciate insightful comments/criticism on other posts I have coming.

MMartin1 One Orange Chip Points: 27879 More actions · Answer 13

Diligentdba 46159 (8/31/2016)
' I would not underestimate the competence level of members here to pick up new knowledge. ' - neither do I but I do think the average sql server person generally does not need an indepth understanding of confidence levels and other statistical terms. R programming is picking up just now and the vast majority of sql server jobs do not necessarily demand it as a skill. The problem is around where to draw the line between getting terribly statistical and keeping it relevant to the readers who are basically sql server people, not statisticians. That is why I like to start from simple basics and keep new terms minimal. And also to take it to level of data analysis and presenting findings rather than explaining a whole lot of raw statistics to them. Thanks in advance for your understanding, and appreciate insightful comments/criticism on other posts I have coming.

No problem at all. I do appreciate the column and your feed back as well. Though if you'll allow me to be more specific with a few items >>>

1.

what can be deemed to be the most commonly used statistical concept - the odds ratio

I still think I see probabilities way more often. I've not heard odds ratio , for example, in election polls.

2.

Simply put, odds are expressed as ratios while probability is expressed as a fraction or a percentage of an outcome.

Here people can still be confused, what is the difference between a ratio and a fraction? A percentage value can be 400% (4/1) where a ratio is always >=0 and <=1 if I recall correctly. Is this right? I think it worth explaining with the extra line or two to get better understanding in the rest of the article.

3.

We have to be able to say that 95% of the time the correlation between smoking status and health is in the range of x and y, where x and y are considered upper and lower confidence intervals.

What is meant by 95% of the time? This is what I was thinking specifically when it came to the confidence interval. I think it means if you repeated the experiment 100 times, the ratio would fall between your lower and upper range 95 times ==>95/100 is a strong case for the odds ratio suggested. I am not sure if means 95/100 smokers will develop bad health ... though I wouldnt doubt that either. 😛

Thanks again.

----------------------------------------------------

Diligentdba 46159 SSCommitted Points: 1554 More actions · Answer 14

1 & 2: Fraction: Chances for/Total Chances Odds: Chances for : Chances against. I am not debating what you have personally heard of

more. In my experience people use odds ratio a *lot* and a lot of people find probability and math very intimidating.

That does not necessarily mean they use the right concept mathematically, i have seen many use chances as chances for/total chances, without knowing they are technically using a probability ratio, but that is ok in my opinion atleast.

3 Am 95% confident that chances of smokers getting cancer are between 1.82 to 2.10 times higher than a non smoker.

95% is just a commonly used percentage in this context, like 20% sampling for sql server statistics.

http://www.mathbootcamps.com/interpreting-confidence-intervals/

"95% of the time, when we calculate a confidence interval in this way, the true mean will be between the two values. 5% of the time, it will not. Because the true mean (population mean) is an unknown value, we don’t know if we are in the 5% or the 95%. BUT 95% is pretty good so we say something like

“We are 95% confident that the mean time it takes all workers in this city to get to work is between 18.3 and 23.7 minutes.” This is a common shorthand for the idea that the calculations “work” 95% of the time."