SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 

Normal Data

By Steve Jones,

This editorial was originally published on Jan 28, 2015. It is being republished as Steve is out of the office.

I thought this was an interesting post: Your data is not normal. It talks about the fact that we often assume some normal distribution when working with data. We seem to think we can extrapolate, often in a linear fashion, based on past experience. We often use functions and write queries that tend to think of normality as having some standard distribution.

However that's not often the case. It's not necessarily even the best way to begin looking at data for patterns. We might need to apply a number of different types of algorithms to a set of data to see what types of analysis might be appropriate. Not sure what I mean? Let me give you a few examples.

As a speaker, I often get feedback from my sessions as a series of forms where people have rated the session on a scale and possibly included comments. Many speakers receive the data in the same format, but I've often seen speakers get one form that rates them poorly, or has a complaint. The speaker will then view that one item as somehow representative of the session. This despite the fact that they might have ten forms with high ratings. They're somehow assigning more weight to the negative rating then the all the others. There's perhaps some value here in doing so, since there might be a legitimate complaint, but often we need to discard this one form as an outlier.

As another example, I worked with an import firm that was trying to estimate future purchases based on past ones for ordering purposes. Initially they looked at an average of the last three months (rolling) to place future orders. However I pointed out that we had some seasonal fluctuations in our business. There were a few times a year that the rolling average would dramatically under or over estimate the product we needed. As we dug deeper into our historical data, we learned that a better estimate required some level of complexity in our formulas, but we also realized we needed to allow for human overrides when other information was available. That was a fun project, but one that was quite complex when we analyzed the data.

There are numerous other examples I could give, but the point is this. Learn about your data. Dig into patterns, and don't assume that normal for one set of data looks anything like normal for other sets.

 
Total article views: 185 | Views in the last 30 days: 1
 
Related Articles
BLOG

Growing the Pool of SQL Server Speakers

In a nutshell, I’m interested in growing and diversifying the pool of speakers that focus on SQL Ser...

ARTICLE

Normalizing Performance

Determining the best performance value you can get for your effort is not as straightforward as you ...

ARTICLE

Rollbacks Are Normal

At Google, rollbacks from deployments are normal.

FORUM

Is this normalized?

...or have done incorrect table normalization?

FORUM

What is Normalizations?

What is Normalizations?

Tags
data analysis    
editorial    
 
Contribute