Normal Data

Steve Jones, 2018-10-10 (first published: 2015-01-28)

This editorial was originally published on Jan 28, 2015. It is being republished as Steve is out of the office.

I thought this was an interesting post: Your data is not normal. It talks about the fact that we often assume some normal distribution when working with data. We seem to think we can extrapolate, often in a linear fashion, based on past experience. We often use functions and write queries that tend to think of normality as having some standard distribution.

However that's not often the case. It's not necessarily even the best way to begin looking at data for patterns. We might need to apply a number of different types of algorithms to a set of data to see what types of analysis might be appropriate. Not sure what I mean? Let me give you a few examples.

As a speaker, I often get feedback from my sessions as a series of forms where people have rated the session on a scale and possibly included comments. Many speakers receive the data in the same format, but I've often seen speakers get one form that rates them poorly, or has a complaint. The speaker will then view that one item as somehow representative of the session. This despite the fact that they might have ten forms with high ratings. They're somehow assigning more weight to the negative rating then the all the others. There's perhaps some value here in doing so, since there might be a legitimate complaint, but often we need to discard this one form as an outlier.

As another example, I worked with an import firm that was trying to estimate future purchases based on past ones for ordering purposes. Initially they looked at an average of the last three months (rolling) to place future orders. However I pointed out that we had some seasonal fluctuations in our business. There were a few times a year that the rolling average would dramatically under or over estimate the product we needed. As we dug deeper into our historical data, we learned that a better estimate required some level of complexity in our formulas, but we also realized we needed to allow for human overrides when other information was available. That was a fun project, but one that was quite complex when we analyzed the data.

There are numerous other examples I could give, but the point is this. Learn about your data. Dig into patterns, and don't assume that normal for one set of data looks anything like normal for other sets.

Data Sprawl

by Steve Jones

SQLServerCentral.com

Editorial

The challenges of data growth and sprawl can be compounded by the variety of tools and platforms available. Steve Jones notes that you might need to learn a bit about different technologies.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

(2)

You rated this post out of 5. Change rating

2015-09-22

136 reads

Discuss

Computational Biologist

by Steve Jones

SQLServerCentral.com

Editorial

A job Steve Jones has never heard of is using data to improve medical treatments.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2015-03-16

97 reads

Discuss

Fun Data Analysis

by Steve Jones

SQLServerCentral.com

Editorial

This week Steve wants to know if you'd have some issues when analyzing data and you've mistaken correlation for causation.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2014-10-10

176 reads

Discuss

Contract or Perm

by Steve Jones

SQLServerCentral.com

Editorial

If you are accepting a DBA position, does it make sense to work as a contractor or permanent employee?

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2007-11-21

242 reads

Discuss

Mini-Me

by Steve Jones

SQLServerCentral.com

Editorial

Will the next version of Windows be a "Mini-Me" version of Vista? Who knows, and it's too early to tell, but apparently there's a mini-kernel version of Windows 7, the one after Vista, which fits into 25MB on disk. That's a touch lower than the 4GB that Vista takes up. Granted it's not a full […]

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2007-10-25

141 reads

Discuss

Normal Data

Rate

Share

Categories

Share

Rate

Normal Data

Rate

Share

Categories

Share

Rate

Related content

Data Sprawl

Computational Biologist

Fun Data Analysis

Contract or Perm

Mini-Me