A Lightweight, Self-adjusting, Baseline-less Data Monitor

Question

A Lightweight, Self-adjusting, Baseline-less Data Monitor

Alex Chamchourine

SSC Eights!

Points: 912
More actions
February 22, 2016 at 9:54 pm

#308659

Comments posted to this topic are about the item A Lightweight, Self-adjusting, Baseline-less Data Monitor

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

kenambrose SSCrazy Points: 2513 More actions · Answer 1

I've advocated for years that many of our metrics for data quality should be based on statistical analysis rather than brute force counting of every single value in every single row.

At one of my contracts we were collecting data from a consumer "internet streaming" device that "called home" constantly while in use in millions of consumer locations.

There were times when a firmware update could - and did - unintentionally break "part" of the data collection but not all of it. For example the volume of data coming in might be normal, but a particular data value might be completely broken.

I implemented a sampling algorithm in TSQL that dynamically calculated the standard deviation for occurrence and range of selected values in a table on a daily basis and could then determine if the current/latest data load contained any value ranges that exceeded the standard deviation.

So we could quickly detect any unplanned/unexpected loss of data in the collection system. Without an automated system it would have been prohibitively expensive to manually verify the data on a daily basis...

And without statistical analysis it would have been too expensive in computing resources and time.

GPO SSCarpal Tunnel Points: 4574 More actions · Answer 2

Excellent article and some very cogent comments by kenambcrose.

...One of the symptoms of an approaching nervous breakdown is the belief that ones work is terribly important.... Bertrand Russell

Alan Burstein SSC Guru Points: 61152 More actions · Answer 3

Excellent article, well written and an easy read. I need to look at/play around with your code a little more to understand what you're doing but I love the overall approach.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

David McKinney SSChampion Points: 10472 More actions · Answer 4

Very interesting article...having done a little reading recently on Kaggle and analysing massive datasets with Python and R, I wonder if the recent R extensions to SQL Server would not be best suited for this type of analysis? Any thoughts?

Steve Jones - SSC Editor SSC Guru Points: 740271 More actions · Answer 5

The R or Python stuff is useful if the statistical work is more complex. Most of what can be done in R can easily work in Python, which is a great language for linking systems or data together.

The R Services will allow you to run R in SQL Server without the data movement in/out of SQL, which is nice, but that's only 2016. For most people, and most systems, this won't be reality for years.

akljfhnlaflkj SSC Guru Points: 76202 More actions · Answer 6

akljfhnlaflkj

SSC Guru

Points: 76202

March 9, 2016 at 4:43 pm

#1864139

Great article, thanks.

zintp SSChasing Mays Points: 614 More actions · Answer 7

Very interesting article, that I tucked away and recently had a reason to revisit. I am not sure I am understanding it completely. Do you have any other references to the technique you are using that you could share? Is this just a standard statistical method of forming a probability distribution from actual data?

Thanks for any pointers you can provide.

logitestus SSCrazy Points: 2928 More actions · Answer 8

I just spent 30 minutes reading and re-reading this article. It is well thought out, noting it's strong points and where it can be weak. The writer presents the ideas in a clear and concise manner. Bravo! I wish more articles discussing an interesting idea were like this one.

John N Hick Ten Centuries Points: 1320 More actions · Answer 9

I am sorry that I didn't see this last year - great article!