SQLServerCentral Editorial

Data Science Sanity Checks

,

Why would any data scientist want to stop a business Intelligence analyst, or anyone else in the business, from making use of company data? There are many reasons, in my experience, but they boil down to this: all unusual changes in data need careful checking before they can provide business insights.

Imagine that you're working in an IT or telecommunications business, such as online trading. The activities of the business show continuous 'normal' variance on top of an underlying trend. There is a heartbeat in all human activity that is surprisingly predictable. Occasionally, some activity sticks out as being odd, as something that isn't part of the fuzz of normal activity. You investigate, because levels of trading normally change slowly over time, so you need to be able to explain sudden changes.

The first question is "is there an error in our data?" This will take some checking, phone calls and plenty of SQL. No, you decide, it is a real effect.

Second question: is it fraud? Every company I've ever worked for has been the focus of determined efforts at fraud, globally. Not yours, you say. How closely have you looked? This takes much head-scratching, searches through logs, calls to experts, and tapping into the gossip. Sometimes one gasps at the extent of misplaced human ingenuity.

The third question: "is someone using your trade as a money-laundering exercise". This involves the business, as it requires careful auditing of the trades. You don't want the business to confuse normal activity with money laundering. It always ends in tears.

At that point, having eliminated the obvious three fears, we ask, "Is this a sign of a real trend from which we, as a business, could benefit"? Only now is it time to call in the BI experts.

Why all the fussiness? If you reveal to a business manager a sudden growth in a part of the business, they'll go after it just as surely as a dog goes after a rabbit. It's their nature, and good executives are intuitive and impulsive. If you show the business a trend prematurely, before you've checked the data, the consequences can be dire.

If a business expands a venture based on a trend that, in reality, arises from the activity of fraudsters, exploiting the system, then a one million dollar loss can swiftly turn into a ten million dollar loss. I've seen salespeople threatened with the sack when a change in the database introduced a bug that artificially diminished the sales figures for a region. I've had to watch helplessly when a rapid increase in certain trades resulted in the business expanding and investing to exploit it, whereupon it turned out to be transitory money-laundering swindle.

Can we ever reach the stage where we can stand aside and let the business access data without a range of checks? I don't think so. The complexity of human ingenuity in undermining complex systems is astonishing, so we can never consider data safe and sanitized without constant vigilance.

Phil Factor.

Rate

4.5 (2)

You rated this post out of 5. Change rating

Share

Share

Rate

4.5 (2)

You rated this post out of 5. Change rating