Why Getting Data Right Matters

Question

Why Getting Data Right Matters

Louis Davidson (@drsql)

SSCommitted

Points: 1664
More actions
January 18, 2020 at 12:00 am

#3714992

Comments posted to this topic are about the item Why Getting Data Right Matters

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

peter-512402 SSC Rookie Points: 33 More actions · Answer 1

Thanks for this , Louis. BTW, I have your book on DB design and enjoy it thoroughl. I appreciate you bringing attention to the importance of solid , up-front DB design (data modeling ). In this age of agile dev, many devs still fail into the trap of throwing poor models together to quickly get software out the door, only to find that once it's in production and collecting data, that "refactoring" is not simple.

skeleton567 SSCertifiable Points: 5936 More actions · Answer 2

At one company where I worked the business was the design and manufacture of large scale equipment for indoor and outdoor facilities. I was actually constrained by management from creating integrity constraints and carefully validating data as it was entered into the system. This was in the early days of CAD software, and the standard was 'let it be put into the system and we'll worry about getting it correct later'. We saw some really strange blueprints come out of the design department.

Another instance I have mentioned here before was at a company where I discovered a uniform SQL logic error in a subsystem of about a dozen reports. The underlying data was correct but the logic produced invalid results. The sad part was that the timid project managers wouldn't implement the fixes, so the invalid reporting continued until the system was retired a number of years later.

Often the validity of data both in design and logic is of greater import to us than it is to our users, in spite our ability to improve quality and prevent reentry and rework.

Rick
If you do a half-assed job of things, folks will ask 'why did this ass only do half the job?'

Robin Wilson SSCertifiable Points: 5258 More actions · Answer 3

I have had to work on a popular college MI system (still in use today) in a few places where the tables and fields all have very unclear names based around a cryptic naming convention used by some previous versions of government software it exported some of its data to, the last of which was replaced in 2013 (the previous one was about 8 years before this). Data that is supposed to be unique per student and required can be entered multiple times, not created at all or created and left NULL or blank. Other data is stored in a parent/child structure in the same table but correct hierarchy cannot be enforced. This makes useful and accurate reporting from the system a real challenge. Added to this there is little to no validation of entry and the UI design means that often entering a value and pressing save does not actually save the whole screen so people lose what they have entered and there is no warning.

Also the system has no primary keys or clustered indexes making every table a heap and reporting also really slow (to add to the fact I have to rank the possibly duplicated records in various places). I did eventually manage to convince the software company to add these clustered indexes but only after a great deal of pushback and telling me it was "special" and that clustered indexes were not required.

Database design remains poor and it still has no primary keys with a single table holding the next sequence where a lock must be placed on this table in every transaction to ensure no overlap occurs whilst the sequence is read, used then incremented, making this a serious bottleneck, especially for batch operations. This also makes inserting new data via SQL complicated as well as slow.

I feel it is important to point out flaws in structures/software in the hope that one day software companies may fix them but whilst large organisations keep paying them to maintain the substandard version of their software, there is little incentive to improve it and the people who make the purchasing decisions are rarely the ones who must use the software, with any demos based around high level concepts and strategic goals.

When the support desk close your calls with a status of "By Design" (or buy a bigger monitor when it doesn't support scaling) it is quite depressing and it is obvious they either couldn't care less or agree with you but are powerless to enact change.

The software house do have the best user groups and sales incentives so nothing changes. Their software has exactly the same interface it had 30 years ago and some of the same bugs (from checking old user guides and notes).

skeleton567 SSCertifiable Points: 5936 More actions · Answer 4

I've had the same experience with commercial software packages with their own designs. I guess there are reasons that these db' s often are lacking good design, beyond just plain incompetence. Let's face it, if they can have fewer support issues by eliminating validity check errors, their support is easier. Allow the invalid data and avoid a support complaint. If the user enters invalid data, let it go. Not our problem.

I even had a proposal from a support group one time to remove all integrity constraints from production databases so there would be fewer support issues because all problems were supposed to be discovered in development.

Life as a DBA can be scary.

Rick
If you do a half-assed job of things, folks will ask 'why did this ass only do half the job?'

Robin Wilson SSCertifiable Points: 5258 More actions · Answer 5

I think sometimes with these companies, they get a lot of income from paid support too so it may not be in their interests to make things easy to use or logical and maybe allowing things to go badly wrong is a "feature".

Robin

skeleton567 SSCertifiable Points: 5936 More actions · Answer 6

Robin Wilson wrote:

I think sometimes with these companies, they get a lot of income from paid support too so it may not be in their interests to make things easy to use or logical and maybe allowing things to go badly wrong is a "feature".
Robin

Robin, if it were my company, I would be putting top priority on reliability of systems and accuracy and safety of clients data for the simple reason that support is expensive to provide. I don't know but would not expect that many software houses actually find support to be a profit center unless they expend the effort up front on 'getting data right'.

Throughout my 42 years in IT I tried to base my efforts on the premise that support costs arise from how many support events must be handled. Lack of functional reliability and data accuracy would predictably lead to much more support effort, much more client dissatisfaction, and loss of future sales to competition.

Right now I'm sitting in my home office in front of windows looking out on a snowy day. Now, if one of those windows is broken, my furnace is going to run longer and more frequently to keep me at a comfortable temperature, and that is possible as long as I pay the heating bill. On the other hand, if I repair the window, I can have the same temperature result while saving on the fuel cost and discomfort.

During my 11 years of management responsibility combined with being a primary designer and developer, I found that fixing issues when they first occurred meant that I didn't have to spend time doing the same remediation multiple times. That's why it was so frustrating to be in positions where there was so little focus on maintaining existing systems and developers were not held responsible for their own code.

One of the greatest contributors to overall success is getting it right. Over the years I've seen any number of businesses fail, especially smaller ones, due to owners and managers failing to realize that constantly pulling resources out of a company while neglecting problems leads to almost certain failure.

One of my sons, the only one of four who is NOT in IT, started and owns an HVAC company with about sixteen employees. He knows that if they don't get service calls (support) right the first time, on the call back his expenses for travel, wages, and often even materials go on while he is not able to bill additionally for the second trip. This is even more critical in situations of built-in or contract support .

I often recommend a young developer called Sergiy who through his company Propersoft has created and maintains a group of very good financial data manipulation and conversion utilities. If I have a technical question that requires support, I usually get an email answer within a few minutes, and recently he provided a modification that fixed a problem that was not even with his software, all within a 24-hour time frame.

In contrast, one place I spent eleven years as a DBA was such that I propose the slogan:

"We may not be good, but we're slow".

This reply was modified 6 years, 5 months ago by skeleton567.

Rick
If you do a half-assed job of things, folks will ask 'why did this ass only do half the job?'

xsevensinzx One Orange Chip Points: 25560 More actions · Answer 7

I mean, easier said than done. I work primarily in data science (machine learning & AI). My job is to essentially automate, create, manage, deploy, and support these so called data infrastructures for these data scientist to use. Ideally, this article targets people like me tasked with trying to wrangle the beast.

The issue is, most data science projects start with a question about the data you don't have. Remember, MOST data warehousing projects start off by asking the business user what questions they want to solve with the database. This is what essentially will define what the data warehouse or any database for that matter will store. You're not going to just through a bunch of random data into a database and hope the business user will need it. You're going to do some discovery first, then start to do some logical planning of that data before you start implementing the physical models.

That said, sure, once you actually have an idea of what data you eventually will use, then having a structure and system in place to correctly implement and support that data is good. However, a lot of what data science is can be answering a question about the data one time and not needing it again too.

Unfortunately, the "science" part of data science is the chaotic piece that makes using best practices hard for a lot of organizations. It also makes it hard to use the traditional RDBMS in general for some too. I think a better focus on how to quickly ingest and throw away data is better time spent than trying to conform the world to a strict organizational structure of your data that will only slow the business down on critical scientific discoveries (as they say hah).