SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Data Quality


Data Quality

Author
Message
Steve Jones
Steve Jones
SSC Guru
SSC Guru (149K reputation)SSC Guru (149K reputation)SSC Guru (149K reputation)SSC Guru (149K reputation)SSC Guru (149K reputation)SSC Guru (149K reputation)SSC Guru (149K reputation)SSC Guru (149K reputation)

Group: Administrators
Points: 149606 Visits: 19448
Comments posted to this topic are about the item Data Quality

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
Rob L-566658
Rob L-566658
SSC-Enthusiastic
SSC-Enthusiastic (106 reputation)SSC-Enthusiastic (106 reputation)SSC-Enthusiastic (106 reputation)SSC-Enthusiastic (106 reputation)SSC-Enthusiastic (106 reputation)SSC-Enthusiastic (106 reputation)SSC-Enthusiastic (106 reputation)SSC-Enthusiastic (106 reputation)

Group: General Forum Members
Points: 106 Visits: 116
I do agree on the points about data quality and in certain industries, it does seem that technology hasn't been fully embraced.

However, as IT professionals, we can make a significant difference with the systems we deliver by putting in strict validation and making data entry elements mandatory to prevent essential facts from being omitted.

I know making an application "idiot proof" isn't an exact science nor is it always possible to second guess every crazy thing an end user will try but we can definitely make a difference.

Also, the example of estate (real) agents may also be partly down to a liberal attitude to the truth in some cases - sales people huh Smile
P Jones
P Jones
SSCrazy
SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)

Group: General Forum Members
Points: 2916 Visits: 1525
It's not all data quality, much of it in the property business is "marketing" - don't reveal the negatives.

If the norm is three parking spaces and the listing doesn't say anything people will assume the norm and view the property. Then maybe they'll still like it when they find it only has one or two spaces whereas they would not even have viewed if the listing had said only two spaces.

I can only speak from an England and Wales perspective (Scotland may be in the UK but it has a very different house purchasing system) but you rapidly learn to read between the lines with estate agents blurb.

As for data quality at work, I will always highlight issues that I see in data that can be corrected and this is usually welcomed bythe relevant department because I can put data together in different ways that they cannot. I consider it a part of my DBA/developer role.
Brad Allison
Brad Allison
SSC Eights!
SSC Eights! (988 reputation)SSC Eights! (988 reputation)SSC Eights! (988 reputation)SSC Eights! (988 reputation)SSC Eights! (988 reputation)SSC Eights! (988 reputation)SSC Eights! (988 reputation)SSC Eights! (988 reputation)

Group: General Forum Members
Points: 988 Visits: 346
My thinking is that years ago, data integrity was not as important as people are realizing it is today. I am just finishing up a data mining class and one of the biggest issues with target variable results is bad data. As a programmer in the food industry, I am mindful of our control limits on some of our critical data that is needed for GMP and FDA purposes. Hopefully years from now when the analysts are mining data that has been entered from one of my applications, they will be happy that somebody took time to work closely with the "key" players (namely our Quality department) and have those data validations in place.
Ken McKelvey
Ken McKelvey
SSCarpal Tunnel
SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)

Group: General Forum Members
Points: 4169 Visits: 8444
As sometimes excessive data validation can cause users not to enter anything, another approach is to produce regular data quality reports grouped by department, user etc.

If management want information based on missing data, it is amazing how the quality of input can improve!
Eric M Russell
Eric M Russell
One Orange Chip
One Orange Chip (29K reputation)One Orange Chip (29K reputation)One Orange Chip (29K reputation)One Orange Chip (29K reputation)One Orange Chip (29K reputation)One Orange Chip (29K reputation)One Orange Chip (29K reputation)One Orange Chip (29K reputation)

Group: General Forum Members
Points: 29593 Visits: 11542
The problem is that 3rd party websites that aggregate listings from multiple sources can't be too choosy about their data, because they have no control over the data entry etiquette of individual agents, and they don't want to drop listings just because the agent happens to be sloppy (or even a little dishonest).
One solution I'd suggest is that they rank and sort their listing based on completeness of the records. This would involve not only checking for missing data but also flagging records that are statistically outside the norm. For example, if a house in a new subdivision is listed as having 5 bedrooms and 2,500 square feet, but all the other houses have 4 bedrooms and less than 2,200 square feet. Also, if a property is listed as zoned for residential/business multiuse, but comparing the zipcode and street address against a county database suggests otherwise.
These questionable records could be sorted near the bottom of the list along with a special icon or notation advising the user of a possible discrepancy.


"The universe is complicated and for the most part beyond your control, but your life is only as complicated as you choose it to be."
DougGifford
DougGifford
SSC-Addicted
SSC-Addicted (449 reputation)SSC-Addicted (449 reputation)SSC-Addicted (449 reputation)SSC-Addicted (449 reputation)SSC-Addicted (449 reputation)SSC-Addicted (449 reputation)SSC-Addicted (449 reputation)SSC-Addicted (449 reputation)

Group: General Forum Members
Points: 449 Visits: 860
I think it may be a liability issue. Some agents leave information, like sq ft and car spaces, off the form because they do not want to be sued if the information is wrong.

"When in danger or in doubt. Run in circles, scream and shout!" TANSTAAFL
"Robert A. Heinlein"
Matt Penner
Matt Penner
SSC Rookie
SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)

Group: General Forum Members
Points: 31 Visits: 114
I work for a school district in the US that serves grades K-12 and also work with another 21 agencies in my county. The data quality is extremely varied between the districts. As California is attempting to switch to a statewide student data system it is revealing that many of the districts in California just simply do not have the infrastructure to even manage the data and, of those that do, for many data quality has been an after thought.

What I have realized in working with our district (who I have to say is consistently in the top tier regarding quality) is it is a joint effort between the IT staff, the administration staff and the data entry staff themselves (clerks and teachers for the most part).

It is quite evident that data entry staff are only really concerned with data that affects them personally. For instance, student test scores and grades, their teacher and courses and their attendance record are of very high quality. This is because this affects the teacher/student on a daily basis. The attendance record directly impacts the district financially.

As we started several years ago to clean up other areas we would encourage staff as to their importance in the grand view of things but, unless they had a personal ambition to do high quality work, many felt they couldn't spend the time.

We use several 3rd party software systems and have many internal applications. Unfortunately many vendors just do not put a value on data quality. Some might argue that we should switch vendors but our vendor is actually one of the best (some others are far worse) and the current investment is too time/financially prohibitive. Some of our most trusted systems allow users to enter bad data all the time. Since we have many custom database services that move our data around, bad data in one system can often cause errors when moving to other systems.

One of the best things we did was to create our own internal audit tracking system. It tracks hundreds of different data points every day and emails out reports to the relevant administration staff. Having an audit system that catches errors within hours (not months) has proven to be very effective. As users start to get reports on errors they entered the day before they start taking a personal investment in their data and they also get better at verifying data they enter in the first place.

We still have a long way to go but I believe it is systems like this that empower users and aid accountability that help us constantly be at the top.
GSquared
GSquared
SSC Guru
SSC Guru (59K reputation)SSC Guru (59K reputation)SSC Guru (59K reputation)SSC Guru (59K reputation)SSC Guru (59K reputation)SSC Guru (59K reputation)SSC Guru (59K reputation)SSC Guru (59K reputation)

Group: General Forum Members
Points: 59309 Visits: 9730
The only way I've ever seen this work is to take the data quality, analyze it, report on it, and make it part of the annual review (salary and raise issue).

That will have zero impact on salespeople, since their whole value to the company and to themselves is sales volume, but for anyone else, it can help increase quality.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Eric M Russell
Eric M Russell
One Orange Chip
One Orange Chip (29K reputation)One Orange Chip (29K reputation)One Orange Chip (29K reputation)One Orange Chip (29K reputation)One Orange Chip (29K reputation)One Orange Chip (29K reputation)One Orange Chip (29K reputation)One Orange Chip (29K reputation)

Group: General Forum Members
Points: 29593 Visits: 11542
What I like are the realestate websites where you can see actual floor plans or recorded video allowing a virtual tour. A picture is worth a 1,000 data items.


"The universe is complicated and for the most part beyond your control, but your life is only as complicated as you choose it to be."
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search