Data Quality

  • Comments posted to this topic are about the item Data Quality

  • I do agree on the points about data quality and in certain industries, it does seem that technology hasn't been fully embraced.

    However, as IT professionals, we can make a significant difference with the systems we deliver by putting in strict validation and making data entry elements mandatory to prevent essential facts from being omitted.

    I know making an application "idiot proof" isn't an exact science nor is it always possible to second guess every crazy thing an end user will try but we can definitely make a difference.

    Also, the example of estate (real) agents may also be partly down to a liberal attitude to the truth in some cases - sales people huh 🙂

  • It's not all data quality, much of it in the property business is "marketing" - don't reveal the negatives.

    If the norm is three parking spaces and the listing doesn't say anything people will assume the norm and view the property. Then maybe they'll still like it when they find it only has one or two spaces whereas they would not even have viewed if the listing had said only two spaces.

    I can only speak from an England and Wales perspective (Scotland may be in the UK but it has a very different house purchasing system) but you rapidly learn to read between the lines with estate agents blurb.

    As for data quality at work, I will always highlight issues that I see in data that can be corrected and this is usually welcomed bythe relevant department because I can put data together in different ways that they cannot. I consider it a part of my DBA/developer role.

  • My thinking is that years ago, data integrity was not as important as people are realizing it is today. I am just finishing up a data mining class and one of the biggest issues with target variable results is bad data. As a programmer in the food industry, I am mindful of our control limits on some of our critical data that is needed for GMP and FDA purposes. Hopefully years from now when the analysts are mining data that has been entered from one of my applications, they will be happy that somebody took time to work closely with the "key" players (namely our Quality department) and have those data validations in place.

  • As sometimes excessive data validation can cause users not to enter anything, another approach is to produce regular data quality reports grouped by department, user etc.

    If management want information based on missing data, it is amazing how the quality of input can improve!

  • The problem is that 3rd party websites that aggregate listings from multiple sources can't be too choosy about their data, because they have no control over the data entry etiquette of individual agents, and they don't want to drop listings just because the agent happens to be sloppy (or even a little dishonest).

    One solution I'd suggest is that they rank and sort their listing based on completeness of the records. This would involve not only checking for missing data but also flagging records that are statistically outside the norm. For example, if a house in a new subdivision is listed as having 5 bedrooms and 2,500 square feet, but all the other houses have 4 bedrooms and less than 2,200 square feet. Also, if a property is listed as zoned for residential/business multiuse, but comparing the zipcode and street address against a county database suggests otherwise.

    These questionable records could be sorted near the bottom of the list along with a special icon or notation advising the user of a possible discrepancy.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I think it may be a liability issue. Some agents leave information, like sq ft and car spaces, off the form because they do not want to be sued if the information is wrong.

    "When in danger or in doubt. Run in circles, scream and shout!" TANSTAAFL
    "Robert A. Heinlein"

  • I work for a school district in the US that serves grades K-12 and also work with another 21 agencies in my county. The data quality is extremely varied between the districts. As California is attempting to switch to a statewide student data system it is revealing that many of the districts in California just simply do not have the infrastructure to even manage the data and, of those that do, for many data quality has been an after thought.

    What I have realized in working with our district (who I have to say is consistently in the top tier regarding quality) is it is a joint effort between the IT staff, the administration staff and the data entry staff themselves (clerks and teachers for the most part).

    It is quite evident that data entry staff are only really concerned with data that affects them personally. For instance, student test scores and grades, their teacher and courses and their attendance record are of very high quality. This is because this affects the teacher/student on a daily basis. The attendance record directly impacts the district financially.

    As we started several years ago to clean up other areas we would encourage staff as to their importance in the grand view of things but, unless they had a personal ambition to do high quality work, many felt they couldn't spend the time.

    We use several 3rd party software systems and have many internal applications. Unfortunately many vendors just do not put a value on data quality. Some might argue that we should switch vendors but our vendor is actually one of the best (some others are far worse) and the current investment is too time/financially prohibitive. Some of our most trusted systems allow users to enter bad data all the time. Since we have many custom database services that move our data around, bad data in one system can often cause errors when moving to other systems.

    One of the best things we did was to create our own internal audit tracking system. It tracks hundreds of different data points every day and emails out reports to the relevant administration staff. Having an audit system that catches errors within hours (not months) has proven to be very effective. As users start to get reports on errors they entered the day before they start taking a personal investment in their data and they also get better at verifying data they enter in the first place.

    We still have a long way to go but I believe it is systems like this that empower users and aid accountability that help us constantly be at the top.

  • The only way I've ever seen this work is to take the data quality, analyze it, report on it, and make it part of the annual review (salary and raise issue).

    That will have zero impact on salespeople, since their whole value to the company and to themselves is sales volume, but for anyone else, it can help increase quality.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • What I like are the realestate websites where you can see actual floor plans or recorded video allowing a virtual tour. A picture is worth a 1,000 data items.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Data quality is one of my biggest pet peeves. It's really not that hard to ensure quality data. The hardest part is convincing the business it's important BEFORE they want reports on the data entered. In many cases if you are at the point you're reporting on data, it's too late.

    My two personal favorites are free text fields and developers who hate foreign keys.

    For me it's part of a larger question about quality in general. What happened to doing it right the first time?

    EDIT

    Forgot to add the infamous 'Let's not make the customer enter the information correctly; instead you should just fix it with code when you're creating the report!'.

    -------------------------------------------------------------------------------------------------
    My SQL Server Blog

  • My wife is a real estate agent, and places a great deal of importance in the accuracy of the data she enters in MLS. She does the due diligence before entering the record in the system. That takes effort and footwork, something not always attended to by some other agents.

    When you come right down to it, the efficacy of a human-driven data system is dependent on the dedication and attention to detail provided by the human. I'm convinced that garbage data is the sign of laziness, and I'd be hesitant to work with an agent who doesn't take pride in their work, MLS listings included.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply