One Single View

  • Comments posted to this topic are about the item One Single View

  • I very much liked it when the phenomenon of the Balanced Score Card was introduced in one of my previous jobs. It is a good way to get rid of this problem of asynchonicity. And it's good for business. You're forced to rethink doing what you do (usually improving it) AND you have to agree on standards so everybody knows WHAT is being measured and HOW and WHEN it's being measured. It's a painstaking and lenghty process and not all the KPI's may be as solid as they should, but it makes for good communication and decisiveness, speeding up business as a whole. Thank god for Kaplan and Norton.

  • We, too, have been searching for the elusive single source of truth. Invariably, the answer is that 'we need a data warehouse'. My thinking is that we already HAVE one, and spending years writing ETL packages to move data from one platform to another just so it fits within the demands of today's flavor-of-the-month BI package is silly. Not to mention costly, time consuming, error prone and, generally, so unrealistic in the end that the problems are bigger than before.

    The problem, IMHO, is not that most companies lack data warehouses, they lack the ability to pick items out of those warehouses, particularly where that data resides on multiple, incompatible platforms. What is needed is a rules-based interface that can merge results from those separate platforms WITHOUT MOVING THE DATA. In order to begin to accomplish this, you need the business to go through the same exercise in defining just what means what in the business process (like having a single definition of margin, rather than definitions that fit the purposes of those who espouse them) as described above, and reaping the same benefits.

    There are vendors who are taking this approach and companies either benefiting or suffering from it, just like every other approach. But I believe the concept is sound--define the business term and filter the data through it to get a result instead of constantly shifting the data to somewhere else. For every somewhere else, the number of sources of 'truth' increase and they will never (according to Murphy) agree.

    ------------
    Buy the ticket, take the ride. -- Hunter S. Thompson

  • There is not necessarily a 'single truth'. (And I'm not trying to sound like a po-mo nutcase here)

    A business of any size has different functionalities (sales, production, maintenance, finance) that live in different worlds with different definitions. Sales as seen by the sales department (commissions, targets etc) can differ markedly from sales as seen from production (what can be shipped from stock, when does it need to be delivered) or sales as seen from purchasing.

    Even with finance, there is no such thing as an exact snapshot. At any given instant the company 'owes' money to vendors and employees, is 'owed' money by customers, some (often undetermined amount) money is pending in the financial institutions. The deeper you dig into it, the more you realize that the seemingly precise definitions simply do not clearly fit every item. Is a customer really a customer? Someone who buys one time? What if a customer is a company that gets purchased by another customer? Is it still a customer?

    Then there is the fact that information is usually (always?) incomplete. Some sales may not have been logged yet, a contract may have been signed but funds have not yet been sent, there are mistakes and mischaractarizations which will be corrected as they are discovered. And there are simply the innaccuracies due to normal time lags of information update.

    I question the assumption in the referenced article that somehow IT has a special grasp of 'truthiness'. IT may know the data, but it does not necessarily understand the nuances of definition that different departments are using. It does not necessarily understand the conditions and assumptions under which that data is gathered before coming into the system. IT may not even be in a position to determine which data (coming from different channels) should be considered duplicitous.

    ...

    -- FORTRAN manual for Xerox Computers --

  • Interesting idea, Bryant, of a shifting or sliding warehouse.

    From what I understand, which might be little, is that the warehouse should be the central source where you've consolidated the data from your disparate systems (CRM, Sales, Finanace, etc.) and that's a central, stable warehouse. It shouldn't conform to any soft of software or BI package.

    From the warehouse you might spin off various marts or cubes that deal with specific areas or departments. These might be based on some particular software or philosophy.

    Your way might work, but I'm not sure how to move Sales data into CRM to spot trends or work on analyzing the relationships between that data.

  • We just stepped right through data, through metadata, and landed right in metaphysics. I think there's a huge difference between data and results on one side, and truth on the other. Truth involves context, meaning rules, and definitions and circumstances allowing for the rules to be applicable. It also entails an agenda, which data intrinsically doesn't have (or - shouldn't). The fact that we can't always implement systems with complete flexiblity often introduces bias of one kind or another.

    I've found that most of the confusion and issues with presupposed "TRUTHS" are because the contexts are ignored. Items that are measured according to a certain set of rules are found to be accurate in a restricted set of circumstances are then compared to other items, often without even checking to see if you can even draw the comparison, if the assumptions involved are valid, if the methods are sound.

    Most of the analaytics systems, or BI, I've come across, involve an outstanding amount of user assumptions (what's a fact? what's an important dimension?), that often go unchallenged once they are established. A lot like statistics on data - just because these were "right" at one point doesn't mean they're still appropriate even a short while later.

    I tend to be uniquely suspicious of "single views of the truth" since they often graduate into absolute truths without the rigorous review they deserve. After all - under a "single view of the truth", the earth would still be flat, and we'd still be wondering why those apples keep falling and hitting us in the head.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • To achieve one true view you need "One true data source" and this is the stage that we're working towards in our business i.e. each piece of data is sourced from one particular source system only - typical example is an employee staff number. The version on the HR system is gospel and the time and attendance and payroll systems and Active Directory and anything else have to be amended to match that.

    A vital piece of information is the business data dictionary, defining where each data item is sourced. We have a number of different bought in and home built systems and bring the data together into data dump databases for reporting - they aren't proper data warehouses but are the start for building them.

    As soon as you start cross matching the data from different systems the discrepancies become apparent and the decision has been made to show the bad data as then they'll question and correct it.

    The recession and cost cutting has been to our advantage as the decision has been made not to buy expensive BI systems but to DIY using SQL Server IS, RS and to expand into AS as most of the sources are in SQL Server anyway. Other tools that bolt onto this will be under consideration, particularly as we will probably move to Office 2010 soon.

  • I think in many cases, a 'single view of the truth' is almost impossible to define. The way we tackle the question will depend on why we want the answer.

    Take a seemingly simple question: How many staff does the company have ?

    Do we count subcontracted employees, or directly employed staff only ?

    How do we count part time staff ? As 1 person, or as a fraction to represent the time they work ?

    Do we include or exclude women on maternity leave ? How about their temporarily-hired replacements ?

    Do interns and students count ?

    We might use different combinations depending on whether we are trying to work out how many desks to buy, how big the company is compared to last year, how many people to invite to the Xmas party, how much to budget for software licences, how many tax forms we need to complete for staff ...

    Best we can do is to agree on a single source of data. But data <> truth.

  • The single view of the truth is something I think all DBAs and "data people" want. We want to know that a customer is a customer is a customer. We want to normalize data and have relations that ensure we aren't duplicating data.

    Having one version of the truth begins with the business agreeing on and documenting what constitutes a fact and how facts make a higher truth. Within the database, tables must be created such that they contain one and only one record of each fact. A properly designed table has check and referential constraints that prevent a fact that isn't logically true from being inserted in the first place. I've worked on databases where there was no single version of the truth, and this inconsistency wasn't just between departments, it was inconsistent within the primary transactional tables themselves.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Thanks for that example. Truth is hard, and it gets even harder when we want to communicate the truth. There is so much room for interpretation. I wonder if a news report of a crash might say their were 114 passengers on a plane that crashed, I wonder if they would count the cello or miss a crew member.

    I work for a water and sewer district and will get requests like "I need to know how many connections we have in a city." It gets tricky because until you know the context and the purpose of the query that term connection has a lot of ambiguity, not to mention the term city.

    What is even worse is that I may save my query or report as say "ConnectionsByCity" and some one else will use that query again in a different context thinking it is an appropriate use because they don't question the terms.

    It would be nice to have the different definitions of "connection", and "city" like the airline does for "passenger" and then to have everyone know, or at least know of them. But that would mean work and toil and it seems that most don't see the problem as large enough or the benefits significant enough to work through these issues. Instead we toil away at this on a case by case basis, coming up with different definitions over and over again. Or we make poor assumptions and entertain false perspectives and make poor choices.

  • Todd Payne (7/12/2012)


    Thanks for that example. Truth is hard, and it gets even harder when we want to communicate the truth. There is so much room for interpretation. I wonder if a news report of a crash might say their were 114 passengers on a plane that crashed, I wonder if they would count the cello or miss a crew member.

    ...

    Imagine how an emergency response team would feel, if they rescued 113 passengers from a plane crash, but then spent an additional hour searching the burning wreckage for the elusive 114th passesgener, only to be later informed that passenger 114 was actually someone's cello.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (7/12/2012)


    Todd Payne (7/12/2012)


    Thanks for that example. Truth is hard, and it gets even harder when we want to communicate the truth. There is so much room for interpretation. I wonder if a news report of a crash might say their were 114 passengers on a plane that crashed, I wonder if they would count the cello or miss a crew member.

    ...

    Imagine how an emergency response team would feel, if they rescued 113 passengers from a plane crash, but then spent an additional hour searching the burning wreckage for the elusive 114th passesgener, only to be later informed that passenger 114 was actually someone's cello.

    Alternatively, if they recovered 114 bodies of 114 passengers, ended off, and then found out later that there had been an additional five or six "babies in laps" that they had left in the wreckage.

    Defintions have to be by context and by comparison to surrounding data. Languages simply work that way. That's why any view of "truth" or "data" has to include contextual definitions, to have real utility.

    The statement "we found 5 bodies" means completely different things in rescue work, in hiring temps, and in astronomy.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Ack. My business is currently dealing with just such a problem, and it's giving the accounting team fits. For one marketplace that we sell our goods in, the marketplace is several timezones over, and our sales are calculated on a weekly basis. Hence, the definition of a "week" becomes rather contentious. A week begins at midnight on Sunday, yes, but is that midnight our time or their time?

    This is made rather tricky because we actually get quite a few sales on that channel right around midnight, and those end up tumbling all over the place in the financial reports. On top of that, there's the problem that, even if a sale happens at, say, 5:00 in the afternoon, we may not get the report containing that order for a day or two. Or ten, in some bizarre cases. In that case, when did the order come in? Did it come in when the marketplace recorded the payment, or did it come in when we got the report?

    Complicating this further is the fact that we'll have to pay taxes on those sales every year, and by the time all is said and done, the figures that we have for weekly sales compared to the figures the marketplace has can differ by a very sizable amount! If we pay more in taxes than we should, that's lost profit, and if we pay less, that leads to a very ugly audit :-P.

    Thankfully, this is mostly the business of the accounting team, so I haven't been involved with the workings of the whole mess, but having seen this, I realize that I'll rapidly develop a headache once I fall into such a situation.

    - 😀

  • We have a data entry system and a data warehouse built from its data.

    Our problems with "single truth" are generally of the form that a report direct from the data entry system doesn't tally with the data in the reporting system. This is usually a timing issue, i.e. one is real time, one isn't.

    For example, we could define passenger consistently across the two, and still get people complaining that the numbers are inconsistent.

    Ultimately, this comes down to educating people to only use the data warehouse for reporting, but still the problem persists.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply