The Consistency Debate

  • Comments posted to this topic are about the item The Consistency Debate

  • It depends entirely on the business requirements.

    As an example, I tried using change data capture to represent historical changes to tables.

    However one of the business requirements was that if a current version of a row is not yet approved then the last approved version of the row should appear in the view.

    As CDC works asynchronously through the log the historical data can be inconsistent (has the agent run yet?) and you sometimes get the wrong answer (which is ALWAYS unacceptable).

    So throw away CDC and base the history on triggers.

    I would welcome some examples of business requirements where inconsistency is acceptable.

  • The classic scenario is when two people are editing the same record. This is known to be a common but infrequent scenario in most systems. This is less likely in where there is one system interface, e.g. call center software, however even then you could get both partners calling at a similar time (only needs to be similar in distributed systems, and to suck an egg...).

    In my experience, users would find unacceptable the blind overwriting of the last edit or the blocking of the latest update. Sometimes they have to choose between the two because the additional cost of coding the UI around the issue is not considered worthwhile. It is a really simple problem but the solution of a record version merge is complicated to do from a UI perspective especially when you cannot guarantee that you can automatically accept two non-clashing changes due to the relationship with other data in the set.

    Distribution of read only data is rarely a problem from users point of view in my experience. It is the manipulation of that data that is key.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • Stock trading often runs on milliseconds (the occasional addition of a leap second plays havoc with the Pacific markets).

    In another area, I've found that last second sniping is a negative aspect in ebay style auctions (certainly takes the fun out of it). In my opinion a random (publicized by the auction site) delay of a couple of seconds on all bids. This would have the effect of encouraging bidders to actually bid their highest bid instead of trying to slip it within the last second.

    ...

    -- FORTRAN manual for Xerox Computers --

  • Why do people like giving the example of Facebook as an application where consistency is not required? Maybe not by the company themselves, because they (anybody?) hasn't yet worked out a way to make very large distributed databases ACID compliant.* But I as a user have more than once found it frustrating that a post I make on Facebook is not visible to my friends until a few days later, or vice versa. I have in the past wished their database was consistent. Not eventually consistent, but consistent now.

    * The NoSQL movement only exists because we haven't yet worked this out. Once we do...

    Hakim Ali

    Hakim Ali
    www.sqlzen.com

  • In the case of Google and other search engine companies, I don't think they're talking about transactional consistency but rather allowing for some marginal degree of inconsistency across distributed data views. In other words, node #7882 may be five minutes behind node #0001 because it's last in line for the hourly record change rollout.

    Really, each search result is practically a unique publication of data that only one of us sees at one point in time. In addition to having our resultset pulled from a random node, what we see is also filtered by our geographic location and those little cookies that marketing companies use to target adds and track your web traffic.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • It's a principle I've been interested in for a while, because I live in a very remote area (Central Australia) with unreliable and often very slow internet (eg. 35 kbit/s satellite connections some places).

    I've been saying for years that networks of peer-to-peer systems with well-defined update regimes are a far better proposition to deal with intermittent or slow networks, and I suppose that once you're as large and distributed as Facebook, you could just about regard ALL networks as slow and/or intermittent, given the no doubt massive flow of data.

    However, no one has gone this way as the mania for centralisation and web delivery has flowed, rightly or wrongly, because of the far lower per-client development cost of doing it this way - at least for small to medium size projects.

    In Australia, if people bleat loud enough about 'broadband in the bush' to service these bandwidth-hungry applications, something will eventually happen, but this simply isn't a choice throughout much of the developing world.

    While initial development costs are substantially higher, distributed applications gain a great deal of flexibility and scalability - witness the success of two of the major peer-to-peer systems: Skype and BitTorrent.

    Depending on the need for distributed data, it could well be the case that time spent on developing good peer-to-peer data synchronisation processes might prove a key advantage for a particular application.

    In particular, I think we need a useful framework for semi-automating complex business-rule based synchronisation. The automated data synchronisation offered by database servers is suitable for some applications, but quite unsuitable for others - in particular where only a subset of rows in a table are to be shared.

    This could be a key feature that will propel database-based distributed applications forwards.

  • In truth all data is obsolete. The moment a printed report leave a printer (actually before) it is by definition no longer valid.

    A reasonable approach (in many cases) is to acknowledge this and timestamp the data with a guarantee time... that is the data is correct as of the time indiated, rather than as of RIGHT NOW (which is never really true anyhow)

    ...

    -- FORTRAN manual for Xerox Computers --

  • william.sisson (3/13/2012)


    I would welcome some examples of business requirements where inconsistency is acceptable.

    It's not inconsistency is acceptable, it's that there are delays and changes to data that make it inconsistent from moment to moment.

    If I run a sales report of activity up to now, does it really matter if I have all activity from midnight to this second? By the time I process the data mentally, it will have changed. So does it make sense to spend lots of effort to run that report off the production system when I could run it off a secondary system that might not have data to 9:37am, but might have it to 9:27am?

    The consistency among all nodes, when you scale out, is what I was questioning. In some cases it really does matter. In some, not so much.

    Another one. If I have remote ATMs and I upload balances to it every night. Someone comes to withdraw $40 from it. Do I need to go get the last minute balance as of that second? Or can I take last night's balance, see it's $1200, and make an assumption that I can handle $40 and deal with the consequences later?

    In many cases, you might make that consistency tradeoff, not within a database, but across multiple databases.

  • hakim.ali (3/13/2012)


    Why do people like giving the example of Facebook as an application where consistency is not required? Maybe not by the company themselves, because they (anybody?) hasn't yet worked out a way to make very large distributed databases ACID compliant.*

    We have worked this out. The problem is that it's unacceptably slow. You can't have data magically appear in all places at once. There's always some latency.

    But I as a user have more than once found it frustrating that a post I make on Facebook is not visible to my friends until a few days later, or vice versa. I have in the past wished their database was consistent. Not eventually consistent, but consistent now.

    Days? Never seen that, and not sure it matters. With all the sorting and personalization people can make, you can't guarantee anyone sees your posts when you make it, or even within 5 minutes.

  • Steve Jones - SSC Editor (3/13/2012)


    ...

    Another one. If I have remote ATMs and I upload balances to it every night. Someone comes to withdraw $40 from it. Do I need to go get the last minute balance as of that second? Or can I take last night's balance, see it's $1200, and make an assumption that I can handle $40 and deal with the consequences later?

    ...

    I don't think some banks are too concerned about the consequences of their customers overdrawing their accounts by $40. Afterall, the customer will get charged for the $40 overdraft... plus a $35 overdraft fee... plus another $35 fee for the $2.00 cup of coffee ...

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • SQL builds on mathematical set theory. NoSQL has no theoretical basis. That is neither good nor bad, but it is a major difference. Transactional consistency is required by SQL to support constraints. If you have been a DBA for several years, you almost certainly have come across applications that use the database just as a bunch of tables. Though these application always do their best to maintain consistency, it seems inevitable that after a while some litter will spoil the data, like an employee address for a nonexistent employee.

    At least with NoSQL we can blame the computer, just like in the old days: Sorry boss, I really saved that very important letter, but somehow it disappeared. Do you know beforehand how important the information is that your application handles? Haven't we seen the same problems regarding data security with developers (and DBAs) underestimating the vulnerability of really important data? And how do you ensure security if you cannot ensure consistency?

    It is important to distinguish between transactional data and 'additional' data. But who can draw the line? Is a file linked to a database record less important than the record itself? Maybe this file is the legal grounds for a substantial financial transaction. Should each and every save action by the user be accompanied by a warning: your attachment might not be saved?

    There is a group of applications where NoSQL really comes to the rescue. But it will not make SQL obsolete for line-of-business applications. Accountant archive their documents on a regular basis, but today most applications do not have any provision to archive some of the non-current data to another storage medium. Let us hope that fundamental research on distributed data storage will provide solutions to scale out data storage whithout losing all guarantees on the consistency of the stored data. Until then, if your end users expect consistency, SQL might still be your only option.

  • I think local, up-to-the-hour, or up-to-the-day data is probably good enough for a lot of things. Like ads on websites (to name one with a lot of money attached to it).

    So, in those cases, the editorial is exactly right. Distributing a "recent enough" copy of the data in a read-only, optimized-for-Selects format, makes all kinds of sense.

    In others, it needs to be real-time. I think banking is actually a poor example of this, because most people don't need their balance to be up-to-the-second. On the other hand, there are medical data applications where it could be life or death. "Has Mr. Smith not had his dose of X yet, or is the database a few minutes out of date?" could be a question that kills someone, either through a missed dose of something critical, or a double-dose of something that turns lethal at higher blood concentrations. And so on.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Pragmatically, with so much cr*p out there to wade through, which of us would know that Google's search hits stale data? It'll harvest a ton of manure, a heap of which we'll wade through to find the nugget we hope they'll dish... If in doubt, hit "search" again! Maybe hitting that button repeatedly won't be madness - you might just get a different result:-D

    If FB was a life-dependant app, I'd be concerned - it ain't!

    Now, if BofA can't tell me up to the second what my balance is, I am rightfully concerned - it's my MONEY!

    It depends - as usual...

  • GSquared (3/13/2012)


    I think local, up-to-the-hour, or up-to-the-day data is probably good enough for a lot of things. Like ads on websites (to name one with a lot of money attached to it).

    So, in those cases, the editorial is exactly right. Distributing a "recent enough" copy of the data in a read-only, optimized-for-Selects format, makes all kinds of sense.

    In others, it needs to be real-time. I think banking is actually a poor example of this, because most people don't need their balance to be up-to-the-second. On the other hand, there are medical data applications where it could be life or death. "Has Mr. Smith not had his dose of X yet, or is the database a few minutes out of date?" could be a question that kills someone, either through a missed dose of something critical, or a double-dose of something that turns lethal at higher blood concentrations. And so on.

    Having worked in both the financial and health care industries, I can tell you it's the financial sector that is much less tolerant of data errors and inconsistencies. In health care, bad data is just accepted with a shrug of the shoulders.

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply