Dropping a Row

  • Comments posted to this topic are about the item Dropping a Row

  • It falls into the 'right tool for the right job'

    Financials, order entry, etc need to be exact. By comparison many of the largist data set operations (such as business/trend analysis) can easily afford to loose a small amount of data (often the noise in the original data can be larger than such data loss).

    If you use a highly structured data system for non rigid data, you probably are using too many resources.

    ...

    -- FORTRAN manual for Xerox Computers --

  • I understand your point of view about sometimes a row might be missed. If I compare a nuclear plant for instance with a web forum, I understand that one's highly critical and will surely impact on humans lives while the other is far from it and almost no one will notice.

    However from a customer perspective, if a company, whatever the service I'm paying or using for "lose" a row from an order or everything I asked them to, from my customer eyes it's a flaw and I'll be more eager to change company / service. I will see that as "I'm unimportant" for them as they didn't took care of my requests like they should.

    Although utopia, to my eyes, no information should be lost "by design".

    This is two ways of life fighting each other, IT and the technical limitations and expectations from humans not knowing IT stuff (for most people).

    This is a debatable subject.

    Just my two cents.

  • When it comes to cost I have been amazed at how much risk management teams are willing to accept. It has always appalled me to see poor backup plans, lack of dr and no high availability in scenarios where minor improvements and expenditures would significantly reduce risks, yet management says no, regardless of how well it is explained.

    I would bet there are quite a few places where the decision would be made to have an acceptable threshold of loss due to the cost to prevent it.

  • I'm not quite sure I agree that if Facebook lost 1 out of 1,000 posts that no one would care. I do agree that with Google most people wouldn't care if they got slightly different search results with the same terms. It's the difference between instantaneous and persisted data. Facebook posts are essentially persisted data. They get put in and are there for ever. Google searches are more transitory. With Google continually crawling the web if a page gets missed once it'll get hit and put in again later so it's not a big deal. Neither of these are critical but it's the nature of how the data is generated and used that makes the difference.

  • A side note on the subject of getting different results from Google: that is the norm. See http://www.ted.com/talks/eli_pariser_beware_online_filter_bubbles.html

  • jay holovacs (5/10/2011)


    It falls into the 'right tool for the right job'

    Financials, order entry, etc need to be exact. By comparison many of the largist data set operations (such as business/trend analysis) can easily afford to loose a small amount of data (often the noise in the original data can be larger than such data loss).

    If you use a highly structured data system for non rigid data, you probably are using too many resources.

    I think your point about noise is a key consideration. None of the people entering or interpreting the data are perfect and incorrect data is often more damaging than missing data.

  • cfradenburg (5/10/2011)


    I'm not quite sure I agree that if Facebook lost 1 out of 1,000 posts that no one would care. I do agree that with Google most people wouldn't care if they got slightly different search results with the same terms. It's the difference between instantaneous and persisted data. Facebook posts are essentially persisted data. They get put in and are there for ever. Google searches are more transitory. With Google continually crawling the web if a page gets missed once it'll get hit and put in again later so it's not a big deal. Neither of these are critical but it's the nature of how the data is generated and used that makes the difference.

    I'm not sure how many users monitor their guest book or blog posts close enough on a daily basis to notice if one (out of a couple hundred) entries from months back suddenly disappeared. I'm sure somebody eventually would, and they'd be really verklempt about it. However, there generally isn't something like a Service Level Agreement between a social media company and their users. Even if the issue were brought to the company's attention, I doubt they would respond by assigning a DBA with the task of digging though backups or transaction logs to locate the missing data.

    On the other hand, if a bank were dropping transactions, within a few hours customers would start calling in with complaints about non-posted paychecks or missing daily deposits. It would become news really fast, and the bank would be required by law to fix it.

    Regarding where NoSQL databases can properly fit in a corporate enterprise envrionment, there is a lot of non-transactional stuff like documents, images, reference data, and entity-attribute-value records that could be better offloaded from the RDMS into NoSQL. I could see the merits of a blended architecture, even in an organization like a bank.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (5/10/2011)


    I'm not sure how many users monitor their guest book or blog posts close enough on a daily basis to notice if one (out of a couple hundred) entries from months back suddenly disappeared. I'm sure somebody eventually would, and they'd be really verklempt about it.

    If it was an old one then chances are very slim that anyone would notice. I was coming from the perspective that it was the write failing meaning it is a new post instead of an old one. If it's a read that fails and it shows up after a refresh no one is going to care if it's Facebook or a blog. Well, no one should care. If it were a medical record, one bad read can have very, very bad consequences even if the data shows up on a refresh.

  • Donald Bustell (5/10/2011)


    A side note on the subject of getting different results from Google: that is the norm. See http://www.ted.com/talks/eli_pariser_beware_online_filter_bubbles.html

    Saw that after I wrote this and was very interested. A failing of algorithmic learning.

    Also: made your link hot

  • cfradenburg (5/10/2011)


    Eric M Russell (5/10/2011)


    I'm not sure how many users monitor their guest book or blog posts close enough on a daily basis to notice if one (out of a couple hundred) entries from months back suddenly disappeared. I'm sure somebody eventually would, and they'd be really verklempt about it.

    If it was an old one then chances are very slim that anyone would notice. I was coming from the perspective that it was the write failing meaning it is a new post instead of an old one. If it's a read that fails and it shows up after a refresh no one is going to care if it's Facebook or a blog. Well, no one should care. If it were a medical record, one bad read can have very, very bad consequences even if the data shows up on a refresh.

    The issue is with some systems (not saying Facebook is one), that you might write an update on Node 1, and you see the update. However Node 1 is buried, and before it can update node 2 and node 3, it fails. when it's rebuilt/recovered. your update is gone. You might not notice, or if you do, do you stop using the service? You might, but depending on your investment in the service, you might not. You might be more careful, or chalk it up to a random glitch in the matrix.

    However if one of my deposits failed at an ATM because it wasn't fully hardened in the entire system, that's bad.

    Customers don't want to ever lose their data, but it happens and we accept some minor glitches.

  • My takeaway from that talk was not that it was a failure of algorithmic learning but rather it is an algorithm designed to predict my interests or point of view; the problem being that I will no longer see points of view not aligned with my own, thereby creating a "mind-narrowing" experience.

  • AFAIK none of my SSC blog posts have ever mysteriously "vanished". Once in SQL, forever in SQL 🙂

  • jay holovacs (5/10/2011)


    It falls into the 'right tool for the right job'

    I tend to agree

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • cfradenburg (5/10/2011)


    Eric M Russell (5/10/2011)


    I'm not sure how many users monitor their guest book or blog posts close enough on a daily basis to notice if one (out of a couple hundred) entries from months back suddenly disappeared. I'm sure somebody eventually would, and they'd be really verklempt about it.

    If it was an old one then chances are very slim that anyone would notice. I was coming from the perspective that it was the write failing meaning it is a new post instead of an old one. If it's a read that fails and it shows up after a refresh no one is going to care if it's Facebook or a blog. Well, no one should care. If it were a medical record, one bad read can have very, very bad consequences even if the data shows up on a refresh.

    It would depend on the data that was lost.

    If we're talking blogs, for example, there are entries on some that I reference pretty regularly, even though they are far from recent. For example, just yesterday, I had four devs read an older article on Gail Shaw's blog. If that entry disappeared, it would be noticed, and it would matter.

    How about if older movies started disappearing from IMDB?

    So, it depends on the data. I guess the point is, if your social site of choice is using a non-ACID data repository, don't put any data into it that would matter if it suddenly goes away. "Matter" is always subjective, so pick your fights on that one.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply