• Unfortunately, I firmly believe from what I saw during my years in IT is that 'big data' is often the result of system design that is incomplete and lacking in foresight.  With hardware less and less expensive and at the same time more and more powerful and fast, it was easy to overlook the eventual need to dispose of, if not archive the volume.  We must always bear in mind that the older data becomes, the less value it has for analysis.  A truly good system design will include an understanding of how best to summarize 'data' into meaningful 'information' without chewing up the terabytes of storage needed to keep the raw formats. 

    For instance, I have a life insurance policy that was taken on my life in 1943 for a face value of $1000.00  The annual premium was, and is, $10.44 per year.  Each year the company has paid a dividend deposit, and paid interest on the deposit. 

    Now, on my side, it's not a huge deal to maintain the annual detail for the 74 year history, if I so desire.  Most likely I will never actually look at this information but will continue to look at the CURRENT dividend, interest, and total value.  If I wish, I am free to store whatever data I desire.

    However, on the  insurer's side, it makes absolutely no sense for them to have accumulated detail for 74 premium payments, 74 annual dividends, and 74 years of interest payments on deposits - in addition to the likely thousands and thousands of additional policies they have issued, especially those which are no longer in effect.  I would expect that if I contacted the company, they would surely tell me they no longer have the historic data.  While there may be analytical value in the effective date, insured's birthdate, from which age may be calculated, the premium payment dates, dividend, and interest detail is of little if any value.

    While this is obvious to you as you look at this explanation now, you would probably be shocked to discover how many times in my IT years I found that there was absolutely no thought given to exactly this situation.  This then leads to the scenario in which IT has to try to explain to users why they now have to make detail disappear that 'we have always had' and that 'we might need', even though it has not been looked at in decades.  Maybe most of you don't remember the old variable-length flat file record where it was common design to populate repeating data into repeating segments within a single record.  Imagine my current record consisting of a base record and 74 repeated historical segments.  Yes, we didn't have relational technology in those days.

    In summary, we must be sure that 'big data' is not in reality 'no policy' (please pardon the pun).

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )