Database Nonsense - Unstructured Data

  • Comments posted to this topic are about the item Database Nonsense - Unstructured Data

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • Makes sense...but we would need an alternative term as Unstructured Data is a term that defines something. It may hint at more so be inaccurate but revocation can only occur after replacement.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • Really not understanding the point of the article.

    I used to have similar thoughts, but at the end of the day, if you see a massive crowd of people walking in all directions in the street, you can honestly say there is no structure in what they are doing.

    The definition of structure is, "construct or arrange according to a plan; give a pattern or organization to"

    In my example, their could certainly be some level of organization. They all may be in that location for the same reason that was organized by something or someone. There even may be some global agenda on what they are doing. But overall, there is no structure to how they are flowing and ultimately where they may end up.

    In the RDBMS world, we would try to step into that crowd and try to better organize their flow. We may group them up by attributes, define lines to walk in for those groups of people and maybe even give them set schedules on when to walk. That's the structure we are missing from unstructured data.

    All that work to apply structure is overhead. It costs the business a lot of money to hire someone like you to provide that structure to the massive crowd. The question this article should be asking and hopefully answering is why do we even need you to structure the crowd? Why can't we just let them flow and pick out what we need when we need it?

  • To coin another phrase: Much Ado About Nothing. It's just a phrase that some could mean that the data appears to be a mess and you can't make any sense out of it. I say let it be (oh another phrase).

  • When we speak of no-schema databases and analytical tools, I assume we're referring to a data store essentially intended for staging and exploration. It truly is a bucket of 1s and 0s as far as the physical layer is concerned, and the developer transposes their own semantic meaning and structure on top of at query runtime, perhaps tweaking and re-running their query 100x before getting it right. However, once you've identified the data you need and a working semantic model, the no-schema nature of this particular database just gets in the way. At this point it would make sense to ETL a useful subset of the data and it's semantic model into into a relational database.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • The distinction would be between data and information. Information is far more valuable, because that is what is left after the irrelevant and wrong is removed. Lots of data <> lots of information.

    Think of carbon paper (remember that stuff). A plain sheet of carbon paper contains a page of every book ever written or ever will be written ... if only the 'unneeded' ink particles could be removed.

    ...

    -- FORTRAN manual for Xerox Computers --

  • this wasn't exactly my favorite editorial of all time.

  • I, for one, enjoyed the editorial. As usual, I agree in part and disagreed in others.

    Also, let's not discourage editorials from different authors. Surely we will enjoy the variety and will be forced "This one was originally posted in 1897 and we repeat it today because Steve's away" less often.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • Gary Varga (10/25/2016)


    Makes sense...but we would need an alternative term as Unstructured Data is a term that defines something. It may hint at more so be inaccurate but revocation can only occur after replacement.

    The term

    Complexly Structured Data

    suggested in the editorial works well for me.

    412-977-3526 call/text

  • Well thought out article! And many thanks for writing.

    But sadly, we have all seen crazy, questionable labels used in businesses and organizations.

    Perhaps the best line was: "Yes, you can, but the real question is; should you?"

    The more you are prepared, the less you need it.

  • I think the editorial was good. It put into words some of the thoughts I’ve had for a while, especially in the area of schemaless schema (EAV). If you want to debate that subject, and the issues with using EAV, go read Joe Celko’s stuff on the subject. This editorial, in many ways, parallels that subject.

    As to a different name in place of “unstructured data”, I would suggest something with “normalized” in it. “non-normal variant schema”, “askew normal schema” or something else. Maybe toss in “meta”, that’s a popular buzzword these days.

    Beer's Law: Absolutum obsoletum
    "if it works it's out-of-date"

  • To be honest, the EAV structure has its place and if done correctly can be effective.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply