Code smells versus transgressions

  • Yes, Gail is right, if, from the database perspective, the xml or JSON represents an unsplittable data item, then you can legitimately store it as such. Sure, it could be a serialised object from the application perspective, but if we at the database level never need to shred it to get at the data, or test its equivalence, then who cares? This isn't even a special problem with the XML data type: you can even pack several bitwise flags into integers if the devil whispers in your ear. The sin or smell is, surely, in packing several items of data into a single data type.

    Best wishes,
    Phil Factor

  • Depending on the context of how it's used, seeing how often it's diced and fudged with T-SQL functions, even the common DateTime type could be seen as non-atomic.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (4/5/2016)


    So here comes the question: if you were do decide on SQL practices that were always wrong, what would you include?

    The following practices are almost always wrong.

    1. A table without a primary key.

    2. VarChar column containing multi-valued string values.

    3. VarChar or Char column containing integer or date/time values.

    4. Multiple indexes covering an identical column set in the same order.

    5. Joining multiple remote tables using 4 part naming convention.

    6. Adding ORDER BY clause to the SELECT statement of a view.

    OK, it's a Friday afternoon, past time for me to leave, and I'm investigating an emergency service request where an ETL process that normally takes a few minutes to complete has now been running since noon and into the evening. It's been stuck on the same SELECT statement for 3+ hours. It's joining two remote linked tables using four part naming convention. Apparently a developer made a minor modification to return a couple of new columns in the resultset.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Has Jeff just been caught out by his old favourite "it depends"?

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • Phil Factor (4/8/2016)


    The sin or smell is, surely, in packing several items of data into a single data type.

    Precisely.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Gary Varga (4/8/2016)


    Has Jeff just been caught out by his old favourite "it depends"?

    Heh... I hope not because I'm pretty much against storing it even as a "single unit" blob as Gail suggests. The problem is the tag bloat in the pipe (lots of folks forget about "the pipe"), on the disk, on the backups and for any restore.

    Unfortunately, I can't stop all of it. Despite my best efforts and attempts at education, I'm just one person and the fellow I work for has the final say. A good number of our larger performance issues can be traced back directly to both types of XML storage.

    Actually, I should be enthusiastically endorsing even over use of XML because it has spurred manufacturers to make a whole lot of things bigger, fast, and less expensive. You've just got to love how much memory we have and things like the fact that you can buy a 256GB USB mini "pen" drive the size of a mouse transmitter for less than $11USD including shipping 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • For those that use the "single unit" style of XML in their databases and want to use Stretch Databases to store historical data, you'll have to pick some other data type other than XML. See the following...

    https://msdn.microsoft.com/en-us/library/mt605114.aspx

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Storing XML is not ideal, to be sure. But a particularly frustrating variant of it is when the structure of the XML being stored turns out to be identical in every row of your table. The excuse (sorry "advantage") for XML was that it allowed you to store "semi-structured" data without having to continually mess with your schema. But when the XML getting stored ends up being completely rigidly structured then it might just as well have been stored normalised.

    I don't think you'll find a worse odium than the following (which I'm told has really happened):

    Paper forms were developed (and pre-populated with some fields) using SSRS, printed off, completed in pen (what would have translated into perhaps another hundred bytes of data per row), and then scanned back in and stored as jpegs. No matter how much we might dislike delimited columns and XML, at least with them we have some hope of getting the data back out.

    ...One of the symptoms of an approaching nervous breakdown is the belief that ones work is terribly important.... Bertrand Russell

  • GPO (4/8/2016)


    Storing XML is not ideal, to be sure. But a particularly frustrating variant of it is when the structure of the XML being stored turns out to be identical in every row of your table. The excuse (sorry "advantage") for XML was that it allowed you to store "semi-structured" data without having to continually mess with your schema. But when the XML getting stored ends up being completely rigidly structured then it might just as well have been stored normalised.

    I don't think you'll find a worse odium than the following (which I'm told has really happened):

    Paper forms were developed (and pre-populated with some fields) using SSRS, printed off, completed in pen (what would have translated into perhaps another hundred bytes of data per row), and then scanned back in and stored as jpegs. No matter how much we might dislike delimited columns and XML, at least with them we have some hope of getting the data back out.

    What does the travesty of storing scanned forms as jpeg images have to do with storing delimited/CML data?

    Shifting gears back to that subject, why wouldn't you convert delimited/XML data back to normalized data for final storage? Even with what Gail suggested (use the whole XML only), I still try to normalize the data because it's not only more efficient for storage (especially in the case of XML) and better for the "pipe" when the data is called upon, but it's already shredded and ready for use for other eventualities, as well.

    About the only thing that I've officially "caved in" on the subject of stored XML at work is storing configuration information for certain front-end functionality and that's only because 1) it is only used as "the whole thing", as Gail suggested, 2) the tables where such configuration is stored are very small and will stay that way, and 3) the data isn't called upon hundreds of thousands of times per day.

    Unfortunately, I've not been able to prevent all the "really bad types" (is used by queries) of delimited/XML data because of some legacy code and the occasional insistence of the Enterprise Architect. In every case there, resource usage and performance are both serious problems compared to shredding and storing the data as normalized data. Heh... and don't get me started on my "love" for XML indexes. 😉

    Even using XML as a form of "audit" table is a terrible idea. The tag bloat alone makes the tables larger than even a "columnar" EAV table and related indexes. If you want to use something like Stretch databases, forget it because it won't allow XML.

    Anyway, I'm not trying to convert anyone into my way of thinking on the subject of XML or any other tagged data. I'm just trying to give folks food for thought because, contrary to popular belief, cpu time, disk storage, and the "pipe" are not as cheap as many would have us believe especially when it comes to things like DR.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • GPO (4/8/2016)


    ...The excuse (sorry "advantage") for XML was that it allowed you to store "semi-structured" data without having to continually mess with your schema. But when the XML getting stored ends up being completely rigidly structured then it might just as well have been stored normalised...

    Even though I have heard of people touting it as an advantage, I have yet to see XML data structures treated as having an extendible structure. This has always meant recompilation of all apps sharing a given extended schema.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • Heh... ok... I take it back. I was asked to troubleshoot some problems in some code I've never seen before and I now even hate XML used for configuration settings. There's nothing to ensure that a tag hasn't been duplicated or misspelled and it takes just as much brew-ha-ha to extract the settings from the XML as it does an EAV table. The XML is also a hell of a lot more difficult for a human to read than the return from an EAV. I see no justification nor any advantage for such a thing.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/10/2016)


    Heh... ok... I take it back. I was asked to troubleshoot some problems in some code I've never seen before and I now even hate XML used for configuration settings. There's nothing to ensure that a tag hasn't been duplicated or misspelled and it takes just as much brew-ha-ha to extract the settings from the XML as it does an EAV table. The XML is also a hell of a lot more difficult for a human to read than the return from an EAV. I see no justification nor any advantage for such a thing.

    The more I have to deal with XML the more it feels like the work from a dissertation that impresses but is not suitable for the real world. I guess it was obvious from the start if you think about the need for SAX parsers.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • Did some test many moons ago on xml vs the shredding it once into a structure and then rebuilding it. Ballpark was when there were 1000+ rows being queried (even with xml indexes) normal tables and indexes would out perform. Don't know where the cutover is now a days. Of course still faster sending a xml stream rather than building it from the base tables.

    And yes I have seen xml stored in varchar(max) and the requirements made that perfect sense. Normally I would say that it was the code stink of the abomination storing xml as varchar(max), but not always.

  • Jeff Moden (4/10/2016)


    Heh... ok... I take it back. I was asked to troubleshoot some problems in some code I've never seen before and I now even hate XML used for configuration settings. There's nothing to ensure that a tag hasn't been duplicated or misspelled and it takes just as much brew-ha-ha to extract the settings from the XML as it does an EAV table. The XML is also a hell of a lot more difficult for a human to read than the return from an EAV. I see no justification nor any advantage for such a thing.

    This seems like kind of a symptom of application developers just wanting an easy to parse thing and for that XML works great in theory in their dev environment. Then it hits a production setting and you have a huge XML config file.... and have fun....

  • ZZartin (4/11/2016)


    Jeff Moden (4/10/2016)


    Heh... ok... I take it back. I was asked to troubleshoot some problems in some code I've never seen before and I now even hate XML used for configuration settings. There's nothing to ensure that a tag hasn't been duplicated or misspelled and it takes just as much brew-ha-ha to extract the settings from the XML as it does an EAV table. The XML is also a hell of a lot more difficult for a human to read than the return from an EAV. I see no justification nor any advantage for such a thing.

    This seems like kind of a symptom of application developers just wanting an easy to parse thing and for that XML works great in theory in their dev environment. Then it hits a production setting and you have a huge XML config file.... and have fun....

    Usually quite happy to take it on us developers but this is a vendor driven trend.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

Viewing 15 posts - 31 through 44 (of 44 total)

You must be logged in to reply to this topic. Login to reply