Best Practice Help - Table Structures

  • Hi there, I am being asked to provide a list of all the reasons that we should create actual normalized tables with data fields vice creating one table with a blob column that stores all sorts of data elements in one field as XML. I can’t believe I actually have to do this but wanted some help with my list to make sure I am not forgetting anything or off base. This is for a system that is heavy on database logic and data manipulation. It is also a transactional system that pushes data to other systems and ingests data.

    Pros of normalized rational tables:

    Data integrity

    Data validation capabilities

    Querability

    Query performance

    ACID

    System to system interfaces (having like data)

    Ease of understanding

    Easier object-to-data mapping

    Some of my topics may overlap each other but I plan on fleshing it all out and describing each area better. Just wanted to get my thoughts down and then have some community input. I don’t have any other dbas to bounce ideas off of ... it’s just me! 🙂

  • Wow, that's a huge question. I can take a small part of it - the normalization part, and maybe only part of that. One thing normalization does is make your database smaller, because you don't store repeated information. Instead you include foreign keys to other tables. A properly normalized database will require some documentation, but nothing like one with little or unclear normalization. And querying monolithic tables is a headache. In my case, it was a cancer database, and there are literally thousands of diagnoses that are classified, so having a column for each diagnosis was an epic fail. What do you do if you need to add a new diagnosis? Add more columns?
    Instead of spending ONE day writing maybe 20 easy queries, I was at the place for 8 months. At first I was just querying existing stuff (and I had to use dynamic SQL) to just do frequency counts. Then I started from scratch, and built a properly normalized database as a starting point. Eventually I worked my way out of a job, because it was pretty easy to query everything. A little documentation with some good naming conventions (Hungarian notation is NOT a good naming convention for fields in a table.) go a long way toward making a database easy to understand... and that's only the tip of the iceberg.

  • Good answer above. For querying and manipulation of updates, normalized tables are better. Trying to update a set of values in multiple XML documents is slow and problematic. Is this one transaction, which could introduce locking issues, or multiple ones, which cause integrity issues?

    The effort to  deserialize XML isn't simple, so it's worth discussion. If we tend to query singletons and work with them often, then XML might be OK. If we work in batches, it's probably not.

  • Yikes!!! You are the DBA and you have to justify your design ideas to people who are not in that position? That is tough for sure. This is like an auto mechanic having to justify why he wants his shop laid out a certain way to an accountant. Sounds like the dev group is pushing hard for an EAV style database where they can just add data to the config table and all the data is stored in one big ugly nasty table. When I hear of storing all the data in a single table I have to add a reference to this article. It goes into detail about just how truly god awful a single table for data can be on a large system. All I can say is do everything in your power to avoid this nightmare. https://www.red-gate.com/simple-talk/opinion/opinion-pieces/bad-carma/

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • amy26 - Friday, September 7, 2018 4:17 PM

    Hi there, I am being asked to provide a list of all the reasons that we should create actual normalized tables with data fields vice creating one table with a blob column that stores all sorts of data elements in one field as XML. I can’t believe I actually have to do this but wanted some help with my list to make sure I am not forgetting anything or off base. This is for a system that is heavy on database logic and data manipulation. It is also a transactional system that pushes data to other systems and ingests data.Pros of normalized rational tables:Data integrity Data validation capabilities Querability Query performanceACIDSystem to system interfaces (having like data)Ease of understanding Easier object-to-data mappingSome of my topics may overlap each other but I plan on fleshing it all out and describing each area better. Just wanted to get my thoughts down and then have some community input. I don’t have any other dbas to bounce ideas off of ... it’s just me! 🙂

    You can argue such cases until you're Blue in the face but you can't argue with demonstrable code.  You need to write some code, build some test data, and demonstrate the usage and performance differences or they're just going to continue to argue.  Since there are more of them than you, they will win if you only argue.

    I hate to bring this up but, being female, you're probably also fighting the "Good ol' boys club" in this matter and they're probably also a bunch of folks that don't actually know what they're doing with data.  Take the pool cues away from them (old billiards term for arguments) and stuff demonstrable code down their throats.

    If you can get them to provide you with the XML data they propose, even better.  There's nothing like beating someone with their own bat. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • p.s.  When you do the comparison, don't forget to measure disk space, which will also translate to memory usage.  Once you have an example of their XML data and you've done the conversion, compare the sizes to emphasize the horror of tag bloat.  When it come to performance, they'll also remind you that there are XML indexes.  Be sure to include those in your measurements, as well, because they're a duplication of data and, behind the scenes, the data is stored in a type of "table" known as indexes and they should simply eliminate the middle man and store the data in tables.

    They're probably also unaware that XML is rendered out as a rather bloated "Adjacency List Hierarchical Structure" behind the scenes and all the indexing does is materialize that data as indexes.

    You could also bring up the DRI nature of things in that you can't auto-magically assign DRI to XML.  That would all have to be done by the front-end code which also means extra-development time in recreating the wheel that tables with FKs already know how to do better than they'll ever be able to do and faster because they'll have to make an extra round trip to the server to do it.

    Again, you'll need to prove all of that with code because "A man forced against his will is of the same opinion still". ;D

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • p.p.s  Also remember that dates with times stored in XML will take quite a bit more space than an 8 byte datetime and that just a date will take more space than a DATE datatype, not to mention all of the conversions that will be necessary to accomplish any temporal math.  The same holds true for integers.  Anything over 4 digits is going to take more space and will need to be converted prior to doing any math.  If they only have things up to 3 digits, that will still be larger than a smallint and the same math conversion problems will exist.  And all of that doesn't include the wanton tag bloat.

    Again, that should all come out in the demonstrable code.  Have THEM write the XML code and you write the T-SQL opponent code.

    Then, because they seem hell bent on making this mistake and likely won't back down if for no other reason than to save-face (especially because you're female) and be "right" according to them, prepare for them to ignore your proofs by saying something stupid like "Well!  That was just one example.  Can you prove that it won't work for every example"?  Or, worse yet, resorting to something even more stupid by accusing you of being emotional and that's a part of the reason why you don't want to argue with them.  They cannot accuse your good code demonstration of being emotional.  It's why even I, as a guy, have to resort to only code because stupid people will resort to all sorts of ad hominem attacks, especially if they've run out of defenses and want to save-face in front of their buddies.  It's a crowd mentality that's difficult to fight even for guys.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Couple o' more thoughts.  If your demonstrable code for the normalized methodology beats the XML, and I have no doubt that it will, and they play the "Well, can you prove that it will beat the XML in all instances?" card, don't go crazy.  Simply state that the normalized method just beat an XML method in at least one case and then ask them if they can prove that the XML method will win in all other cases.  😀

    Also, don't forget to measure the extra network traffic that the XML method can generate in both directions.  I've found that, because of the tag bloat, it's usually in the area of 8 to 16 times more network traffic.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Monday, September 10, 2018 9:43 AM

    p.s.  When you do the comparison, don't forget to measure disk space, which will also translate to memory usage.  Once you have an example of their XML data and you've done the conversion, compare the sizes to emphasize the horror of tag bloat.  When it come to performance, they'll also remind you that there are XML indexes.  Be sure to include those in your measurements, as well, because they're a duplication of data and, behind the scenes, the data is stored in a type of "table" known as indexes and they should simply eliminate the middle man and store the data in tables.

    They're probably also unaware that XML is rendered out as a rather bloated "Adjacency List Hierarchical Structure" behind the scenes and all the indexing does is materialize that data as indexes.

    You could also bring up the DRI nature of things in that you can't auto-magically assign DRI to XML.  That would all have to be done by the front-end code which also means extra-development time in recreating the wheel that tables with FKs already know how to do better than they'll ever be able to do and faster because they'll have to make an extra round trip to the server to do it.

    Again, you'll need to prove all of that with code because "A man forced against his will is of the same opinion still". ;D

    DRI?
    https://acronyms.thefreedictionary.com/DRI

  • Joe Torre - Monday, September 10, 2018 3:13 PM

    Pretty sure he was meaning DRY, aka Don't Repeat Yourself.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • So, thank you everyone for all the insightful help. I am appalled that I was overruled. My contract just got switched and after 20 years we are a sub now instead of the prime. We are trying to redesign the system and trying to smash our entire system into the framework of this other existing system that does some similar things to ours. So just because their simple process dumps everything into this one table, all of our stuff should be that way too.

    I tried to argue technical reasons and how a best practice method would not be to do it that way and the project manager lady practically laughed at me and said not necessarily. Which yes maybe not for another system but for ours, yes it is the best way to do it. I tried to make my points about performance and storage and she flat out told me I don’t want to hear anything about performance or that stuff just tell me why it technically doesn’t do what you need it to do. It was basically them ganging up on us to impose their will.

    It’s just laziness in my opinion. They also want to remove like all logic from the database and do hardly anything in it. Which I don’t get ... why purchase an expensive dbms just to use it for table storage. And do it poorly!!!!

    I’m so livid, I really want my company to reassign me cause I just can’t be part of thr madness anymore.

  • Sean Lange - Monday, September 10, 2018 3:35 PM

    Joe Torre - Monday, September 10, 2018 3:13 PM

    Jeff Moden - Monday, September 10, 2018 9:43 AM

    p.s.  When you do the comparison, don't forget to measure disk space, which will also translate to memory usage.  Once you have an example of their XML data and you've done the conversion, compare the sizes to emphasize the horror of tag bloat.  When it come to performance, they'll also remind you that there are XML indexes.  Be sure to include those in your measurements, as well, because they're a duplication of data and, behind the scenes, the data is stored in a type of "table" known as indexes and they should simply eliminate the middle man and store the data in tables.

    They're probably also unaware that XML is rendered out as a rather bloated "Adjacency List Hierarchical Structure" behind the scenes and all the indexing does is materialize that data as indexes.

    You could also bring up the DRI nature of things in that you can't auto-magically assign DRI to XML.  That would all have to be done by the front-end code which also means extra-development time in recreating the wheel that tables with FKs already know how to do better than they'll ever be able to do and faster because they'll have to make an extra round trip to the server to do it.

    Again, you'll need to prove all of that with code because "A man forced against his will is of the same opinion still". ;D

    DRI?
    https://acronyms.thefreedictionary.com/DRI

    Pretty sure he was meaning DRY, aka Don't Repeat Yourself.

    DRI = Declared Referential Integrity.  You know... the stuff that a lot of people that don't understand databases hate like PKs, FKs, column constraints, etc.

    Apologies for the abbreviation.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • amy26 - Monday, September 10, 2018 3:56 PM

    So, thank you everyone for all the insightful help. I am appalled that I was overruled. My contract just got switched and after 20 years we are a sub now instead of the prime. We are trying to redesign the system and trying to smash our entire system into the framework of this other existing system that does some similar things to ours. So just because their simple process dumps everything into this one table, all of our stuff should be that way too.I tried to argue technical reasons and how a best practice method would not be to do it that way and the project manager lady practically laughed at me and said not necessarily. Which yes maybe not for another system but for ours, yes it is the best way to do it. I tried to make my points about performance and storage and she flat out told me I don’t want to hear anything about performance or that stuff just tell me why it technically doesn’t do what you need it to do. It was basically them ganging up on us to impose their will.It’s just laziness in my opinion. They also want to remove like all logic from the database and do hardly anything in it. Which I don’t get ... why purchase an expensive dbms just to use it for table storage. And do it poorly!!!!I’m so livid, I really want my company to reassign me cause I just can’t be part of thr madness anymore.

    I know exactly how you feel.  I worked as a consultant for a small company that really did need the help.  In the process of doing things, I noticed that they used SSN as the clear text PK and FK for just about all of their tables.  I made the manager aware of the problem with all that and she took it to the compliance manager.  The compliance manager scoffed and said everything was fine.  When I suggested that she (the compliance manager) should enter her SSN and other data into the system to demonstrate her confidence, she made a thousand excuses as to why that she didn't need to do that while "knocking her ring" the whole time.

    Don't be livid.  You can't teach people that think they know it all a bloody thing.  Rather, give them the opportunity to fail.  Do it exactly the way they ask and make sure that you and your good company not only keep all directive documents, but also insist on an NVO policy of direction (NVO = No Verbal Orders) because you know they're going to have problems and you know they're going to blame everyone except themselves.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • amy26 - Monday, September 10, 2018 3:56 PM

    So, thank you everyone for all the insightful help. I am appalled that I was overruled. My contract just got switched and after 20 years we are a sub now instead of the prime. We are trying to redesign the system and trying to smash our entire system into the framework of this other existing system that does some similar things to ours. So just because their simple process dumps everything into this one table, all of our stuff should be that way too.I tried to argue technical reasons and how a best practice method would not be to do it that way and the project manager lady practically laughed at me and said not necessarily. Which yes maybe not for another system but for ours, yes it is the best way to do it. I tried to make my points about performance and storage and she flat out told me I don’t want to hear anything about performance or that stuff just tell me why it technically doesn’t do what you need it to do. It was basically them ganging up on us to impose their will.It’s just laziness in my opinion. They also want to remove like all logic from the database and do hardly anything in it. Which I don’t get ... why purchase an expensive dbms just to use it for table storage. And do it poorly!!!!I’m so livid, I really want my company to reassign me cause I just can’t be part of thr madness anymore.

    Ugh that really stinks!!! Sounds like they shouldn't be using sql at all. Perhaps a no-sql solution would be better for their ultimate flexibility. Hopefully someday they will start to realize the bad decisions they made. It sounds more and more like the link I sent yesterday where the IT guru and his/her followers are going to cause all sorts of problems and make a crap load of cash when they shouldn't. I agree with Jeff 100% here, let them fail but make sure to CYA along the way.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sounds to me like you should start looking for another job! I'd be surprised if that company will last the distance and the stress of working for a company that stores its data in such a fashion would be immense.
    Good luck to you!

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply