Where's the Schema?

  • Comments posted to this topic are about the item Where's the Schema?

  • As someone who has never really worked in a professional development team , I'm more of an all rounder who occasionally writes front ends for internal enterprise use only. I am like yourself Steve slightly skeptical. To me No SQL is an interesting idea that looks like it has some merits in some circumstances . I haven't tried any of the databases themselves but try to understand the concepts.

    I have thought that I might be able to store all data for any conceivable application in two tables - a key value table and a schema table. Such that these keys can be associated with multiple schemas. This to my mind would allow developers to create columns on the fly while not worrying about the schema and then I can retrospectively go back and assign such fields to some kind of schema. I can see that this would have merit in certain circumstances for things like CRM notes applications. I just can't see that it would be good for design of financial systems hr systems / pretty much anything involving money. Master child forms for desktop applications would appear more work if you use this kind of no sql format - interestingly I think standard web frameworks are very very poor with master child forms anyway and I think this is one of the driving forces to why no sql is so appealing to web developers as it makes it very easy to records standard master child information in a flat form while removing additional development time consulting with the DBA. A nice half way house might be implementation of a key value table with the requirement that developers restrict the ability of users to create "columns" in some way and impress upon them the need to make every effort to remove (in some way) duplicate columns that confuse users and lead to problems with schema definition.

    Without some kind of structure users subsequently analyzing information will definitely miss more stuff than with standard rdbms as individuals miss columns or struggle to distil data down.

    Open Street Map uses this key value no sql structure albeit managed through an RDBMS (postgres) where users can create columns. The single table is massive in terms of columns and interestingly it does seem to have come to some kind of stable limit , analyzing data becomes an exercise in fuzzy matching and thus two people can get a different answer to the same question. This was the right decision for them though I think because they have tens of thousands of users and very few developers the amount of data being collected is huge and the system copes well and has flexibility.

  • @dalkeith I would associate NoSQL more with document / JSON stored values rather than EAV. EAV would certainly never scale whilst a JSON type approach can scale, with a schema based on on ignoring things you don't need to know about. It's a decent plans for some use cases anyhow...not ones I use, but that I can envisage. I think Steve is right about schemas though.

  • Thanks CC - noted.

  • I'm a dev who has moved more to lead dev role, and for the record I've predominantly used SQL Server for all data storage.

    For the types of projects I work on anything with out a defined schema = NO! Customers want to be able to create their own reports and as soon as you need to do that then storing arbitrary things like JSON that could represent anything immediately makes this a nightmare. And maintaining one store for JSON etc.. and one for some deserialized versus in a relational database seems like something that is doomed to cause nothing but headaches for all involved.

  • .. There's always a schema, and the rules have to be implemented up front, or later on. Whether you use a RDBMS or a NoSQL store, you are going to be dealing with a schema. The question is do you want to deal with it in a central location or in every application? ..

    The reason why most application developers choose a document database is that they simply don't want to "deal with" the schema at all. A java or C# object can be transparently persisted to the database as a JSON object. What happens on the front end is a single method call submitting the ID or session token, and what happens on the backend is a single bookmark lookup, returning a document containing all relevant related data for that customer. For example, a document database is ideal of an eCommerce shopping cart or a mobile app with microservice based architecture. It's schemaless in the sense that they are not designing the schema for a set of relational database tables, they are simply designing the properties of an object class. Maintaining multiple versions of a schema within a collection is technically possible, but that's not the goal. 

    I'm currently working on a project where a customer base of several million can access their account information and purchase / payment history from a mobile app. We're considering Azure DocumentDB.  As a DBA responsible for the availability and performance of SQL Server, I appreciate the fact that there is another option besides having the production online database (or even the data warehouse) getting pounded by thousands of requests per second. Yes, SQL Server can scale up to the task, it's been done before, but the level of effort and expense would be the equivalent of deploying an army to the moon. 

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • As time goes on, I'm beginning to see some value in the whole "schema-less" concept.  One use-case is for Proofs of Concept.  Being able to rapidly create something and just mash whatever data you're collecting in a single blob in the database.  That makes for quicker development (no need to map individual values to database columns) and yet it still gives you access to the data.  Later on, the application could be changed to store it in columns, or perhaps it doesn't.  Another app could be written to take that "blob" (JSON, etc.) and map it to another table with user-friendly columns, etc. for such things as reporting.

    The old adage of "if I have a hammer, everything looks like a nail" seems to hold true here.  If I know (and love) SQL Server and its relational-concepts, I tend to want to force everything into that paradigm, but it's good to understand and explore other options. ... Not everything is a nail! 😉

  • IowaDave - Tuesday, July 3, 2018 8:26 AM

    As time goes on, I'm beginning to see some value in the whole "schema-less" concept.  One use-case is for Proofs of Concept.  Being able to rapidly create something and just mash whatever data you're collecting in a single blob in the database.  That makes for quicker development (no need to map individual values to database columns) and yet it still gives you access to the data.  Later on, the application could be changed to store it in columns, or perhaps it doesn't.  Another app could be written to take that "blob" (JSON, etc.) and map it to another table with user-friendly columns, etc. for such things as reporting.

    The old adage of "if I have a hammer, everything looks like a nail" seems to hold true here.  If I know (and love) SQL Server and its relational-concepts, I tend to want to force everything into that paradigm, but it's good to understand and explore other options. ... Not everything is a nail! 😉

    Except that there's that other problem where the prototype becomes the real thing and you get lumbered with a load of technical debt.
    Using a relational database is not really a hammer. Your comments main assertion is to save time not because things don't fit into relational concepts.

    My problem with conference vids I've seen on this topic is that as soon as you need reporting they completely fall apart and become more of a burden then just doing relational SQL in the first place.

  • I've hit some nasty messes when people try and store master data in No SQL, the devs have literally lost the ability to rebuild the data hierarchy. After conversations with fellow professionals it seems I'm not alone, and my conclusion is that it's only good for really basic list management (Craig's list is a good example).

    It's a trade off between consistency and hierarchy management vs speed.

    I've spoken to devs working on web performance issues who go running off to build NoSQL solutions to solve web response problems, when really their actual problem is terrible terrible SQL architecture. Result? The architecture is still terrible plus they have a sprawling, unmanageable No SQL mess to manage too.

    Web devs tend to solve the problem in front of them with whatever tools they fancy having a play with. They shouldn't be given their heads on this.

    I am sure eventually web development will be mature enough to build an all encompassing data solution, but not today, they're still generating the same problems I've been solving for 20 years. SQL is king, it's maybe not very trendy or exciting but it's the best tool for managing modelling state transitions in business processes by quite a margin.

  • I think the real appeal of "schema-less" (actually, it should probably be called "ad-hoc schema" since nothing is schema-less) is that new "table structures" can be defined without regard to existing ones. My hobbies usually involve the need for LOTS of data storage and I once tried to create a database for D&D world creation. This meant I needed *thousands* of tables, one per different kind of object (planets, countries, skills, spells, creatures, you name it...).

    I used Access (which is a relational DBMS regardless of what detractors would have you believe) and the result was a set of half a dozen tables to handle "ad hoc" tables. These tables contained the details of the "tables". The user could create new tables (actually, object blueprints) to their hearts' content, and export them to other users (both "schema" and data). It worked very, very well for what it was intended for.

    In short it was almost identical to the way SQL Server works! 😛

    SQL Server does EXACTLY the same thing, in a far more sophisticated way. Every time you create a table it goes in a system table. Ditto for columns, indexes, permissions, etc.

    So in effect a no-SQL database does exactly what a SQL database does, just less efficiently and in a less granular fashion. Developers think this is easier, and then spend all their time replicating features SQL Server provides for almost free.

    It's really no different than building every table with NVARCHAR(MAX) columns for everything, in other words they're a bit ignorant of helpful features.

    I suspect there ARE use cases where no-SQL is a better fit. Maybe important ones (Google indexing comes to mind, or YouTube).

    But IMO the vast majority of cases are better off with structured schema, be it SQL or something similar.

  • The weirdest schema I've ever seen is known as HL7. I just don't get it. Like Richard Campbell on the Dot Net Rocks podcast called it, "It's the non-standard, standard."

    Kindest Regards, Rod Connect with me on LinkedIn.

  • Eric M Russell - Tuesday, July 3, 2018 7:28 AM

    I'm currently working on a project where a customer base of several million can access their account information and purchase / payment history from a mobile app. We're considering Azure DocumentDB.  As a DBA responsible for the availability and performance of SQL Server, I appreciate the fact that there is another option besides having the production online database (or even the data warehouse) getting pounded by thousands of requests per second. Yes, SQL Server can scale up to the task, it's been done before, but the level of effort and expense would be the equivalent of deploying an army to the moon. 

    Have you seen this? https://vincentlauzon.com/2018/06/27/cosmos-db-stored-procedures-handling-continuation/?_lrsc=1c59c385-cfb7-4c28-a1df-9d7d87a627a4
    I think  document stores work well for situations where you almost always deal with singletons. However, when you start to try and work on reports or when your singleton structure changes, you've got to ensure that all apps, which includes ETL, reports, and other potential schema-bound systems (like warehouses) evolve as well. Plus you have to handle the multiple structures in your code.

    I'm not saying that's bad, but it's going to be overhead over time.

  • IowaDave - Tuesday, July 3, 2018 8:26 AM

    As time goes on, I'm beginning to see some value in the whole "schema-less" concept.  One use-case is for Proofs of Concept.  Being able to rapidly create something and just mash whatever data you're collecting in a single blob in the database.  That makes for quicker development (no need to map individual values to database columns) and yet it still gives you access to the data.  Later on, the application could be changed to store it in columns, or perhaps it doesn't.  Another app could be written to take that "blob" (JSON, etc.) and map it to another table with user-friendly columns, etc. for such things as reporting.

    The old adage of "if I have a hammer, everything looks like a nail" seems to hold true here.  If I know (and love) SQL Server and its relational-concepts, I tend to want to force everything into that paradigm, but it's good to understand and explore other options. ... Not everything is a nail! 😉

    I'm leaning this way as well. I think this (https://www.allthingsdistributed.com/2018/06/purpose-built-databases-in-aws.html) is good. We likely can use multiple stores for different purposes

  • richard.gardner87 - Tuesday, July 3, 2018 8:49 AM

    I've hit some nasty messes when people try and store master data in No SQL, the devs have literally lost the ability to rebuild the data hierarchy. After conversations with fellow professionals it seems I'm not alone, and my conclusion is that it's only good for really basic list management (Craig's list is a good example).

    It's a trade off between consistency and hierarchy management vs speed.

    I've spoken to devs working on web performance issues who go running off to build NoSQL solutions to solve web response problems, when really their actual problem is terrible terrible SQL architecture. Result? The architecture is still terrible plus they have a sprawling, unmanageable No SQL mess to manage too.

    Web devs tend to solve the problem in front of them with whatever tools they fancy having a play with. They shouldn't be given their heads on this.

    I am sure eventually web development will be mature enough to build an all encompassing data solution, but not today, they're still generating the same problems I've been solving for 20 years. SQL is king, it's maybe not very trendy or exciting but it's the best tool for managing modelling state transitions in business processes by quite a margin.

    Craigslist is interesting. It is mostly singletons, or a series of related singletons retrieved for display. I have no idea how hard it is to move things around there, but that would be a fun design exercise.

    I tend to agree. I think SQL works really well for most things. Maybe not relationships, where graphs shine, but I can't decide if the graph scenarios are just a small part of a larger business issue and should just be one part of your data.

  • roger.plowman - Tuesday, July 3, 2018 8:50 AM

    But IMO the vast majority of cases are better off with structured schema, be it SQL or something similar.

    I think so, too, but I wonder. Are there more cases where we should have different structures? Use multiple data stores for different needs?

Viewing 15 posts - 1 through 15 (of 29 total)

You must be logged in to reply to this topic. Login to reply