Containers and Databases

  • Comments posted to this topic are about the item Containers and Databases

  • I am in the "containers are stateless" camp so am also looking for enlightenment here.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • I'm using Apache Spark more and more and we are using it to join and query data from many different sources.  It makes sense to have a compute engine in a container but the data storage has to be state-full.  I think we'd need a radical redesign of a database engine to be able to use containers in a meaningful way in production.

  • I'm afraid I don't get it, but I have yet to give containers a go at all so it's not surprising. Awaiting enlightenment!

  • I have no experience with using MS SQL Server in containers, but I have used PostgreSQL in containers using Docker for nearly two years now.  I wrote a post about my initial impressions of that experience here, if you're interested.   For us, at the time, it was absolutely the correct choice.  While we are still using Postgres in Docker, I'm starting to consider un-doing that change and putting it back on the base servers.

    We realized three major benefits by using containers for databases:

    1)  It helps ensure development databases have the same level of security and configuration applied as production (accidental "Oops" protection)
    2)  It made it very easy to migrate one database while leaving the rest alone, since each database has a dedicated container
    3)  It enabled a very easy method to create automated tests of database restorations

    In the past 6 months we have fully embraced Ansible for automating the deployment/configuration of servers.  With this in place, a lot of the benefits I found using Docker can be realized using Ansible.  Containers were a perfect solution for a tiny startup trying to keep resources at a bare minimum, but we've outgrown that stage now.  

    We will continue to use containers for deploying web applications, VPN services, etc., but will likely stop using them for databases in the next 6 months.  I don't think that using containers is inherently a problem for a database server, but at this point I don't see it as a value-add either.

    I hope others have some feedback on this topic too, I am very interested in other experiences in this realm!

  • I've not played with containers yet, but the concept sounds interesting.

    The use cases for development / QA types systems is fairly easy to see, but for a production environment I'm having a hard time coming up with a case where it would be worth the time / effort to get things set up.  About the only situation I've been able to think of would be some sort of reporting type application.
    Set it up so that every morning, or whenever, it creates a SQL container with a blank DB, pulls in the required data from the production SQL instance on another server, then holds the data for the reporting application until the user is done (could be all day, could be a couple hours, whatever.)  The user would be able to generate reports, even if not connected to the network, without impacting the production data.  The user could even modify the data, again without impacting the production data (although if they generate reports and play with the data on a production instance, you might need to re-think your process anyways...)

    But even there, why not just stand up an Express instance on their device?  Less headache, and probably about as much work initially...

  • rustprooflabs - Tuesday, February 28, 2017 6:12 AM

    I have no experience with using MS SQL Server in containers, but I have used PostgreSQL in containers using Docker for nearly two years now.  I wrote a post about my initial impressions of that experience here, if you're interested.   For us, at the time, it was absolutely the correct choice.  While we are still using Postgres in Docker, I'm starting to consider un-doing that change and putting it back on the base servers.

    We realized three major benefits by using containers for databases:

    1)  It helps ensure development databases have the same level of security and configuration applied as production (accidental "Oops" protection)
    2)  It made it very easy to migrate one database while leaving the rest alone, since each database has a dedicated container
    3)  It enabled a very easy method to create automated tests of database restorations

    In the past 6 months we have fully embraced Ansible for automating the deployment/configuration of servers.  With this in place, a lot of the benefits I found using Docker can be realized using Ansible.  Containers were a perfect solution for a tiny startup trying to keep resources at a bare minimum, but we've outgrown that stage now.  

    We will continue to use containers for deploying web applications, VPN services, etc., but will likely stop using them for databases in the next 6 months.  I don't think that using containers is inherently a problem for a database server, but at this point I don't see it as a value-add either.

    I hope others have some feedback on this topic too, I am very interested in other experiences in this realm!

    I'd love to see you write more on each of these topics, pros, cons, etc.  I'm not sure how much overlap there is with the PostgreSQL/SQL Server paradigms for development and administration, but I'd like to learn more.

  • You raise good questions Steve, a couple of points of clarification should help.   First, let's clarify what are "SQL Server containers?"   SQL Server containers are simply SQL Server instances created and managed with Docker commands and APIs.   They have a few added features specific to being a "container," such as a higher level of process and user isolation, and resource governance via the Docker API, and a private file system.   On the other hand they are normal full featured SQL Server instance that operates on the Windows host, and interoperate with all normal SQL Server tools.   Databases can be attached on the local host, or on network or SAN attached storage, and data persistence is no different than other SQL Server instances.   Containers add a private file system, which is a new option and provides a simple non-persistent option for working with data that is suited for dev/QA use.  But, there is nothing otherwise to distinguish Windocks SQL Server containers from other SQL Server instances installed on the host.   You have the same registry settings, the same features and functions, the same performance.  Containres work with all normal SQL Server tools, and are comprised of the same sqlserver.exe and DLL's.   

    With Windocks SQL Server containers you get the best of both worlds . . . fast instance creation and great support for dev/QA use, as well as a complete, full featured SQL Server environment.   SQL Server containers support VSS and existing backup systems too.

    Do "containers" add value in production environments?   While a container instance can be managed and patched just as any instance, the container "adds value" in production only to the extent that containers are a common infrastructure in a Continuous Integration and Delivery strategy.. In time I believe there will be interest in use of containers for production for "continuous delivery" and rollback strategies, as DevOps and CI processes are adopted.    But, today, the particular value of containers is realized in improved support of Dev and QA teams.

    I hope this helps . . . I think this is a great discussion.

  • One additional comment that I think adds to this discussion.  To the extent you believe that Docker containers represent an industry-wide strategy, and certainly Microsoft seems to emphasize Docker containers for their vNext release . . . I would argue that is another reason for exploring these SQL Server containers.   As SQL Server professionals we need to be aware of larger trends, and consider that there is an entire industry innovating on containers, and it may be important in the future to have SQL Server strategies that align with a container oriented world.

  • Thanks, Paul. Appreciate the comments.

  • pauls 72822 - Tuesday, February 28, 2017 1:10 PM

    One additional comment that I think adds to this discussion.  To the extent you believe that Docker containers represent an industry-wide strategy, and certainly Microsoft seems to emphasize Docker containers for their vNext release . . . I would argue that is another reason for exploring these SQL Server containers.   As SQL Server professionals we need to be aware of larger trends, and consider that there is an entire industry innovating on containers, and it may be important in the future to have SQL Server strategies that align with a container oriented world.

    Hi Paul - This is a great point, new technologies/methodologies introduce new problems.  One caveat when working with Docker and Linux (specifically Ubuntu with UFW, possibly other configurations) is that you will run into unexpected firewall issues with open ports you did not expect.  The issue is explained briefly in this post.  The moral of the story is, with Docker and UFW you can end up with ports exposed that you did not intend to be publicly available.

  • Thanks, this is a good discussion.   One difference in our design is WinDocks are application level containers, and we utilize the host for ports and mount points . . . so the firewall settings work, and you don't end up with ports or other stuff happening without the host being involved and managing the resource.

  • Steve Jones - SSC Editor - Tuesday, February 28, 2017 9:38 AM

    rustprooflabs - Tuesday, February 28, 2017 6:12 AM

    I have no experience with using MS SQL Server in containers, but I have used PostgreSQL in containers using Docker for nearly two years now.  I wrote a post about my initial impressions of that experience here, if you're interested.   For us, at the time, it was absolutely the correct choice.  While we are still using Postgres in Docker, I'm starting to consider un-doing that change and putting it back on the base servers.

    We realized three major benefits by using containers for databases:

    1)  It helps ensure development databases have the same level of security and configuration applied as production (accidental "Oops" protection)
    2)  It made it very easy to migrate one database while leaving the rest alone, since each database has a dedicated container
    3)  It enabled a very easy method to create automated tests of database restorations

    In the past 6 months we have fully embraced Ansible for automating the deployment/configuration of servers.  With this in place, a lot of the benefits I found using Docker can be realized using Ansible.  Containers were a perfect solution for a tiny startup trying to keep resources at a bare minimum, but we've outgrown that stage now.  

    We will continue to use containers for deploying web applications, VPN services, etc., but will likely stop using them for databases in the next 6 months.  I don't think that using containers is inherently a problem for a database server, but at this point I don't see it as a value-add either.

    I hope others have some feedback on this topic too, I am very interested in other experiences in this realm!

    I'd love to see you write more on each of these topics, pros, cons, etc.  I'm not sure how much overlap there is with the PostgreSQL/SQL Server paradigms for development and administration, but I'd like to learn more.

    TLDR;
    Containers are simply a different tool for the same job.  In our use cases, they've been very beneficial, we'll continue to use them, but some of their use cases will start to diminish because we can afford more servers.  Containers will continue to power our web applications for a variety of other reasons, but we have realized plenty of benefits from the database perspective as well.

    Paradigms
    I'll start with overlap between PostgreSQL and SQL Server paradigms, with a bit of my experience.  I've developed on SQL Server for roughly 10 years starting with SQL Server 2005 through current.  While I've administered my own dev servers, I've never administered SQL Server in production or been responsible for its backups.  So, I don't have real world experience in those areas with SQL Server.  With PostgreSQL, I've both administered and developed against that platform for about five years.  One set of experience is my "day job" with the other being my business.

    From a development standpoint, the real differences are in the non-ANSI standard features from each platform.  If you religiously follow the standards, there are minimal issues.  The major differences are in the non-ANSI standard features from each platform.  If you religiously follow the standards, such as using COALESCE instead of ISNULL, there are minimal differences.  One major change between the two is that PostgreSQL does not have stored procedures, but I haven't found an instance where I couldn't use an UDF instead.  As much as I love SQL Server, it can't compete when it comes to the geo-spatial abilities provided by the PostGIS extension.  SQL Server is making GREAT progress on this, but PostGIS has a first-to-market advantage there.

    Now onto some pros and cons for my use cases.

    Development Databases:
    For development purposes, using Docker is truly awesome.  (See my developer roots showing?)  A developer can download the latest DB image (images in Docker are sort of like snapshots with a VM), and run a single-line command and have a fully configured (including securing settings) database server running in seconds.  My PostgreSQL Docker image is roughly 150 MB (compressed), so not a terribly large download even over a slow network.  This also allows treating development servers like Cattle instead of Kittens.  No longer do we have development database servers with dozens of random test databases hanging out, with little or no documentation about why they exist.  Developers simply spool up a new container when they need it, and destroy it when they're done.  If they don't, I'll do it for them!  Oh, and don't forget that these images have standardized security for production services, so reduces the chance of a developer slip-up.

    As I mentioned in my previous post, bringing in Ansible has shown I can do the same thing without using containers.  But, I wouldn't really consider that a con for this use case.  In reality, I can spool up as many unique database containers as are needed, while with Ansible the same task would be more difficult without splitting out the load to multiple servers.

    Database Isolation:
    As I previously mentioned, we work with a lot of spatial data and that grows in size quickly.  The OpenStreetMap data for just the state of Colorado is roughly 1GB.  We also work with weather data from NOAA and a single year of data easily hits 10 GB per year, so a multi-year historic analysis can grow to the TB level in a snap.  Shortly after I had containerized our databases, we ran into a storage related issue for a temporary job.  To fix this I simply spooled up a new server with enough storage, created a new container for the process, and moved it over.  This is also a great example of being able to put the power where I needed it.  Our main production server for our database containers has 512 MB RAM and 1 core.  Our daily workload runs there and the server is basically at idle, but these giant data loads would be terribly slow on that hardware.  It's easy (and cheap) to spool up a new server with the appropriate power and let it run.  This allows us to handle our peak loads without having giant servers with wasted resources.

    The problem with this solution is each database container needs its own backup routine, though I suspect that many folks with multiple databases on a single "server" do the same anyway.  But, since everything is scripted and automated, this isn't a major issue.  Another con is that each database container has it's own unique, non-standard, port.  That may be a hassle to some, but again with proper configuration deployment and standards it hasn't been that much of a headache.

    Automated Restoration Tests:
    I think this was the biggest benefit for us, especially when we had ZERO resources.  Being able to deploy production-ready database containers in seconds made it possible to automate the restoration of our databases and validate our backups at no extra cost.  While our resources have grown to where we can afford to have extra servers for this need, we don't have any problems at all with this use and probably won't mess with it.

  • pauls 72822 - Tuesday, February 28, 2017 4:57 PM

    Thanks, this is a good discussion.   One difference in our design is WinDocks are application level containers, and we utilize the host for ports and mount points . . . so the firewall settings work, and you don't end up with ports or other stuff happening without the host being involved and managing the resource.

    Excellent!!

  • At this stage I am only looking into SQL and Docker for development/testing purposes.
    SQL on Linux is probably a better option right now for production databases (if not going the traditional Microsoft on-premise or Azure route).

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply