SQLServerCentral Article

Mongo Jumbo Backups


Taking a basic full SQL Server database backup is disarmingly simple operation. Most people, within a few minutes of reading the documentation, could backup a live SQL Server database and then shortly after restore it to a consistent state with respect to a certain time. Pause for a second to consider the underlying complexity that makes this possible and one can't help but admire this simplicity. Furthermore, with the FULL recovery model, the DBA can exert very fine-grained control over the restore process, rolling through the log backups to restore a database to a precise point in time.

Of course, backing up a very large database is 'harder', in terms of the time and resources required, but then we have options such as differential file, and filegroup backups that enable us to customize a strategy for backups. Ultimately, except in extreme cases, if you're not regularly backing up your SQL Server databases, arrange a visit from Grant Fritchey and his lead-weighted hickory learning bat.

By contrast, to this relatively untrained eye at least, the world of NoSQL database backups seems a minefield. According to the MongoDB backup documentation, for example, we perform backups by 'copying' MongoDB's underlying data files, using point-in-time file system snapshots, providing the data volume supports them and provided we've got journaling enabled in order to get a consistent snapshot.

It's akin to backing up SQL Server databases with a Windows tool such as Microsoft DPM (Data Protection Manager). From the DBA's perspective, it removes almost all of their control over the database backup, and more critically, the database restore operation.

Add replicas and sharded clusters into the mix, and obtaining consistent point-in-time backups of a MongoDB database begins to feel like a distant dream. If it's a sharded system, we must "disable the balancer and capture a snapshot from every shard and a config server at approximately the same moment in time." I'm betting this will not sound massively reassuring to many DBAs. The documentation admits that these backups can get very large and "do not support point in time recovery for replica sets and are difficult to manage for larger sharded clusters". It offers a different form of managed backup through its MongoDB Management Service (MMS), although it doesn't support point-in-time restore for sharded systems.

In the world of "big data", some applications collect millions of rows of data per week, or even day. In such cases, the main attractions of a key-value database, such as MongoDB, with a 'flexible' schema-less data model, over a traditional relational database, are that it is 'built' for High Availability, through replication, and massive scalability, through sharded clusters. It also seems to be built on the assumption that traditional database backups are just for the ultra-fussy. Let's hope they're right.