Biz & IT —

Why use a database instead of just saving your data to disk?

For smaller projects, is a database worth it?

Why use a database instead of just saving your data to disk?
Stack Exchange
This Q&A is part of a weekly series of posts highlighting common questions encountered by technophiles and answered by users at Stack Exchange, a free, community-powered network of 100+ Q&A sites.

Dokkat appears to think that databases are overused. "Instead of a database, I just serialize my data to JSON, saving and loading it to disk when necessary," he writes. "All the data management is made on the program itself, which is faster AND easier than using SQL queries." What is missing here? Why should a developer use a database when saving data to a disk might work just as well?

See the original question here.

The laundry list

Robert Harvey Answers (117 votes):

  1. You can query data in a database (ask it questions).
  2. You can look up data from a database relatively rapidly.
  3. You can relate data from two different tables together using JOINs.
  4. You can create meaningful reports from data in a database.
  5. Your data has a built-in structure to it.
  6. Information of a given type is always stored only once.
  7. Databases are ACID.
  8. Databases are fault-tolerant.
  9. Databases can handle very large data sets.
  10. Databases are concurrent; multiple users can use them at the same time without corrupting the data.
  11. Databases scale well.

In short, you benefit from a wide range of well-known, proven technologies developed over many years by a wide variety of very smart people.

If you're worried that a database is overkill, check out SQLite.

Related: "When would someone use MongoDB (or similar) over traditional RDMS?"

Complexity deserves thought

Yannis Rizos comments:

You don't need a database in the scenario you are describing because your data set is trivial. Databases are meant for more complex data sets, if all you do is read and show a list, your approach works.

When do you make the call?

Sam answers (91 votes):

Whilst I agree with everything Robert said, he didn't tell you when you should use a database as opposed to just saving the data to disk.

So take this in addition to what Robert said about scalability, reliability, fault tolerance, etc.

Use a RDBMS when:

  • You have relational data, i.e. you have a customer who purchases your products and those products have a supplier and manufacturer.
  • You have large amounts of data and you need to be able to locate relevant information quickly.
  • You need to start worrying about the previous issues identified: scalability, reliability, ACID compliance.
  • You need to use reporting or intelligence tools to work out business problems.

Use a NoSQL when:

  • You have lots of data that needs to be stored which is unstructured.
  • You have scalability and speed needs.
  • You generally don't need to define your schema up front, so if you have changing requirements this might be a good point.

Finally, use files when:

  • You have unstructured data in reasonable amounts that the file system can handle.
  • You don't care about structure, relationships.
  • You don't care about scalability or reliability (although these can be done, depending on the file system).
  • You don't want or can't deal with the overhead a database will add.
  • You are dealing with structured binary data that belongs in the file system, for example: images, PDFs, documents, etc.

Think about “what”

Emperor Orionii Answers (5 votes):

I see a lot of answers focus on the problem of concurrency and reliability. Databases provide other benefits beside concurrency, reliability, and performance. They allow to not to bother how bytes and chars are represented in the memory. In other words, databases allow programmer to focus himself on "what" rather than "how".

One of the answers mentions queries. "Asking SQL database a question" scales well with the complexity of a question. As code evolves during the development simple queries such as "fetch all" can easily expand to "fetch all where property1 equals this value and then sort by property2" without making it programmer's concern to optimize data structure for such query. Performance of most queries can be speed up by making index for a certain property.

Other benefit are relations. With queries it's cleaner to cross-reference data from different data sets than having nested loops. For example searching for all forum posts from users that have fewer than 3 posts in a system where users and posts are different data sets (or DB tables or JSON objects) can be done with a single query without sacrificing readability.

All in all, SQL databases are better than plain arrays if data volume can be big (let's say more than 1000 objects), data access in non-trivial and different parts of code access to different subset of data.

Find more answers or leave your own at the original post. See more Q&A like this at Programmers, a site for conceptual programming questions at Stack Exchange. And of course, feel free to login and ask your own.

Channel Ars Technica