SQLServerCentral Editorial

Not a cloud to be seen

,

When reading the marketing flannel that infects the industry, the message seems to be that you can solve your data problems (such as Big Data or Internet of Things) by choosing a particular technology. I'm convinced that it is what you do with the technology that matters far more; the relative values and priorities of the development team; the techniques that you bring to bear to developing the application; the development environment and facilities you provide; the attention to detail in the processes within the database lifecycle.

The StackOverflow site is a great example, because they take so much care to let us know about their technical solution. At StackOverflow, a single live SQL Server instance (with a hot standby), bedecked with 384 GB of RAM and 2TB of SSD, manages a 2.4 TB database to manage 440 million queries a day, peaking at 8500 queries per second, and never even breaks into a sweat (15% CPU usage). It is an intelligent, and well architected solution. Like Wikimedia, it uses Elasticsearch, based on Lucene, to provide sophisticated search, and their own custom tag engine. Redis powers the 'awards' amongst other things. The entire family of sites handle 560 million page views a month.

At first glance, it all seems a bit retro. There is, by deliberate choice, no use of the cloud; it uses Windows servers where appropriate and, like Wikipedia, it is based on a relational database. There isn't a whiff of Microservices. NoSQL, in the form of Redis, takes a subordinate self-contained role where this is appropriate. However, there is a strong minimalist design ethos, built on the idea of scalability and the build is lightning fast, at ten seconds, because there are only 110,000 lines of code.

Every member of the 'Stack' family has the same database schema (except for Careers Stack Overflow, stackexchange.com, and Area 51). This makes deployment easier, since the schema changes are applied to all site databases at the same time. There is no partitioning, because the size of the data doesn't warrant it. There is some denormalization to optimize query performance. Obviously, their deep understanding of indexes determines a lot of database-design decisions.

With a rapid deployment process using Puppet and PowerShell DSC, any refactoring can be done in several low-risk stages, often using feature-switching. Unit tests, integration tests and UI tests run on every deployment and the process is aborted on any failure. They'd like to do up to five deployments a day because it encourages the programming mindset of building the smallest thing that would work. The SQL Development is done on a SQL Server installation that is the same size as production and the test data is similar to production in size and content, so performance-testing is easier, and a culture of performance is instilled in the developers.

So no Cloud, plenty of real, well-crafted hardware, fanatical attention to performance, rapid delivery and SQL Server. Does my broad smile indicate that I'm some sort of IT Hipster?

Phil Factor.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating