Extreme SQL

I have made a career of working with SQL and databases. Usually I've looked for interesting companies and people, but I've avoided extreme situations. For me, that often is very large, or very real time environments. I once declined a job for a 13TB database on SQL Server 6.5. My suspicion is that job would have taken me away from my young children and wife far too often.

Facebook has a lot of users, and a lot of queries they run. With over 1billion daily users and hundreds of TBs of daily uploads, they really need strong databases. While they have multiple databases, and that includes SQL ones, they have struggled with analytic queries in the past. They started using Presto as a solution, an open source query engine for running analytic queries against data in different storage locations like RDBMSes or in something like Hive/HDFS. This sounds like what Polybase does for SQL Server.

The problem with any engine at Facebook's scale is the load. While they like Presto, they needed to make it work better. They initially built a caching layer that required users to build ETL jobs to load data into SSDs attached to the Presto cluster. However, they outgrew this and ended up turning to a distributed file system called Alluxio.

The article linked above talks a bit about how this works, and allows users to query petabytes of data. Most of us have users that often don't qualify their queries completely, so we expect that some queries that might need to scan 100GB end up reading much more until the users tune them appropriately.

The thing I found interesting in here is that some queries were taking up to 10s, which users found unacceptable. The move to Alluxio gave them a 30-50% boost, which doesn't sound like a lot. 5-7s over 10 isn't a great savings to me. The reduction in reads, is impressive, which is good, but I wonder to what expect there is some management and tuning needed here to ensure the cache works well.

I have no desire to work on these extreme systems, but I am glad someone does. The lessons and tricks learned here often trickle down to improve the daily performance many of us see in our smaller systems. I think that the Hyperscale work Microsoft is doing, and the Big Data Clusters, are fascinating ways of organizing SQL Server based servers, and some of that tech will likely trickle down and help us continue to improve our smaller systems' performance over time.

SQL Server 2019 Feature

by Gregory Larsen

SQLServerCentral

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

(1)

You rated this post out of 5. Change rating

2020-06-23

853 reads

Discuss

Set Statistics Time Examples for Tuning SQL Server Queries

by Additional Articles

MSSQLTips.com

Database Performance

The SQL Server set statistics time statement displays the number of milliseconds to parse, compile, and execute a T-SQL query statement. This set statement is widely used to assess times to implement a query statement. The set statistics time statement reports the CPU time and elapsed time for performance tuning.

2020-05-08

How Bad are Bad Page Splits?

by Mike Byrd

SQLServerCentral

A look at bad page splits and how they affect your database.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2020-03-03

Discuss

What Write Ahead Logging Is and Why It Matters

by Kendra Little

SQLServerCentral

Kendra Little talks about write ahead logging in SQL Server, one of the basic concepts that developers and DBAs should understand.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

(3)

You rated this post out of 5. Change rating

2024-06-12 (first published: 2020-01-20)

4,770 reads

Discuss

Query Performance Diagnostics with sp_PerfSQ

by Edward Haynes

SQLServerCentral

Database Performance

This article discusses sp_PerfSQ a diagnostic tool designed to quantify performance features of database queries with active requests. It includes a behavioural parser and can assist in troubleshooting complex performance issues.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

(10)

You rated this post out of 5. Change rating

2021-12-24 (first published: 2019-12-02)

11,510 reads

Discuss

Extreme SQL

Rate

Share

Categories

Share

Rate

Extreme SQL

Rate

Share

Categories

Share

Rate

Related content

SQL Server 2019 Feature

Set Statistics Time Examples for Tuning SQL Server Queries

How Bad are Bad Page Splits?

What Write Ahead Logging Is and Why It Matters

Query Performance Diagnostics with sp_PerfSQ