Data Orchestration

We continue to deal with larger and larger data sets all the time. In fact, it seems that most people find themselves outgrowing the capabilities of some of their OLTP databases, often RDBMS stores, and need to upgrade hardware or re-architect software. It doesn't matter if you have a 100GB database on a few cores or 10TB on dozens of cores, there is often a need to upgrade to meet our workload demands.

In addition to the transactional needs, there is a growing demand for reporting and analysis workloads. Some people use a separate warehouse, and some want to just query data where it is. Certainly ETL processes and platforms have grown tremendously over the last few decades for those that want to implement the former process, but there is plenty of demand for the latter. In fact, I'm amazed how many customers have inquired if Redgate's SQL Clone product will enable them to do this and spread their workload to other systems (it's not designed for this).

I've been thinking that with SQL Server 2019 we will start to access data where it lives, not move it to another place we want it. To me, this is more of what future data orchestration might involve. I ran across an article that takes a slightly different approach, thinking AI and other products will help better move data around, and perhaps that's true, but I do think more and more we want to query data where it lives, and use larger, distributed compute platforms to do this.

The scale out capabilities of SQL Server 2019, with the separation of compute and storage in Big Data Clusters, is a huge change that I think will be the future for many of us that look to meet reporting needs. The ability to grow hardware to match the workload needs is huge. This alone is a good reason to think about doing this in a hybrid or public cloud scenario.

Of course this doesn't come cheap, easy, or quick. There is work to be done to evolve systems, but it is an area I think is worth experimenting in during the coming year. I bet many companies would be interested in some PoC work here to determine how to better meet the reporting requirements of larger data sets. Perhaps this is something you suggest to someone in your organization.

SQL Server Integrates Hadoop and Spark out-of-the box: The Why?

by Frank A. Banin

SQLServerCentral

Why has Microsoft added new capabilities in SQL Server to connect to other types of data sources? Read on to learn more.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

(8)

You rated this post out of 5. Change rating

2021-05-14 (first published: 2019-09-09)

10,604 reads

Discuss

Do You Have Big Data?

by Steve Jones

SQLServerCentral

Big Data

Data sizes are always growing. Stats on world data are astounding, as are the stats many of us experience in our lives. Plenty of us have moved from MB management to GBs, and I see plenty of people dealing with TB storage at home. Most of that data is likely from images and video, but […]