Democratizing Self Service Cloud Analytics and ELT

By: Ron L'Esteve | Updated: 2023-03-02 | Comments | Related: > Cloud Strategy

Problem

Many organizations struggle with democratizing access to data and analytics in the cloud, particularly for self-service purposes. As a result, data and analytics projects often require significant IT involvement, leading to delays, bottlenecks, and inefficiencies. Additionally, data analytics, ELT, and BI tools often require specialized technical expertise, making them inaccessible to a broader range of users.

Solution

To address these challenges, organizations can adopt cloud-based data and analytics platforms that provide self-service capabilities and support modern cloud data management approaches. These approaches, such as data lakehouse, data virtualization, and serverless computing, can enable organizations to democratize access to data and analytics in the cloud, enabling users across the organization to leverage data and analytics for various purposes. In addition, organizations can implement self-service analytics and BI tools such as dbt, Tableau, Qlik, and Power BI to enable users to build and deploy analytics and BI solutions without IT involvement. Similarly, on the ELT front, cloud-based self-service capabilities such as code-free, reusable ELT frameworks, zero-copy cloning, external table, hybrid transactional analytical processing, and more can reduce time to market and development expertise needed to further democratize the self-service capabilities on the cloud.

Design and Build Self-Service ELT Pipelines

Extract, Load, Transform (ELT) pipelines are a key component of modern data management and analytics architectures, and they enable organizations to extract data from various sources, load it into a central repository, and transform it as needed for analysis and reporting. In this section, we will explore how organizations can design and build self-service ELT pipelines, which enable non-technical users to extract, load, and transform data in a self-service manner.

There are several technologies and approaches that organizations can use to design and build self-service ELT pipelines, including reusable and code-free ELT frameworks, hybrid transactional analytical processing (HTAP), external tables, zero-copy cloning, and Unistore. We will examine these technologies and approaches in detail and explore how they can be used to design and build self-service ELT pipelines.

Reusable and Code-Free ELT Frameworks

Reusable and code-free ELT frameworks enable organizations to extract, load, and transform data in a self-service manner, without the need for specialized technical skills or coding knowledge. They are designed to be easy to use and intuitive. Also, they typically provide a graphical user interface (GUI) that enables non-technical users to extract, load, and transform data using drag-and-drop and point-and-click operations.

AWS Glue, Azure Synapse Pipelines and Mapping Data Flows, Google Cloud Data Fusion, and Talend Cloud are all cloud-based data integration platforms that enable organizations to extract, load, and transform data using a GUI-based interface. They all provide pre-built connectors and transformations that can extract data from a wide range of sources, load it into a central repository, and transform it as needed for analysis and reporting.

These are just a few examples of reusable and code-free ELT frameworks in the cloud, and many other options are available. These platforms enable organizations to extract, load, and transform data in a self-service manner without the need for specialized technical skills or coding knowledge.

Hybrid Transactional Analytical Processing

Hybrid Transactional Analytical Processing (HTAP) is a data management approach that enables organizations to support both transactional and analytical processing on the same data platform. It combines the capabilities of transactional systems, designed to support fast and reliable processing of high volumes of transactions, with those of analytical systems designed to support complex queries and data analysis.

The origins of HTAP can be traced back to the late 1990s and early 2000s when several vendors introduced "hybrid" data management systems that combined the capabilities of transactional and analytical systems. These systems were designed to enable organizations to support both transactional and analytical processing on the same platform. They were intended to address the challenges of managing data in a complex and rapidly changing environment. Today, HTAP is used by organizations in many industries to support various use cases, including real-time analytics, customer 360, fraud detection, and supply chain optimization. It is particularly well-suited for use cases that require fast and reliable access to data, as well as the ability to perform complex queries and data analysis.

One example of how HTAP works is in the retail industry. Imagine a retailer with a system that processes transactions in real-time, such as when a customer makes a purchase in-store or online. With HTAP, the retailer can also immediately analyze this transactional data to identify trends and make informed decisions, such as adjusting inventory levels or targeting promotions to specific customer segments. HTAP systems allow organizations to analyze data in real-time without needing to load it into a separate data warehouse or mart. This can be especially useful in industries where data is constantly changing, and immediate analysis is necessary, such as financial services or internet of things (IoT) applications.

There are several cloud technologies that use HTAP to provide fast and reliable access to data, as well as the ability to perform complex queries and data analysis. These include:

Snowflake Unistore: Snowflake’s Unistore is designed to be easy to use, scalable, and cost-effective, and it is well-suited for organizations that want to enable non-technical users to extract, load, and transform data in a self-service manner. It can support a wide range of data sources and formats. It provides a SQL-based query language that is familiar to many users, making it easy for non-technical users to extract, load, and transform data without the need for specialized technical skills.
Azure Synapse Link: Azure Synapse Link is a real-time data integration feature that enables organizations to perform both transactional and analytical processing on the same data in real-time using a single system or platform. Synapse link is supported by several Microsoft and Azure databases, including Azure SQL, Cosmos, and SQL Server 2022.
GCP AlloyDB: GCP recently introduced AlloyDB for PostgreSQL, which was more than four times faster for transactional workloads and up to 100 times faster for analytical queries. This is enabled by a vectorized columnar accelerator that stores data in memory in an optimized columnar format for faster scans and aggregations.

External Tables

External tables are a feature that allows users to query data stored in external systems as if it were a table in their database. This can be useful for simplifying ELT processes by eliminating the need for data movement. An example of how external tables can simplify ELT is as follows: suppose an organization has a database that stores customer information and a separate database that stores sales data. The organization wants to analyze the sales data for a specific customer segment. Without external tables, they would need to extract the sales data from the separate database, load it into the customer database, and then transform it for analysis. With external tables, they can simply query the sales data directly from the other database without the need for data movement. This saves time and effort in the ELT process.

External tables have been around for several years and are supported by many modern databases, such as Oracle, MySQL, and PostgreSQL. In recent years, the trend towards hybrid cloud and multi-cloud architectures has increased the use of external tables, as organizations increasingly need to access data across different systems and platforms. Several technologies use external tables to simplify ELT operations and enable organizations to access data stored in external sources using SQL-based queries without importing the data into the platform. These include Snowflake, Google BigQuery, Amazon Redshift, and Azure Synapse Analytics (formerly SQL Data Warehouse).

Overall, external tables are a powerful tool for simplifying ELT operations and are widely used in various data management technologies. They can help organizations extract, load, and transform data more efficiently and effectively, enabling them to derive more value from their data assets and drive insights and decision-making throughout the organization.

Zero Copy Cloning

Zero copy cloning is a feature that allows users to quickly create copies of data without physically copying the data itself. This can be useful for simplifying ELT processes in modern cloud platforms. It allows users to create copies of large data sets for testing or development purposes without transferring and storing the data.

Here is an example of how zero copy cloning can simplify ELT: suppose an organization is building an ELT pipeline that involves processing large amounts of data from a source system. They want to test the pipeline with a smaller sample of the data before running it on the full dataset. Without zero copy cloning, they would need to physically copy the sample data from the source system and store it in the target system. This can be time-consuming and requires additional storage capacity. With zero copy cloning, they can simply create a pointer to the data in the source system and use it as if it were a copy. This allows them to test the pipeline without transferring and storing the data.

Zero copy cloning can simplify ELT operations by enabling organizations to create copies of datasets quickly and efficiently without transferring large amounts of data. It can also help reduce storage costs, eliminating the need to store multiple copies of the same data. There are several technologies, such as Snowflake, Google BigQuery, Amazon Aurora, and Databricks, that use zero copy cloning to simplify ELT operations to create copies of data sets quickly and efficiently. They enable organizations to create virtual copies of datasets that refer to the original dataset rather than creating separate copies of the data.

Zero-copy cloning is a powerful tool for simplifying ELT operations, and it is widely used in a variety of data management technologies. It can help organizations to create copies of data sets quickly and efficiently, enabling them to derive more value from their data assets and drive insights and decision-making throughout the organization.

HTAP vs. External Tables vs. Zero Copy Cloning

HTAP, external tables, and zero copy cloning are different technologies or approaches that enable organizations to manage and access data differently. Here is a summary of when these technologies might be used:

HTAP: HTAP might be used when an organization needs to perform both transactional and analytical processing on the same data in real-time. HTAP can be useful in situations where an organization needs to gain insights from its data and make decisions in real-time or where it needs to perform analytics on transactional data stored in a database.
External tables: External tables might be used in situations where an organization needs to access data stored in external sources as if it were stored in the database itself. External tables can be useful when an organization needs to access data from external sources but does not want to physically import or copy the data into the database.
Zero copy cloning: Zero copy cloning might be used when an organization needs to create copies of data or data structures for testing, development, or other purposes but does not want to physically duplicate the data. Zero copy cloning can be useful in situations where the data is large or complex, and physically copying the data would take a long time or require a lot of storage space.

Overall, HTAP, external tables, and zero copy cloning are all different technologies or approaches that enable organizations to manage and access data in different ways. The appropriate technology will depend on the specific needs and goals of the organization.

Implement Self Service Analytics

In addition to the major cloud Unified Analytics Platforms such as Synapse Analytics Workspaces, Databricks, and Snowflake, there are even more analytics-focused tools that are well suited for self-service analytics and BI, which allow users to analyze and visualize data without the need for specialized technical expertise or support from IT or data teams. This can simplify accessing and working with data by eliminating the need for coding expertise and providing a more intuitive way to access and work with data. The proliferation of self-service data and analytics tools allows users to access and analyze data without needing specialized technical expertise or support from IT or data teams. This can simplify the process of accessing and working with data by eliminating the need for coding expertise and providing a more intuitive way to access and work with data.

There are many examples of self-service analytics and BI tools available in the cloud, including:

Tableau: Tableau is a popular self-service analytics and BI tool available in the cloud. It enables users to explore and analyze data using drag-and-drop interfaces, create interactive dashboards and reports, and share insights with others.
Power BI: Power BI is a cloud-based self-service analytics and BI tool from Microsoft that enables users to explore and analyze data, create reports and dashboards, and share insights with others. It also includes machine learning capabilities through its AutoML feature. AutoML allows users to train machine learning models on their data without the need for coding expertise, making it easier to gain insights from data.
Looker: Looker is a cloud-based self-service analytics and BI tool that enables users to explore and analyze data, create reports and dashboards, and share insights with others.
Qlik: Qlik is a cloud-based self-service analytics and BI tool that enables users to explore and analyze data, create reports and dashboards, and share insights with others.
Data Build Tool (dbt): dbt is an open-source command-line tool that helps organizations build, test, and maintain data models for analytics and BI purposes. It is designed to enable data analysts and BI professionals to build and manage data models in a consistent, repeatable, and scalable manner, helping to improve the quality and reliability of the data assets used for analytics and BI purposes. dbt enables self-service analytics and BI by providing tools and practices that will allow data analysts and BI professionals to build and maintain data models that are accessible and understandable to non-technical users. It helps to standardize and automate the process of building and maintaining data models, making it easier for non-technical users to understand and use the data assets that are available to them. dbt is particularly well-suited for use in self-service analytics and BI environments, as it provides a set of tools and practices that enable organizations to build and maintain data models in a scalable, repeatable, and consistent way. It helps to ensure that data assets are reliable and of high quality, enabling non-technical users to trust and use the data assets that are available to them.

Overall, self-service analytics and BI tools are essential for organizations looking to derive more value from their data assets and drive insights and decision-making throughout the organization. They can help to democratize data access and enable non-technical users to explore and analyze data on their own without the need for specialized technical skills. By adopting cloud-based data and analytics platforms and leveraging modern cloud data management approaches, organizations can effectively democratize and decentralize their data management and access, enabling more agile and efficient self-service analytics and ELT in the cloud.

Next Steps

Read more about Data Democratization | The Definitive Guide | Towards Data Science
Read more about Data Democratization - 5 Best Practices - Astera
Read more about Self-Service tools with ETL Capabilities: A quick comparison - Visual BI Solutions

About the author

Ron L'Esteve is a trusted information technology thought leader and professional Author residing in Illinois. He brings over 20 years of IT experience and is well-known for his impactful books and article publications on Data & AI Architecture, Engineering, and Cloud Leadership. Ron completed his Master’s in Business Administration and Finance from Loyola University in Chicago. Ron brings deep tec

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Article Last Updated: 2023-03-02

Democratizing Self Service Cloud Analytics and ELT Supporting Modern Cloud Data Management Approaches