Blog Post

The Evolution of Data: From Databases to Spark to Lakebases

,

The circle cylinder of life

Maybe you’ve noticed all the twenty somethings tight rolling their jeans or people with bellbottoms or the 80’s music playing in grocery stores… It’s true, fashion and art are cyclical. I’m certain we’ll be seeing MC Hamer pants as a new trend around 2030.

Technology, much like fashion, often operates in an ever-moving circle. Trends emerge, fade, and then, with a fresh coat of paint and some innovative twists, reappear as the next big thing. In the world of data and Business Intelligence (BI), we’re witnessing a fascinating full circle moment.

For years, the narrative pushed was that traditional relational databases were ill-suited for the scale and complexity of modern BI solutions. The marketing was something like: “Databases don’t belong in BI; use Spark!” We embraced distributed computing frameworks, data lakes, and complex ETL pipelines to move data from operational databases into analytical engines. The idea was to separate transactional workloads from analytical ones to ensure performance and scalability. Spark, with its ability to handle massive datasets and flexible processing, became the darling of the data world.

Spark revolutionized how we processed big data, enabling sophisticated analytics and machine learning on previously unwieldy datasets. But the cost was often architectural complexity, increased data movement, and a fragmented view of an organization’s data.

Fast forward to today, and the pendulum is swinging back. Only this time, it’s not a mere return; it’s an evolution. The latest innovations from Databricks, Snowflake, and Microsoft are demonstrating that databases, particularly relational ones, are not just relevant but are becoming innovative and cool again for BI solutions. They’re doing this by integrating transactional capabilities directly into the analytical data layer, offering a unified platform for both operational and analytical workloads. This new breed of “lakebases” (or “unistores“) aims to simplify data architectures, reduce latency, and provide a single source of truth for both real-time operations and long-term analysis.

The “old is new again” isn’t about discarding Spark or data lakes; it’s about integrating the best of both worlds. It’s about bringing the familiarity and transactional integrity of databases to the scalability and flexibility of lakehouse architectures.

Let’s explore these exciting new developments:

Databricks Lakebase, Snowflake Postgres, and Azure Fabric SQL Database

These three offerings are leading the charge in this new era of converged data platforms. While each has its unique flavor and strategic positioning, they all share a common goal: simplifying the data landscape for organizations.

Feature / PlatformDatabricks Lakebase (Public Preview)Snowflake Postgres (Crunchy Data)Azure Fabric SQL Database (Preview)
Core ConceptFully managed PostgreSQL database for OLTP directly within Databricks Data Intelligence Platform.Fully managed, enterprise-grade PostgreSQL service natively integrated into the Snowflake AI Data Cloud.Developer-friendly transactional database within Microsoft Fabric, based on Azure SQL DB engine.
Primary WorkloadOLTP, AI applications & agents, real-time feature serving.OLTP, AI agents, transactional applications, vector-driven systems.Operational workloads, application development, real-time reporting.
Underlying TechnologyPostgreSQL (leveraging Neon technology for compute/storage separation).PostgreSQL (leveraging Crunchy Data‘s open-source expertise).Azure SQL Database engine.
Data Lake/Lakehouse IntegrationDeep integration with Delta Lake & Unity Catalog. Managed synchronization between Lakebase and Delta tables.Deeply integrated with Snowflake AI Data Cloud. Unifies OLTP, OLAP, and AI workloads on a single governed platform.Automatic, near real-time replication of data to OneLake (Fabric’s unified data lake) in Parquet/Delta format.
Compute/Storage SeparationYes, allows independent scaling and auto-scaling to zero for cost efficiency.Designed for independent scaling; aims for elastic Postgres clusters.Yes, serverless compute model automatically scales; integrated with Fabric Capacity Units.
Developer ExperienceFamiliar Postgres, database branching (zero-copy clones), integrated with Databricks Apps, online feature store.Familiar Postgres interfaces/tools, full compatibility with existing Postgres apps, centralized governance.Familiar T-SQL, SSMS/VS Code support, Copilot assistance, GraphQL API creation, Git integration.
Governance & SecurityUnity Catalog integration for robust governance, security, and access controls.Enterprise-grade encryption, key management, centralized governance, compliance (SOC 2, ISO 27001, HIPAA, FedRAMP).Microsoft Entra authentication, granular access control, Microsoft Purview sensitivity labels.
Key DifferentiatorBrings operational Postgres directly into the Databricks lakehouse, optimized for AI.Brings enterprise-grade Postgres capabilities natively into the Snowflake Data Cloud, leveraging Crunchy Data’s operational expertise.Provides a transactional SQL database automatically mirrored to OneLake, making data instantly available across all Fabric experiences for real-time analytics.
StatusPublic Preview (announced June 2025)Intent to Acquire Crunchy Data (announced June 2025)Public Preview (announced November 2024)

Why This Matters: The Return to Simplicity and Velocity

This “old is new again” phenomenon isn’t just a marketing ploy; it’s a response to real-world challenges faced by organizations building modern data-driven applications:

  1. Complexity Reduction: Gone are the days of needing separate teams managing distinct operational databases and analytical data warehouses, along with the complex ETL/ELT pipelines between them. These new offerings promise a more unified, simplified architecture.
  2. Real-time Insights: By bringing transactional data directly into the analytical platform, the latency between operational events and actionable insights is drastically reduced. This is crucial for real-time BI, personalized experiences, and AI applications.
  3. Data Consistency and Governance: A unified platform means a single source of truth and consistent governance across all data, reducing data discrepancies and compliance headaches.
  4. AI Acceleration: With transactional data directly accessible within the analytical and AI platform, building and deploying intelligent applications (e.g., real-time fraud detection, personalized recommendations, AI agents) becomes significantly faster and more efficient.
  5. Developer Productivity: Developers can leverage familiar relational database concepts and tools while benefiting from the scalability and modern features of cloud-native platforms.

The journey of data technology is cyclical, but each revolution builds upon the last. We’re not abandoning the lessons learned from the big data era; rather, we’re integrating them with the enduring strengths of relational databases. The result is a more elegant, efficient, and powerful approach to BI and AI solutions; proving once again that sometimes, the most innovative solutions are those that cleverly re-imagine what we thought we already knew.

Original post (opens in new tab)

Rate

5 (1)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (1)

You rated this post out of 5. Change rating