Data Engineering Books Worth Having on Your Shelf (or your tablet)

Good documentation gets you started. Good books get you deep. After years of working with cloud data platforms, SQL engines, and machine learning pipelines, a handful of titles keep coming up as genuinely useful — not just for beginners, but for practitioners who want to go beyond the basics. Here are five books that cover the tools this blog is built around.

Disclosure: the links below are Amazon affiliate links. If you purchase through them, this site earns a small commission at no extra cost to you.

Generative AI on AWS

Chris Fregly, Antje Barth & Shelbee Eigenbrode — O'Reilly

Covers how to build context-aware, multimodal reasoning applications on AWS using generative AI. Written by AWS practitioners, it bridges the gap between the hype and the actual engineering work — prompt engineering, RAG pipelines, fine-tuning, and deploying models at scale. Useful for data engineers who are being asked to build AI-adjacent infrastructure and want a grounded, AWS-native perspective.

Amazon Redshift: The Definitive Guide

Tyler Scherr & James Tunick — O'Reilly

The most thorough single reference for Redshift available in print. Goes well beyond the documentation: cluster architecture, distribution styles, sort keys, workload management, Redshift Spectrum, and data sharing. Whether you are running a provisioned cluster or Redshift Serverless, this book will save you hours of trial and error on performance and cost optimization.

Data Science on AWS

Chris Fregly & Antje Barth — O'Reilly

A hands-on guide to building end-to-end, continuous AI and ML pipelines on AWS. Covers SageMaker, data ingestion, feature engineering, model training, deployment, and monitoring — with real code throughout. The focus on continuous pipelines rather than one-off notebooks makes this especially relevant for teams trying to productionize their ML work.

Just Use Postgres!

Brandur Leach

The title says it plainly. PostgreSQL is a remarkably capable database that teams routinely underutilize before reaching for more complex infrastructure. This book is written for application developers and database practitioners who want to squeeze more out of Postgres — from indexing strategies and transactions to JSON, full-text search, and advanced concurrency patterns. A practical counter-argument to unnecessary architectural complexity.

Practical SQL, 2nd Edition

Anthony DeBarros — No Starch Press

SQL books tend to be either too shallow or too academic. This one hits the right balance. Written by a journalist and data analyst, it teaches SQL through real datasets and actual analysis problems using PostgreSQL. The focus on finding the story inside data — rather than just memorizing syntax — makes the concepts stick. Concepts transfer directly to MySQL, SQL Server, Redshift, and most other relational engines.

A very personal note on buying technical books

Technical books go out of date, but the fundamentals they cover rarely do.

A well-chosen book on Redshift architecture or SQL query design will still be useful three or more years after purchase — the core concepts around distribution, joins, and execution plans do not change with every minor release.