Disclaimer: Apress gave me a free copy of the book to review. People that know me, also know I always give honest feedback
I had the pleasure of reading the book Mastering Snowflake Solutions: Supporting Analytics and Data Sharing. The book is aimed at data professionals who have already taken their first steps developing data warehouses (or other data solutions) with Snowflake, but are looking to expand their knowledge on the product. The book is well written and because it’s only about 230 pages, it’s easy to digest. Which brings us to my main “problem” with the book: it’s too short. If I see a book title with “mastering” in it, I expect some deep dive into the product. Something along the lines of great SQL Server 2012 Internals, which was almost 1,000 pages long. Don’t get me wrong, the book is great, gives a good overview of most features and has good SQL code examples. I’m just not sure I can call myself a Snowflake master now that I’ve read the book. The book stays too much high level for that.
Anyway, here’s an overview of the book:
- Chapter 1 gives an intro into the various components of Snowflake. It’s brief, but it does the trick. Sometimes it feels like the author has drank a bit too much of the “Snowflake kool aid” and makes it seem on-premises relational databases – like SQL Server – are all legacy and stupid (I understand, Snowflake is awesome :). For example, it says how Snowflake manages statistics automatically (so does SQL Server) and how Snowflake does partition pruning automatically (so does SQL Server. Granted, you have to set up partitioning in SQL Server while Snowflake does this for you).
- Chapter 2 talks about data movement, iow how to get data in and out of Snowflake. Pretty decent chapter, although there’s not much detail on Snowpipe (the built-in streaming feature of Snowflake) and it’s excluded from the final example (most chapters have a final example where the author ties all the features discussed in the chapter together, which is pretty neat).
- Chapter 3 is all about cloning. It’s an awesome feature, and this chapter does a good job explaining it. I appreciated there’s a script included to check the permissions of the cloned objects, something that I wasn’t really given much thought before.
- Chapter 4 dives into security and access control. It’s a bit high level, but if you want more detail, Apress has a whole book on Snowflake security (I’m reading that book as well, there will be review of it someday in the near future. I hope). There are some good examples on how to handle PII data (you know, for GDPR) and row access policies. I wish the section on roles was a bit more elaborate, because I found this part a bit lacking in Snowflake, especially if you see how easy it is to configure security in SQL Server Management Studio.
- Chapter 5 talks about time travel (another great feature) and data encryption. It’s about 13 pages long, which is … short.
- Chapter 6 is all about business continuity and disaster recovery, which are important topics of course. However, the features discussed (replication) are only available in certain editions (business-critical edition if I’m not mistaken) and I don’t have that edition, so I basically skimmed this chapter.
- Chapter 7 is another interesting, but brief chapter. For example, data clean rooms are introduced, but not really elaborated. At the end there’s a great example, but it’s like “boom here’s a giant SQL script”. Personally, I would have split this into multiple pieces in the book and given more explanation on each of the steps. Or at least added some comments into it.
- Chapter 9 is the chapter I looked most forward to: performance tuning. It’s the biggest chapter in the book, but again some info was too high level. I would have for example liked more info on the results of the clustering information function or a more detailed walkthrough of the query plans. Materialized views are discussed as a performance tuning vehicle, but there are no before/after examples on how they improved the performance of a specific function. The book explains it’s possible to create multiple materialized views on the same table, each with different clustering keys. At runtime, the optimizer chooses the best view to satisfy the query. This sounds like an amazing feature, but again, no example. The search optimization feature is explained – another cool feature which improves performance for highly selective queries – but there’s no info on what the costs are for this feature (if there are any). There’s a very informative section on warehouse utilization with example charts and I really liked that bit. It’s very important to pay attention to this if you want to optimize/reduce costs in Snowflake. Some parts that were missing in my opinion:
- how data modelling affects performance. For example, a star schema vs a data vault.
- how you can optimize queries by splitting them in multiple pieces and storing intermediate results in transient tables.
- Chapter 10 is the final chapter and deals with developing applications in Snowflake. It talks about SnowSQL (the CLI for Snowflake), about Java UDFs (similar to how SQL Server enables R, Python and Java directly in the database engine), about Snowpark (very, very briefly) and about connectors such as Python and Kafka (again, very, very briefly).
I may sound harsh because I hammer down on how brief the book is. However, this doesn’t make the book bad. On the contrary, I really enjoyed it and it’s a very good introduction to many features of Snowflake. But that’s what it is in my opinion: a good introduction. Not really “mastering”. If you’ve worked with Snowflake for some time, you’ll probably know most of the topics covered in this book. If you’re just starting out, with a couple of weeks/months under your belt, I would definitely recommend this book.
The post Book Review – Mastering Snowflake Solutions: Supporting Analytics and Data Sharing first appeared on Under the kover of business intelligence.