External Article

Why Would I Ever Need to Partition My Big ‘Raw’ Data?

Whether you are running an RDBMS, or a Big Data system, it is important to consider your data-partitioning strategy. As the volume of data grows, so it becomes increasingly important to match the way you partition your data to the way it is queried, to allow 'pruning' optimisation. When you have huge imports of data to consider, it can get complicated. Bartosz explains how to get things right; not perfect but wisely.

External Article

How to Start Big Data with Apache Spark

It is worth getting familiar with Apache Spark because it a fast and general engine for large-scale data processing and you can use you existing SQL skills to get going with analysis of the type and volume of semi-structured data that would be awkward for a relational database. With an IDE such as Databricks you can very quickly get hands-on experience with an interesting technology.

Blogs

AI: Blog a Day – Day 6: Embeddings – How AI Understands

By

Continuing from Day 5 where we covered notebooks, HuggingFace and fine tuning AI now...

The Book of Redgate: Mistakes

By

This is kind of a funny page to look at. The next page has...

ADF Pipeline Debugging Fails with BadRequest – The Sequel

By

A while ago I blogged about a use case where a pipeline fails during...

Read the latest Blogs

Forums

Dynamic Unpivot

By pietlinden

I have a table I didn't design that has tons of repeating groups in...

Writing as an Art and a Job

By Steve Jones - SSC Editor

Comments posted to this topic are about the item Writing as an Art and...

String Similarity II

By Steve Jones - SSC Editor

Comments posted to this topic are about the item String Similarity II

Visit the forum

Question of the Day

String Similarity II

What is the range for the result from the EDIT_DISTANCE_SIMILARITY() function in SQL Server 2025?

See possible answers