Comments posted to this topic are about the item Stairway to U-SQL Level 2: Breakdown of a U-SQL Statement
Thanks Mike. I've been following the series so far and it's very enlightening! There's just one bit I'm struggling with:
Structured data is data loaded into tables in a U-SQL database, whilst unstructured data is what we’re dealing with in this example – files, no matter what the format.
The files we've been using are, to all intents and purposes, extracts from database tables. Each one has a certain number of columns, and each column has the same type of data in every row. Where is the line between structured and unstructured drawn?
Glad you are enjoying the series, and thanks for your question.
You are correct, all of your data is structured, regardless of where it comes from or the format it is in - a file contains columns and rows, for example.
It's all about how the data is stored in Azure, and how Microsoft apply terms to your data. Anything stored in a database table, for instance, is defined as structured. Anything in an unmanaged file which we've dumped into the data lake is deemed to be unstructured.
So, the line is drawn as:
- If your data is in a file, Microsoft's terminology defines your data as unstructured (because to Azure, it's just a file)
- If your data is in a table, Azure knows about the structure of the data (columns, data types etc), so Microsoft define that as structured
Hope this is clear, come back to me if you have further questions.
Thanks Mike - that was quick!
I get it now... I assume we'll be looking at tables later, so on with the Stairway!
One other small C# vs. T-SQL difference in your example is the zero-based SUBSTRING. The first character is at position zero, not one.
Viewing 6 posts - 1 through 5 (of 5 total)