Stairway to U-SQL

Stairway to U-SQL Level 10: Table-Valued Functions and UDTs

,

Welcome to the next step in our U-SQL journey. This time around, we’re going to take a look at how Table-Valued Functions (TVFs) and User Defined Types (UDTs) are implemented in U-SQL. You’re probably familiar with these terms from T-SQL, but try to forget all that as U-SQL implements these features in a different manner. Before we start – an important message!

Code Download

The code for this series is now available to download via GitHub. I always think typing code into the computer is the best way to learn and understand it, but if you’re too busy and just want to pull the code down to have a look at it, now you can. I’ll continue to update the code repository as the series progresses.

An Overview of Table-Valued Functions

TVFs can be a divisive topic in SQL Server. Some people love them, other people avoid them due to performance considerations. Of course, whether they are the best fit for a particular task comes down to the age-old statement – “it depends”. This question doesn’t really apply in U-SQL, as a TVF is nothing more than a parameterised view. In fact, the U-SQL team generally recommend TVFs should be used instead of views, unless your case is very simple.

I like views in U-SQL, and if you want to return a data set which you can filter in an ad-hoc manner then a view is the ideal solution. If you want to control the filters or perform additional logic on a data set before returning it, a TVF is probably what you need. Some of the advantages TVFs have over views are:

  • One or more parameters can be passed to a TVF – views cannot accept parameters directly
  • TVFs can utilise multiple statements – a view can only consist of a SELECT or EXTRACT statement
  • TVFs can make user of user-defined code (e.g. C# functions)
  • TVFS can make use of user-defined types

Let’s take a look at how we can create TVFs.

The CREATE FUNCTION STATEMENT

During our U-SQL journey, we’ve learnt that most statements are very similar in nature to their T-SQL equivalents, with a bit of C# thrown in. So I’m sure it comes as no surprise to learn the CREATE FUNCTION statement follows a similar path! Here it is:

CREATE FUNCTION IF NOT EXISTS FunctionName (arguments)
RETURNS TABLE DEFINITION or TABLE TYPE
AS
BEGIN
Statements…
END;

This is pretty simple, and we’ll create a real function in a moment which will demonstrate the various concepts. We’ll take a quick look at dropping functions before we do that though.

The DROP FUNCTION Statement

The DROP FUNCTION statement is blindingly simple:

DROP FUNCTION IF EXISTS FunctionName;

That’s it – run this and you can eliminate your function. This is no different to other U-SQL DROP statements.

Creating a TVF

In the previous article, we created a view that returned the following columns:

  • Postcode
  • Postcode Part 1
  • Postcode Part 2
  • County Name
  • District Name

The postcode GL7 1JX would return something along these lines:

  • Postcode: GL7 1JX
  • Part 1: GL7
  • Part 2: 1JX
  • County Name: Gloucestershire
  • District Name: Cotswold

We’ll recreate the function as a TVF. If you’ve been following along with the series, open up the Visual Studio project. If you haven’t (where have you been?!), download the code from GitHub and then open it up in VS. Once you have the project sitting in front of you, either add a new script called 400 Create PostcodeDetails TVF.usql, or open the file of the same name (if you do use the GitHub project, the code won’t look the same at the moment as this isn’t the finished version of the function!). Here’s the code for that script:

USE DATABASE UkPostcodes;
USE SCHEMA Postcodes;
DROP FUNCTION IF EXISTS PostcodeDetailsTvf;
CREATE FUNCTION IF NOT EXISTS PostcodeDetailsTvf ()
RETURNS @result TABLE(Postcode string, Part1 string, 
Part2 string, CountyName string, DistrictName string)
AS
BEGIN
@records = SELECT p.Postcode, 
p.Postcode.IndexOf(" ") == -1 ? p.Postcode : 
  p.Postcode.Substring(0, p.Postcode.IndexOf(" ")) AS Part1,
p.Postcode.IndexOf(" ") == -1 ? p.Postcode : 
  p.Postcode.Substring(p.Postcode.IndexOf(" ") + 1) AS Part2, 
c.CountyName, 
d.DistrictName
FROM Postcodes AS p
LEFT JOIN Counties AS c ON p.CountyCode == c.CountyCode
LEFT JOIN Districts AS d ON p.DistrictCode == d.DistrictCode;
@result = SELECT Postcode, Part1, Part2, 
CountyName, DistrictName FROM @records;
END;

We’ve seen the first few lines before. The CREATE FUNCTION line is where things start to become interesting:

CREATE FUNCTION IF NOT EXISTS PostcodeDetailsTvf ()
RETURNS @result TABLE(Postcode string, Part1 string, 
Part2 string, CountyName string, DistrictName string)

We have created a function called PostcodeDetailsTvf. It returns a table with the columns we talked about earlier, which is referenced by the @result variable. The first SELECT statement is the same SELECT statement we used in the view, except it is assigned to a row set variable called @records (if you want to know what this SELECT is doing, return to article 9 on views). The second SELECT statement assigns the rows returned by the first SELECT statement to the @result TABLE return variable.

Run this by clicking the Submit button on the toolbar (it’s up to you whether you run this locally or directly against your Azure account):

Once you’ve done this, well done! You’ve just created your first U-SQL TVF.

Using the TVF

OK, we have a TVF and we’re not afraid to use it! Either open or add the script 410 Select From PostcodeDetailsTvf.usql, ensuring it has this code embedded within it:

USE DATABASE UkPostcodes;
USE SCHEMA Postcodes;
@pcodes = SELECT Postcode, Part1, Part2, CountyName, DistrictName 
FROM PostcodeDetailsTvf() AS postcodes;
OUTPUT @pcodes
TO "outputs/postcodestvf.csv"
USING Outputters.Csv();

We’ve seen most of this before, the only difference is the SELECT line, which is using the PostcodeDetailsTvf() function. Note it has been aliased as “AS postcodes”. A TVF is deemed to be an anonymous rowset, so it must have an alias specified for it. If you do not specify an alias, you’ll see this error:

Don’t forget, the AS keyword is mandatory in U-SQL (it’s optional in T-SQL).

Run this script to generate the postcodestvf.csv file. Once executed, the Local Run Results window will appear. Right-click on the file name and open it (you can either preview it or open the file’s location, which will then allow you to open it in Notepad (other text editors are available).

Have a scroll through the file and you’ll find it’s packed with postcode goodness.

OK, this is cool. We’ve created a TVF. But what’s the big deal, I hear you cry? This function is no different to the view we created last time out! Time to look at how we can change things to take advantage of TVF-specific capabilities, starting with arguments.

Adding Parameters to a TVF

As I mentioned earlier, one of the compelling features of TVFs is their ability to support parameters. Parameters can be mandatory or optional. We’ll change the function to accept a mandatory @CountyName parameter. In script 400, Change the CREATE FUNCTION line so it looks like this:

CREATE FUNCTION IF NOT EXISTS PostcodeDetailsTvf(@CountyName string)

We’re now telling users that they can only use this function if they specify a county name. We need to change the first SELECT statement to use this parameter, so remove the semi-colon from the last LEFT JOIN line and then add the WHERE clause. Here are those two lines:

LEFT JOIN Districts AS d ON p.DistrictCode == d.DistrictCode
WHERE c.CountyName == @CountyName;

Save and run this script, then return to script 410 and try to execute it. You’ll receive an error:

Ah-ha, we didn’t specify the parameter! Change the SELECT statement line to use it:

@pcodes = SELECT Postcode, Part1, Part2, CountyName, DistrictName 
FROM PostcodeDetailsTvf("Northamptonshire") AS postcodes;

Run this again, and this time results will be returned – just for the fine county of Northamptonshire.

But what if we want to support returning all records as well as records for a specific county?

Default Parameter Values

I did say parameters can be mandatory or optional. We’ll now turn our mandatory parameter into an optional parameter, which will allow us to return all records or just the records for a specific county. Go back to script 400 and change the CREATE FUNCTION line, so it makes @CountyName optional:

CREATE FUNCTION IF NOT EXISTS PostcodeDetailsTvf(@CountyName string = "NotSpecified")

We just assign a default value to the parameter. Next, change the WHERE clause so it can handle the default value:

WHERE c.CountyName == @CountyName OR @CountyName == "NotSpecified";

Run this to update the TVF. Go back to script 410 and execute it again – Northamptonshire’s postcodes will still be returned. But if you replace the SELECT line so DEFAULT is specified, rather than Northamptonshire:

@pcodes = SELECT Postcode, Part1, Part2, CountyName, DistrictName 
FROM PostcodeDetailsTvf(DEFAULT) AS postcodes;

All of the postcodes will be returned. Pretty nifty!

Multiple Parameters

You can specify as many parameters as you wish. Here, I’ve changed the function so it can accept an optional @CountyName parameter, and a mandatory @DistrictName parameter.

USE DATABASE UkPostcodes;
USE SCHEMA Postcodes;
DROP FUNCTION IF EXISTS PostcodeDetailsTvf;
CREATE FUNCTION IF NOT EXISTS PostcodeDetailsTvf
(@CountyName string = "NotSpecified", @DistrictName string)
RETURNS @result TABLE
(Postcode string, Part1 string, Part2 string, 
 CountyName string, DistrictName string)
AS
BEGIN
@records = SELECT p.Postcode,
p.Postcode.IndexOf(" ") == -1 ? p.Postcode : 
  p.Postcode.Substring(0, p.Postcode.IndexOf(" ")) AS Part1,
p.Postcode.IndexOf(" ") == -1 ? p.Postcode : 
  p.Postcode.Substring(p.Postcode.IndexOf(" ") + 1) AS Part2,
c.CountyName, d.DistrictName
FROM Postcodes AS p
  LEFT JOIN Counties AS c
ON p.CountyCode == c.CountyCode
  LEFT JOIN Districts AS d ON p.DistrictCode == d.DistrictCode
WHERE (c.CountyName == @CountyName OR @CountyName == "NotSpecified") 
AND d.DistrictName == @DistrictName;
@result =
SELECT Postcode, Part1, Part2, CountyName, DistrictName
FROM @records;
END;

You can see that @CountyName is still optional (“NotSpecified” is the default), but the new @DistrictName parameter is mandatory. The WHERE clause has been changed to support this:

WHERE (c.CountyName == @CountyName OR @CountyName == "NotSpecified") 
AND d.DistrictName == @DistrictName;

The CountyName piece has now been wrapped up in brackets (because of the OR clause), and the DistrictName clause has been added. With this function submitted, we can try to run the SELECT script again (script 410). Attempting to run it without modification will fail, as only one parameter – the optional CountyName parameter - is specified:

@pcodes = SELECT Postcode, Part1, Part2, CountyName, DistrictName 
FROM PostcodeDetailsTvf(DEFAULT) AS postcodes;

So we can change the SELECT statement:

@pcodes = SELECT Postcode, Part1, Part2, CountyName, DistrictName 
FROM PostcodeDetailsTvf("Northamptonshire", "East Northamptonshire") AS postcodes;

This works perfectly.

Using User Defined Types

The function we’ve created returns a table. The table definition is stored directly in the function definition:

RETURNS @result TABLE(Postcode string, Part1 string, Part2 string, 
CountyName string, DistrictName string)

This is all well and good if this recordset will only ever be returned by this function. But what if we need to reuse this recordset? We’d have to redefine it in whatever database object we were creating. This is where User Defined Types (UDTs) come in. We can declare a table structure and store it as a type. The type can be used just like any other U-SQL type, such as int or string. Types can be passed to TVFs and returned from TVFs.

Creating a UDT

Add a new script called 420 Create PostcodeDetail Table Type.usql if you haven’t downloaded the project. Here’s the code for that script:

USE DATABASE UkPostcodes;
USE SCHEMA Postcodes;
DROP TYPE IF EXISTS PostcodeDetailTbl;
CREATE TYPE IF NOT EXISTS PostcodeDetailTbl 
AS TABLE
(Postcode string, Part1 string, Part2 string, 
 CountyName string, DistrictName string);

The usual DROP and CREATE IF NOT EXISTS statements are present. This script creates a TABLE type. Only TABLE and STRUCT types can be created at the moment (consider STRUCT as out of scope for the time being).

We’ll use this table type to return data from our TVF, replacing the hard-coded table definition. We could just as easily use it to pass multiple rows into the function too. Execute the CREATE TYPE script, then pop back to script 400 (the TVF definition) and change the RETURNS line so it matches the code below:

CREATE FUNCTION IF NOT EXISTS PostcodeDetailsTvf
 (@CountyName string = "NotSpecified", @DistrictName string)
RETURNS @result PostcodeDetailTbl
AS
BEGIN

Submit this script and then execute script 410, which executes a SELECT using the TVF. It still works perfectly!

Much like UDTs in SQL Server, there isn’t much involved in creating and using types. But they give you a lot of power and flexibility when used in the right place.

A Few Other Things

We’re almost at the end of our TVF and TYPE odyssey. There are just a couple of things to make you aware of.

Firstly, you don’t have to return all columns from a TVF when you select the data from it. Take another look at our SELECT statement from script 410:

@pcodes = SELECT Postcode, Part1, Part2, CountyName, DistrictName 
FROM PostcodeDetailsTvf("Northamptonshire", "East Northamptonshire") AS postcodes;

It selects all five columns. If I change this to ignore the Part1 and Part2 columns:

@pcodes = SELECT Postcode, CountyName, DistrictName 
FROM PostcodeDetailsTvf("Northamptonshire", "East Northamptonshire") AS postcodes;

Everything will still work. So you can filter out columns as required.

The second thing I want to mention is an oddity. The TVF we’ve defined returns five columns, as does the SELECT statement we’ve used in the TVF. We can actually change the SELECT statement to return more than five columns. Say I change the final SELECT statement in the TVF, so it returns an additional column:

SELECT Postcode, Part1, Part2, CountyName, DistrictName, 
"Bob" AS ExtraColumn 
FROM @records;

All logic says the compiler should fail this, right? “ExtraColumn” does not exist as part of @records. But it doesn’t! The function is happily created. Now we change the SELECT statement used in script 410 to SELECT *:

@pcodes = SELECT * 
FROM PostcodeDetailsTvf("Northamptonshire", "East Northamptonshire") 
AS postcodes;

When we run this, we hit an error:

Be aware of this. It happens because of how U-SQL inlines rowsets during processing. I hope this will be fixed in a future release.

What About Scalar Functions?

We’ve had a good look at what we can do with TVFs. What about scalar functions, I hear you ask? Unfortunately, these are not supported via U-SQL – you can’t create them. You can create such functions using C#, and then utilise them in your U-SQL scripts. But that’s a topic for another day…

Summary

This article has taken a detailed look at how we can create and use Table Valued Functions and User Defined Types in U-SQL. These features can greatly aid code reusability.

Next time out, we’re going to look at another database concept U-SQL has borrowed from T-SQL – the venerable stored procedure. I look forward to seeing you then!

This article is part of the parent stairway Stairway to U-SQL

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating