Function gotcha with use of asterisk
Introduction
During a recent upgrade of our software, I discovered what I assume is a little-known gotcha that I feel appropriate to document and pass on. The gotcha occurs with user-defined table functions (UDFs), when selecting columns from a table using the asterisk (select all) method. The problem then occurs only when columns are subsequently added or removed from the based over table without recreating the function. When these conditions occur, it results in interesting and undesirable output. The SQL below illustrates this problem using SQL 2000.
Setting up the example
First, we need some tables to work with. These are just two simple tables that we can join within a function to extract column details from both tables.
-- Create sample table A
Create table dbo.TableA
(
recordid dec(5,0) identity(1,1)
, columnA char(10)
, columnB char(10)
, columnC char(10)
)
-- Create sample table B
Create table dbo.TableB
(
recordid dec(5,0) identity(1,1)
, fieldA char(10)
, fieldB char(10)
, fieldC char(10)
)
-- Insert sample data into table A & B
Insert into dbo.TableA values('Robert', 'Jimmy', 'Jones')
Insert into dbo.TableB values('Fish', 'and', 'Chips')
GO
Next, we need a table function to extract the details from the above tables. The thing to note when creating this function is that I created it using the asterisk to select all columns from the first table.
-- Create a function that returns a table result from table A and table B
-- This function uses the select all columns (*) option for table A.
CREATE FUNCTION TestFunction ()
RETURNS TABLE
AS
RETURN
Select TableA.* -- Note: All fields selected from table A
, TableB.fieldA
, TableB.fieldB
, TableB.fieldC
from dbo.TableA as TableA
inner join dbo.TableB as TableB
on TableA.recordID = TableB.recordID
GO
With my real-life example, I used the asterisk being lazy, since the table had over thirty columns in it and I did not wish to replicate all the column names. This was a mistake, as I will demonstrate.
Compare output before and after table change
First run the function as prepared above so that you have a set of results to compare with later.
-- Run the function for the first time
select * from TestFunction ()
GO
The results should look as follows
recordid | columnA | columnB | columnC | fieldA | fieldB | fieldC |
---|---|---|---|---|---|---|
1 | Robert | Jimmy | Jones | Fish | and | Chips |
Now alter the table to include a new column, as done by my software supplier in my real-life example.
-- Alter TableA to include a new column
Alter table dbo.TableA Add columnD char(10) null
GO
To see the effect of this column addition on the function, run the function again.
-- Run the function a second time
select * from TestFunction ()
GO
The results should look as follows
recordid | columnA | columnB | columnC | fieldA | fieldB | fieldC |
---|---|---|---|---|---|---|
1 | Robert | Jimmy | Jones | NULL | Fish | and |
What has changed and why
Note from the results that the number of output fields from the function remains the same, and that these maintain exactly the same column headings as before. The difference is that the data from the second table has now been shifted one column to the right. The column headed "fieldA" now contains the null value from the new column added to table A.
To understand why you must understand that a table function actually creates and stores a table definition within SQL's system tables. The table definition is created at the time the CREATE FUNCTION statement is executed and defines the table layout as set out under the RETURNS clause. The layout is stored away in the sysobjects and syscolumns tables.
To see the function's table layout as defined in the system files use the following
-- Review the function's column layout in system tables
Select objs.name as [Function name]
, objs.crdate as [Function create date]
, cols.name as [Column name]
, cols.length as [Column length]
, cols.colorder as [Column order]
from sysobjects as objs
inner join syscolumns as cols
on cols.id = objs.id
where objs.name = 'TestFunction'
order by cols.colorder
GO
The results should appear as follows
Function name | Function create date | Column name | Column length | Column order |
---|---|---|---|---|
TestFunction | 2008-04-02 14:09:12.433 | recordid | 5 | 1 |
TestFunction | 2008-04-02 14:09:12.433 | columnA | 10 | 2 |
TestFunction | 2008-04-02 14:09:12.433 | columnB | 10 | 3 |
TestFunction | 2008-04-02 14:09:12.433 | columnC | 10 | 4 |
TestFunction | 2008-04-02 14:09:12.433 | fieldA | 10 | 5 |
TestFunction | 2008-04-02 14:09:12.433 | fieldB | 10 | 6 |
TestFunction | 2008-04-02 14:09:12.433 | fieldC | 10 | 7 |
Since the create statement is not executed each time the function is used, the table definition remains constant in the system tables. This being as per the original definition determined when the create statement was first executed, even though the asterisk (select all) method now returns additional data.
To overcome this problem, one must recreate the function every time the underlying table changes, which will then update the system tables with the changed layout caused by the asterisk selection method.
Of course, the better way to overcome this problem is to explicitly state every column name in the function's select statement rather than select these implicitly using the asterisk selection method. In this way, the output columns will always maintain the same position as defined in the system tables.
I have read numerous discussions about the differences between a UDF and stored procedure and never once found mention of the above. You will be pleased to know that the same problem does not happen when using the asterisk on joins within stored procedures, these handle the inclusion of a new column successfully, showing all columns in their output. This is because the output of a stored procedure does not map over a predefined table layout.
If you have created the above tables and functions, remember to clean them up afterwards
-- Clean up afterwards
Drop table dbo.TableA
Drop table dbo.TableB
Drop function TestFunction
GO
Remaining lazy
I'm ever one to remain lazy, so the idea of having to key in thirty odd column names into my function when they all come from the same table does not appeal. There is a short cut to this however that I will explain below.
The solution is to use SP_Help and MS/Excel.
Within SQL and your database, type the code as follows
Sp_help [dbo.TableName]
GO
This produces a list of the table's columns giving output similar to below (I've restricted this to the first 8 columns)
Column_name | Type | Computed | Length | Prec | Scale | Nullable | TrimTrailingBlanks |
---|---|---|---|---|---|---|---|
recordid | decimal | no | 5 | 5 | 0 | no | (n/a) |
columnA | char | no | 10 | yes | no | ||
columnB | char | no | 10 | yes | no | ||
columnC | char | no | 10 | yes | no |
Simply click in the column header for the column "Column_name" to highlight the whole column, then select the drop down menu option >> edit >> copy.
Now open up a worksheet in MS/Excel, position your cursor in cell A1 and select the drop down menu option >> edit >> paste, to get a result as below.
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
1 | recordid | |||||
2 | columnA | |||||
3 | columnB | |||||
4 | columnC |
In cell B1 type in the following formula =", " & A1
Drag this formula down or double click in the right bottom corner of cell B1 to populate this formula to the bottom of the list. The result will look as follows
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
1 | recordid | , recordid | ||||
2 | columnA | , columnA | ||||
3 | columnB | , columnB | ||||
4 | columnC | , columnC |
Now simply reverse the copy and paste by highlighting all the values in column B of the spreadsheet, copying them, and pasting them after the word "Select" in your SQL. You just have to remove the one comma from before the first column name and you have a complete list of fields in your select statement.
i.e.
Select recordid
, columnA
, columnB
, columnC
If you prefer your column names in a horizontal rather than vertical list then do the following.
After doing the copy of values from column B, position your cursor in cell C1 and select the drop down menu option >> edit >> paste special >> values (this converts the formulas to values).
Next highlight the values in column C and select the drop down menu option >> edit >> copy, then position your cursor in cell D1 and select the drop down menu option >> edit >> paste special >> Transpose.
The transpose option takes the vertical cell values and pastes them horizontally from D1 along row 1. You can now highlight all these horizontal values as a block and copy and paste these into your SQL to result as follows.
Select ,recordID, columnA, columnB, columnC
You then just simply remove the leading comma as per before.
Using this technique for long lists of fields is invaluable, saving lots of keying time and removing the chance of introduced typos in the column names. The transpose method above is also a great way of adding column headings in MS/Excel to data results you have previously copied and pasted to a worksheet.
Summary
In summary, this article highlights a simple trap caused by taking a shortcut in coding, being the use of the asterisk to select all columns within a user-defined function. In doing this, it highlights a fundamental difference between a stored procedure and a UDF. It shows how you can review a function's column layout via querying the system tables, and provides methods for overcoming the problem and provides techniques to quickly build a field list to paste into your SQL statement.
I hope the illustration highlights the danger of using the asterisk select method successfully, and that you do not have too many examples of this in your own function library to clean up.
Oh, and in truth I'm not really lazy, I just don't enjoy doing repetative or mundane tasks when I can find a solution to speed the process, and free up my time for more productive work.