Inline vs. Multi-Statement Table-Valued function, Temp Tables vs. CTEs?

Question

Inline vs. Multi-Statement Table-Valued function, Temp Tables vs. CTEs?

Katerine459

SSC Eights!

Points: 945
More actions
March 18, 2015 at 1:30 pm

#305777

So, I've been reading the discussion on the latest Stairway article about CTEs (http://www.sqlservercentral.com/Forums/Topic1660966-34-1.aspx), and a bunch of people are talking about how CTEs are performance hogs. And now I'm really curious (and I don't have a dataset large enough to test this, and I don't actually have anything I need at the moment)...
Let's say that you have a SQL Server back-end and a Microsoft Access front-end (like most of my setups). In Microsoft Access, there are a lot of situations where stored procedures simply will not work for the task at hand - most notably, there's no way to use stored procedure results as a (direct) data source for a report. For that, well, there are a few different options, but definitely the most time-effective in terms of billable programming hours and highest data security and lowest network bottleneck, is to use table-valued-functions and pass-through queries.
So, AFAIK, there are two choices in SQL Server, in that case: inline table-valued function, or multi-statement table-valued function.
So here's my question (finally 😛 ... sorry about that): if CTEs are performance hogs relative to temp tables, and multi-statement table-valued functions are performance hogs relative to inline functions (as I've heard in one of my other threads, here: http://www.sqlservercentral.com/Forums/Topic1664381-1292-1.aspx), and, AFAIK, inline functions can't use temp tables, then which is better: a multi-statement table-valued function that uses temp tables, or an inline function that uses CTE's?
I'm guessing it depends on the frequency of run, right? Like if the potential CTE in question has GROUP BYs and aggregate functions and is being performed against a big table, it's probably better to use a multi-statement table value function and temp tables? But... does it really make a difference? Unless the CTE is also a correlated subquery, wouldn't it just run once, regardless of whether it's writing to a temp table or being run within the query itself?
Just curious. Thanks! 🙂

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Katerine459 SSC Eights! Points: 945 More actions · Answer 1

Update: looks like I posted too soon. The consensus in that thread now seems to be that CTEs, in and of themselves (as long as they're not recursive) are no worse than regular subqueries (as long as they're not correlated subqueries)... that they are, in fact, the same. Correct?

Sorry about posting these general questions without waiting for the question to answer itself. :ermm:

Alan Burstein SSC Guru Points: 61136 More actions · Answer 2

Katerine459 (3/18/2015)
So, I've been reading the discussion on the latest Stairway article about CTEs (http://www.sqlservercentral.com/Forums/Topic1660966-34-1.aspx), and a bunch of people are talking about how CTEs are performance hogs. And now I'm really curious (and I don't have a dataset large enough to test this, and I don't actually have anything I need at the moment)...

Here's how you can build one real, real fast using CTEs. Thus proving that CTEs, when used correctly can be used to create very fast queries.

IF OBJECT_ID('tempdb..#lds') IS NOT NULL DROP TABLE #lds;

CREATE TABLE #lds (lds_id int not null, lds_txt char(36) not null);

GO

-- here's a great example of where CTEs kick butt

-- this creates a million rows of data in ~1sec

WITH

L1 AS (SELECT n = 1 UNION ALL SELECT 1),--2 rows

L2 AS (SELECT n = 1 FROM L1 a CROSS JOIN L1 b),--4 rows

L3 AS (SELECT n = 1 FROM L2 a CROSS JOIN L2 b),--16 rows

L4 AS (SELECT n = 1 FROM L3 a CROSS JOIN L3 b),-- 256 rows

L5 AS (SELECT n = 1 FROM L4 a CROSS JOIN L4 b),-- 65K rows

ITally AS

(SELECT n = ROW_NUMBER() OVER (ORDER BY (SELECT 1))

FROM L5 a CROSS JOIN L5 b)-- 4billion+ rows

INSERT #lds

SELECT TOP (1000000)

ROW_NUMBER() OVER (ORDER BY (SELECT 1)), newid()

FROM ITally ;

Let's say that you have a SQL Server back-end and a Microsoft Access front-end (like most of my setups). In Microsoft Access, there are a lot of situations where stored procedures simply will not work for the task at hand - most notably, there's no way to use stored procedure results as a (direct) data source for a report. For that, well, there are a few different options, but definitely the most time-effective in terms of billable programming hours and highest data security and lowest network bottleneck, is to use table-valued-functions and pass-through queries.

Inline only. multiline TVFs are always bad for many, many reasons. Never use them. That's my advice.

So here's my question (finally 😛 ... sorry about that): if CTEs are performance hogs relative to temp tables

False, absolutely false. 100% false. Unless you can provide an example, I will aver that this is false.

, and multi-statement table-valued functions are performance hogs relative to inline functions

True, absolutely and completely true. iTVFs can be considered parameterized views. If you understand how views work, one that can accept parameters is pretty special. multi-statement ITVs are something totally different and inferior. Create and iTVF and an mTVF that does the same thing then highlight both queries in SSMS, right-click and select "display estimated execution plan" to understand whey mTVS are so terrible.

which is better: a multi-statement table-valued function that uses temp tables, or an inline function that uses CTE's?

An inline function that uses CTEs. Period. (unless your iTVF is using a recursive CTE for counting then they are both bad and now we're debating which of these two terrible choices is worse).

I'm guessing it depends on the frequency of run, right?

Wrong. Though you can write bad iTVFs you can never write a good mTVFs because they are always bad.

Like if the potential CTE in question has GROUP BYs and aggregate functions and is being performed against a big table, it's probably better to use a multi-statement table value function and temp tables?

CTEs don't have any negative impact on big tables. That's folklore. Again, mTVFs are always bad. Though I have never seen a good article about this topic - read both splitter articles in my signature area. Note that they don't use mTVFs. Read any Itzek Ben Gan book - note that, for TVFs, he always uses inline table value functions. This is not a coincidence.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Alan Burstein SSC Guru Points: 61136 More actions · Answer 3

Had a second to kill. I put together a test to demonstrate why multi statement TVFs (mTVF) are terrible

Using TempDB (a DB I know you have) this code will create a 1,000,000 row table with sample data. It will also create an inline table values function (iTVF) and a mTVF that does the same thing. They both take a literal string as a parameter (@pattern) and tell us if that pattern exists in the lds_txt column.

USE tempdb

GO

IF OBJECT_ID('tempdb..lds') IS NOT NULL DROP TABLE lds;

CREATE TABLE lds (lds_id int not null, lds_txt char(36) not null);

GO

-- (1) Create the sample data

WITH

L1 AS (SELECT n = 1 UNION ALL SELECT 1),--2 rows

L2 AS (SELECT n = 1 FROM L1 a CROSS JOIN L1 b),--4 rows

L3 AS (SELECT n = 1 FROM L2 a CROSS JOIN L2 b),--16 rows

L4 AS (SELECT n = 1 FROM L3 a CROSS JOIN L3 b),-- 256 rows

L5 AS (SELECT n = 1 FROM L4 a CROSS JOIN L4 b),-- 65K rows

ITally AS

(SELECT n = ROW_NUMBER() OVER (ORDER BY (SELECT 1))

FROM L5 a CROSS JOIN L5 b)-- 4billion+ rows

INSERT dbo.lds

SELECT TOP (1000000)

ROW_NUMBER() OVER (ORDER BY (SELECT 1)), newid()

FROM ITally ;

ALTER TABLE lds

ADD CONSTRAINT pk_lds_x PRIMARY KEY(lds_id);

GO

-- (2) Create and iTVF

IF OBJECT_ID('tempdb.dbo.mTVF_CheckString') IS NOT NULL DROP FUNCTION dbo.mTVF_CheckString;

GO

CREATE FUNCTION dbo.mTVF_CheckString(@pattern varchar(5))

RETURNS @x TABLE

(

lds_id int not null,

lds_txtvarchar(36) not null,

pattern_txt varchar(5) null,-- user can pass the function a null value

has_pattern bit

)

AS

BEGIN

INSERT @x

SELECT

lds_id,

lds_txt,

@pattern,

CASE WHEN charindex(@pattern, lds_txt) = 0 THEN 0 ELSE 1 END

FROM dbo.lds;

RETURN;

END

GO

IF OBJECT_ID('tempdb.dbo.iTVF_CheckString') IS NOT NULL DROP FUNCTION dbo.iTVF_CheckString;

GO

-- (3) Create and iTVF

CREATE FUNCTION dbo.iTVF_CheckString(@pattern varchar(5))

RETURNS TABLE

AS

RETURN

SELECT

lds_id,

lds_txt,

@pattern AS pattern_txt,

CASE WHEN charindex(@pattern, lds_txt) = 0 THEN 0 ELSE 1 END AS has_pattern

FROM dbo.lds;

GO

No some code to test each function.

-- (4) Let's test these guys...

SET NOCOUNT ON;-- cleaner output

-- let's clean up the query cache

DBCC FREEPROCCACHE with no_infomsgs;

DBCC DROPCLEANBUFFERS with no_infomsgs;

DECLARE @pattern varchar(5) = '123-a';

SET STATISTICS TIME ON-- let's time these queries

PRINT char(10)+'Inline TVF:';

SELECT *

FROM dbo.iTVF_CheckString(@pattern)

WHERE has_pattern = 1;

PRINT char(10)+'Multi-line TVF:' +char(32);

SELECT *

FROM dbo.mTVF_CheckString(@pattern)

WHERE has_pattern = 1;

SET STATISTICS TIME OFF

GO

and the results:

SQL Server Execution Times:

CPU time = 516 ms, elapsed time = 521 ms.

Multi-line TVF:

SQL Server Execution Times:

CPU time = 2506 ms, elapsed time = 2537 ms.

Both functions return the same result set but the iTVF does so 5 times faster than the mTVF.

If you examine the code for each you will notice that the iTVF is simple returning a select statement. The mTVF, on the other hand, is:

(1) Creating a temp table named @x

(2) Inserting the contents of my query into that table

(3) Returning the result set.

To better understand how evil mTVFs are let's look at the query plan. If you run the test above in SSMS with "Include Execution Plan" on you will see this:

The iTVF utilizes the Clustered index, performs a simple Computer Scalar operation and returns the result set. Now look at the mTVF...

Yuck! the mTVF does a TABLE SCAN, and makes 2 calls to the mTVF.

Normally I don't use the option to "Display Estimated Execution plan" option but it's worth doing here because it reveals even more ugliness about mTVF's.

The mTVF actually has a "sub excecution plan"... You can see an index scan, then a table insert in the "sub plan", then, in the main plan, there's that table scan then all this data is tied together using the sequence operator and finally returned to you.

This should help explain why the iTVF is 5 times faster than the mTVF.:-P

Edit: 2nd image was not showing up. Fixed.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 4

Katerine459 (3/18/2015)
Unless the CTE is also a correlated subquery, wouldn't it just run once

And who said that a correlated subquery doesn't run just once?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Katerine459 SSC Eights! Points: 945 More actions · Answer 5

GilaMonster (3/19/2015)
Katerine459 (3/18/2015)
Unless the CTE is also a correlated subquery, wouldn't it just run once
And who said that a correlated subquery doesn't run just once?

I read that here: http://www.sqlservercentral.com/articles/Stairway+Series/105972/

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 6

Katerine459 (3/19/2015)
GilaMonster (3/19/2015)
Katerine459 (3/18/2015)
Unless the CTE is also a correlated subquery, wouldn't it just run once
And who said that a correlated subquery doesn't run just once?
I read that here: http://www.sqlservercentral.com/articles/Stairway+Series/105972/

This statement is wrong (and trivially provable to be wrong)

It will be run once for each candidate row selected in the outer query.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass