SQLServerCentral Article

The Cascading (CROSS) APPLY

,

Introduction

An old trick nowadays but one which is still underused here on SQLServerCentral is the cascading CROSS APPLY – where output from one CROSS APPLY is used as input for another. The term cascading CROSS APPLY (or "cCA" for short) was coined by our very own Jeff Moden here.

The use of cCAs can provide some rather remarkable set-based "in-row" processing abilities where the use of RBAR, inline functions, or multiple separate queries or CTEs might otherwise be required. Because cCAs can eliminate so many different types of code elements, they also make for some relatively short, easy to read, and easy to maintain code.

Whilst the technique works equally well with OUTER APPLY, you’re most likely to use it as sets of inline row-valued functions, that is, operating on rows from a table on the left side of the CROSS APPLY.

A Simple "Split" Example

One of the simplest examples of cCA usage takes the form of a string splitter with a tightly defined scope.

Defining the Problem

Let’s say we’ve imported a table containing email information in the following format:

 CREATE TABLE #EmailAddress
        (
        EmployeeID INT IDENTITY(1,1),
        EmailAddress VARCHAR(128)
        )
;
 INSERT INTO #EmailAddress
        (EmailAddress)
 SELECT '<mmooney@gmail.com>;mmoney@gmail.com' UNION ALL
 SELECT '<chrisgardener@yahoo.com>)''[chrisgardener@comcast.net)]' UNION ALL
 SELECT '"natan@hotmail.com'':>"natan@hotmail.com?''' UNION ALL
 SELECT '"jennifer@walmart.com}'';jennifer@fedex.com' UNION ALL
 SELECT '''<jennifer.juniper@fedex.com>;"jennifer.juniper@fedex.com"'''
;

Let's also say that we want to compare the domain of the two email addresses in each data element and perform some action depending on whether they are the same or not. To compare the domains we first have to resolve them out of the string.

First Thoughts

If we intend to tackle this problem using only T-SQL, two tools spring to mind.

One of these tools is the built-in scalar function PARSENAME(), which returns the specified part of a 4-part object name delimited with a period(.). Well, perhaps - if the at sign (@) was replaced with a period - but it can be eliminated immediately because it returns NULL if the string contains more than three delimiters.

The other tool would be the popular DelimitedSplit8K inline table-valued function, which returns parsed strings as a derived table. Using it seems simple enough. Just pass in the email address and at sign as delimiter and truncate the returned strings at the first period:

SELECT  ea.EmployeeID, ea.EmailAddress,
        Domain1 = MAX(CASE
                      WHEN ca.ItemNumber = 2
                      THEN LEFT(ca.Item,CHARINDEX('.',ca.Item,1)-1)
                      ELSE ''
                      END),
        Domain2 = MAX(CASE
                      WHEN ca.ItemNumber = 3
                      THEN LEFT(ca.Item,CHARINDEX('.',ca.Item,1)-1)
                      ELSE ''
                      END)
FROM    #EmailAddress ea
CROSS APPLY dbo.DelimitedSplit8K (ea.EmailAddress, '@') ca
WHERE   ca.ItemNumber IN (2,3)
GROUP BY ea.EmployeeID, ea.EmailAddress
HAVING COUNT(*) = 2;

This does generate the correct results:

EmployeeID

EmailAddress

Domain1

Domain2

1

<mmooney@gmail.com>;mmoney@gmail.com

gmail

gmail

2

<chrisgardener@yahoo.com>)'[chrisgardener@comcast.net)]

yahoo

comcast

3

"natan@hotmail.com':>"natan@hotmail.com?'

hotmail

hotmail

4

"jennifer@walmart.com}';jennifer@fedex.com

walmart

fedex

5

<jennifer.juniper@fedex.com>;"jennifer.juniper@fedex.com"'

fedex

fedex

…but seems a bit heavy handed because it has to split everything out, return a result set for each row, and then choose from the result set.  Using our test data it runs about 2.5 times slower than the cCA code we'll end up with using.

Inline Splitting

If we could read from the first at to the first period, then jump to the second at and second period, we’d be there. The next simple query picks up the character position of the first at:

SELECT  EmployeeID, EmailAddress,
        p1 = CHARINDEX('@',EmailAddress,1)
FROM    #EmailAddress;

The results:

EmployeeID

EmailAddress

p1

1

<mmooney@gmail.com>;mmoney@gmail.com

9

2

<chrisgardener@yahoo.com>)'[chrisgardener@comcast.net)]

15

3

"natan@hotmail.com':>natan@hotmail.com?'

7

4

"jennifer@walmart.com}';jennifer@fedex.com

10

5

'<jennifer.juniper@fedex.com>;"jennifer.juniper@fedex.com"'

19

If p1 was passed into a similar CHARINDEX expression as the 3rd parameter it would return the position of the next delimiter, and we can show this with a CTE:

SELECT  EmployeeID, EmailAddress,
        p1 = CHARINDEX('@',EmailAddress,1)
FROM    #EmailAddress;
WITH level1 AS
        (
        SELECT  EmployeeID, EmailAddress,
                p1 = CHARINDEX('@',EmailAddress,1)
        FROM    #EmailAddress
        )
SELECT  EmployeeID, EmailAddress,
        p1,
        p2 = CHARINDEX('.',EmailAddress,p1)
FROM    level1;

with these results:

EmployeeID

EmailAddress

p1

p2

1

<mmooney@gmail.com>;mmoney@gmail.com

9

15

2

<chrisgardener@yahoo.com>)'[chrisgardener@comcast.net)]

15

21

3

"natan@hotmail.com':>natan@hotmail.com?'

7

15

4

"jennifer@walmart.com}';jennifer@fedex.com

10

18

5

<jennifer.juniper@fedex.com>;"jennifer.juniper@fedex.com"'

19

25

Continuing in the same manner eventually yields the 4 character positions required to SUBSTRING the domains out of the email address column:

WITH
level1 AS ( -- find p1 = first '@'
    SELECT EmployeeID, Email,
        p1 = CHARINDEX('@',Email,1)
    FROM @t),
level2 AS ( -- find p2 = first '.'
    SELECT EmployeeID, Email,
        p1, p2 = CHARINDEX('.',Email,p1)
    FROM level1),
level3 AS ( -- find p3 = second '@'
    SELECT EmployeeID, Email,
        p1, p2, p3 = CHARINDEX('@',Email,p2)
    FROM level2),
level4 AS ( -- find p4 = second '.'
    SELECT EmployeeID, Email,
        p1, p2, p3, p4 = CHARINDEX('.',Email,p3)
    FROM level3)
SELECT EmployeeID, Email,
    Domain1 = SUBSTRING(Email,p1+1,p2-(p1+1)),
    Domain2 = SUBSTRING(Email,p3+1,p4-(p3+1))
FROM level4;

This gives us:

EmployeeID

EmailAddress

Domain1

Domain2

1

<mmooney@gmail.com>;mmoney@gmail.com

gmail

gmail

2

<chrisgardener@yahoo.com>)'[chrisgardener@comcast.net)]

yahoo

comcast

3

"natan@hotmail.com':>"natan@hotmail.com?'

hotmail

hotmail

4

"jennifer@walmart.com}';jennifer@fedex.com

walmart

fedex

5

<jennifer.juniper@fedex.com>;"jennifer.juniper@fedex.com"'

fedex

fedex

Which, whilst it works and speaking of "heavy handed,"  is a whole heap of code.

Splitting with Cascading CROSS APPLYs

If you’ve used APPLY before, you will know that it’s very handy for performing inline calculations. Partial calculations can be completed in the output set – passing the results “up”, as it were, like this:

SELECT  EmployeeID, EmailAddress,
        SUBSTRING(EmailAddress,x1.p1,20)
FROM    #EmailAddress
CROSS APPLY(SELECT p1 = CHARINDEX('@',EmailAddress,1)) x1;

The results:

EmployeeID

EmailAddress

1

<mmooney@gmail.com>;mmoney@gmail.com

@gmail.com>;mmoney@g

2

<chrisgardener@yahoo.com>)'[chrisgardener@comcast.net)]

@yahoo.com>)'[chrisg

3

"natan@hotmail.com':>natan@hotmail.com?'

@hotmail.com':>"nata

4

"jennifer@walmart.com}';jennifer@fedex.com

@walmart.com}';jenni

5

<jennifer.juniper@fedex.com>;"jennifer.juniper@fedex.com"'

@fedex.com>;"jennife

The results can be passed “down” too, to the WHERE clause:

SELECT  EmployeeID, EmailAddress,
        SUBSTRING(EmailAddress,x1.p1,20)
FROM    #EmailAddress
CROSS APPLY(SELECT p1 = CHARINDEX('@',EmailAddress,1)) x1
WHERE   x1.p1 = 9;

What happens if we pass x1.p1 down to another CROSS APPLY block? It would look like this:

SELECT  EmployeeID, EmailAddress,
        SUBSTRING(EmailAddress,x1.p1,20),
        SUBSTRING(EmailAddress,x2.p1,20)
FROM    #EmailAddress
CROSS APPLY(SELECT p1 = CHARINDEX('@',EmailAddress,1)) x1
CROSS APPLY(SELECT p1 = CHARINDEX('@',EmailAddress, x1.p1+1)) x2;

x1.p1 is the position of the first at in the string. If we pass this output column into another CHARINDEX as the third parameter (which is the location to start searching from) it will find the same at and hence return the same position, so we add a 1 to it to begin the search at the character immediately after. The results are very close to what we’re looking for:

EmployeeID

EmailAddress

1

<mmooney@gmail.com>;mmoney@gmail.com

@gmail.com>;mmoney@g

@gmail.com

2

<chrisgardener@yahoo.com>)'[chrisgardener@comcast.net)]

@yahoo.com>)'[chrisg

@comcast.net)]

3

"natan@hotmail.com':>natan@hotmail.com?'

@hotmail.com':>"nata

@hotmail.com?'

4

"jennifer@walmart.com}';jennifer@fedex.com

@walmart.com}';jenni

@fedex.com

5

<jennifer.juniper@fedex.com>;"jennifer.juniper@fedex.com"'

@fedex.com>;"jennife

@fedex.com"'

And from here it’s a trivial matter to fetch the position of the period immediately after each at:

SELECT  EmployeeID, EmailAddress,
        Domain1  = SUBSTRING(EmailAddress, x1.p1, x1.p2-x1.p1),
        Domain2 = SUBSTRING(EmailAddress, x2.p1, x2.p2-x2.p1)
FROM    #EmailAddress
CROSS APPLY(SELECT
        p1 = 1+CHARINDEX('@',EmailAddress,1),
        p2 = CHARINDEX('.',EmailAddress,CHARINDEX('@',EmailAddress,1))
        ) x1
CROSS APPLY(SELECT
        p1 = 1+CHARINDEX('@',EmailAddress, x1.p1),
        p2 = CHARINDEX('.',EmailAddress,CHARINDEX('@',EmailAddress, x1.p1))
        ) x2;

Remember the 1 added to p1? Here I’ve put it into the expression for p1 so I can safely forget about it elsewhere in the query. Note that you could introduce two more CROSS APPLY blocks, one for each p2.

Here are the results:

EmployeeID

EmailAddress

Domain1

Domain2

1

<mmooney@gmail.com>;mmoney@gmail.com

gmail

gmail

2

<chrisgardener@yahoo.com>)'[chrisgardener@comcast.net)]

yahoo

comcast

3

"natan@hotmail.com':>"natan@hotmail.com?'

hotmail

hotmail

4

"jennifer@walmart.com}';jennifer@fedex.com

walmart

fedex

5

<jennifer.juniper@fedex.com>;"jennifer.juniper@fedex.com"'

fedex

fedex

This is the cascaded (CROSS) APPLY: the output from one APPLY block is used as an input parameter in a subsequent block.

A note about performance

Cascaded (CROSS) APPLY used to extract elements from strings quickly becomes expensive as the number of elements increases. Give it up to four or five elements and it is likely to be faster than a tally-based string splitter; more than that and it begins to lag behind. Don’t be fooled by the execution plan. Regardless of how many APPLY blocks are in your query, it shows a single Compute Scalar operator. You have to raise the properties of the Compute Scalar Operator to get any idea of how much work is performed.

Here’s a crude comparison between cCA and SSC’s favourite string splitter, DelimitedSplit8K:

Element

cCA

DelimitedSplit8K

count

CPU

Elapsed

CPU

Elapsed

1

250

1,017

4,056

4,186

2

1,248

1,488

4,665

4,760

3

2,824

2,990

5,257

5,414

4

5,148

5,354

5,913

6,023

5

8,409

8,555

6,693

6,994

6

12,481

12,634

7,566

7,788

7

17,191

17,367

8,097

8,347

8

22,137

22,255

8,705

8,920

9

27,581

27,822

9,360

9,590

10

33,243

33,399

9,844

10,190

11

37,487

37,878

10,514

10,743

A few real-world applications

Developing a complex aggregate

Case:

-- whoosh up some sample data
IF object_id('TempDB..#Persons') IS NOT NULL
 DROP TABLE #Persons;
GO
CREATE TABLE #Persons (
 PersonID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
 Forename VARCHAR(20),
 Lastname VARCHAR(20),
 Title VARCHAR(10),
 DOB DATE,
 WeightKg NUMERIC (5,2),
 Heightmm INT)
;
INSERT INTO #Persons (Forename, Lastname, Title, DOB, WeightKg, Heightmm)
 SELECT 
  Forename, 
  Lastname, 
  Title,
  DOB = CONVERT(DATE,DOB,107), 
  WeightKg = CAST(RIGHT(DOB,2) AS INT) * 1.5,
 Heightmm = YEAR(CONVERT(DATE,DOB,107)) 
 FROM (
  VALUES 
  (('Francis'),('Bacon'),('Mr'),('January 22, 1561'),(NULL)),
  (('Jean'),('Baptiste'),('Mr'),('March 21, 1768'),(NULL)), 
  (('Clarence'),('Birdseye'),('Mr'),('December 9, 1886'),(NULL)),
  (('Charles'),('Babbage'),('Mr'),('November 24, 1872'),(NULL)),
  (('John'),('Logie Baird'),('Mr'),('August 13, 1888'),(NULL)),
  (('Nicolaus'),('Copernicus'),('Mr'),('February 19, 1473'),(NULL)),
  (('Christian'),('Doppler'),('Mr'),('November 29, 1803'),(NULL)),
  (('Paul'),('Ehrlich'),('Mr'),('March 14, 1854'),(NULL)),
  (('Alexander'),('Fleming'),('Mr'),('August 6, 1881'),(NULL)),
  (('Michael'),('Faraday'),('Mr'),('September 22, 1791'),(NULL)),
  (('Pierre'),('de Fermat'),('Mr'),('August 17, 1601'),(NULL)),
  (('Robert'),('Fulton'),('Mr'),('November 14, 1765'),(NULL)),
  (('Joseph'),('Fourier'),('Mr'),('March 21, 1768'),(NULL)),
  (('Galileo'),('Galilei'),('Mr'),('February 15, 1564'),(NULL)), 
  (('Stephen'),('Hawking'),('Mr'),('January 8, 1942'),(NULL)), 
  (('Otto'),('Hahn'),('Mr'),('March 8, 1879'),(NULL)), 
  (('Jean-Baptiste'),('Lamarck'),('Mr'),('August 1, 1744'),(NULL)),
  (('Konrad'),('Lorenz'),('Mr'),('November 7, 1903'),(NULL)), 
  (('Gregor'),('Mendel'),('Mr'),('July 22, 1822'),(NULL)),
  (('Guglielmo'),('Marconi'),('Mr'),('April 25, 1874'),(NULL)),
  (('Isaac'),('Newton'),('Mr'),('December 25, 1642'),(NULL)), 
  (('J. Robert'),('Oppenheimer'),('Mr'),('April 22, 1904'),(NULL)), 
  (('Joseph'),('Priestley'),('Mr'),('March 13, 1733'),(NULL)),
  (('Wolfgang'),('Pauli'),('Mr'),('April 25, 1900'),(NULL)), 
  (('Blaise'),('Pascal'),('Mr'),('June 19, 1623'),(NULL)),
  (('Craig'),('Venter'),('Mr'),('October 14, 1946'),(NULL)), 
  (('James'),('Watson'),('Mr'),('April 16, 1970'),(NULL)) 
  ) d (Forename, Lastname, Title, DOB, WeightKg)
;
-- Task: group scientists according to their weight and height 
SELECT 
  Century, 
  LightweightShorties = COUNT(z.LightweightShorty),
  -- more columns
  GihugeousTallies = COUNT(z.GihugeousTallie)
 FROM #Persons
  CROSS APPLY ( -- calculate century of birth and weight band
   SELECT 
    Century = (YEAR(DOB)/100)+1,
    WeightBand = CASE 
                 WHEN WeightKg < 80 THEN 1
                 WHEN WeightKg >= 80 AND WeightKg < 100 THEN 2
                 WHEN WeightKg >= 100 AND WeightKg < 120 THEN 3
                 ELSE 4 END
     ) x
   CROSS APPLY ( -- calculate height band
    SELECT 
     HeightBand = CASE 
                  WHEN Heightmm < 1700 THEN 1
                  WHEN Heightmm >= 1700 THEN 2
                  ELSE 4 END
     ) y
   CROSS APPLY ( -- count scientists in specific weight and height bands
    SELECT 
     LightweightShorty = CASE WHEN WeightBand = 1 AND HeightBand = 1 THEN 1        END,
     -- more columns
    GihugeousTallie = CASE WHEN WeightBand = 4 AND HeightBand = 2 THEN 1 END    ) z
  GROUP BY Century
  ORDER BY Century
-- Once the results are verified, condense the code. 
-- note differences between COUNT and SUM
-- count rows where the result is not null 
SELECT 
  x.Century,
  LightweightShorties = SUM(CASE WHEN WeightKg < 80 AND Heightmm < 1700 THEN 1END), -- SUM(1) 
  -- more columns
  GihugeousTallies = COUNT(CASE WHEN WeightKg >= 120 AND Heightmm >= 1700 THEN 1 END) 
 FROM #Persons
  CROSS APPLY (SELECT Century = (YEAR(DOB)/100)+1) x
 GROUP BY x.Century
 ORDER BY x.Century

Note that the optimizer chooses the same plan for both queries.

Make a complex calculation easier to work with by breaking it down into smaller, simpler elements

Case: calculate the elapsed time in milliseconds (in data type TIME(3)) between two DATETIME2 values truncated to milliseconds. Show workings and determine if the DATETIME datatype would be a sensible choice for intermediate values in the calculation.

WITH MyCTE (aDT2, bDT2)
AS
(
       SELECT CONVERT(datetime2,'1900-01-01 00:10:00.000'), CONVERT(datetime2,'1900-01-01 00:00:00.000')
       UNION ALL SELECT '1900-01-01 00:10:00.000', '1900-01-01 00:00:00.001'
       UNION ALL SELECT '1900-01-01 00:10:00.000', '1900-01-01 00:00:00.002'
       UNION ALL SELECT '1900-01-01 00:10:00.000', '1900-01-01 00:00:00.003'
       UNION ALL SELECT '1900-01-01 00:10:00.000', '1900-01-01 00:00:00.004'
       UNION ALL SELECT '1900-01-01 00:10:00.000', '1900-01-01 00:00:00.005'
)
SELECT
       *,
       Result_DT2 = CAST(msTime_DT2 AS TIME(3)),
       Result_DT = CAST(msTime_DT AS TIME(3)),
       ChosenCalc = CAST(DATEADD(ms,a.msTimeDiff,CAST(CAST(0 AS DATETIME) AS DATETIME2)) AS TIME(3))
FROM myCTE
CROSS APPLY (SELECT msTimeDiff = ABS(DATEDIFF(ms,aDT2, bDT2))) a
CROSS APPLY (SELECT TimeZero_DT2 = CAST(CAST(0 AS DATETIME) AS DATETIME2)) b
CROSS APPLY (SELECT TimeZero_DT = CAST(0 AS DATETIME)) b1
CROSS APPLY (SELECT msTime_DT2 = DATEADD(ms,a.msTimeDiff,TimeZero_DT2)) c
CROSS APPLY (SELECT msTime_DT = DATEADD(ms,a.msTimeDiff,TimeZero_DT)) c1;

Results

Result_DT2

Result_DT

ChosenCalc

00:10:00.000

00:10:00.000

00:10:00.000

00:09:59.999

00:10:00.000

00:09:59.999

00:09:59.998

00:09:59.997

00:09:59.998

00:09:59.997

00:09:59.997

00:09:59.997

00:09:59.996

00:09:59.997

00:09:59.996

00:09:59.995

00:09:59.997

00:09:59.995

There’s our answer - precision is lost if DATETIME is used. A quick scan of the DATETIME datatype section in BOL warns Accuracy: Rounded to increments of .000, .003, or .007 seconds. I could have read BOL first of course, but it wouldn’t have been as much fun and working through an exercise like this helps retention.

Split a string with multiple heterogeneous delimiters

Case: a single data element in a string column may contain several lines of text (delimited with carriage return/linefeed pairs). There are also spaces between words, and at (@) and period (.) act as delimiters either side of domain names. The second line of each string contains two email addresses. We want the domain name of the second email address.

-- Create a small sample table
DECLARE @MyString VARCHAR(8000), @CRLF CHAR(2)
SET @CRLF = CHAR(13) + CHAR(10)
SET @MyString =
      'The quick brown fox slip@snailmail.com over the,lazy,dog¬' + @CRLF +
      'The,quick,abc@hutmail.com fox,jumped,over the,frizzy@boogle.com dog¬' + @CRLF +
      'The,quick,red,fox,jumped,shell@snailmail.com,the,lazy,dog¬' + @CRLF +
      'Fourth line' 
PRINT @MyString

DROP TABLE #Strings
SELECT *
INTO #Strings
FROM ( -- make each string slightly different
      SELECT '1 ' + @MyString AS MyString UNION ALL
      SELECT '12 ' + REPLACE(@MyString,'boogle','woogle') UNION ALL 
      SELECT '123 ' + REPLACE(@MyString,'boogle','doogle') UNION ALL
      SELECT '1234 ' + REPLACE(@MyString,'boogle','noogle') UNION ALL
      SELECT '12345 ' + REPLACE(@MyString,'boogle','foogle') UNION ALL
      SELECT '123456 ' + @MyString UNION ALL
      SELECT '1234567 ' + @MyString UNION ALL
      SELECT '12345678 ' + @MyString UNION ALL
      SELECT '123456789 ' + @MyString UNION ALL
      SELECT '1234567890 ' + @MyString
) d

-- solution
SELECT
      *,
      domain = SUBSTRING(d.MyString, x3.p3+1, x4.p4-(x3.p3+1))
FROM #Strings d
CROSS APPLY (SELECT p1 = CHARINDEX(@CRLF, d.MyString, 0)) x1 -- start of line 2 (end of line 1)
CROSS APPLY (SELECT p2 = CHARINDEX('@', d.MyString, x1.p1)) x2 -- first '@' in line 2
CROSS APPLY (SELECT p3 = NULLIF(CHARINDEX('@', d.MyString, x2.p2+1),0)) x3 -- second '@' in line 2
CROSS APPLY (SELECT p4 = NULLIF(CHARINDEX('.', d.MyString, x3.p3+1),0)) x4; -- first '.' after second '@' in line 2

Solving this puzzle (a real case which came up on SSC sometime last year) is trivial with cCA but would be quite tricky using a string splitter, splitting firstly on CR or LF and choosing the second item from the list, then on at and choosing the second item, and finally CHARINDEX() to capture the position of the terminal period of the domain. Something like this;

SELECT
       s.MyString,
       domain = LEFT(b.Item, c.dotpos-1)
FROM #Strings s
CROSS APPLY dbo.DelimitedSplit8K(MyString,CHAR(10)) a
CROSS APPLY dbo.DelimitedSplit8K(a.Item,'@') b
CROSS APPLY (SELECT dotpos = CHARINDEX('.',b.Item)) c
WHERE a.ItemNumber = 2
       AND b.ItemNumber = 3;
  • which of course uses a cascaded CROSS APPLY!

Thank you for taking the time to read this article. A huge thanks to my good friend Dwain Camps for his endless encouragement and to Jeff Moden for his first-rate editing and comments.

Rate

4.8 (55)

You rated this post out of 5. Change rating

Share

Share

Rate

4.8 (55)

You rated this post out of 5. Change rating