Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««56789

Understanding and Using APPLY (Part 1) Expand / Collapse
Author
Message
Posted Saturday, January 7, 2012 12:52 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 6:23 AM
Points: 9,928, Visits: 11,196
Jeff Moden (1/6/2012)
Would you mind posting your split function so I can take a look, please?

Can I ask that you guys start a new thread for this? Feel free to post a link here though.




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1231813
Posted Sunday, January 8, 2012 1:15 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 9:05 AM
Points: 35,589, Visits: 32,178
SQL Kiwi (1/7/2012)
Jeff Moden (1/6/2012)
Would you mind posting your split function so I can take a look, please?

Can I ask that you guys start a new thread for this? Feel free to post a link here though.


Thought the discussion would lend itself to the proper application of CROSS APPLY, but sure.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1232131
Posted Sunday, January 8, 2012 9:08 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 6:23 AM
Points: 9,928, Visits: 11,196
Jeff Moden (1/8/2012)
Thought the discussion would lend itself to the proper application of CROSS APPLY, but sure.

Hm, that's actually a good point. As you were, discuss away...




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1232185
Posted Monday, January 9, 2012 10:28 AM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: 2 days ago @ 11:14 AM
Points: 117, Visits: 586
As requested, Jeff:
USE DBA
GO
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[udf_SplitN]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))
DROP FUNCTION [dbo].[udf_SplitN]
GO
CREATE FUNCTION [dbo].[udf_SplitN]
( @inString nvarchar(MAX)
, @delim nvarchar(255)
)
RETURNS @arrSplit TABLE
( IDX int IDENTITY(0, 1) PRIMARY KEY
, [value] nvarchar(1024)
)
AS
/* =============================================================================
** original idea by Anith Sen, posted on www.simple-talk.com at
** http://www.simple-talk.com/sql/t-sql-programming/the-helper-table-workbench/
** Description: Split a nvarchar string into chunks using a delimiter of up to
** 255 characters.
**
** Notes: See the corresponding udf_Split for the varchar version.
**
SELECT *
FROM dbo.udf_Split('one,two,three,four,five,six,seven,eight,nine,ten', ',')

SELECT *
FROM dbo.udf_Split('Monday--Tuesday--Wednesday--thursday--friday--saturday--sunday', '--')
========================================================================== */
BEGIN
INSERT INTO @arrSplit
( [value]
)
SELECT SUBSTRING(@inString + @delim, number, CHARINDEX(@delim, @inString + @delim, number) - number)
FROM dbo.Numbers
WHERE number <= LEN(REPLACE(@inString, N' ', N'`'))
AND SUBSTRING(@delim + @inString, number, LEN(REPLACE(@delim, N' ', N'`'))) = @delim
ORDER BY number

RETURN
END

GO

This is used in preference to the function below in cases where 4000 characters is likely to be exceeded, and sometimes by MB. The msTVF works very well at splitting XML files that contain multiple documents - a problem a third-party vendor presented that could not be changed due to their third party software. That's the primary reason for the msTVF's existence in this form.

When I know the delimited varchar data are short, I'll use the much-discussed and augmented UDF found on SSC (renamed by me to distinguish functionality from my other 2 udf_Split% UDFs):
USE [DBA]
GO
/****** Object: UserDefinedFunction [dbo].[udf_Split8Kv] Script Date: 01/09/2012 09:26:22 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

ALTER FUNCTION [dbo].[udf_Split8Kv]
/***************************************************************************************************
Purpose: Split a given string at a given delimiter and return a list of the split elements (items).
Returns: iTVF containing the following:
ItemNumber = Index Position of the Item as an int, begins at ZERO.
Item = Element value as a VARCHAR(8000)
CROSS APPLY Usage Example:
-----------------------------------------------------------------------------------------------------
===== Conditionally drop the test tables to make reruns easier for testing.
-- (this is NOT a part of the solution)
IF OBJECT_ID('tempdb..#JBMTest') IS NOT NULL
DROP TABLE #JBMTest;
--===== Create and populate a test table on the fly (this is NOT a part of the solution).
SELECT *
INTO #JBMTest
FROM ( SELECT 1,'1,10,100,1000,10000,100000,1000000'
UNION ALL
SELECT 2,'2000000,200000,20000,2000,200,20,2'
UNION ALL
SELECT 3, 'This,is,a,test'
UNION ALL
SELECT 4, 'and so is this'
UNION ALL
SELECT 5, 'This, too (no pun intended)'
UNION ALL
SELECT 6, LEFT(REPLICATE('Thisisanunbrokenstring',400),7999)
) d (SomeID,SomeValue);
GO
--===== Split the CSV column for the whole table using CROSS APPLY (this is the solution)
SELECT test.SomeID
, split.ItemNumber
, split.Item
FROM #JBMTest test
CROSS APPLY ( SELECT ItemNumber
, Item
FROM dbo.udf_Split8Kv(test.SomeValue,',')
) split;
***************************************************************************************************/
--===== Define I/O parameters
( @pString varchar(7999)
, @pDelimiter char(1)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
--===== "Inline" CTE Driven "Tally Table” produces values up to
-- 10,000... enough to cover VARCHAR(8000)
WITH E1 ( N )
AS ( --=== Create Ten 1's
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)
, E2 ( N )
AS ( SELECT 1
FROM E1 a
, E1 b
) -- 100
, E4 ( N )
AS ( SELECT 1
FROM E2 a
, E2 b
) --10,000
, cteTally ( N )
AS ( SELECT ROW_NUMBER() OVER ( ORDER BY ( SELECT N ) )
FROM E4
)
--===== Do the split
SELECT ROW_NUMBER() OVER ( ORDER BY N ) - 1 AS IDX
, SUBSTRING(@pString, N, CHARINDEX(@pDelimiter, @pString + @pDelimiter, N) - N) AS Value
FROM cteTally
WHERE N < LEN(@pString) + 2
AND SUBSTRING(@pDelimiter + @pString, N, 1) = @pDelimiter

GO

Post #1232646
Posted Monday, January 9, 2012 3:52 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Yesterday @ 7:05 AM
Points: 3,467, Visits: 1,831
Excellent article. I liked the graphics, although I have to say I liked the code file even better. I have a few questions but I think I will wait until I have had a chance to read part 2 before expressing any

Kenneth


Kenneth Fisher
I strive to live in a world where a chicken can cross the road without being questioned about its motives.
--------------------------------------------------------------------------------
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Link to my Blog Post --> www.SQLStudies.com
Post #1232854
Posted Monday, January 9, 2012 8:39 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 9:05 AM
Points: 35,589, Visits: 32,178
SAinCA (1/9/2012)
As requested, Jeff:


I thought the problem may have been with the split function you had listed but, looking at the code you provided for the split functions, I don't see a particular problem with them other than they're slower (because of the delimiter concatenation) than the new split function available at the URL coming up. The attachments to the article have been updated, so be sure to get the code from the attachments and not the article itself. The code in the attachments is about 20% faster after someone was good enough to provide yet another optimization.

Here's the link to the article with the new splitter functions...
http://www.sqlservercentral.com/articles/Tally+Table/72993/

One of the "problems" (in quotes because it's a human error, not a problem with CROSS APPLY or iTVFs themselves) with iTVFs and CROSS APPLY is a problem that is very similar to a problem with VIEWs. That is, it's "hidden" code or at least "out of site" code AND the code becomes a part of the execution plan as if it had been written as a part of the code. Because of that, you can miss some of the most terrible performance problems due to a very simple oversight... mismatched datatypes which, of course, will cause table and index scans and even full "rewinds" which are worse than mere table scans. That was the same problem that Brad Shultz ran into in the link that Paul previously posted (I never did post back on that thread to show what the problem was). In that case (IIRC... it was a while ago), it was caused by CHARINDEX returning BIGINT instead of just an INT because the first operand of the CHARINDEX Brad used was MAX datatype. The datatype mismatch was a real killer there. Instead of the Numbers table selecting only the numbers it needed, it "rewound" the table 121,316 times and generated a total of 121,317,000 rows internally through a Lazy Table Spool (I have the execution plan but have misplaced the analysis that showed the precise reason, so operating as best I can from memory).

Because you haven't included any information as to the datatypes you used for variables or any of the columns used in your code example, I can't even begin to guess where the datatype mismatch (or, possibly, accidental many-to-many join) may be. It may be in the splitters or it (more likely) is in the fn_GetPermissionByUser function you actually used in the CROSS APPLY. And, yeah... because of the lack of information about your code, I can only guess but anytime something like what you've described happens with a conversion to an iTVF and a CROSS APPLY, it's usually due to a datatype mismatch. You have to remember... iTVFs and VIEWs are a lot alike... they "become one" with the calling code and, as with any fully inline code, data type matches become rather critical for any decent performance.

I agree with your "word of caution" but I wouldn't blame the iTVF or the CROSS APPLY. Chances are it's something like a simple datatype mismatch and that's likely what you have to be most cautious about. Why does multi-line and scalar TVFs let you get away with it (datatype mismatches and accidental many-to-many joins)? Because the execution plan doesn't incorporate the plans for such things in the main-stream execution plan.

Paul is much more of an expert on execution plans than I am. If you were to follow the procedure (see the second link in my signature line at the end of this post) for posting performance problems and opened a new thread for the problem (we've diverged too much from the intent of this thread already), I'm sure that someone, or even Paul himself, would be happy to show you how to correctly incorporate the very high speed methods of combining CROSS APPLY and iTVFs for this particular problem.

As a side bar, you've got a huge amount of unnecessary code in your code example that you posted. Take the following snippet, for example...

  WHERE GETDATE() BETWEEN eff_date AND term_date
AND ( @CompanyID IS NULL
OR ( @CompanyID IS NOT NULL
AND @CompanyID > 0
AND company_id = @CompanyID
)
)


Because NULLs cannot be related to anything else including other NULLs and assuming that you're not actually using negative CompanyIDs, the code can be simplified quite a bit as follows...

  WHERE GETDATE() BETWEEN eff_date AND term_date
AND (@CompanyID IS NULL OR company_id = NULLIF(@CompanyID,0))


The fact that you're using BETWEEN to compare dates is a whole other chapter in performing SQL Harakiri but this thread isn't the place for that particular discussion.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1232892
Posted Tuesday, January 10, 2012 9:51 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 7:45 AM
Points: 7,860, Visits: 9,606
Jeff Moden (1/9/2012)
As a side bar, you've got a huge amount of unnecessary code in your code example that you posted. Take the following snippet, for example...

  WHERE GETDATE() BETWEEN eff_date AND term_date
AND ( @CompanyID IS NULL
OR ( @CompanyID IS NOT NULL
AND @CompanyID > 0
AND company_id = @CompanyID
)
)


Because NULLs cannot be related to anything else including other NULLs and assuming that you're not actually using negative CompanyIDs, the code can be simplified quite a bit as follows...

  WHERE GETDATE() BETWEEN eff_date AND term_date
AND (@CompanyID IS NULL OR company_id = NULLIF(@CompanyID,0))



I can't see any declarations anywhere in any of the code posted that prevents companyID being negative, and unless there is such a restriction this simplification is broken. It seems likely that there is such a restriction (column names with ID that get compared with 0 are often identity columns with 1 as initial seed) but there isn't certain to be, unless I've missed something.


Tom
Post #1233362
Posted Tuesday, January 10, 2012 11:21 AM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: 2 days ago @ 11:14 AM
Points: 117, Visits: 586
Thanks, Jeff, for the complete treatment given to the script.

Thanks, too, to Paul for uncovering the treasure that is APPLY.

My only defense is "inherited code; wasn't broken so didn't fix it". I grabbed the newest split code (thanks) and looked at the code for implicit data type conversions. The usage of the split-string within the snippet posted takes a varchar and concatenates strings before comparison against a string, so there are no implicit conversions there.

The get permissions UDF is unchanged in my tenure apart from reformatting it from unreadable spaghetti. It's a msTVF that could probaby be rewritten as an iTVF using some CTEs and UNION ALL statements, but time doesn't permit that change just yet. Folks no longer with the Company said the underlying design needed complete overhaul and I can wait another day until it no longer supports the business - I have bigger fish to fry. The defensive "company > 0" was already there in case the UI sent garbage (outside my control); as I said, it ain't broke and under 2-second response to get all permissions for all users is acceptable by my Director.

BTW: Did you have a URL for the reference to the dates-between-harakiri, please? Wouldn't want to bump into misleading posts, as sometimes is the case. The term_date is defaulted to 9999-12-31 23:59:59 and if someone leaves a client company, we usually find out after the fact, so their term_date bereft of time is OK.

Thanks again for such an in-depth, meaty post on the posted scripts. If I get into performance issues (growth is just beginning in earnest), I'll be sure to post a new thread per your directive.
Post #1233415
Posted Tuesday, January 10, 2012 4:31 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 9:05 AM
Points: 35,589, Visits: 32,178
L' Eomot Inversé (1/10/2012)
Jeff Moden (1/9/2012)
As a side bar, you've got a huge amount of unnecessary code in your code example that you posted. Take the following snippet, for example...

  WHERE GETDATE() BETWEEN eff_date AND term_date
AND ( @CompanyID IS NULL
OR ( @CompanyID IS NOT NULL
AND @CompanyID > 0
AND company_id = @CompanyID
)
)


Because NULLs cannot be related to anything else including other NULLs and assuming that you're not actually using negative CompanyIDs, the code can be simplified quite a bit as follows...

  WHERE GETDATE() BETWEEN eff_date AND term_date
AND (@CompanyID IS NULL OR company_id = NULLIF(@CompanyID,0))



I can't see any declarations anywhere in any of the code posted that prevents companyID being negative, and unless there is such a restriction this simplification is broken. It seems likely that there is such a restriction (column names with ID that get compared with 0 are often identity columns with 1 as initial seed) but there isn't certain to be, unless I've missed something.


Heh... agreed. That's why I very specifically stated...
...and assuming that you're not actually using negative CompanyIDs...


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1233627
Posted Tuesday, January 10, 2012 5:57 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 7:45 AM
Points: 7,860, Visits: 9,606
Jeff Moden (1/10/2012)
Heh... agreed. That's why I very specifically stated...
...and assuming that you're not actually using negative CompanyIDs...

Yes, I'm getting really erratic . I even quoted that statement and still didn't notice it. How erratic can I get?


Tom
Post #1233666
« Prev Topic | Next Topic »

Add to briefcase «««56789

Permissions Expand / Collapse