In-Line Function to split string

  • Good afternoon gents,

    I'm having a slight issue with a function i have created and the way it works.

    Some background...

    I have a job which has multiple customers, the data is store as such

    And i need the in the format below

    As it stands the function i have modified/created will return a 2 column values for a single project based on inputting a project number, see below

    /****** Object: UserDefinedFunction [dbo].[SplitList] Script Date: 05/01/2014 16:51:24 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    Alter Function [dbo].[SplitCust]

    (

    @Project Varchar(20),

    @CustomerCode [varchar](max),

    @CommaDelimiter [varchar](1)

    )

    RETURNS @Table Table (Project varchar(20), Value [varchar](100))

    BEGIN

    ------------------------------------------------------------------------

    ------------------------------------------------------------------------

    DECLARE @sTemp [varchar](max)

    SET @sTemp = ISNULL(@CustomerCode,'')

    + @CommaDelimiter

    WHILE LEN(@sTemp) > 0

    BEGIN

    INSERT INTO @Table

    SELECT @Project as Project,

    SubString(@sTemp,1,

    CharIndex(@CommaDelimiter,@sTemp)-1) as customer

    SET @sTemp = RIGHT(@sTemp,

    LEN(@sTemp)-CharIndex(@CommaDelimiter,@sTemp))

    END

    RETURN

    END

    -----------------------------------------------------------------------------

    Declare @Project varchar(10)

    Set @Project = ('123')

    Select *

    from SplitCust

    (@Project,

    (Select [Customer Code]

    FROM dbo.PROJECT

    where Code in (@Project)),','

    )

    What the ultimate outcome i want to achieve is to have a table which has all the projects and customers...

    Can you guys point me in the right direction???

    Thanks

  • Thinking about it now maybe a SPROC would have been better to use

  • Why not use this splitter?

    http://www.sqlservercentral.com/articles/Tally+Table/72993/

  • I'm pretty sure you could use Jeff Moden's DelimitedSplit8K[/url] function using CROSS APPLY. Something like this:

    SELECT

    P.project AS jobnumber,

    S.item AS customer

    FROM

    dbo.Project AS P

    CROSS APPLY dbo.DelimitedSplit8K(P.code, ',') AS S

  • Just an fyi...the code you posted is what is known as a multi statement table valued function (mstvf). These are very poor performers. In fact, a typical scalar function or even a cursor will generally perform better.

    I triple the suggestion to take a look at Jeff's splitter. It is fast!!!

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (5/1/2014)


    Just an fyi...the code you posted is what is known as a multi statement table valued function (mstvf). These are very poor performers. In fact, a typical scalar function or even a cursor will generally perform better.

    I triple the suggestion to take a look at Jeff's splitter. It is fast!!!

    Thanks for the suggestion and the advice.

    I have taken a look at the splitter and it works very well.

    However the purpose of this exercise is for my own development, nothing will be implemented as a business solution, I'm just trying to find the best way to get the outcome.

    If i was to change the type of function to scalar, would this end up giving the correct dataset required?

    I have also been looking at using a row_number function to define the data and then to iterate through each line, however i only return the final line (see below)

    /****** Object: UserDefinedFunction [dbo].[SplitList] Script Date: 05/01/2014 16:51:24 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    ALTER Function [dbo].[SplitCust3]

    (

    @CommaDelimiter [varchar](1)

    )

    RETURNS @Table Table (Value [varchar](100))

    BEGIN

    Declare @CustomerCode [varchar](max)

    Declare @I INT

    Select @I = ROW_NUMBER() OVER (Order by __CODE), @CustomerCode = [Customer Code]

    From PROJECT

    Where [Customer Code] like '%,%'

    ------------------------------------------------------------------------

    ------------------------------------------------------------------------

    Set @I = 1

    WHILE @I =1-- 10

    BEGIN

    DECLARE @sTemp [varchar](max)

    SET @sTemp = ISNULL(@CustomerCode,'')

    + @CommaDelimiter

    While LEN(@sTemp) > 0

    Begin

    INSERT INTO @Table

    SELECT --@Project as Project,

    SubString(@sTemp,1,

    CharIndex(@CommaDelimiter,@sTemp)-1) as customer

    SET @sTemp = RIGHT(@sTemp,

    LEN(@sTemp)-CharIndex(@CommaDelimiter,@sTemp))

    END

    Set @I = @I+1

    END

    RETURN

    END

  • DECLARE @tblNumber TABLE

    (

    ID INT

    )

    INSERT INTO @tblNumber

    SELECT TOP 1000 ROW_NUMBER() OVER(ORDER BY S.Object_id) FROM sys.objects s,sys.objects si

    DECLARE @tbl TABLE

    (

    ID INT, [Value] NVARCHAR(1000)

    )

    INSERT INTO @tbl

    SELECT '13','Mitesh,Sohan,Oswal' UNION

    SELECT '14','Pune,Kalyan,Maharashtra' UNION

    SELECT '15','31,3110,311083' UNION

    SELECT '16','311083,31,3110'

    select t.ID,SUBSTRING(Value+',',tn.id,CHARINDEX(',',Value+',',tn.id)-tn.id)

    from @tbl t INNER JOIN @tblNumber tn

    ON LEN(t.[Value]) >=tn.id

    where SUBSTRING(','+value,tn.id,1)= ','

    Order by t.ID

    Regards,
    Mitesh OSwal
    +918698619998

  • I heard you when you said this is a learning and growth exercise and not for production code. It seems likely that whatever you come up with will form a foundation for code that you do put into production...at least some day. This type of exercise has the ability to change your expectations of performance. I know the Tally table approach sure has changed mine.

    There are, without a doubt, many ways to skin this cat. In addition to getting it to work, the very important point of performance should be one of your goals. If at all possible, my advice would be to approach this with the goal of making it an Inline Table Value Function (ITVF) for the sake of performance. The answer I would give (using DelimitedSplit8K) has already been posted, so I won't post anything further on that, but if you have the time, spend it trying the different ways to solve the problem. This process will help you grow and understand the performance implications of doing things differently.

    If I could suggest one thing, it would be to test whatever you come up with on a 1M-row physical table. I find that it tends to expose weaknesses.

    Have fun with the exercise!

  • waxb18 (5/2/2014)


    Thanks for the suggestion and the advice.

    I have taken a look at the splitter and it works very well.

    However the purpose of this exercise is for my own development, nothing will be implemented as a business solution, I'm just trying to find the best way to get the outcome.

    I'm pretty confident that Jeff's DelimitedSplit8k is going to be the "best" way using T-SQL. "Best" being defined as the fastest way to return the correct results.

    If i was to change the type of function to scalar, would this end up giving the correct dataset required?

    I have also been looking at using a row_number function to define the data and then to iterate through each line, however i only return the final line (see below)

    Changing the code to a scalar function would likely return the correct results, but do it very slowly as it will have to perform the loop through the delimited string for each row. It might give acceptable performance for 1000 rows, but wouldn't scale much beyond it. As Ed said, "There are, without a doubt, many ways to skin this cat", but Jeff's solution is the best way I've seen or used when implementing solely using T-SQL.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply