Processing strings.

  • Hello

    Processing strings.

    Examples are replace, DelimitedSplit8K.

    And lots more.

    What is a good way to process the string, but not process the quoted parts.

    declare @s-2 varchar(max) =

    'This doesn''t help. The quick brown fox ''jumps over'' the lazy dog.'

    Process the string but not the quoted part.

    (Offcourse there can be any number of quotes within the string).

    A simple example would be to split the string in element using DelimitedSplit8K, but not the quoted part.

    Ben

    The end goal is to be able to process any csv file (imported as string in SQL-server). For example :

    DECLARE @s-2 varchar(max)

    set @s-2 = 'a,b,'' ''''help, me'''', my name is ''''ben'''' '', c,d,e,''another string with ''''those'''' quotes'',g'

    PRINT '-- >>> ' +@S

    -- result of the Print.

    -- >>> a,b,' ''help, me'', my name is ''ben'' ', c,d,e,'another string with ''those'' quotes',g

    Result:

    ItemNumber string Item

    1 a

    2 b

    3 s 'help, me', my name is 'ben'

    4 c

    5 d

    6 e

    7 s another string with 'those' quotes

    8 g

    At the moment I have a 'partial' solution (no loops, no RBAR), but it is not very elegant.

  • Are you really on SQL Server 2008? If you were using SQL 2012 this article would get you 90% there:

    http://www.sqlservercentral.com/articles/SQL+Server+2012/106903/[/url]

    I'm going to play around with a couple ideas and see what I come up with but this is a tricky one in SQL Server 2008!

    Update (11:17AM CST)

    This cooked my brain, I may not be able to come up with a good 2008 solution anytime soon. This is as far as I got, perhaps someone can take this and run with it or come up with something different.

    WITH

    E1(C) AS (SELECT 1 FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1))t(c)),

    iTally(N) AS

    (

    SELECT 0 UNION ALL

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM E1 a, E1 b, E1 c, E1 d

    ),

    delimTq AS

    (

    SELECT TOP (DATALENGTH(@S)+1) rn = ROW_NUMBER() OVER (ORDER BY N), N, t = SUBSTRING(@S,N,1)

    FROM iTally

    WHERE SUBSTRING(@S,N,1) IN (@delim) OR N=0

    )

    SELECT d1.N, d2.N, SUBSTRING(@S,d1.N+1, ISNULL(d2.N,8000)-d1.N)

    FROM delimTq d1

    LEFT JOIN delimTq d2 ON d1.rn = d2.rn-1

    ORDER BY d1.rn;

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • In my opinion, you would be better fixing the process that creates the string. Something as simple as changing the delimiter would solve all the headaches.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Alan.B (3/25/2016)


    Are you really on SQL Server 2008? If you were using SQL 2012 this article would get you 90% there:

    http://www.sqlservercentral.com/articles/SQL+Server+2012/106903/[/url]

    Thanks for the link. I have to process this, that will take a bit of time. But this looks like must have read material at the least. (The Lag and Lead functions look very promising).

    I'm going to play around with a couple ideas and see what I come up with but this is a tricky one in SQL Server 2008!

    Update (11:17AM CST)

    This cooked my brain, I may not be able to come up with a good 2008 solution anytime soon. This is as far as I got, perhaps someone can take this and run with it or come up with something different.

    Thanks for the script, tried to analyse it without running it, but will be running it later on.

    In the meantime, I think I have cracked most of the problems. Maybe the performance is going to be poor on large or very large files. Maybe I can't get the very large files processed at all.

    The following I have tackled I think (I hope):

    1. Quoted fields.

    2. Delimiters (for example the komma) within the quoted fields.

    3. Quotes within the quoted fields. (As in SQL double (single) quotes).

    4. New lines (CRLF) within a quoted field (Maybe this is not needed in a proper CSV, but included that in the code anyway, to be sure.

    Still have to polish the code (a lot) and see what the performance is. And if it can still process huge CSV files in an acceptable time.

    After polishing the code, I will present it here.

    And see if the link brings extra idea's, technologie of even a full solution, with better performance.

    I am strongly doubting the performance I can reach with my current version. Because of the quoted fields that's get's tackled first. But if there are no quoted fields the process might be slowed by the size of the 'string'. So I still need further cleaning an testing of the code.

    (No RBAR, No While statements, use the splitdelimiter8K a few times).

    Maybe some of the splitting can be combined.

    Thanks for your script and your link,

    Ben

  • Luis Cazares (3/25/2016)


    In my opinion, you would be better fixing the process that creates the string. Something as simple as changing the delimiter would solve all the headaches.

    Probably most CSV's won't be a problem. Using double and or single quotes for different functions. That the delimiter (for example the comma, or the ';'), can't be used in a free format text probably will be a problem. In free text one can not exclude specific characters (in my opinion).

    Also I do not have control in which format the CSV's will be delivered. Most will be no problem. But usage of a comma or a semicolon for CSV delimiter and also within texts is to be expected.

    Another point is probably the SQLserver environment is not the best environment for parsing CSV files. But because I have not seen yet a generic solution, I am going to give it a go. (See what's happens).

    Thanks for your time and attention.

    And by the way I think you are probably right.

    Ben

  • ben.brugman (3/25/2016)


    Luis Cazares (3/25/2016)


    In my opinion, you would be better fixing the process that creates the string. Something as simple as changing the delimiter would solve all the headaches.

    Probably most CSV's won't be a problem. Using double and or single quotes for different functions. That the delimiter (for example the comma, or the ';'), can't be used in a free format text probably will be a problem. In free text one can not exclude specific characters (in my opinion).

    Also I do not have control in which format the CSV's will be delivered. Most will be no problem. But usage of a comma or a semicolon for CSV delimiter and also within texts is to be expected.

    Another point is probably the SQLserver environment is not the best environment for parsing CSV files. But because I have not seen yet a generic solution, I am going to give it a go. (See what's happens).

    Thanks for your time and attention.

    And by the way I think you are probably right.

    Ben

    It doesn't need to be a comma or a semicolon. You could use different control characters that are unlikely to be in any current text. Read the "Unlikely Characters and Collation" from this article[/url] for examples.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (3/25/2016)


    It doesn't need to be a comma or a semicolon. You could use different control characters that are unlikely to be in any current text. Read the "Unlikely Characters and Collation" from this article[/url] for examples.

    ben.brugman (3/25/2016)


    Also I do not have control in which format the CSV's will be delivered.

    Most likely the seperation character wil be a colon or a semicolon. It's is very unlikely that this character is not in the 'content'.

    Even if it was a 'rare' character, it will not often appear in the content, but I have seen enough content, that if a character is allowed it will be used somewhere. (And even if a character is not allowed it is sometimes used).

    Thanks for the link, that does help,

    Ben

  • Alan.B (3/25/2016)


    Are you really on SQL Server 2008? If you were using SQL 2012 this article would get you 90% there:

    http://www.sqlservercentral.com/articles/SQL+Server+2012/106903/[/url]

    I would say more like 99.99% there:-D

    😎

    The article's code as a function

    IF OBJECT_ID(N'dbo.ITVFN_SPLIT_TQCSV_8K') IS NOT NULL DROP FUNCTION dbo.ITVFN_SPLIT_TQCSV_8K;

    GO

    CREATE FUNCTION dbo.ITVFN_SPLIT_TQCSV_8K

    (

    @pString VARCHAR(8000) --= '';

    ,@pDelimiter CHAR(1) --= ',';

    ,@pTxtQualifier CHAR(1) --= '''';

    )

    RETURNS TABLE

    WITH SCHEMABINDING

    AS

    /*********************************************************************

    Splitting and parsing a CSV with "text qualifiers"

    Based on the code from http://www.sqlservercentral.com/articles/SQL+Server+2012/106903/

    2014/03/24 Eirikur Eiriksson

    Using Jeff Moden's DelimitedSplit8K as a base with the addition

    of a Text Qualifier parameter, @pTxtQualifier CHAR(1) = '"'

    *********************************************************************

    cteTally, inline Tally table returning a number sequence equivalent

    to the length of the input string.

    *********************************************************************/

    RETURN

    (

    WITH E1(N) AS ( SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS X(N)),

    E2(N) AS (SELECT 1 FROM E1 a, E1 b),

    E4(N) AS (SELECT 1 FROM E2 a, E2 b),

    cteTally(N) AS

    (

    SELECT

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM E4

    ORDER BY 1 OFFSET 0 ROWS

    FETCH FIRST CAST(LEN(@pString) AS BIGINT) ROWS ONLY

    )

    /********************************************************************

    Retrieve the position (N) and the character code (chrCode)

    for all delimiters (@pDelimiter) and text qualifiers

    (@pTxtQualifier)

    ********************************************************************/

    ,ctePrimer(N,chrCode) AS

    (

    SELECT

    t.N

    ,UNICODE(SUBSTRING(@pString,t.N,1)) AS chrCode

    FROM cteTally t

    WHERE SUBSTRING(@pString,t.N,1) COLLATE Latin1_General_BIN

    = @pDelimiter COLLATE Latin1_General_BIN

    OR SUBSTRING(@pString,t.N,1) COLLATE Latin1_General_BIN

    = @pTxtQualifier COLLATE Latin1_General_BIN

    )

    /********************************************************************

    The cteStart encloses the string in virtual delimiters using

    Union All at the beginning and the end. The main body sets the

    IsDelim and IsTxQf flags.

    ********************************************************************/

    ,cteStart(N,IsDelim,IsTQA) AS

    (

    SELECT

    0 AS N

    ,1 AS IsDelim

    ,0 AS IsTxQf

    UNION ALL

    SELECT

    t.N

    ,(1 - SIGN(ABS(t.chrCode - UNICODE(@pDelimiter)))) AS IsDelim

    ,(1 - SIGN(ABS(t.chrCode - UNICODE(@pTxtQualifier)))) AS IsTxQf

    FROM ctePrimer t

    UNION ALL

    SELECT

    LEN(@pString) + 1 AS N

    ,1 AS IsDelim

    ,0 AS IsTxQf

    )

    /********************************************************************

    cteWorkSet:

    Position (N), Delimiter flag (IsDelim), Text Qualifier flag

    (IsTQA) and the running total of the number of appearances of

    Text Qualifiers. The delimiters which are inside Text Qualifiers

    are cancelled out by multiplying the IsDelim flag with the result

    of ( 1 + the running total of IsTQA ) mod 2.

    ********************************************************************/

    ,cteWorkSet(N,IsDelim,IsTQA) AS

    (

    SELECT

    cST.N

    ,cST.IsDelim * ((1+ SUM(cST.IsTQA) OVER

    (PARTITION BY (SELECT NULL) ORDER BY cST.N

    ROWS UNBOUNDED PRECEDING)) % 2) AS IsDelim

    ,((SUM(cST.IsTQA) OVER (PARTITION BY (SELECT NULL)

    ORDER BY cST.N ROWS UNBOUNDED PRECEDING)) % 2) AS IsTQA

    FROM cteStart cST

    ),

    /********************************************************************

    cteWSTQ:

    Using LEAD and LAG to retrieve the offsets for the Text Qualifiers

    and filtering the results by IsDelim = 1 or IsTQA = 1. The set now

    holds all the information needed for correctly splitting the text.

    ********************************************************************/

    cteWSTQ(P_START,IsDelim,NEXT_IsTQA,LAG_IsTQA) AS

    (

    SELECT

    cWS.N AS P_START

    ,cWS.IsDelim AS IsDelim

    ,LEAD(cWS.IsTQA,1,0) OVER (ORDER BY cWS.N) AS NEXT_IsTQA

    ,LAG(cWS.IsTQA,1,0) OVER (ORDER BY cWS.N) AS LAG_IsTQA

    FROM cteWorkSet cWS

    WHERE cWS.IsDelim = 1

    OR cWS.IsTQA = 1

    )

    /********************************************************************

    cteWSLEN:

    Calculate the start and the length of each field

    ********************************************************************/

    ,cteWSLEN(P_START,P_LEN) AS

    (

    SELECT

    (X.P_START + X.NEXT_IsTQA + SIGN(X.P_START)) AS P_START

    ,(LEAD(X.P_START,1,0) OVER (ORDER BY X.P_START) -

    ((X.P_START + X.NEXT_IsTQA) + SIGN(X.P_START) +

    LEAD(X.LAG_IsTQA,1,0) OVER (ORDER BY X.P_START))) AS P_LEN

    FROM cteWSTQ X WHERE X.IsDelim = 1

    )

    /********************************************************************

    Splitting the string using the output of the cteWSLEN, filtering

    it by the length being non-negative value. The NULLIF returns NULL

    if the field is empty.

    ********************************************************************/

    SELECT

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS ItemNumber

    ,NULLIF(REPLACE(SUBSTRING(@pString,cWL.P_START,cWL.P_LEN),@pTxtQualifier+@pTxtQualifier,@pTxtQualifier),'') AS Item

    FROM cteWSLEN cWL

    WHERE cWL.P_LEN > -1

    )

    Usage with the posted example string

    DECLARE @TQCSV VARCHAR(8000) = 'a,b,'' ''''help, me'''', my name is ''''ben'''' '', c,d,e,''another string with ''''those'''' quotes'',g';

    SELECT

    CSVX.ItemNumber

    ,CSVX.Item

    FROM dbo.ITVFN_SPLIT_TQCSV_8K(@TQCSV,CHAR(44),CHAR(39)) CSVX

    Output

    ItemNumber Item

    ----------- -----------------------------------

    1 a

    2 b

    3 'help, me', my name is 'ben'

    4 c

    5 d

    6 e

    7 another string with 'those' quotes

    8 g

  • Eirikur Eiriksson (3/26/2016)


    Alan.B (3/25/2016)


    Are you really on SQL Server 2008? If you were using SQL 2012 this article would get you 90% there:

    http://www.sqlservercentral.com/articles/SQL+Server+2012/106903/[/url]

    I would say more like 99.99% there:-D

    😎

    The article's code as a function

    First of all THANKS :w00t::-D:hehe:

    Sorry for a a response this late. I have been on a loooong Easter weekend. (And at home I only have access to a 2008 version and no access to the data I'll be working with.)

    The above code looks much better than the point which I did get at last Friday. I'll clean up that code and publish that here (today hopefully).

    What I did add in my code was CRLF code within the quoted elements. And some counters for the 'rownumber' and the elementnumber within the row. Did read up on some links provided here. And saw that CRLF within an element is legal. What I also saw was the both double quotes and single quotes can be used around elements. It didn't explicitly say that only one or the other could be used or that they can be used in the same file. (Although everybody would strongly advise against that.)

    So for today:

    1. Clean up my (working but horrible) version and publish this.

    2. Get the above code working. (In 2008 and adding the CRLF construction).

    (3. Run some performance tests with larger CSV files).

    Later On:

    4. Create a table from a CSV file.

    5. Process a complete folder with CSV files.

    Again ThAnKs,

    Ben

    A sneak preview on my 'horrible' code. For each step I use a temporary table. This helps with understanding what is going on. And tackle the problem in reasonable chuncks. And does provide a proof of concept.

  • Why would you want to do string processing using TSQL ? Please see the attached image. I wrote this in C# in less than 20 minutes.It can be then easily integrated as CLR .

    --------------------------------------------------------------------------------------------------
    I am just an another naive wannabe DBA trying to learn SQL Server

  • Sachin Nandanwar (3/31/2016)


    Why would you want to do string processing using TSQL ? Please see the attached image. I wrote this in C# in less than 20 minutes.It can be then easily integrated as CLR .

    Well done, looks like a good option for SQL Server 2008

    😎

  • Sachin Nandanwar (3/31/2016)


    Why would you want to do string processing using TSQL ? Please see the attached image. I wrote this in C# in less than 20 minutes.It can be then easily integrated as CLR .

    The main reason for not doing this in CLR is that I am not familiar with CLR. Yes I know what is it is but do not know:

    How to get it in a CLR function.

    How to deploy this to other machines and sites.

    What security issues there are with CLR.

    And how to process a large CSV file into a table.

    So that is the main reason for not using CLR.

    My C is rusty, my C# is very limited. And even if I was a bit more familiar with C#, there still would be the problem that I am not familiar with how to get the 'table' produced in CLR into the database.

    So that is the main reason, it will take me more time to learn this than I have available at the moment.

    But CLR is probably a far more efficient solution to build for somebody with the knowledge and it probably performs better as wel.

    For a bit more extensive answer see below the signature.

    Thanks for the example and the suggestion,

    Ben

    A more extensive anwser (for the people who are interrested in this):

    Another reason is RBAR.

    Or I like the SQL set based principle.

    And I do like to solve problems within the SQL environment. Maybe only to see if this can be done. Although at the moment we require a generic CSV importer. As far as I know this is not available in SSIS. And I could not find a generic CSV importer on the internet.

    (Maybe a commercial product like Altova can do batch .CSV imports into SQLserver, but I do not have convermation for that).

    The request example did not include multiple lines which are 'normal' for a CSV file. Multiple rows as is. This is only a problem if there can be value's which have multiple lines.

    Also the example was 'short'. In reality CSV files could get large. In my example set the largest is about 500 Mb.

    If this was a simple problem (99.99 % done), there probable would be a solution just to be used in this forum or on the internet.

    So as a challenge I thought, well if it is not there why not build it (within the technology that I am familiar with.) Thanks for the interrest.

  • The issue with custom text parsing functions in TSQL is that they are very cumbersome to maintain and one needs to be highly proficient in TSQL to very well understand each and every line of them to tweak them in case of any modifications but at the same time any benign C# programmer can easily customize the equivalent C# code.If my C# code posted earlier cannot parse different type of data format I can change it relatively quickly to fulfill the new format.Now imagine doing same in a custom TSQL function.Another issue would be logging of the errors.Consider a million row CSV file having 50 columns and if the data on 129474 row in column number 23 could not be parsed how is that going to be logged as an error in a text file and then continue with processing the rest of the rows using TSQL.I know it is possible with but implementing this functionality with TSQL makes me tremble while I can do that in C# with relative ease.

    Possibly only a couple of custom CSV parser are available in TSQL with the most famous one being CSVSplitter but when you search on github there are tonnes of equivalent C# libraries to handle all types of requirements apart from the inbuilt native C# libraries.FileHelpers,CSVReader,CSVParsers to name a few.I had come across a CSV parser written in Python in one of the clients application and it was so fast that it would give all the .Net CSV libraries run for it's money.

    Now sure if this is relevant but in one instance I had my hands burned when I tried to show of my XQuery skills for processing XML data in SQL Server.When the XML data was relatively small it simply excelled but as the files got bigger and bigger my worst fears came true and the parsing became excruciatingly slower and slower though the native TSQL XQuery functions were used and all possible XML indexes that I could think of were implemented.But when one of the C# developer wrote the equivalent C# code using native XML libraries in C# the performance was completely untouchable.While TSQL took about 8-10 secs to process around 100 XML files its C# equivalent could easily handle 300+ plus with the additional benefit of serializing the parsed text as networkStream to be sent over TCP and to top it of the C# code was easy to understand and maintain as compared my XQuery crap.

    I would simply would not implement a TSQL solution for processing files around 500 MB to save myself from further pain down the line.

    --------------------------------------------------------------------------------------------------
    I am just an another naive wannabe DBA trying to learn SQL Server

  • Sachin Nandanwar (3/31/2016)


    Why would you want to do string processing using TSQL ? Please see the attached image. I wrote this in C# in less than 20 minutes.It can be then easily integrated as CLR .

    I can't see the "ELSE" on that and correct me if I'm wrong, but that looks like it might not handle trailing delimiters correctly... which is why I do it in T-SQL. That way I don't have to rely on someone doing it possibly wrong in C#. Heh... and no... not going to take the time to learn C# for the edge cases where I might need it. 😛 I'm too busy fixing things in T-SQL that were written by experts in C#. :hehe:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As Promissed this is what I reached on Friday, but now cleaned.

    It is still not 'complete'.

    Known problems:

    Performance.

    In some situations where a string is started with a double qoute, this is not handled correctly.

    (Not sure, I think it is allowed to use a single qoute for one elemente and then use a double qoute for the next element. My solution (as others here) do not handle this).

    Haven't had the time to convert the given solution.

    Replace2/Replace3 does do multiple replaces in one call.

    Why not in CLR.

    This has to be answered later, because security is removing me from the building. (Yesterday)

    (Today) Later sorry. So NOW it's the next Morning, bit later than planned.

    Ben

    -----------------------------------------------------------------------------------------------------

    -- stef ten bras AKA Ben Brugman

    -- 20160325

    --

    -- http://www.sqlservercentral.com/Forums/Topic1772477-391-1.aspx?Update=1

    -----------------------------------------------------------------------------------------------------

    -- Create a single string variable as a CSV file.

    declare @s-2 varchar(max)

    -----------------------------------------------------------------------------------------------------

    -- Create NewLine variabele for CRLF

    DECLARE @NL VARCHAR(2)

    set @NL = char(13)+char(10)

    DECLARE @Delim char(1) = ',' -- Delimitor character.

    DECLARE @TxtQualifier char(1) = '''' -- Text Qualifier

    DECLARE @TxtQualifier2 char(2) = '''''' -- Two Text Qualifiers

    -----------------------------------------------------------------------------------------------------

    -- Test String(s)

    -----------------------------------------------------------------------------------------------------

    -- Multiple rows.

    -- Each row contains 8 elements.

    --

    set @s-2 =

    -- Header row with the name of the 'fields'

    'var_a,var_b,var_string,var_c,var_empty,e,var_string2,var_g'+

    -- First value row with numbers.

    '

    1,2,3,4,5,6,7,8'+

    -- Second row with 'complex' values-

    '

    a,b,'' ''''help, me'''',

    my

    name

    is ''''ben'''' '', c,,e,''another

    string with ''''those'''' quotes'',g'

    +

    -- Third row.

    '

    a2,b2,'' ''''help2, me2'''',

    my2

    name2

    is2 ''''ben2'''' '', c2,,e2,''another2

    string2 with2 ''''those2'''' quotes2'',g2'

    -- Another possible testset (4 rows with 4 values). (Remove the x)

    --SET @s-2=

    --'a,b,c,d

    -- a2,b2,c2,d2

    -- a3,b3,''c3'',d3

    -- a4,b4,c4,d4'

    -----------------------------------------------------------------------------------------------------

    -----------------------------------------------------------------------------------------------------

    -- Take care of all double qoutes.

    set @s-2 = Replace(@S,@TxtQualifier2,'&dq')

    -- Mark all remaining qoutes. Remark string becomes larger.

    set @s-2 = Replace(@S,@TxtQualifier,'&sqo''&sqs') -- & is used as an escape character.

    -- Split on each 'remaining' quote, each row is 'numbered'. Odd is an open quote, even is a close quote.

    select itemnumber OddEvenNr, item OddEvenItem into #split_quote from master.dbo.DelimitedSplit8K(@S,@TxtQualifier)

    --

    -- Replace CRLF with a single character.

    -- (DelimitedSplit8K can only work with single characters.)

    --

    update #split_quote SET OddEvenItem = REPLACE(OddEvenItem,@NL, '|')

    --

    -- Split the different rows. Do not split when the 'OddEvenNr' is odd.

    --

    SELECT

    ROW_NUMBER() OVER(PARTITION BY 1 ORDER BY OddEvenNr, LineSubNumber) OrderItem

    , (ROW_NUMBER() OVER(PARTITION BY 1 ORDER BY (OddEvenNr))-OddEvenNr) LineNumber

    , OddEvenNr,OddEvenItem,Subitem into #split_line

    FROM (select

    S.*, k.Item SubItem, K.ItemNumber LineSubNumber

    from #split_quote s cross apply master.dbo.DelimitedSplit8K(OddEvenItem,'|') k

    WHERE S.OddEvenNr <> s.OddEvenNr/2*2

    UNION

    SELECT s.*, s.OddEvenItem SubItem,s.OddEvenNr LineSubNumber from #split_quote s WHERE S.OddEvenNr = s.OddEvenNr/2*2

    ) XXX ORDER BY OrderItem

    select * into #split_delim from (

    select S.*

    , x.ItemNumber DelimNr

    , x.item DelimItem

    from #split_line S cross apply master.dbo.DelimitedSplit8K(SubItem,@delim) X where OddEvenNr <> OddEvenNr/2*2

    union

    select S.*

    , 0

    , SubItem DelimItem

    from #split_line S where OddEvenNr = OddEvenNr/2*2

    ) xxx order by OrderItem, DelimItem

    --

    -- Remove the extra generated open and close text qualifiers.

    --

    delete #split_delim where DelimItem in ('&sqo','&sqs')

    --

    -- show the result. Characters which are escaped are set to their values.

    --

    select DelimItem

    , LineNumber

    , row_number() over (PARTITION BY LineNumber ORDER BY OrderItem, DelimNr) elementnumber

    into #CSV_Elements

    from #split_delim order by LineNumber, elementnumber

    -- Show result

    update #CSV_Elements set DelimItem = REPLACE(DelimItem, '|',' ') -- End of Line

    update #CSV_Elements set DelimItem = REPLACE(DelimItem, '&sqs','''') -- Single Quote (Open)

    update #CSV_Elements set DelimItem = REPLACE(DelimItem, '&sqo','''') -- Single Quote (Close)

    update #CSV_Elements set DelimItem = REPLACE(DelimItem, '&dq','''''') -- Double Qoute "

    select * from #CSV_Elements

    -- Clean Up

    drop table #split_quote

    drop table #split_line

    drop table #split_delim

    drop table #CSV_Elements

    Improvements: CTE construction, eliminate some of the replaces.

Viewing 15 posts - 1 through 15 (of 62 total)

You must be logged in to reply to this topic. Login to reply