SUBSTRING and CHARINDEX query.... please help.

  • Sergiy - Thursday, June 29, 2017 2:09 AM

    Just tried to load CSV file with freaky quoted values into LibreOffice Calc - works perfectly.
    MS could contract those programmers...

    I only know of one RFC-4180 compliant parser function in T-SQL and that is the one I wrote few years back, always found it strange that this essential functionality is missing in SQL Server.
    😎 

    The function
    CREATE OR ALTER FUNCTION dbo.ITVFN_CSV_SPLITTER_4180_8K
    (
      @pString   VARCHAR(8000)
     ,@pDelimiter  CHAR(1)  
     ,@pTxtQualifier CHAR(1)  
    )
    RETURNS TABLE
    WITH SCHEMABINDING
    AS
    RETURN
    -- *********************************************************************
    --  RFC-4180 compliant CSV splitter
    --  Eirikur Eiriksson
    --
    --  This function is used for Comma-Separated Values (CSV) of MIME
    --  type "text/csv".
    --
    --  Common Format and MIME Type for Comma-Separated Values (CSV) Files
    --  https://tools.ietf.org/html/rfc4180
    --
    --  First published 2014/07/24
    --  http://www.sqlservercentral.com/articles/SQL+Server+2012/106903/
    --
    -- *********************************************************************
    --  USAGE
    --
    --  DECLARE @pString   VARCHAR(8000) = ''
    --  DECLARE @pDelimiter  CHAR(1)   = ',';
    --  DECLARE @pTxtQualifier CHAR(1)   = '"';
    --  SELECT @pString = '100123,"Blues","West Point, Mississippi","Burnett,'
    --        + '""Howlin'' Wolf"" Arthur Chester",1910-06-10,';
    --  SELECT
    --   CSV.ItemNumber
    --   ,csv.Item
    --  FROM dbo.ITVFN_CSV_SPLITTER_4180_8K(@pString,@pDelimiter,@pTxtQualifier) CSV
    -- *********************************************************************
    --  Inline Tally table returning a number sequence equivalent
    --  to the length of the input string. The Cast of the Len() to a
    --  Bigint prevents an implicit cast.
    -- *********************************************************************
    WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    , NUMS(N) AS (SELECT TOP(CONVERT(BIGINT,LEN(@pString),0)) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
         FROM T T1,T T2,T T3,T T4)
    -- ********************************************************************
    -- Retrieve the position (N) and the character code (chrCode)
    -- for all delimiters (@pDelimiter) and text qualifiers
    -- (@pTxtQualifier)
    -- ********************************************************************
    ,ctePrimer(N,chrCode) AS
    (
    SELECT
      t.N
      ,UNICODE(SUBSTRING(@pString,t.N,1)) AS chrCode
    FROM NUMS t
    WHERE SUBSTRING(@pString,t.N,1) COLLATE Latin1_General_BIN
          = @pDelimiter  COLLATE Latin1_General_BIN
    OR  SUBSTRING(@pString,t.N,1) COLLATE Latin1_General_BIN
          = @pTxtQualifier COLLATE Latin1_General_BIN
    )
    -- ********************************************************************
    -- The cteStart encloses the string in virtual delimiters using
    -- Union All at the beginning and the end. The main body sets the
    -- IsDelim and IsTxQf flags.
    -- ********************************************************************
    ,cteStart(N,IsDelim,IsTQA) AS
    (
    SELECT
      0       AS N
      ,1       AS IsDelim
      ,0       AS IsTxQf
    UNION ALL
    SELECT
      t.N
      ,(1 - SIGN(ABS(t.chrCode - UNICODE(@pDelimiter))))  AS IsDelim
      ,(1 - SIGN(ABS(t.chrCode - UNICODE(@pTxtQualifier)))) AS IsTxQf
    FROM ctePrimer t
    UNION ALL
    SELECT      
      LEN(@pString) + 1 AS N
      ,1       AS IsDelim
      ,0       AS IsTxQf
    )
    -- ********************************************************************
    -- Position (N), Delimiter flag (IsDelim), Text Qualifier flag
    -- (IsTQA) and the running total of the number of appearance of
    -- Text Qualifiers
    -- ********************************************************************
    ,cteWorkSet AS
    (
      SELECT
       cST.N
      ,cST.IsDelim
      ,cST.IsTQA
      ,SUM(cST.IsTQA)
       OVER (ORDER BY cST.N
       ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS TQRT
      ,CASE
       WHEN SUM(cST.IsTQA)
       OVER (ORDER BY cST.N
       ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) % 2 = 1 THEN 'X'
          ELSE ''
       END AS Active
      FROM cteStart  cST
    )
    -- ********************************************************************
    -- Determine the token type and the types of the preceding and
    -- following tokens.
    -- ********************************************************************
    ,cteWSTQ(P_START,IsDelim,NEXT_IsTQA,LAG_IsTQA) AS
    (
    SELECT
      cWS.N             AS P_START
      ,cWS.IsDelim           AS IsDelim
      ,LEAD(cWS.IsTQA,1,0) OVER (ORDER BY cWS.N) AS NEXT_IsTQA
      ,LAG(cWS.IsTQA,1,0) OVER (ORDER BY cWS.N) AS LAG_IsTQA
    FROM cteWorkSet  cWS
    WHERE cWS.IsDelim = 1
    OR  cWS.IsTQA = 1
    )
    -- ********************************************************************
    -- Calculate the start position and the length of each element
    -- ********************************************************************
    ,cteWSLEN(P_START,P_LEN) AS
    (
    SELECT
      (X.P_START + X.NEXT_IsTQA + SIGN(X.P_START))   AS P_START
      ,(LEAD(X.P_START,1,0) OVER (ORDER BY X.P_START) -
      ((X.P_START + X.NEXT_IsTQA) + SIGN(X.P_START) +
      LEAD(X.LAG_IsTQA,1,0) OVER (ORDER BY X.P_START))) AS P_LEN
    FROM cteWSTQ X WHERE X.IsDelim = 1
    )
    -- ********************************************************************
    -- Splitting the string using the output of the cteWSLEN, filtering
    -- it by the length being non-negative value. The NULLIF returns NULL
    -- if the field is empty.
    -- 
    -- ********************************************************************

    SELECT
     ROW_NUMBER() OVER (ORDER BY @@VERSION)    AS ItemNumber
    ,NULLIF(REPLACE(SUBSTRING(@pString,cWL.P_START,cWL.P_LEN),@pTxtQualifier+@pTxtQualifier,@pTxtQualifier),'') AS Item
    FROM cteWSLEN  cWL
    WHERE cWL.P_LEN > -1;
    GO

  • OMG... after working with SQL Server for decades, MS has finally done something about one of my decades-old wishes.  They've made OPENROWSET handle "true CSV" (RFC 4180 compliant) files.  Too bad it's only for 2017.  Look for "FORMAT = 'CSV'" without the double quotes in the following article.
    https://docs.microsoft.com/en-us/sql/t-sql/functions/openrowset-transact-sql

    Now if we could just get them to do the same thing with the much faster BCP and BULK INSERT functionality along with allowing you to pass a variable for things like the file path, etc.  It would be a big help if they also did such a thing for OPENROWSET.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Saturday, July 8, 2017 7:34 PM

    OMG... after working with SQL Server for decades, MS has finally done something about one of my decades-old wishes.  They've made OPENROWSET handle "true CSV" (RFC 4180 compliant) files.  Too bad it's only for 2017.  Look for "FORMAT = 'CSV'" without the double quotes in the following article.
    https://docs.microsoft.com/en-us/sql/t-sql/functions/openrowset-transact-sql

    Now if we could just get them to do the same thing with the much faster BCP and BULK INSERT functionality along with allowing you to pass a variable for things like the file path, etc.  It would be a big help if they also did such a thing for OPENROWSET.

    Heh - they already let you use a variable for the filename in BULK INSERT.  Assign the statement to a variable. 😉  It's amazing what we've gotten use to doing to get around the limitations imposed on us, isn't it?

  • Jeff Moden - Saturday, July 8, 2017 7:34 PM

    OMG... after working with SQL Server for decades, MS has finally done something about one of my decades-old wishes.  They've made OPENROWSET handle "true CSV" (RFC 4180 compliant) files.  Too bad it's only for 2017.  Look for "FORMAT = 'CSV'" without the double quotes in the following article.
    https://docs.microsoft.com/en-us/sql/t-sql/functions/openrowset-transact-sql

    Now if we could just get them to do the same thing with the much faster BCP and BULK INSERT functionality along with allowing you to pass a variable for things like the file path, etc.  It would be a big help if they also did such a thing for OPENROWSET.

    Thanks for the heads up Jeff!
    😎

  • Ed Wagner - Saturday, July 8, 2017 10:51 PM

    Jeff Moden - Saturday, July 8, 2017 7:34 PM

    OMG... after working with SQL Server for decades, MS has finally done something about one of my decades-old wishes.  They've made OPENROWSET handle "true CSV" (RFC 4180 compliant) files.  Too bad it's only for 2017.  Look for "FORMAT = 'CSV'" without the double quotes in the following article.
    https://docs.microsoft.com/en-us/sql/t-sql/functions/openrowset-transact-sql

    Now if we could just get them to do the same thing with the much faster BCP and BULK INSERT functionality along with allowing you to pass a variable for things like the file path, etc.  It would be a big help if they also did such a thing for OPENROWSET.

    Heh - they already let you use a variable for the filename in BULK INSERT.  Assign the statement to a variable. 😉  It's amazing what we've gotten use to doing to get around the limitations imposed on us, isn't it?

    😉 There is no spoon.  There are no limitations.  It's all a workaround. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 5 posts - 31 through 34 (of 34 total)

You must be logged in to reply to this topic. Login to reply