Odd behavior from SUBSTRING/CHARINDEX with CHAR(10) and CHAR(13)

  • Hello,

    I realize I'm breaking the first 5 or 6 SSC rules by not posting sample data, but this is an informational request only for now. I'll add later if need be.

    Anyway, I have this NVARCHAR(MAX) column, which contains email body data. I need to pull strings out of a couple specific places. Easy enough to set up the substring call to get me to the first position

    SUBSTRING(column_name,

    CHARINDEX('first_pos', column_name) + LEN('first_pos')

    , ...)

    But the last piece is where it gets odd. Each string I need to pull out is terminated by CHAR(13) CHAR(10) CHAR(13) CHAR(10), but when I try to feed either of those to the remainder of the substring, everything gets thrown off.

    , CHARINDEX(CHAR(13), column_name, CHARINDEX('first_pos') ) - LEN('first_pos') - CHARINDEX('first_pos', column_name) )

    So naturally I fell back to trying NCHAR(13), DATALENGTH, and feeding the positional strings in a N'', but every time, what ended up failing was using CHAR(..) as the termination point. This was also true when I, for squats and giggles, ran it up against DelimitedSplit8k. The only thing that worked at all was using REPLACE(REPLACE(column_name, CHAR(13), '||'), CHAR(10), '~~'), to verify that they were what I was actually looking at. I even used some janky code from MSDN to output the unicode CHAR number of each character in the string.

    Anyway, if you've made it this far: how do you get N/CHAR(10) and N/CHAR(13) to play nice with SUBSTRING? Is there a trick, or am I missing something horribly obvious?

    Thanks

  • Quick thought, the character combination is double windows new-line, 0x0D000A000D000A00 in Unicode or 0x0D0A0D0A in ASCII, construct a variable with this value and use charindex to search for it.

    😎

    Charindex and double new-line sample

    USE tempdb;

    GO

    SET NOCOUNT ON;

    DECLARE @NEWLINE NCHAR(4) = NCHAR(13) + NCHAR(10) + NCHAR(13) + NCHAR(10);

    DECLARE @NVTEXT NVARCHAR(MAX) = N'String with 2 new lines after this point

    and a DOUBLE newline

    before this point';

    SELECT CHARINDEX(@NEWLINE,@NVTEXT,1) AS FPOS;

    Return

    FPOS

    ------

    43

    Edit: typo

  • It's funny, using this with another column of similar data works perfectly, if I stack new line and double new line char/nchar variations in a VALUES construct and use that for positions, but on the column I was originally trying it on, it goes nowhere. I may work on it later, or just stick with my alternate solution using your idea in its place. There's just so much going on in the original column it's hard to account for all the varations. It's like entire email threads :alien:

    DECLARE @NEWLINE CHAR(2) = CHAR(13) + CHAR(10)--+ CHAR(13) + CHAR(10)

    DECLARE @DOUBLENEWLINE CHAR(4) = CHAR(13) + CHAR(10) + CHAR(13) + CHAR(10)

    DECLARE @NEWLINE_N NCHAR(2) = NCHAR(13) + NCHAR(10)--+ CHAR(13) + CHAR(10)

    DECLARE @DOUBLENEWLINE_N NCHAR(4) = NCHAR(13) + NCHAR(10) + NCHAR(13) + NCHAR(10)

    Thanks

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply