Parse multiple dates from Text Field

  • I have a text field containing from one to x number of dates along with comments such as:

    '8/9/08 - out for upgrade; 8/16/08 returned to inventory. 8/31/2008 - deployed to field.'

    I need to extract all dates, as well as all associated status notes from the table into separate DATE & TEXT fields into a new target table.

    Any suggestions as to how to accomplish this would be MUCH appreciated!

    Thank you!

  • By "text" field, do you mean varchar? If so, is it in a SQL Server table? Is there a common delimiter among date/status groupings, like a semicolon or comma? Is there one status field per date?

    There is no "i" in team, but idiot has two.
  • first get everything into a temp table, and make sure you convert to varchar(max)...it's a lot of trouble fiddling with TEXT datatypes.

    something simply like SELECT convert(varchar(max),YourTextField) as BetterTextField,* into #tmp From YourTable

    then you can do something with charindex and pattern matching, like '%[0-9]/[0-9]/[0-9][0-9][0-9][0-9]%' for1/1/xxxx type dates, with multiple passes for two digit months and year combinations.

    That should help to find the start of your date and comment, but you've got to be able to figure out where one item ends and the next begins.

    is it always {date} {space}some text ?

    if we can establish a pattern, we can use a tally table to pull out all the strings, then parse the strings for date/comment separately. I've done that parsing and split for web pages and dictionary definitions and similar stuff.

    can you give us some sample data for the text fields?

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • If you have access to visual studio .NET it may be worth looking at a CLR function. You will have access to the regular expression class which will make pattern maching easier.

    http://blogs.msdn.com/sqlclr/archive/2005/06/29/regex.aspx

    I found that one useful.

  • Thank you Dave, Lowell & Tom for your replies.

    Here are the answers to your questions:

    Dave:

    The source table is SQL Server, and the field is TEXT.

    Unfortunately there is no common delimeter in the field, so I will have to take multiple passes for each type of entry - it may be a space, a comma or a semicolon. And there is a status for each date entry.

    Lowell:

    Thank you for the info on the trouble with the TEXT datatype. I'll take your recommendation to load to a temp table and convert it to varchar.

    I did already start down the path of the CHARINDEX with the '%[0-9]/[0-9]/[0-9][0-9][0-9][0-9]%' , so guess I was headed down the right path afterall.

    Here are a couple of actual entries that show highlight the challenge, NO consistancy!:

    08/04/2003 - REC'D JOB FROM KAYLA - REQUEST FOR PDC TO GET R/W; 9/7/04 JON CANCELLED PER CRAIG BULLOCK BECAUSE OWNER CANNOT BE REACHED (LISA MURPHY OF PDC)

    08/21/2003 - RESEARCH GIGI. NEEDS EXHIBIT A & R/W TO BE OBTAINED BY PDC. 09/12/2003: REC'D NOTE FROM PDC THAT SURVEY WAS COMPLETED.09/16/2003 - FIELD NOTES TO MI FILE, COPY OF DWG SENT TO J. TODD

    Tom:

    I do have access to Visual Studio, just have minimal knowledge of using it. I'll check out the link and see if I can figure it out as well.

  • I'm fiddling with this, and if we have consistent delimiters, it's REALLY easy to pull out with a tally table.

    my dilemma is that it looks like multiple passes of updates would have to occur to insert delimiters before and after each possible date pattern.

    here's what I'm trying so far: I'm trying to exclude a charindex where the pipe character is part of the pattern...i thought [^\|] woudl work, but it is not...

    see if this gets anyone's ideas working better than mine:

    create table #example(exampleid int,

    exampletext varchar(max)

    )

    insert into #example

    SELECT 1,'08/04/2003 - REC''D JOB FROM KAYLA - REQUEST FOR PDC TO GET R/W; 9/7/04 JON CANCELLED PER CRAIG BULLOCK BECAUSE OWNER CANNOT BE REACHED (LISA MURPHY OF PDC)'

    UNION ALL

    SELECT 2,'08/21/2003 - RESEARCH GIGI. NEEDS EXHIBIT A & R/W TO BE OBTAINED BY PDC. 09/12/2003: REC''D NOTE FROM PDC THAT SURVEY WAS COMPLETED.09/16/2003 - FIELD NOTES TO MI FILE, COPY OF DWG SENT TO J. TODD'

    UNION ALL

    SELECT 1,'8/9/08 - out for upgrade; 8/16/08 returned to inventory. 8/31/2008 - deployed to field.'

    --x/x/xx dates

    UPDATE #example

    SET exampletext = SUBSTRING(exampletext,1,CHARINDEX('[^\|][0-9]/[0-9]/[0-9][0-9][^0-9]',exampletext))

    + '|'

    + SUBSTRING(exampletext,CHARINDEX('[^\|][0-9]/[0-9]/[0-9][0-9][^0-9]',exampletext),8000)

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Tim,

    If you can use something other than T-SQL to build a solution to your problem (C#, VB) it will be easier. If so, you should look at regular expressions that can find the date values and insert a delimiter. Then take whatever is left as the comments. I've done a few gigs like this, and always used C# or VB6 to read the data, perform the transformation, and write to a table. Good luck.

    There is no "i" in team, but idiot has two.
  • Hi Tim

    I took Lowells advice to use a tally table based solution in a nested style (first seen from Bob).

    @lowell:

    Thanks for the test data!

    This should work. I just don't know about the performance with very large data or a large count of rows.

    [font="Courier New"]DECLARE @t TABLE (Id INT NOT NULL IDENTITY, Txt TEXT)

    INSERT INTO @t

                 SELECT '8/9/08 8/31/08'

       UNION ALL SELECT '08/04/2003 - REC''D JOB FROM KAYLA - REQUEST FOR PDC TO GET R/W; 9/7/04 JON CANCELLED PER CRAIG BULLOCK BECAUSE OWNER CANNOT BE REACHED (LISA MURPHY OF PDC)'

       UNION ALL SELECT '08/21/2003 - RESEARCH GIGI. NEEDS EXHIBIT A & R/W TO BE OBTAINED BY PDC. 09/12/2003: REC''D NOTE FROM PDC THAT SURVEY WAS COMPLETED.09/16/2003 - FIELD NOTES TO MI FILE, COPY OF DWG SENT TO J. TODD'

       UNION ALL SELECT '8/9/08 - out for upgrade; 8/16/08 returned to inventory. 8/31/2008 - deployed to field.'

    ; WITH

    t (Id, Txt) AS

    (

       SELECT

             Id,

             ',' + CONVERT(VARCHAR(MAX), Txt) + ','

          FROM @t

    )

    SELECT

          Id,

          t2.Item

       FROM t

          CROSS APPLY

             (

                SELECT

                      CONVERT(DATETIME, SUBSTRING(t.Txt, tally1.N + 1, tally2.N - (tally1.N + 1))) Item

                   FROM Tally tally1

                      CROSS APPLY

                         (

                            SELECT TOP(1)

                                  N

                               FROM Tally

                               WHERE

                                  N > tally1.N

                                  AND N <= LEN(t.Txt)

                                  AND SUBSTRING(t.Txt, N, 1) IN (',', ';', ' ', ':')

                               ORDER BY N

                         ) tally2

                   WHERE tally1.N < LEN(t.Txt)

                      AND SUBSTRING(t.Txt, tally1.N, 1) IN (',', ';', ' ', ':')

                      AND ISDATE(SUBSTRING(t.Txt, tally1.N + 1, tally2.N - (tally1.N + 1))) = 1

             ) t2

    [/font]

    Greets

    Flo

    Edit: Just noticed the colon as possible delimiter

  • Thank you Lowell & Flo.....

    I'll work with this code to see if I can't get it to work!

  • Hi Tim

    I didn't notice that this was the "Newbie" forum...

    If you have any questions, feel free to come back here!

    Greets

    Flo

  • Thanks again Flo.....

    I may need to take you up on that! The code as is gives me some errors that I am trying to debug now.

  • Errors? I just copy/pasted into an empty query window and executed. I don't get any errors.

    Err, I think I know what you need...

    You probably don't have a Tally table. Execute this script and the previous should work:

    CREATE TABLE dbo.Tally (N INT NOT NULL)

    INSERT INTO dbo.Tally

    SELECT TOP(11000)

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM master.sys.all_columns c1

    CROSS JOIN master.sys.all_columns c2

    ALTER TABLE dbo.Tally

    ADD CONSTRAINT PK_Tally

    PRIMARY KEY CLUSTERED

    (N)

    WITH (FILLFACTOR = 100)

    To know how a tally table works you should read this article by Jeff Moden:

    http://www.sqlservercentral.com/articles/T-SQL/62867/

    Greets

    Flo

  • Well after scratching my head for a bit as to why the code wasn't working, found out that the source server is SQL 2000. :blush:

    Guessing some of these commands are new in SQL 2005 and don't work in 2000.:unsure:

    So unless anyone has any more ideas on how to handle on 2000, just going to have to foregoe the parsing.:satisfied:

    Thanks for everyone's help....in particular you Flo!

  • Hi Tim

    If you are not sure if you are not sure about your SQL Server version you can execute this statement. It shows the detailed version information:

    PRINT @@VERSION

    If you use SSE2k you are not able to use CROSS APPLY. Instead of this you have to use a CROSS JOIN. In addition you have to ensure that your source data are enclosed with a comma (or another specified delimiter).

    Here a SSE2k version of my previous script. I added some comments to explain how it works:

    [font="Courier New"]---==========================

    -- Create a tally table if not yet done

    IF (OBJECT_ID('dbo.Tally') IS NULL)

    BEGIN

       SELECT TOP 11000

             IDENTITY(INT, 1, 1) AS N

          INTO dbo.Tally

          FROM MASTER.dbo.syscolumns c1

             CROSS JOIN MASTER.dbo.syscolumns c2

       ALTER TABLE dbo.Tally

          ADD CONSTRAINT PK_Tally

          PRIMARY KEY CLUSTERED

          (N)

          WITH FILLFACTOR = 100

    END

    GO

    ---==========================

    -- Your source table

    DECLARE @t TABLE (Id INT NOT NULL IDENTITY, Txt VARCHAR(8000))

    INSERT INTO @t

                 SELECT '8/9/08 8/31/08'

       UNION ALL SELECT '08/04/2003 - REC''D JOB FROM KAYLA - REQUEST FOR PDC TO GET R/W; 9/7/04 JON CANCELLED PER CRAIG BULLOCK BECAUSE OWNER CANNOT BE REACHED (LISA MURPHY OF PDC)'

       UNION ALL SELECT '08/21/2003 - RESEARCH GIGI. NEEDS EXHIBIT A & R/W TO BE OBTAINED BY PDC. 09/12/2003: REC''D NOTE FROM PDC THAT SURVEY WAS COMPLETED.09/16/2003 - FIELD NOTES TO MI FILE, COPY OF DWG SENT TO J. TODD'

       UNION ALL SELECT '8/9/08 - out for upgrade; 8/16/08 returned to inventory. 8/31/2008 - deployed to field.'

    -- We need a leading and trailing delimiter

    UPDATE @t SET Txt = ',' + Txt + ','

    ---===========================

    -- Split the source data and return the dates

    SELECT

          t.Id,

          -- Get the text part between startPos and endPos

          CONVERT(DATETIME, SUBSTRING(t.Txt, startPos.N + 1, MIN(endPos.N) - (startPos.N + 1))) Item

       FROM @t t

          -- First tally table to scan for date start position

          CROSS JOIN Tally startPos

          -- Second tally table to scan for date end

          CROSS JOIN Tally endPos

       WHERE

          -- The start position has to be less than complete length of text

          startPos.N < LEN(t.Txt)

          -- Determine if the current position is one of the specified delimiter

          AND SUBSTRING(t.Txt, startPos.N, 1) IN (',', ';', ' ', ':')

          -- For some reason I got duplicate values in case of ". " start position

          -- You can try to remove this

          AND SUBSTRING(t.Txt, startPos.N + 1, 1) NOT IN (',', ';', ' ', ':')

          -- The start position has to be less than the end position

          AND startPos.N < endPos.N

          -- The end position has to be sess than or equal to len of text

          AND endPos.N <= LEN(t.Txt)

          -- End position has to be one of hte specified delimiters

          AND SUBSTRING(t.Txt, endPos.N, 1) IN (',', ';', ' ', ':')

          -- The text between start and end position has to be a date value

          AND ISDATE(SUBSTRING(t.Txt, startPos.N + 1, endPos.N - (startPos.N + 1))) = 1

       GROUP BY

          t.Id,

          t.Txt,

          startPos.N

       ORDER BY

          t.Id,

          startPos.N

    [/font]

    Greets

    Flo

  • The TEXT column in question is that of a "log" maintained by humans. Likely as not, it has some hidden delimiters in it. Please check the data for the presence of CHAR(10) and CHAR(13) separately. Might want to throw in the additional check for CHAR(9), as well. The presence of these characters would greatly simplify this problem.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply