Finding a substring and extracting a number associated with it

  • Hi,

    I have a text string that appears in files as below:

    'There were 6 discharges and 3 admission since reporting 21/04/2016'

    However sometimes, admissions are swapped around with discharges in the file such that on the following day:

    'There were 5 admissions and 3 discharges since reporting 22/04/2016'

    There can be variable amounts of white space on either side and within the string.

    Please note that occasionally discharges is spelt discharge and admissions as admission.

    How can i always pull out the correct number for Admissions and the correct number for Discharges please?

    Q

  • quentin.harris (7/1/2016)


    Hi,

    I have a text string that appears in files as below:

    'There were 6 discharges and 3 admission since reporting 21/04/2016'

    However sometimes, admissions are swapped around with discharges in the file such that on the following day:

    'There were 5 admissions and 3 discharges since reporting 22/04/2016'

    There can be variable amounts of white space on either side and within the string.

    Please note that occasionally discharges is spelt discharge and admissions as admission.

    How can i always pull out the correct number for Admissions and the correct number for Discharges please?

    Q

    can you ALWAYS guarantee the number will come immediately before either "admissions" or "discharges"

    what happens if someone types this "Since reporting 22/04/2016, the admissions are 5 and the discharges are 3"

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Start by using PATINDEX to find the position of " admission" and " discharge" in the string. You can then work back from there to find the first instance of space followed by a number, and convert that to int. Have a go, and post back if there's anything you're struggling with.

    John

  • How can i always pull out the correct number for Admissions and the correct number for Discharges please?

    can you please give example of how you want the end result to be presented? rows, a single line of text, something else?

    thanks

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Hi,

    The output would be a row that would populate a table with headers Date,Admissions,Discharges:

    Date Admissions Discharges

    22/04/2016 5 3

    I had used patindex in existing code. I used a variable VarDisch to which i assigned the string based on finding the one occurrence in the file of 'disch'.

    The problem is because i searched on finding a number as opposed to a string.

    The code i have I am unsure how to post as each time i try i get the response: connection reset.

    So any clues on how to do that would help!

    Regards,

    Q

  • can you ALWAYS guarantee the number will come immediately before either "admissions" or "discharges"

    what happens if someone types this "Since reporting 22/04/2016, the admissions are 5 and the discharges are 3"

    please can you advise?

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Hi,

    I cannot guarantee that. In the template submitted for the last 7 months, the only changes i have seen to this text field have been:

    1. number of spaces (white space within and without);

    2. the spelling of admissions and discharges, and;

    3. the ordering of admissions and discharges.

    The number (of) has always preceded the text 'admissions' or 'discharges', and the date has always been at the end of the string.

    The SQL is a workaround until validation is in place for the template file.

    Q

  • WARNING WARNING WARNING: You MUST get your inputs standardized. You are at risk until then.

    That said. This was a fun problem, and the code below should get you started.

    select * into #input

    from (values (1,'There have been 5 admissions and 3 discharges since 04/14/2016.')

    ,(2,'There has been 1 admission and 2 discharges since 04/22/2016.')

    ,(3,'There have been 2 admissions and 1 discharge since 04/29/2016.')

    ,(4,'There have been 2 discharges and 2 admissions since 05/15/2016.')

    ,(5,'There have been 2 discharges and 1 admission since 05/24/2016.')) dt (Rowid,TextString);

    select * from #input;

    with cte as (select rowID, itemnumber,convert(varchar(50),item) as item

    from #input i

    cross apply DelimitedSplit8K(TextString, ' ')

    )

    ,cte2 as(select rowID, Item

    , casewhen Item like 'Admission%' or Item Like 'Discharge%'

    then LAG(item,1,null) over(order by rowid,itemnumber) else null end as CountX

    , TRY_CONVERT(date, replace(item,'.','')) as DateX

    from cte )

    selectMax(Case when Item Like 'Admission%' then CountX else null end) as Admissions

    ,Max(Case when Item Like 'Discharge%' then CountX else null end) as Discharges

    ,Max(DateX) as DateX

    from cte2

    where CountX is not null or DateX is not null

    group by Rowid

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • The Dixie Flatline (7/1/2016)


    WARNING WARNING WARNING: You MUST get your inputs standardized. You are at risk until then.

    That said. This was a fun problem, and the code below should get you started.

    select * into #input

    from (values (1,'There have been 5 admissions and 3 discharges since 04/14/2016.')

    ,(2,'There has been 1 admission and 2 discharges since 04/22/2016.')

    ,(3,'There have been 2 admissions and 1 discharge since 04/29/2016.')

    ,(4,'There have been 2 discharges and 2 admissions since 05/15/2016.')

    ,(5,'There have been 2 discharges and 1 admission since 05/24/2016.')) dt (Rowid,TextString);

    select * from #input;

    with cte as (select rowID, itemnumber,convert(varchar(50),item) as item

    from #input i

    cross apply DelimitedSplit8K(TextString, ' ')

    )

    ,cte2 as(select rowID, Item

    , casewhen Item like 'Admission%' or Item Like 'Discharge%'

    then LAG(item,1,null) over(order by rowid,itemnumber) else null end as CountX

    , TRY_CONVERT(date, replace(item,'.','')) as DateX

    from cte )

    selectMax(Case when Item Like 'Admission%' then CountX else null end) as Admissions

    ,Max(Case when Item Like 'Discharge%' then CountX else null end) as Discharges

    ,Max(DateX) as DateX

    from cte2

    where CountX is not null or DateX is not null

    group by Rowid

    Hi Dixie

    as you say a fun problem....but your code doesnt seem to cope with the white space issues as posted...

    for example

    select * into #input

    from (values (1,'There have been 5 admissions and 3 discharges since 04/14/2016.')

    ,(2,'There has been 1 admission and 2 discharges since 04/22/2016.')

    ,(3,'There have been 2 admissions and 1 discharge since 04/29/2016.')

    ,(4,'There have been 2 discharges and 2 admissions since 05/15/2016.')

    ,(5,'There have been 2 discharges and 1 admission since 05/24/2016.')) dt (Rowid,TextString);

    select * from #input;

    with cte as (select rowID, itemnumber,convert(varchar(50),item) as item

    from #input i

    cross apply DelimitedSplit8K(TextString, ' ')

    )

    ,cte2 as(select rowID, Item

    , casewhen Item like 'Admission%' or Item Like 'Discharge%'

    then LAG(item,1,null) over(order by rowid,itemnumber) else null end as CountX

    , TRY_CONVERT(date, replace(item,'.','')) as DateX

    from cte )

    selectMax(Case when Item Like 'Admission%' then CountX else null end) as Admissions

    ,Max(Case when Item Like 'Discharge%' then CountX else null end) as Discharges

    ,Max(DateX) as DateX

    from cte2

    where CountX is not null or DateX is not null

    group by Rowid

    DROP TABLE #input

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • well this is an interesting exercise....and I found it fun to have a go.

    I appreciate that that this only a "workaround" until you get some sense into the data entry validation....here below is my "effort"...its far from foolproof and performant...but does cover a lot of permutations.

    That said...we never know what a user is going to type in a "text field" 😉

    SET DATEFORMAT DMY;

    SELECT *

    INTO #input

    FROM (values

    ( 1, 'There were 5 admissions and 3 discharge since reporting 22/04/2016'),

    ( 2, 'today here were 6 discharges and 3 admission since reporting 21/04/2016'),

    ( 3, 'since reporting 30/04/2016 the discharges are 8 and the admissions are 10' ),

    ( 4, ' the discharges are 20 and the admissions are 30 on 10/05/2016 '),

    ( 5, ' the admission was 100 on 15/05/2016 and the discharge was 30 '),

    ( 6, ' on 15/08/2016 5 admissions and discharge 10' ),

    ( 7, ' 100 admission 200 discharges 25/12/2018' ),

    ( 8, ' discharge 500 on 01/01/2020 and 100 admission '),

    ( 9, ' discharge 1 2 admission 01/01/2020' ),

    ( 10, ' 20180101 100 admission 200 discharges' ),

    ( 11, '20160101 admission 1 2 discharges '),

    ( 12, 'here are the results up to 12/12/12 Admission 300 400 Discharge ' ) ,

    ( 13, ' Discharges 9000 20/May/2015 8000 Admissions ' ) ,

    ( 14, ' Discharges 99999 88888 Admissions 161216' ),

    ( 15, ' Discharges I think were 99 and maybe 88 possibly less Admissions on 2016-05-15' )

    )

    dt (rowID,TextString)

    ;

    WITH cte AS (

    SELECT *,

    CASE WHEN ttype = 'tdate' then 3 ELSE ROW_NUMBER() OVER (PARTITION BY rowID,ttype ORDER BY Itemnumber) END rn

    FROM (

    SELECT i.rowid, i.TextString, s.ItemNumber, 'Admissions' as Item, 'category' as ttype

    FROM #input i

    CROSS APPLY dbo.DelimitedSplit8K(i.TextString,' ') s

    where s.Item Like 'admission%'

    UNION ALL

    SELECT i.rowid, i.TextString, s.ItemNumber, 'Discharges', 'category'

    FROM #input i

    CROSS APPLY dbo.DelimitedSplit8K(i.TextString,' ') s

    where s.Item Like 'discharge%'

    UNION ALL

    SELECT i.rowid, i.TextString, s.ItemNumber, s.Item, 'count'

    FROM #input i

    CROSS APPLY dbo.DelimitedSplit8K(i.TextString,' ') s

    where s.Item Like '%[0-9]%' AND LEN(s.item) < 6 -- max count 99999

    UNION ALL

    SELECT i.rowid, i.TextString, s.ItemNumber, s.Item, 'tdate'

    FROM #input i

    CROSS APPLY dbo.DelimitedSplit8K(i.TextString,' ') s

    where s.Item Like '%[0-9]%' AND LEN(s.item) >= 6)x

    )

    , cte_cat as (

    SELECT rowID, Item, ttype, rn

    FROM cte

    WHERE (ttype = 'category')

    )

    , cte_cnt as (

    SELECT rowID, Item, ttype, rn

    FROM cte

    WHERE (ttype = 'count')

    )

    , cte_grp as (

    SELECT cte_cat.rowID,

    cte_cat.Item,

    cte_cnt.Item AS result

    FROM cte_cat

    INNER JOIN cte_cnt ON cte_cat.rowID = cte_cnt.rowID

    AND cte_cat.rn = cte_cnt.rn

    UNION ALL

    SELECT rowID,

    ttype,

    item

    FROM cte

    WHERE(ttype = 'tdate')

    )

    SELECT rowID,

    MAX(CASE WHEN Item = 'Admissions' THEN Result END) as Admissions,

    MAX(CASE WHEN Item = 'Discharges' THEN Result END) as Discharges,

    MAX(CASE WHEN Item = 'tdate' THEN TRY_CONVERT(date,Result) END) as tDate

    FROM cte_grp

    GROUP BY rowId

    DROP TABLE #input

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Thank you for the replies.

    I will try to adapt what has been posted to suit our sandpit environment where i have discovered that some functions are not available (LAG and TRY_CONVERT).

    Seems 2012 is not used for the sandpit which limits things.

    🙁

  • quentin.harris (7/4/2016)


    Thank you for the replies.

    I will try to adapt what has been posted to suit our sandpit environment where i have discovered that some functions are not available (LAG and TRY_CONVERT).

    Seems 2012 is not used for the sandpit which limits things.

    🙁

    use this instead

    MAX(CASE WHEN Item = 'tdate' THEN CAST(Result as DATE) END) as tDate

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Thank you.

    What is this line you are using:

    dbo.DelimitedSplit8K?

  • quentin.harris (7/4/2016)


    Thank you.

    What is this line you are using:

    dbo.DelimitedSplit8K?

    sorry...should have posted the link.

    its a nasty fast "splitter"

    http://www.sqlservercentral.com/articles/Tally+Table/72993/

    the DelimittedSplit8K function (code is at end of article)

    and I suggest you read the article and associated discussion....well worth it

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Awesome.

    It works. Now have to get my head round it what it actually all does.

    Thank you!

Viewing 15 posts - 1 through 15 (of 37 total)

You must be logged in to reply to this topic. Login to reply