Extract first numeric value from a string

  • Sean Lange (7/1/2013)


    valeryk2000 (7/1/2013)


    iTVF - 1:08

    Scalar- 2:07

    Long live iTVF and Sean!

    ROFL. Glad you got it working. πŸ™‚

    valeryk2000 (7/1/2013)


    iTVF - 1:08

    Scalar- 2:07

    Long live iTVF and Sean!

    Limping! πŸ˜›

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (7/1/2013)


    Sean Lange (7/1/2013)


    valeryk2000 (7/1/2013)


    iTVF - 1:08

    Scalar- 2:07

    Long live iTVF and Sean!

    ROFL. Glad you got it working. πŸ™‚

    valeryk2000 (7/1/2013)


    iTVF - 1:08

    Scalar- 2:07

    Long live iTVF and Sean!

    Limping! πŸ˜›

    True dat!!!

    The point we are making here is that 1:08 seems pretty slow for whatever you are doing unless the row count is in the millions.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • This table has about 9,000,000 recs.

    However, when I added clustered index the time dropped to 0:32

    Is it still limping?

  • valeryk2000 (7/1/2013)


    This table has about 9,000,000 recs.

    However, when I added clustered index the time dropped to 0:32

    Is it still limping?

    Depending on what your actual query looks like that doesn't sound horrible to select 9 million rows.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Take me right: the result is 15,547 rows from 9 mln rows

  • valeryk2000 (7/1/2013)


    Take me right: the result is 15,547 rows from 9 mln rows

    I bet you can make that quite a bit faster with some tweaking. If you are satisfied then that is cool. If you want some help to make that even faster you will need to post the ddl for the tables and the indexes. In addition, posting an actual execution plan would be very helpful.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • OK. Table:

    ==============

    CREATE TABLE dbo.CHRT_Tran(

    SITE nvarchar(2) NULL,

    LOCATION nvarchar(13) NULL,

    PMP nvarchar(5) NULL,

    CHARTED_DATE nvarchar(8) NULL,

    CHARTED_TIME nvarchar(5) NULL,

    Real_Date_Time datetime NULL,

    NURSE_ID nvarchar(12) NULL,

    REASON nvarchar(60) NULL,

    PAT_NUM nvarchar(12) NULL,

    PTNAME nvarchar(32) NULL,

    SIG nvarchar(15) NULL,

    SCH_PRN_TKH nvarchar(9) NULL,

    UNSPECIFIED nvarchar(3) NULL,

    FREQUENCY nvarchar(6) NULL,

    START_DATE nvarchar(8) NULL,

    STOP_DATE nvarchar(8) NULL,

    MED_IV nvarchar(3) NULL,

    DRUG_INFO_1 nvarchar(128) NULL,

    DRUG_INFO_2 nvarchar(128) NULL,

    ROUTE nvarchar(10) NULL,

    NO_OF_INGREDIENTS decimal(15, 2) NULL,

    UniqueID nvarchar(30) NULL,

    OCC_DATE nvarchar(8) NULL,

    OCC_TIME nvarchar(5) NULL,

    OCC_Date_Time datetime NULL,

    ADMIN_STATUS nvarchar(20) NULL

    )

    =================

    Index:

    =================

    CREATE CLUSTERED INDEX ind_ClustForDDD ON dbo.CHRT_Tran

    (

    Real_Date_Time ASC,

    UniqueID ASC,

    LOCATION ASC

    )

    ===============

  • And the query:

    ==============

    SELECT

    Real_Date_Time ChartedAt

    ,DRUG_INFO_1

    ,LEFT(drug_info_1,CHARINDEX(CHAR(32),drug_info_1)) Drug

    ,f.Dose

    , Coalesce(case charindex('MG', DRUG_INFO_1)

    when 0 then null

    else 'MG'

    end --Mg

    ,case charindex('GM', DRUG_INFO_1)

    when 0 then null

    else 'GM'

    end --GM

    ,case charindex('UNIT', DRUG_INFO_1)

    when 0 then null

    else 'UNITS'

    end ----UNITS

    )

    ,ROUTE

    ,NO_OF_INGREDIENTS

    ,PMP

    ,pat_num

    ,ADMIN_STATUS

    FROM MAK_Data.dbo.CHRT_Tran

    cross apply dbo.fnExtractDigits_iTVF(DRUG_INFO_1) f

    Where

    LEFT(drug_info_1,CHARINDEX(CHAR(32),drug_info_1))

    IN

    (

    'abacavir', 'acyclovir', 'amikacin', 'amoxicillin', 'amoxicillin-clavulanic', 'amphotericin', 'ampicillin',

    'ampicillin-sulbactam', 'atazanavir', 'atovaquone', 'azithromycin', 'aztreonam',

    'cefazolin', 'cefdinir', 'cefepime', 'cefixime', 'cefotaxime', 'cefotetan',

    'cefoxitin', 'cefpodoxime proxetil', 'ceftazidime', 'ceftriaxone', 'cefuroxime',

    'cefuroxime axetil', 'cephalexin', 'ciprofloxacin', 'clarithromycin', 'clindamycin',

    'colistimethate', 'dapsone', 'daptomycin', 'demeclocycline', 'dicloxacillin', 'doxycycline',

    'efavirenz', 'ertapenem', 'erythromycin', 'erythromycin', 'ethambutol', 'fluconazole', 'gentamicin',

    'hydroxychloroquine', 'imipenem', 'isoniazid', 'itraconazole', 'ketoconazole', 'lamivudine', 'levofloxacin',

    'linezolid', 'meropenem', 'methenamine', 'metronidazole', 'minocycline', 'nafcillin',

    'neomycin', 'nevirapine', 'nitrofurantoin', 'nystatin', 'oseltamivir', '

    penicillin', 'pentamidine', 'piperacillin-tazobactam (single)', 'pyrazinamide', 'ribavirin',

    'rifabutin', 'rifampin', 'ritonavir', 'sulfamethoxazole/trimethoprim', 'sulfasalazine',

    'tenofovir', 'terbinafine', 'tetracycline', 'tigecycline', 'tobramycin', 'trimethoprim',

    'valacyclovir', 'vancomycin', 'voriconazole'

    )

    AND Real_Date_Time >'6/1/2013'

    and ADMIN_STATUS ='admin'

  • Execution plan graph

  • At a glance you seem to be missing an index on DRUG_INFO_1. This is exposed in a nonSARGable predicate with your large value list in the where clause.

    LEFT(drug_info_1,CHARINDEX(CHAR(32),drug_info_1))

    IN

    (

    'abacavir', ....

    This is causing a full table scan. It is hidden in your execution plan as FILTER. Notice the filter is 40% of your execution?

    Essentially you have told the engine to look at the first 32 characters of every single row. I was able to turn this scan into an index seek with a simple index and a slight change to the query.

    I tried to follow the naming convention from the index you posted.

    create nonclustered index ind_NonClusteredDrugInfo on CHRT_Tran(DRUG_INFO_1)

    Then instead of the huge list of strings, I moved that list to a cte and then joined to it.

    with DrugInfo (Info) as

    (

    select * from (values('abacavir'), ('acyclovir'), ('amikacin'), ('amoxicillin'), ('amoxicillin-clavulanic'), ('amphotericin'),

    ('ampicillin'), ('ampicillin-sulbactam'), ('atazanavir'), ('atovaquone'), ('azithromycin'), ('aztreonam'), ('cefazolin'),

    ('cefdinir'), ('cefepime'), ('cefixime'), ('cefotaxime'), ('cefotetan'), ('cefoxitin'), ('cefpodoxime proxetil'), ('ceftazidime'),

    ('ceftriaxone'), ('cefuroxime'), ('cefuroxime axetil'), ('cephalexin'), ('ciprofloxacin'), ('clarithromycin'), ('clindamycin'),

    ('colistimethate'), ('dapsone'), ('daptomycin'), ('demeclocycline'), ('dicloxacillin'), ('doxycycline'), ('efavirenz'),

    ('ertapenem'), ('erythromycin'), ('erythromycin'), ('ethambutol'), ('fluconazole'), ('gentamicin'), ('hydroxychloroquine'),

    ('imipenem'), ('isoniazid'), ('itraconazole'), ('ketoconazole'), ('lamivudine'), ('levofloxacin'), ('linezolid'), ('meropenem'),

    ('methenamine'), ('metronidazole'), ('minocycline'), ('nafcillin'), ('neomycin'), ('nevirapine'), ('nitrofurantoin'), ('nystatin'),

    ('oseltamivir'), ('penicillin'), ('pentamidine'), ('piperacillin-tazobactam (single)'), ('pyrazinamide'), ('ribavirin'), ('rifabutin'),

    ('rifampin'), ('ritonavir'), ('sulfamethoxazole/trimethoprim'), ('sulfasalazine'), ('tenofovir'), ('terbinafine'), ('tetracycline'),

    ('tigecycline'), ('tobramycin'), ('trimethoprim'), ('valacyclovir'), ('vancomycin'), ('voriconazole')) as x(Name)

    )

    SELECT

    Real_Date_Time ChartedAt

    ,DRUG_INFO_1

    ,LEFT(drug_info_1,CHARINDEX(CHAR(32),drug_info_1)) Drug

    -- ,f.Dose

    , Coalesce(case charindex('MG', DRUG_INFO_1)

    when 0 then null

    else 'MG'

    end --Mg

    ,case charindex('GM', DRUG_INFO_1)

    when 0 then null

    else 'GM'

    end --GM

    ,case charindex('UNIT', DRUG_INFO_1)

    when 0 then null

    else 'UNITS'

    end ----UNITS

    )

    ,ROUTE

    ,NO_OF_INGREDIENTS

    ,PMP

    ,pat_num

    ,ADMIN_STATUS

    FROM CHRT_Tran cross apply dbo.fnExtractDigits_iTVF(DRUG_INFO_1) f

    join DrugInfo di on CHRT_Tran.DRUG_INFO_1 like di.Info + '%'

    Where Real_Date_Time >'6/1/2013'

    and ADMIN_STATUS ='admin'

    I don't have any data but the plan generated on my system looks like it would be quite a bit faster. See if you can load this up on a test system and see what happens.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Txs - I'll try it tomorrow. BTW in current configuration (with clustered index) it's done in 0:26

    I might have problems with indexing - every night more than 20 K records are inserted into this table, so I (probably) will need to drop clustered index and then recreate it

  • BTW, I commented out "Dose" because it wasn't in your ddl and it didn't really have any bearing on this. Just don't want to overlook it if you end up using copy/paste from my code. πŸ˜›

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Actually 'dose' is what the function returns

  • I tested your code with CTE and non-clustered - it does not give any advantage (I'll try it later on a larger range of dates - currently it is just one month)

  • Sean,

    On larger chank of data your CTE-based qry runs 2.5 times faster.

    I learned a lot from this discussion.

    Thanks, brothers ....

Viewing 15 posts - 16 through 30 (of 33 total)

You must be logged in to reply to this topic. Login to reply