PatExclude8K

  • Alan Burstein

    SSC Guru

    Points: 61033

    Comments posted to this topic are about the item PatExclude8K

    -- Alan Burstein


    Helpful links:Best practices for getting help on SQLServerCentral -- Jeff ModenHow to Post Performance Problems -- Gail ShawNasty fast set-based string manipulation functions:For splitting strings try DelimitedSplit8K or DelimitedSplit8K_LEAD (SQL Server 2012+)To split strings based on patterns try PatternSplitCMNeed to clean or transform a string? try NGrams, PatExclude8K, PatReplace8K, DigitsOnlyEE, or Translate8KI cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code.  -- Itzik Ben-Gan 2001

  • Matthias Kläy

    Ten Centuries

    Points: 1016

    Hi Alan

    Thanks for this great function. One note:

    SELECT * FROM dbo.PatExclude8K('123# What?!... ', '[^a-z!?.]')

    returns hat?!...

    So it is case sensitive, contrary to your comment in the programmers notes. I guess this comes from the Latin1_General_BIN collation.

    To get the result as expected, I can use

    SELECT * FROM dbo.PatExclude8K('123# What?!... ', '[^a-zA-Z!?.]')

    Matthias Kläy, http://www.kcc.ch

  • Alan Burstein

    SSC Guru

    Points: 61033

    Matthias Kläy (11/28/2014)


    Hi Alan

    Thanks for this great function. One note:

    SELECT * FROM dbo.PatExclude8K('123# What?!... ', '[^a-z!?.]')

    returns hat?!...

    So it is case sensitive, contrary to your comment in the programmers notes. I guess this comes from the Latin1_General_BIN collation.

    To get the result as expected, I can use

    SELECT * FROM dbo.PatExclude8K('123# What?!... ', '[^a-zA-Z!?.]')

    Matthias Kläy, http://www.kcc.ch

    Thanks Matthias for your feedback and good catch! I did realize that the Latin1_General_BIN would make the function case sensitive but never updated my developer comments to reflect that change. This was my first script and I don't know what the SQLServerCentral protocol is for changing a script once it is published. I will update the developer comments once I figure out how 😎

    P.S. Sorry for the late reply, it's been a busy couple weeks of moving and project wrap-up.

    -- Alan Burstein


    Helpful links:Best practices for getting help on SQLServerCentral -- Jeff ModenHow to Post Performance Problems -- Gail ShawNasty fast set-based string manipulation functions:For splitting strings try DelimitedSplit8K or DelimitedSplit8K_LEAD (SQL Server 2012+)To split strings based on patterns try PatternSplitCMNeed to clean or transform a string? try NGrams, PatExclude8K, PatReplace8K, DigitsOnlyEE, or Translate8KI cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code.  -- Itzik Ben-Gan 2001

  • akljfhnlaflkj

    SSC Guru

    Points: 76202

    Nice pattern exclusion script. Thanks.

  • Alan Burstein

    SSC Guru

    Points: 61033

    Thank you for the kind words Iwas

    -- Alan Burstein


    Helpful links:Best practices for getting help on SQLServerCentral -- Jeff ModenHow to Post Performance Problems -- Gail ShawNasty fast set-based string manipulation functions:For splitting strings try DelimitedSplit8K or DelimitedSplit8K_LEAD (SQL Server 2012+)To split strings based on patterns try PatternSplitCMNeed to clean or transform a string? try NGrams, PatExclude8K, PatReplace8K, DigitsOnlyEE, or Translate8KI cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code.  -- Itzik Ben-Gan 2001

  • akljfhnlaflkj

    SSC Guru

    Points: 76202

    Good script, thank you.

  • Alan Burstein

    SSC Guru

    Points: 61033

    Iwas Bornready (6/8/2015)


    Good script, thank you.

    Thanks sir!

    -- Alan Burstein


    Helpful links:Best practices for getting help on SQLServerCentral -- Jeff ModenHow to Post Performance Problems -- Gail ShawNasty fast set-based string manipulation functions:For splitting strings try DelimitedSplit8K or DelimitedSplit8K_LEAD (SQL Server 2012+)To split strings based on patterns try PatternSplitCMNeed to clean or transform a string? try NGrams, PatExclude8K, PatReplace8K, DigitsOnlyEE, or Translate8KI cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code.  -- Itzik Ben-Gan 2001

  • Kristen-173977

    SSCrazy Eights

    Points: 8415

    Very useful function, thank you.

    Hopefully not a stupid question?!! but is there is reason why this is a table-valued function rather than scalar?

  • Alan Burstein

    SSC Guru

    Points: 61033

    Thank you for checking it out.

    Kristen-173977 (9/16/2015)


    Very useful function, thank you.

    Hopefully not a stupid question?!! but is there is reason why this is a table-valued function rather than scalar?

    This is an excellent question.:-D I asked the same question a few years and the answer made me a much better developer.

    The short answer is performance. Inline Table Valued functions (iTVF) generally perform better than scalar user-defined functions. One reason (but certainly not the only reason) is that iTVFs can get a parallel query plan whereas scalar cannot.

    PatExclude8K and PatReplace8K[/url] are what some people call an inline scalar function. Check out this article: How to Make Scalar UDFs Run Faster[/url] it's where I learned the technique I'm talking about.

    -- Alan Burstein


    Helpful links:Best practices for getting help on SQLServerCentral -- Jeff ModenHow to Post Performance Problems -- Gail ShawNasty fast set-based string manipulation functions:For splitting strings try DelimitedSplit8K or DelimitedSplit8K_LEAD (SQL Server 2012+)To split strings based on patterns try PatternSplitCMNeed to clean or transform a string? try NGrams, PatExclude8K, PatReplace8K, DigitsOnlyEE, or Translate8KI cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code.  -- Itzik Ben-Gan 2001

  • akljfhnlaflkj

    SSC Guru

    Points: 76202

    Thanks for the code.

  • Scott In Sydney

    Hall of Fame

    Points: 3256

    Apologies for a reply to an old thread, but this still seems the best place to post this reply, rather than creating a new thread.

    First of all, thanks for PatExclude8K.  I use it all the time.

    -- Using with make_parallel():
    SELECT * FROM dbo.PatExclude8K('XXX 123 ZZZ', '['+char(32)+']')
    CROSS APPLY make_parallel();

    Perhaps explain more about make_parallel() ???  Google brought me here and then here.  TBH I don't feel like registering on yet another site, so I'll keep searching for more details on make_parallel() - perhaps it's on another site where I don't have to register.

    I have an upstream database full of dirty tables, and need to apply data cleansing to every character column in the table.  The discussion is here.

    In this scenario, is using CROSS APPLY for every character column the proper way to use PatExclude8K?  If so, I wrote a code generator to help write the code.  Here is an example output:

    CREATE VIEW [cln].[vwEPISODE]
    AS
    SELECT LTRIM(RTRIM([t01].[NewString]))                     AS [facility_identifier]
      ,LTRIM(RTRIM([t02].[NewString]))                     AS [stay_number]
      ,[src].[episode_sequence_number]                     AS [episode_sequence_number]
      ,[src].[snap_from_date]                        AS [snap_from_date]
      ,[src].[snap_to_date]                        AS [snap_to_date]
      ,LTRIM(RTRIM([t06].[NewString]))                     AS [snap_curr_indicator]
      ,[src].[snap_load_date]                        AS [snap_load_date]
      ,[src].[snap_batch_run_no]                       AS [snap_batch_run_no]
      ,[src].[snap_record_status]                      AS [snap_record_status]
      ,LTRIM(RTRIM([t10].[NewString]))                     AS [person_identifier]
      ,[src].[episode_start_date]                      AS [episode_start_date]
      ,[src].[episode_end_date]                       AS [episode_end_date]
      ,LTRIM(RTRIM([t13].[NewString]))                     AS [mo_code1]
      ,LTRIM(RTRIM([t14].[NewString]))                     AS [mo_code2]
      ,LTRIM(RTRIM([t15].[NewString]))                     AS [episode_of_care_type]
      ,LTRIM(RTRIM([t16].[NewString]))                     AS [an_drg]
      ,LTRIM(RTRIM([t17].[NewString]))                     AS [major_diagnostic_category]
      ,LTRIM(RTRIM([t18].[NewString]))                     AS [mode_of_separation]
      ,LTRIM(RTRIM([t19].[NewString]))                     AS [source_of_referral]
      ,LTRIM(RTRIM([t20].[NewString]))                     AS [financial_program]
      ,[src].[episode_leave_days_total]                    AS [episode_leave_days_total]
      ,[src].[episode_length_of_stay]                     AS [episode_length_of_stay]
      ,LTRIM(RTRIM([t23].[NewString]))                     AS [first_psych_admission_flag]
      ,[src].[days_in_psych_unit]                      AS [days_in_psych_unit]
      ,[src].[hours_in_psych_unit]                      AS [hours_in_psych_unit]
      ,[src].[hours_in_icu]                        AS [hours_in_icu]
      ,LTRIM(RTRIM([t27].[NewString]))                     AS [place_of_occurrence]
      ,LTRIM(RTRIM([t28].[NewString]))                     AS [external_cause_code_1]
      ,LTRIM(RTRIM([t29].[NewString]))                     AS [external_cause_code_2]
      ,LTRIM(RTRIM([t30].[NewString]))                     AS [unplanned_theatre]
      ,LTRIM(RTRIM([t31].[NewString]))                     AS [palliative_care_status]
      ,[src].[unqual_baby_bed_days]                      AS [unqual_baby_bed_days]
      ,[src].[unqualified_bed_days]                      AS [unqualified_bed_days]
      ,[src].[unqualified_bed_time]                      AS [unqualified_bed_time]
      ,[src].[qualified_bed_days]                      AS [qualified_bed_days]
      ,[src].[qualified_bed_time]                      AS [qualified_bed_time]
      ,LTRIM(RTRIM([t37].[NewString]))                     AS [legal_status_on_admit]
      ,LTRIM(RTRIM([t38].[NewString]))                     AS [pension_status]
      ,LTRIM(RTRIM([t39].[NewString]))                     AS [payment_status_on_sep]
      ,LTRIM(RTRIM([t40].[NewString]))                     AS [unit_type_on_admission]
      ,LTRIM(RTRIM([t41].[NewString]))                     AS [mrn]
      ,LTRIM(RTRIM([t42].[NewString]))                     AS [practice_identifier1]
      ,LTRIM(RTRIM([t43].[NewString]))                     AS [practice_identifier2]
      ,[src].[snap_upd_batch_run_no]                     AS [snap_upd_batch_run_no]
      ,LTRIM(RTRIM([t45].[NewString]))                     AS [clinical_codeset_1]
      ,LTRIM(RTRIM([t46].[NewString]))                     AS [clinical_codeset_2]
      ,LTRIM(RTRIM([t47].[NewString]))                     AS [clinical_codeset_3]
      ,LTRIM(RTRIM([t48].[NewString]))                     AS [an_drg_orig]
      ,LTRIM(RTRIM([t49].[NewString]))                     AS [an_drg_orig_version]
      ,LTRIM(RTRIM([t50].[NewString]))                     AS [mdc_orig]
      ,LTRIM(RTRIM([t51].[NewString]))                     AS [an_drg_orig_return_cd]
      ,LTRIM(RTRIM([t52].[NewString]))                     AS [an_drg_orig_pccl]
      ,LTRIM(RTRIM([t53].[NewString]))                     AS [spare_1]
      ,LTRIM(RTRIM([t54].[NewString]))                     AS [spare_2]
      ,LTRIM(RTRIM([t55].[NewString]))                     AS [external_cause_code_3]
      ,LTRIM(RTRIM([t56].[NewString]))                     AS [spare_3]
      ,LTRIM(RTRIM([t57].[NewString]))                     AS [spare_4]
      ,LTRIM(RTRIM([t58].[NewString]))                     AS [spare_5]
      ,LTRIM(RTRIM([t59].[NewString]))                     AS [spare_6]
      ,LTRIM(RTRIM([t60].[NewString]))                     AS [an_drg_version]
      ,LTRIM(RTRIM([t61].[NewString]))                     AS [episode_create_date]
      ,LTRIM(RTRIM([t62].[NewString]))                     AS [episode_update_date]
      ,[src].[involuntary_days_in_psych]                    AS [involuntary_days_in_psych]
      ,[src].[episode_sequence_number_ats]                   AS [episode_sequence_number_ats]
      ,LTRIM(RTRIM([t65].[NewString]))                     AS [episode_start_time]
      ,LTRIM(RTRIM([t66].[NewString]))                     AS [episode_end_time]
      ,LTRIM(RTRIM([t67].[NewString]))                     AS [financial_class]
      ,LTRIM(RTRIM([t68].[NewString]))                     AS [financial_class_local]
      ,LTRIM(RTRIM([t69].[NewString]))                     AS [source_system]
      ,LTRIM(RTRIM([t70].[NewString]))                     AS [upd_source_system]
      ,LTRIM(RTRIM([t71].[NewString]))                     AS [infant_start_weight]
      ,LTRIM(RTRIM([t72].[NewString]))                     AS [hours_on_mech_ventilation]
      ,[src].[hours_on_mech_vent_num]                     AS [hours_on_mech_vent_num]
      ,LTRIM(RTRIM([t74].[NewString]))                     AS [drg_mode_of_separation]
      ,LTRIM(RTRIM([t75].[NewString]))                     AS [drg_mhealth_legal_status]
      ,LTRIM(RTRIM([t76].[NewString]))                     AS [an_drg_current]
      ,LTRIM(RTRIM([t77].[NewString]))                     AS [an_drg_current_version]
      ,LTRIM(RTRIM([t78].[NewString]))                     AS [mdc_current]
      ,LTRIM(RTRIM([t79].[NewString]))                     AS [an_drg_current_return_cd]
      ,LTRIM(RTRIM([t80].[NewString]))                     AS [an_drg_current_pccl]
      ,LTRIM(RTRIM([t81].[NewString]))                     AS [financial_sub_program]
      ,[src].[episode_day_stay_los]                      AS [episode_day_stay_los]
      ,LTRIM(RTRIM([t83].[NewString]))                     AS [clinical_coding_audit]
      ,[src].[clinical_coding_audit_date]                    AS [clinical_coding_audit_date]
      ,[src].[replica_valid_from_date]                     AS [replica_valid_from_date]
      ,[src].[replica_valid_to_date]                     AS [replica_valid_to_date]
      ,LTRIM(RTRIM([t87].[NewString]))                     AS [replica_current_record]
      ,LTRIM(RTRIM([t88].[NewString]))                     AS [replica_uuid]
    FROM [ext].[vwEPISODE] src
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[facility_identifier],'%[^ -~]%') t01
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[stay_number],'%[^ -~]%') t02
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[snap_curr_indicator],'%[^ -~]%') t06
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[person_identifier],'%[^ -~]%') t10
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[mo_code1],'%[^ -~]%') t13
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[mo_code2],'%[^ -~]%') t14
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[episode_of_care_type],'%[^ -~]%') t15
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[an_drg],'%[^ -~]%') t16
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[major_diagnostic_category],'%[^ -~]%') t17
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[mode_of_separation],'%[^ -~]%') t18
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[source_of_referral],'%[^ -~]%') t19
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[financial_program],'%[^ -~]%') t20
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[first_psych_admission_flag],'%[^ -~]%') t23
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[place_of_occurrence],'%[^ -~]%') t27
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[external_cause_code_1],'%[^ -~]%') t28
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[external_cause_code_2],'%[^ -~]%') t29
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[unplanned_theatre],'%[^ -~]%') t30
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[palliative_care_status],'%[^ -~]%') t31
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[legal_status_on_admit],'%[^ -~]%') t37
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[pension_status],'%[^ -~]%') t38
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[payment_status_on_sep],'%[^ -~]%') t39
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[unit_type_on_admission],'%[^ -~]%') t40
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[mrn],'%[^ -~]%') t41
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[practice_identifier1],'%[^ -~]%') t42
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[practice_identifier2],'%[^ -~]%') t43
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[clinical_codeset_1],'%[^ -~]%') t45
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[clinical_codeset_2],'%[^ -~]%') t46
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[clinical_codeset_3],'%[^ -~]%') t47
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[an_drg_orig],'%[^ -~]%') t48
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[an_drg_orig_version],'%[^ -~]%') t49
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[mdc_orig],'%[^ -~]%') t50
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[an_drg_orig_return_cd],'%[^ -~]%') t51
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[an_drg_orig_pccl],'%[^ -~]%') t52
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[spare_1],'%[^ -~]%') t53
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[spare_2],'%[^ -~]%') t54
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[external_cause_code_3],'%[^ -~]%') t55
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[spare_3],'%[^ -~]%') t56
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[spare_4],'%[^ -~]%') t57
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[spare_5],'%[^ -~]%') t58
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[spare_6],'%[^ -~]%') t59
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[an_drg_version],'%[^ -~]%') t60
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[episode_create_date],'%[^ -~]%') t61
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[episode_update_date],'%[^ -~]%') t62
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[episode_start_time],'%[^ -~]%') t65
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[episode_end_time],'%[^ -~]%') t66
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[financial_class],'%[^ -~]%') t67
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[financial_class_local],'%[^ -~]%') t68
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[source_system],'%[^ -~]%') t69
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[upd_source_system],'%[^ -~]%') t70
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[infant_start_weight],'%[^ -~]%') t71
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[hours_on_mech_ventilation],'%[^ -~]%') t72
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[drg_mode_of_separation],'%[^ -~]%') t74
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[drg_mhealth_legal_status],'%[^ -~]%') t75
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[an_drg_current],'%[^ -~]%') t76
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[an_drg_current_version],'%[^ -~]%') t77
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[mdc_current],'%[^ -~]%') t78
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[an_drg_current_return_cd],'%[^ -~]%') t79
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[an_drg_current_pccl],'%[^ -~]%') t80
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[financial_sub_program],'%[^ -~]%') t81
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[clinical_coding_audit],'%[^ -~]%') t83
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[replica_current_record],'%[^ -~]%') t87
    CROSS APPLY dbo.fnPatExclude8K_Table([src].[replica_uuid],'%[^ -~]%') t88
    GO

    This does in fact perform better than my scalar function, but the code is rather "busy" 🙂  I just want to confirm this is the correct approach?

  • Eirikur Eiriksson

    SSC Guru

    Points: 182347

    Scott In Sydney - Tuesday, March 12, 2019 10:16 PM

    This does in fact perform better than my scalar function, but the code is rather "busy" 🙂  I just want to confirm this is the correct approach?

    If you are using the function to clean (almost) all columns, this would be the right approach.
    😎

  • Eirikur Eiriksson

    SSC Guru

    Points: 182347

    Here is an optimization of the code, simply by adding the text() function will cut the servers effort by almost 50%.
    😎


    CREATE FUNCTION dbo.PatExclude8K
    (
      @String VARCHAR(8000),
        @Pattern VARCHAR(50)
    )
    /*******************************************************************************
    Purpose:
    Given a string (@String) and a pattern (@Pattern) of characters to remove,
    remove the patterned characters from the string.

    Usage:
    --===== Basic Syntax Example
    SELECT CleanedString
    FROM dbo.PatExclude8K(@String,@Pattern);

    --===== Remove all but Alpha characters
    SELECT CleanedString
    FROM dbo.SomeTable st
    CROSS APPLY dbo.PatExclude8K(st.SomeString,'%[^A-Za-z]%');

    --===== Remove all but Numeric digits
    SELECT CleanedString
    FROM dbo.SomeTable st
    CROSS APPLY dbo.PatExclude8K(st.SomeString,'%[^0-9]%');

    Programmer Notes:
    1. @Pattern is not case sensitive (the function can be easily modified to make it so)
    2. There is no need to include the "%" before and/or after your pattern since since we
        are evaluating each character individually

    Revision History:
    Rev 00 - 10/27/2014 Initial Development - Alan Burstein

    Rev 01 - 10/29/2014 Mar 2007 - Alan Burstein
            - Redesigned based on the dbo.STRIP_NUM_EE by Eirikur Eiriksson
             (see: http://www.sqlservercentral.com/Forums/Topic1585850-391-2.aspx)
            - change how the cte tally table is created
            - put the include/exclude logic in a CASE statement instead of a WHERE clause
            - Added Latin1_General_BIN Colation
       - Add code to use the pattern as a parameter.

    Rev 02    - 11/6/2014
            - Added final performane enhancement (more cudo's to Eirikur Eiriksson)
            - Put 0 = PATINDEX filter logic into the WHERE clause

    Rev 03 - 5/16/2015
            - Updated code to deal with special XML characters
    *******************************************************************************/
    RETURNS TABLE WITH SCHEMABINDING AS
    RETURN
    WITH
    E1(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N)),
    itally(N) AS
    (
    SELECT TOP(CONVERT(INT,LEN(@String),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM E1 T1 CROSS JOIN E1 T2 CROSS JOIN E1 T3 CROSS JOIN E1 T4
    )
    SELECT NewString =
    ((
    SELECT SUBSTRING(@String,N,1)
    FROM iTally
    WHERE 0 = PATINDEX(@Pattern,SUBSTRING(@String COLLATE Latin1_General_BIN,N,1))
    FOR XML PATH(''),TYPE
    ).value('(./text())[1]','varchar(8000)'));
    GO

  • Scott In Sydney

    Hall of Fame

    Points: 3256

    Thanks @Eirikur, I appreciate you confirming my approach and adding the performance enhancement.

    I do try to learn from answers posted, not merely running with the advice, so I went searching for text().  I didn't find it in the MS docs, but did find the URL below.  So I add it here in case someone finding this thread in the future finds it useful.

    https://stackoverflow.com/questions/32225634/for-xml-path-in-sql-server-and-text

  • Eirikur Eiriksson

    SSC Guru

    Points: 182347

    Scott In Sydney - Wednesday, March 13, 2019 2:34 PM

    Thanks @Eirikur, I appreciate you confirming my approach and adding the performance enhancement.

    I do try to learn from answers posted, not merely running with the advice, so I went searching for text().  I didn't find it in the MS docs, but did find the URL below.  So I add it here in case someone finding this thread in the future finds it useful.

    https://stackoverflow.com/questions/32225634/for-xml-path-in-sql-server-and-text

    Quick explanation

    😎

    The text() function eliminates the need for the SQL Server to reconstruct the XML for constructing the output set. Without the function,the execution plan will contain two XML Reader table valued function instances for each element value and an UDX operator for constructing the output as XML.

    With the function, each element value only requires a single instance of the XML Reader with XPath filter table valued function in the execution plan.

     

    Sample code:

    USE TEEST;

    ='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:black;background:white;mso-highlight:white'>

    GO

    SET NOCOUNTON;

    ='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:black;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:gray;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:blue;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:black;background:white;mso-highlight:white'>

     

    DECLARE @TXML XML ='<root>

    ='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:black;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:red;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:black;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:gray;background:white;mso-highlight:white'>

     <node>A</node>

     <node>B</node>

     <node>C</node>

    </root>';

    -- Without the text() function

    ='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:black;background:white;mso-highlight:white'>

    SELECT

         NODES.DATA.value('.','CHAR(1)')AS NODE_VAL

    ='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:black;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:blue;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:black;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:gray;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:red;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:gray;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:red;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:gray;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:blue;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:gray;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:blue;background:white;mso-highlight:white'>

    FROM @TXML.nodes('root/node') NODES(DATA);

    ='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:black;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:gray;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:blue;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:gray;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:black;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:gray;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:red;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:gray;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:black;background:white;mso-highlight:white'>

    -- With the text() function

    ='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:black;background:white;mso-highlight:white'>

    SELECT

         NODES.DATA.value('(./text())[1]','CHAR(1)')AS NODE_VAL

    ='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:black;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:blue;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:black;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:gray;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:red;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:gray;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:red;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:gray;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:blue;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:gray;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:blue;background:white;mso-highlight:white'>

    FROM @TXML.nodes('root/node') NODES(DATA);

    ='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:black;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:gray;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:blue;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:gray;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:black;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:gray;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:red;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:gray;background:white;mso-highlight:white'>='font-size:9.5pt;font-family:consolas;mso-bidi-font-family:consolas;color:black;background:white;mso-highlight:white'>

     

    The execution plan without the text() function:

     

     

    The execution plan with the text() function:

     

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply