Count(*) with LIKE

  • I am working on converting an Excel formula to Excel.  I have successfully moved two other Excel reports to SQL with queries simple enough to figure out.  Now I'm on to a more challenging one.  The Excel query does the following:

    The data is sorted by a "Region" Column and then the query looks for a keyword with a wildcard * and them counts (and sums for one).

    Commercial:
    =COUNTIF(Membership!B6400:B6703,"*EPO")+COUNTIF(Membership!B6400:B6703,"*HMO")+COUNTIF(Membership!B6400:B6703,"*POS")+COUNTIF(Membership!B6400:B6703,"*PPO")
    Medicare:
    =COUNTIF(Membership!B6400:B6703,"*Medicare")
    Medicaid:
    =COUNTIF(Membership!B6400:B6703,"*Medicaid")

    When I try to do this with T-SQL it counts with LIKE '%PPO' it doesn't behave the same way it does in Excel.  My SQL query is below.

    I'd like it to give me totals from the column [Product Line Total] for any records with "Medicaid" , "Medicare" and total all records with any "PPO" , "HMO" , "EPO" , "POS".

    My other thought is maybe this is possible using a Temporary Table, but I have gotten to those yet in my self-training.


    SELECT

      MYCO.Region as [Region],
        Insurance.[Sub Product Line] as [Product Line], count(*) as [Product Line Total]

    FROM [CoveredLives].[dbo].[Insurance] as Insurance

    inner join MyCompany_Info.dbo.Echo as MYCO

    on Insurance.[PCP NPI] = MYCO.NPI
    where
    Insurance.[Sub Product Line] like '%Medicare%'
    or Insurance.[Sub Product Line] like '%medicaid%'

    or [Sub Product Line] like '%HMO'
    or [Sub Product Line] like '%PPO'
    or [Sub Product Line] like '%EPO'
    or [Sub Product Line] like '%POS'

    Group by MYCO.Region, Insurance.[Sub Product Line]
    order by MYCO.Region, Insurance.[Sub Product Line]

    -Mark
    MSSQL 2019 Standard, Azure Hosted. Techie/Sysadmin by trade; Three years as a "DBA" now.

  • You need to replace your COUNTIF with a COUNT(CASE .... WHEN ...) 

    For example,  COUNT(CASE WHEN [Sub Product Line] LIKE '%HMO' THEN 1 ELSE NULL END).

    While the ELSE clause is optional here, I'm including it to emphasize that you want to return NULL for anything that you don't want to count.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • CASE is definitely my favorite TSQL thingy!! 😎

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • drew.allen - Wednesday, February 14, 2018 10:18 AM

    You need to replace your COUNTIF with a COUNT(CASE .... WHEN ...) 

    For example,  COUNT(CASE WHEN [Sub Product Line] LIKE '%HMO' THEN 1 ELSE NULL END).

    While the ELSE clause is optional here, I'm including it to emphasize that you want to return NULL for anything that you don't want to count.

    Drew

    This got me going in the right direction along with this article and the example in the Remarks section.  I forgot to mention that I had tried the CASE function but I was missing the "THEN 1 / ELSE 0/Null" (I used '0' for my use case) in parts of the statement which caused them to fail.  With your tips I got off on the right track and had better keywords to do more search by.  Once I polish up the case statements I'll post back the results. 🙂

    I need to add more case statements because some records have both "Medicare" and "PPO" in the same data row causing double counts.  I just need the recipient of this report to tell me how they want those handled.  My guess is the Excel sheet was also counting them twice, or not accurately.

    EDIT: Below is my updated SUM statements with CASE.  I learned from this that the order of the 'WHEN' lines makes a difference.  I originally had the first two WHEN statements for the first SUM function 4th and 5th and I was still getting double counts.  I moved them to be first and second conditions(?) and now the counts are what they should be.  Thank you again for the nudges!


       SUM (Case
                when [Sub Product Line] like 'M%PPO%' Then 0
                when [Sub Product Line] like 'P%HMO%' Then 0
                when [Sub Product Line] like '%PPO%' Then 1
                when [Sub Product Line] like '%HMO%' Then 1
                when [Sub Product Line] like '%POS%' Then 1
                when [Sub Product Line] like '%EPO%' Then 1
            Else '0'
            End) as [Commercial Total],
        SUM (Case
            when [Sub Product Line] like '%Medicare%' then 1
            when [Sub Product Line] = '%Medicaid Dual Eligible%' then 1
            Else '0'
            End) as [Medicare Total],
        SUM (Case
            when [Sub Product Line] like '%Medicaid%' Then 1
            when [Sub Product Line] like '%Medicaid Dual Eligible%' then 0
            Else '0'
            End) as [Medicaid Total]

    -Mark
    MSSQL 2019 Standard, Azure Hosted. Techie/Sysadmin by trade; Three years as a "DBA" now.

  • CASE is one of the very few SQL Server things that can short-circuit like this. I note that I don't think this behavior is GUARANTEED in all cases though, although it might be.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • It is guaranteed.  From CASE (Transact-SQL)

    The CASE statement evaluates its conditions sequentially and stops with the first condition whose condition is satisfied.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • If you have a query already that returns the region and product line, the you could try the query below using sql servers OVER clause to partition the sums by region. Just replace my values table with your actual table joins.
    Also, this way the order of case statements doesn't matter, b/c the -1 for 'M%POS%' cancels out the 1 for '%POS%'.


    select distinct
        x.region
        ,sum(case when x.productline like '%HMO%' then 1 else 0 end
            + case when x.productline like '%POS%' then 1 else 0 end
            + case when x.productline like '%PPO%' then 1 else 0 end
            + case when x.productline like '%EPO%' then 1 else 0 end
            + case when x.productline like 'M%POS%' then -1 else 0 end) over (partition by x.region) Commercial_Total
        ,sum(case when x.productline like '%Medicare%' then 1 else 0 end) over (partition by x.region) Medicare_Total
        ,sum(case when x.productline like '%Medicaid%' then 1 else 0 end) over (partition by x.region) Medicaid_Total
    from ( values
            ('1','Medicare')
            ,('1','PPO')
            ,('1','Medicaid')
            ,('4','HMO')
            ,('1','PPO')
            ,('2','Medicaid')
            ,('3','Medicare')
            ,('2','HMO')
            ,('2','EPO')
            ,('1','Medicare')
            ,('2','POS')
            ,('2','Medicaid')
            ,('3','HMO')
            ,('2','HMO')
            ,('3','Medicaid')
            ,('1','EPO')
            ,('3','POS')
            ,('3','M1POS')
            ,('3','Medicaid')
            ,('1','Medicaid Dual Eligible')
            ,('4','Medicare')
            ,('1','POS')
            ,('4','Medicare')
            ,('1','Medicaid')
            ,('4','EPO')
            ,('5','Medicaid')
            ,('5','Medicaid Dual Eligible')
            ,('5','Medicaid')
    ) x(region,productline)

    that would return results like this:

  • usererror - Wednesday, February 14, 2018 11:55 AM

    drew.allen - Wednesday, February 14, 2018 10:18 AM

    You need to replace your COUNTIF with a COUNT(CASE .... WHEN ...) 

    For example,  COUNT(CASE WHEN [Sub Product Line] LIKE '%HMO' THEN 1 ELSE NULL END).

    While the ELSE clause is optional here, I'm including it to emphasize that you want to return NULL for anything that you don't want to count.

    Drew

    This got me going in the right direction along with this article and the example in the Remarks section.  I forgot to mention that I had tried the CASE function but I was missing the "THEN 1 / ELSE 0/Null" (I used '0' for my use case) in parts of the statement which caused them to fail.  With your tips I got off on the right track and had better keywords to do more search by.  Once I polish up the case statements I'll post back the results. 🙂

    I need to add more case statements because some records have both "Medicare" and "PPO" in the same data row causing double counts.  I just need the recipient of this report to tell me how they want those handled.  My guess is the Excel sheet was also counting them twice, or not accurately.

    EDIT: Below is my updated SUM statements with CASE.  I learned from this that the order of the 'WHEN' lines makes a difference.  I originally had the first two WHEN statements for the first SUM function 4th and 5th and I was still getting double counts.  I moved them to be first and second conditions(?) and now the counts are what they should be.  Thank you again for the nudges!


       SUM (Case
                when [Sub Product Line] like 'M%PPO%' Then 0
                when [Sub Product Line] like 'P%HMO%' Then 0
                when [Sub Product Line] like '%PPO%' Then 1
                when [Sub Product Line] like '%HMO%' Then 1
                when [Sub Product Line] like '%POS%' Then 1
                when [Sub Product Line] like '%EPO%' Then 1
            Else '0'
            End) as [Commercial Total],
        SUM (Case
            when [Sub Product Line] like '%Medicare%' then 1
            when [Sub Product Line] = '%Medicaid Dual Eligible%' then 1
            Else '0'
            End) as [Medicare Total],
        SUM (Case
            when [Sub Product Line] like '%Medicaid%' Then 1
            when [Sub Product Line] like '%Medicaid Dual Eligible%' then 0
            Else '0'
            End) as [Medicaid Total]

    You need to reverse the WHEN's in your last sum.  Currently the first condition is a superset of the second, so all Medicaid including Dual will be caught in the first WHEN.

  • gvoshol 73146 - Thursday, February 15, 2018 5:20 AM

    You need to reverse the WHEN's in your last sum.  Currently the first condition is a superset of the second, so all Medicaid including Dual will be caught in the first WHEN.

    Thank you for catching that.  Just fixed it!

    -Mark
    MSSQL 2019 Standard, Azure Hosted. Techie/Sysadmin by trade; Three years as a "DBA" now.

  • Jackie Lowery - Wednesday, February 14, 2018 6:29 PM

    If you have a query already that returns the region and product line, the you could try the query below using sql servers OVER clause to partition the sums by region. Just replace my values table with your actual table joins.
    Also, this way the order of case statements doesn't matter, b/c the -1 for 'M%POS%' cancels out the 1 for '%POS%'.


    select distinct
        x.region
        ,sum(case when x.productline like '%HMO%' then 1 else 0 end
            + case when x.productline like '%POS%' then 1 else 0 end
            + case when x.productline like '%PPO%' then 1 else 0 end
            + case when x.productline like '%EPO%' then 1 else 0 end
            + case when x.productline like 'M%POS%' then -1 else 0 end) over (partition by x.region) Commercial_Total
        ,sum(case when x.productline like '%Medicare%' then 1 else 0 end) over (partition by x.region) Medicare_Total
        ,sum(case when x.productline like '%Medicaid%' then 1 else 0 end) over (partition by x.region) Medicaid_Total
    from ( values
            ('1','Medicare')
            ,('1','PPO')
            ,('1','Medicaid')
            ,('4','HMO')
            ,('1','PPO')
            ,('2','Medicaid')
            ,('3','Medicare')
            ,('2','HMO')
            ,('2','EPO')
            ,('1','Medicare')
            ,('2','POS')
            ,('2','Medicaid')
            ,('3','HMO')
            ,('2','HMO')
            ,('3','Medicaid')
            ,('1','EPO')
            ,('3','POS')
            ,('3','M1POS')
            ,('3','Medicaid')
            ,('1','Medicaid Dual Eligible')
            ,('4','Medicare')
            ,('1','POS')
            ,('4','Medicare')
            ,('1','Medicaid')
            ,('4','EPO')
            ,('5','Medicaid')
            ,('5','Medicaid Dual Eligible')
            ,('5','Medicaid')
    ) x(region,productline)

    that would return results like this:

    The whole purpose of the windowed functions is to include both detail and summary information in the same output.  You are not including any detail information, so there is really nothing to gain by using a windowed function here.  The fact that you needed to include a DISTINCT should be a red flag that this is not the right approach.

    I ran a test comparing the windowed functions with the standard GROUP BY.  Here are the results:

    Windowed functions approach
    Table 'Worktable'. Scan count 3, logical reads 77, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table '#regions____00000000C37E'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Simple GROUP BY
    Table '#regions____00000000C37E'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Notice that the windowed functions approach requires a 'Worktable' and has to scan that worktable three times (once for each windowed function).

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • @Jackie Lowery - At this point, I don't know the OVER command, yet.  My full query with the updated CASE statements is below.

    @drew Allen - Also I am not familiar with Windowed Functions, yet. I've seen them mentioned here in posts, but don't fully understand them yet either.

    SELECT
      MYCO.Region as [Region],
        SUM (Case
                when [Sub Product Line] like 'M%PPO%' Then 0
                when [Sub Product Line] like 'P%HMO%' Then 0
                when [Sub Product Line] like '%PPO%' Then 1
                when [Sub Product Line] like '%HMO%' Then 1
                when [Sub Product Line] like '%POS%' Then 1
                when [Sub Product Line] like '%EPO%' Then 1
            Else '0'
            End) as [Commercial Total],
        SUM (Case
                when [Sub Product Line] = '%Medicaid Dual Eligible%' then 1
                when [Sub Product Line] like '%Medicare%' then 1
            Else '0'
            End) as [Medicare Total],
        SUM (Case
                when [Sub Product Line] like '%Medicaid Dual Eligible%' then 0
                when [Sub Product Line] like '%Medicaid%' Then 1
            Else '0'
            End) as [Medicaid Total]

    FROM [CoveredLives].[dbo].[InsuranceCO] as Insurance

    inner join MYCompany_Info.dbo.Echo as MYCO on Insurance.[PCP NPI] = MYCO.NPI

    Group by MYCO.Region
    order by MYCO.Region

    -Mark
    MSSQL 2019 Standard, Azure Hosted. Techie/Sysadmin by trade; Three years as a "DBA" now.

  • @drew Allen - Thanks for that info. Still a bit of a noob to SQL query optimization.

    @mark-3 Tarquini - I think i would still include each of your cases in a separate statement and add them together to make the logic a bit simpler and not have to keep them in any particular order.  The negative values act sort of as a NOT operator.


    SUM (
      CASE when [Sub Product Line] like '%PPO%' Then 1 ELSE 0 END
      + CASE when [Sub Product Line] like '%HMO%' Then 1 ELSE 0 END
      + CASE when [Sub Product Line] like '%POS%' Then 1 ELSE 0 END
      + CASE when [Sub Product Line] like '%EPO%' Then 1 ELSE 0 END
      + Case when [Sub Product Line] like 'M%PPO%' Then -1 ELSE 0 END
      + CASE when [Sub Product Line] like 'P%HMO%' Then -1 ELSE 0 END
    ) as [Commercial Total],
    SUM (
        Case when [Sub Product Line] = '%Medicaid Dual Eligible%' then 1 else 0 end
      + Case when [Sub Product Line] like '%Medicare%' then 1 else 0 end
    ) as [Medicare Total],
    SUM (
       Case when [Sub Product Line] like '%Medicaid%' Then 1 else 0 end
       + Case when [Sub Product Line] like '%Medicaid Dual Eligible%' then -1 else 0 end
    ) as [Medicaid Total]

  • usererror - Thursday, February 15, 2018 8:15 AM

    @Jackie Lowery - At this point, I don't know the OVER command, yet.  My full query with the updated CASE statements is below.

    @drew Allen - Also I am not familiar with Windowed Functions, yet. I've seen them mentioned here in posts, but don't fully understand them yet either.

    SELECT
      MYCO.Region as [Region],
        SUM (Case
                when [Sub Product Line] like 'M%PPO%' Then 0
                when [Sub Product Line] like 'P%HMO%' Then 0
                when [Sub Product Line] like '%PPO%' Then 1
                when [Sub Product Line] like '%HMO%' Then 1
                when [Sub Product Line] like '%POS%' Then 1
                when [Sub Product Line] like '%EPO%' Then 1
            Else '0'
            End) as [Commercial Total],
        SUM (Case
                when [Sub Product Line] = '%Medicaid Dual Eligible%' then 1
                when [Sub Product Line] like '%Medicare%' then 1
            Else '0'
            End) as [Medicare Total],
        SUM (Case
                when [Sub Product Line] like '%Medicaid Dual Eligible%' then 0
                when [Sub Product Line] like '%Medicaid%' Then 1
            Else '0'
            End) as [Medicaid Total]

    FROM [CoveredLives].[dbo].[InsuranceCO] as Insurance

    inner join MYCompany_Info.dbo.Echo as MYCO on Insurance.[PCP NPI] = MYCO.NPI

    Group by MYCO.Region
    order by MYCO.Region

    Windowed functions are called that, because they return information based on a "window" on the current set.  The window is defined by the OVER clause and it can include a partition, order, and frame.  Which pieces are required and which are optional depend on the function.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Thursday, February 15, 2018 8:01 AM

    Jackie Lowery - Wednesday, February 14, 2018 6:29 PM

    If you have a query already that returns the region and product line, the you could try the query below using sql servers OVER clause to partition the sums by region. Just replace my values table with your actual table joins.
    Also, this way the order of case statements doesn't matter, b/c the -1 for 'M%POS%' cancels out the 1 for '%POS%'.


    select distinct
        x.region
        ,sum(case when x.productline like '%HMO%' then 1 else 0 end
            + case when x.productline like '%POS%' then 1 else 0 end
            + case when x.productline like '%PPO%' then 1 else 0 end
            + case when x.productline like '%EPO%' then 1 else 0 end
            + case when x.productline like 'M%POS%' then -1 else 0 end) over (partition by x.region) Commercial_Total
        ,sum(case when x.productline like '%Medicare%' then 1 else 0 end) over (partition by x.region) Medicare_Total
        ,sum(case when x.productline like '%Medicaid%' then 1 else 0 end) over (partition by x.region) Medicaid_Total
    from ( values
            ('1','Medicare')
            ,('1','PPO')
            ,('1','Medicaid')
            ,('4','HMO')
            ,('1','PPO')
            ,('2','Medicaid')
            ,('3','Medicare')
            ,('2','HMO')
            ,('2','EPO')
            ,('1','Medicare')
            ,('2','POS')
            ,('2','Medicaid')
            ,('3','HMO')
            ,('2','HMO')
            ,('3','Medicaid')
            ,('1','EPO')
            ,('3','POS')
            ,('3','M1POS')
            ,('3','Medicaid')
            ,('1','Medicaid Dual Eligible')
            ,('4','Medicare')
            ,('1','POS')
            ,('4','Medicare')
            ,('1','Medicaid')
            ,('4','EPO')
            ,('5','Medicaid')
            ,('5','Medicaid Dual Eligible')
            ,('5','Medicaid')
    ) x(region,productline)

    that would return results like this:

    The whole purpose of the windowed functions is to include both detail and summary information in the same output.  You are not including any detail information, so there is really nothing to gain by using a windowed function here.  The fact that you needed to include a DISTINCT should be a red flag that this is not the right approach.

    I ran a test comparing the windowed functions with the standard GROUP BY.  Here are the results:

    Windowed functions approach
    Table 'Worktable'. Scan count 3, logical reads 77, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table '#regions____00000000C37E'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Simple GROUP BY
    Table '#regions____00000000C37E'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Notice that the windowed functions approach requires a 'Worktable' and has to scan that worktable three times (once for each windowed function).

    Drew

    Did your test override the default (and HORRIBLY-performing) RANGE and replace it with ROWS?  As you note the spool crushed the query.
    https://www.red-gate.com/simple-talk/sql/learn-sql-server/window-functions-in-sql-server-part-3-questions-of-performance/

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru - Thursday, February 15, 2018 11:10 AM

    drew.allen - Thursday, February 15, 2018 8:01 AM

    Jackie Lowery - Wednesday, February 14, 2018 6:29 PM

    If you have a query already that returns the region and product line, the you could try the query below using sql servers OVER clause to partition the sums by region. Just replace my values table with your actual table joins.
    Also, this way the order of case statements doesn't matter, b/c the -1 for 'M%POS%' cancels out the 1 for '%POS%'.


    select distinct
        x.region
        ,sum(case when x.productline like '%HMO%' then 1 else 0 end
            + case when x.productline like '%POS%' then 1 else 0 end
            + case when x.productline like '%PPO%' then 1 else 0 end
            + case when x.productline like '%EPO%' then 1 else 0 end
            + case when x.productline like 'M%POS%' then -1 else 0 end) over (partition by x.region) Commercial_Total
        ,sum(case when x.productline like '%Medicare%' then 1 else 0 end) over (partition by x.region) Medicare_Total
        ,sum(case when x.productline like '%Medicaid%' then 1 else 0 end) over (partition by x.region) Medicaid_Total
    from ( values
            ('1','Medicare')
            ,('1','PPO')
            ,('1','Medicaid')
            ,('4','HMO')
            ,('1','PPO')
            ,('2','Medicaid')
            ,('3','Medicare')
            ,('2','HMO')
            ,('2','EPO')
            ,('1','Medicare')
            ,('2','POS')
            ,('2','Medicaid')
            ,('3','HMO')
            ,('2','HMO')
            ,('3','Medicaid')
            ,('1','EPO')
            ,('3','POS')
            ,('3','M1POS')
            ,('3','Medicaid')
            ,('1','Medicaid Dual Eligible')
            ,('4','Medicare')
            ,('1','POS')
            ,('4','Medicare')
            ,('1','Medicaid')
            ,('4','EPO')
            ,('5','Medicaid')
            ,('5','Medicaid Dual Eligible')
            ,('5','Medicaid')
    ) x(region,productline)

    that would return results like this:

    The whole purpose of the windowed functions is to include both detail and summary information in the same output.  You are not including any detail information, so there is really nothing to gain by using a windowed function here.  The fact that you needed to include a DISTINCT should be a red flag that this is not the right approach.

    I ran a test comparing the windowed functions with the standard GROUP BY.  Here are the results:

    Windowed functions approach
    Table 'Worktable'. Scan count 3, logical reads 77, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table '#regions____00000000C37E'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Simple GROUP BY
    Table '#regions____00000000C37E'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Notice that the windowed functions approach requires a 'Worktable' and has to scan that worktable three times (once for each windowed function).

    Drew

    Did your test override the default (and HORRIBLY-performing) RANGE and replace it with ROWS?  As you note the spool crushed the query.
    https://www.red-gate.com/simple-talk/sql/learn-sql-server/window-functions-in-sql-server-part-3-questions-of-performance/

    A frame requires an ORDER BY clause to determine which rows are preceding/following.  Since there was no ORDER BY specified, it uses the whole (sub)set and you CANNOT add a frame.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply