Windowing function puzzle

  • I have a complex query that produces data for multiple sets.  I need to ensure that for each set of criteria that I get data for each set.  The following query works, but I'd rather use a windowing function to avoid the double execution of the complex query:

    select a.*

    from dbo.ComplexQuery a

    inner join (select ColumnA, ColumnB, ColumnC, count(distinct ColumnD) as ColumnDCount

    from dbo.ComplexQuery

    group by ColumnA, ColumnB, ColumnC

    ) b

    on b.ColumnA = a.ColumnA

    and b.ColumnB = a.ColumnB

    and b.ColumnC = a.ColumnC

    and b.ColumnDCount = 2

    order by a.ColumnA, a.ColumnB, a.ColumnC, a.ColumnD

    This is about as simple of a query that I could produce which replicates the scenario I have.  To complicate things, it's possible to have multiple records of each ColumnD value with some in one group, some in the other, but we're interested in those in both groups only.  I'll have to work on some sample data, but I was hoping in the meantime someone had an idea.

  • You've basically hand-coded a DENSE_RANK() ... WHERE MAX(<DENSE_RANK() value>) = 2. The sample data will help roll a query to test, but you'll likely find a similar query plan with DENSE_RANK() . Window functions kick off a second trip through the data to prepare the results (for each distinct PARTITION BY clause), which are joined back to the base query results to produce the final output.

    Eddie Wuerch
    MCM: SQL

  • Why not use a windowed version of COUNT?

    SELECT a.*
    FROM dbo.ComplexQuery AS a
    WHERE COUNT(DISTINCT ColumnD) OVER(PARTITION BY ColumnA, ColumnB, ColumnC) = 2

    If it won't let you use that in the WHERE clause, you can always just SELECT it in a CTE and then use the WHERE against that new column name in the final SELECT.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Eddie Wuerch - Thursday, March 2, 2017 8:27 PM

    You've basically hand-coded a DENSE_RANK() ... WHERE MAX(<DENSE_RANK() value>) = 2. The sample data will help roll a query to test, but you'll likely find a similar query plan with DENSE_RANK() . Window functions kick off a second trip through the data to prepare the results (for each distinct PARTITION BY clause), which are joined back to the base query results to produce the final output.

    Thanks Eddie, but I already tried the DENSE_RANK() function.  Here's an example of my situation:

    ColumnAColumnBColumnCColumnDDense_RankWhat I Need
    123ABC987A12
    123ABC987B22
    456DEF654A11
    789GHI321A12
    789GHI321B22

    I hope this is clear enough.  Essentially, I need to ensure that I get both of the records as identified in ColumnD.  In the above case, I should get the records with Column A of 123 and 789 but NOT the record of 456 because it didn't have both and A and B values for ColumnD.  If it weren't for the need to return all columns of each record (I didn't display them all), I could simply deal with a GROUP BY.  Having the total row count in the column "What I Need" allows me to properly filter out incomplete sets.

  • sgmunson - Thursday, March 2, 2017 8:32 PM

    Why not use a windowed version of COUNT?

    SELECT a.*
    FROM dbo.ComplexQuery AS a
    WHERE COUNT(DISTINCT ColumnD) OVER(PARTITION BY ColumnA, ColumnB, ColumnC) = 2

    If it won't let you use that in the WHERE clause, you can always just SELECT it in a CTE and then use the WHERE against that new column name in the final SELECT.

    Unfortunately, the DISTINCT keyword is not allowed in a window function with OVER() and without OVER(), it's no longer a window function!  This would have been the ideal solution!!

  • Actually, let me clarify it a bit more and simplify the situation more succinctly:

    ColumnAColumnBOtherData1OtherData2Dense_RankWhat I Need
    123AABC98712
    123BBCD87622
    456ADEF65411
    789AGHI32112
    789BHIJ21022
    789BIJK31232

    The query that works is:
    select a.ColumnA, a.ColumnB, a.OtherData1, a.OtherData2

    from dbo.ComplexQuery a

    inner join (select ColumnA, count(distinct ColumnB) as ColumnBCount

    from dbo.ComplexQuery

    group by ColumnA

    ) b

    on b.ColumnA = a.ColumnA

    and b.ColumnBCount = 2

    order by a.ColumnA, a.ColumnB

    So, while the above works, it's not very efficient.  I had hoped to use something as suggested earlier with a COUNT(DISTINCT ColumnB) OVER(PARTITION BY ColumnA) but the DISTINCT is definitely not allowed.  Thoughts?

  • You can use


    DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) +
    DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) - 1

    instead of 


    COUNT(DISTINCT ColumnB) OVER(PARTITION BY ColumnA)

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Window functions are only allowed in the SELECT and ORDER BY clause. So if you really wanted to use one of them then the best option would be to put it in a CTE.  If you're just looking for alternative ways to write your select stmt then here's one option:


    SELECT *
    FROM ComplexQuery
    WHERE ColumnA IN
    (
    SELECT ColumnA
    FROM ComplexQuery
    GROUP BY ColumnA
    HAVING count(distinct ColumnB) = 2
    )

  • Mark Cowne - Friday, March 3, 2017 7:03 AM

    You can use


    DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) +
    DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) - 1

    instead of 


    COUNT(DISTINCT ColumnB) OVER(PARTITION BY ColumnA)

    This works when the situation only calls for 2 values for ColumnB (which I showed in the examples).  What about where there are 3, 4, or more?

  • Jeff Atherton - Friday, March 3, 2017 7:12 AM

    Window functions are only allowed in the SELECT and ORDER BY clause. So if you really wanted to use one of them then the best option would be to put it in a CTE.  If you're just looking for alternative ways to write your select stmt then here's one option:


    SELECT *
    FROM ComplexQuery
    WHERE ColumnA IN
    (
    SELECT ColumnA
    FROM ComplexQuery
    GROUP BY ColumnA
    HAVING count(distinct ColumnB) = 2
    )

    This would also work, but it requires me to still have the complex query in there twice.  Unfortunately, I can't make use of a CTE in my particular scenario as I only gave a simplified portion of my overall query structure.  Even then, referencing a CTE twice still causes it to execute twice.  Window functions, as I understand it, execute it once but will navigate the result set multiple times as needed.

  • Aaron N. Cutshall - Friday, March 3, 2017 8:33 AM

    Mark Cowne - Friday, March 3, 2017 7:03 AM

    You can use


    DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) +
    DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) - 1

    instead of 


    COUNT(DISTINCT ColumnB) OVER(PARTITION BY ColumnA)

    This works when the situation only calls for 2 values for ColumnB (which I showed in the examples).  What about where there are 3, 4, or more?

    Can you post some sample data.

    This appears to work

    DECLARE @mytable TABLE(ColumnA INT, ColumnB CHAR(1))
    INSERT INTO @mytable(columnA, ColumnB)
    VALUES(123,'A'),(123,'B'),(456,'A'),(789,'A'),(789,'B'),(789,'B'),(888,'A'),(888,'B'),(888,'B'),(888,'D'),(888,'C'),(888,'C');

    SELECT ColumnA, ColumnB,
    DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) +
    DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) - 1 AS DistinctCount
    FROM @mytable
    ORDER BY ColumnA;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Aaron, so is this a performance issue? How much data are we talking about here? How long does the query take to run with the ComplexQuery in there twice? How often will this query need to be run everyday? Do the users need up to the minute results or can you run some ETL overnight to a new table just for reporting?

  • Mark Cowne - Friday, March 3, 2017 8:49 AM

    Can you post some sample data.

    This appears to work

    DECLARE @mytable TABLE(ColumnA INT, ColumnB CHAR(1))
    INSERT INTO @mytable(columnA, ColumnB)
    VALUES(123,'A'),(123,'B'),(456,'A'),(789,'A'),(789,'B'),(789,'B'),(888,'A'),(888,'B'),(888,'B'),(888,'D'),(888,'C'),(888,'C');

    SELECT ColumnA, ColumnB,
    DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) +
    DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) - 1 AS DistinctCount
    FROM @mytable
    ORDER BY ColumnA;

    Mark,

    This is PERFECT!!  WOW!!  I'm going to have to study this one to make sure I truly understand it.  Here's your revised query with additional data:
    DECLARE @mytable TABLE(ColumnA INT, ColumnB CHAR(1), OtherData1 CHAR(3), OtherData2 INT)
    INSERT INTO @mytable(columnA, ColumnB, OtherData1, OtherData2)
    VALUES    (123,'A','ABC',987),
            (123,'B','BCD',876),
            (123,'C','CDE',765),
            (456,'A','DEF',654),
            (789,'A','GHI',321),
            (789,'B','HIJ',210),
            (789,'B','IJK',312),
            (258,'A','FDE',753),
            (258,'B','THE',951),
            (258,'C','PKH',873),
            (246,'A','LDJ',983),
            (246,'B','WKS',893),
            (246,'C','FUW',373),
            (246,'C','WLC',583),
            (246,'D','GKD',353);

    SELECT ColumnA, ColumnB, OtherData1, OtherData2,
    DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) AS DenseRankOnly,
    DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) AS DenseRankDesc,
    DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) +
    DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) - 1 AS DistinctCount
    FROM @mytable
    ORDER BY ColumnA, ColumnB;

    It certainly presents the results just as I needed it in all situations.  I'm totally impressed!!

  • Aaron N. Cutshall - Friday, March 3, 2017 10:08 AM

    Mark Cowne - Friday, March 3, 2017 8:49 AM

    Can you post some sample data.

    This appears to work

    DECLARE @mytable TABLE(ColumnA INT, ColumnB CHAR(1))
    INSERT INTO @mytable(columnA, ColumnB)
    VALUES(123,'A'),(123,'B'),(456,'A'),(789,'A'),(789,'B'),(789,'B'),(888,'A'),(888,'B'),(888,'B'),(888,'D'),(888,'C'),(888,'C');

    SELECT ColumnA, ColumnB,
    DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) +
    DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) - 1 AS DistinctCount
    FROM @mytable
    ORDER BY ColumnA;

    Mark,

    This is PERFECT!!  WOW!!  I'm going to have to study this one to make sure I truly understand it.  Here's your revised query with additional data:
    DECLARE @mytable TABLE(ColumnA INT, ColumnB CHAR(1), OtherData1 CHAR(3), OtherData2 INT)
    INSERT INTO @mytable(columnA, ColumnB, OtherData1, OtherData2)
    VALUES    (123,'A','ABC',987),
            (123,'B','BCD',876),
            (123,'C','CDE',765),
            (456,'A','DEF',654),
            (789,'A','GHI',321),
            (789,'B','HIJ',210),
            (789,'B','IJK',312),
            (258,'A','FDE',753),
            (258,'B','THE',951),
            (258,'C','PKH',873),
            (246,'A','LDJ',983),
            (246,'B','WKS',893),
            (246,'C','FUW',373),
            (246,'C','WLC',583),
            (246,'D','GKD',353);

    SELECT ColumnA, ColumnB, OtherData1, OtherData2,
    DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) AS DenseRankOnly,
    DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) AS DenseRankDesc,
    DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) +
    DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) - 1 AS DistinctCount
    FROM @mytable
    ORDER BY ColumnA, ColumnB;

    It certainly presents the results just as I needed it in all situations.  I'm totally impressed!!

    You're  welcome.

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Here's an alternative. I'm not sure which would perform better as both have issues that can slow them down.

    WITH CTE AS(
      SELECT ColumnA, ColumnB, OtherData1, OtherData2,
      DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) AS DenseRank
      FROM @mytable
    )
    SELECT ColumnA, ColumnB, OtherData1, OtherData2,
      MAX(DenseRank) OVER( PARTITION BY ColumnA) DistinctCount
    FROM CTE
    ORDER BY ColumnA, ColumnB;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply