Finding the MIN or a group of values not used in a previous group

Question

Finding the MIN or a group of values not used in a previous group

Chris Souchik

SSC-Addicted

Points: 419
More actions
November 1, 2018 at 12:13 pm

#409244

I have been having trouble finding anything on google because I'm not sure how to phrase the search. What I need to do is to select the MIN R from each Id group that hasn't already been selected by a previous group. This is the trimmed down version with an example of a row that has no R value.
I have a table simplified
CREATE TABLE Data(Id int, R int)
INSERT INTO Data VALUES(1,1),(2,1),(3,2),(3,3),(4,2),(4,3),(4,4),(5,4),(5,5),(6,5),(6,6),(7,6)

I think I should be able to do it using a tally table and window functions, but the grouping is giving me a headache. I want the min in the group that has not been previously used. I know I can create a cursor to do what I want, but I think there should be a way to do this set based. Any help is appreciated.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

Chris Souchik - Thursday, November 1, 2018 12:13 PM
I have been having trouble finding anything on google because I'm not sure how to phrase the search. What I need to do is to select the MIN R from each Id group that hasn't already been selected by a previous group. This is the trimmed down version with an example of a row that has no R value.
I have a table simplified
CREATE TABLE Data(Id int, R int)
INSERT INTO Data VALUES(1,1),(2,1),(3,2),(3,3),(4,2),(4,3),(4,4),(5,4),(5,5),(6,5),(6,6),(7,6)

I think I should be able to do it using a tally table and window functions, but the grouping is giving me a headache. I want the min in the group that has not been previously used. I know I can create a cursor to do what I want, but I think there should be a way to do this set based. Any help is appreciated.

So expected results based on the above data.

ScottPletcher SSC Guru Points: 100948 More actions · Answer 2

This may not be the most efficient, but I believe it works, and shouldn't be too much overhead anyway unless you have lots of unqiue Ids:

;WITH cte_group AS ( SELECT Id, MIN(R) AS R FROM Data GROUP BY id ) SELECT cg.Id, cg_unique.R FROM cte_group cg LEFT OUTER JOIN ( SELECT *, ROW_NUMBER() OVER(PARTITION BY R ORDER BY Id) AS row_num FROM cte_group ) AS cg_unique ON cg_unique.row_num = 1 AND cg_unique.Id = cg.Id

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Chris Souchik SSC-Addicted Points: 419 More actions · Answer 3

Chris Souchik

SSC-Addicted

Points: 419

November 1, 2018 at 12:24 pm

#2012096

Correct

Chris Souchik SSC-Addicted Points: 419 More actions · Answer 4

Scott,

Very close, but I am seeing this:

Id 4 should not be null it should be 3. Thanks

andycadley SSCertifiable Points: 5296 More actions · Answer 5

I think this will do it:

with X as ( select id, max(R) over (order by id, R) as maxR from #Data ), Y as ( select id, min(maxR) as R from X group by id ) select id, nullif(R, lag(R,1) over (order by id)) from Y

Jason A. Long SSC-Insane Points: 23780 More actions · Answer 6

My entry...

IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL BEGIN DROP TABLE #TestData; END;

CREATE TABLE #TestData ( Id INT NOT NULL, R INT NOT NULL ); INSERT #TestData(Id, R) VALUES (1,1),(2,1),(3,2),(3,3),(4,2),(4,3),(4,4),(5,4),(5,5),(6,5),(6,6),(7,6);

--================================================================================

SELECT x1 .Id, R = CASE WHEN x1.R > ISNULL(MAX(x1.R) OVER (ORDER BY x1.R, x1.Id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) THEN x1.R END FROM ( SELECT td .Id, td.R, r1 = ISNULL(MIN(td.R) OVER (PARTITION BY td.Id ORDER BY td.Id, td.R ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), td.R), r2 = ISNULL(MAX(td.R) OVER (ORDER BY td.Id, td.R ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) FROM #TestData td ) x1 WHERE x1.R = CASE WHEN x1.r1 >= x1.r2 THEN x1.r1 ELSE x1.r2 END;

Results...
Id R ----------- ----------- 1 1 2 NULL 3 2 4 3 5 4 6 5
7 6

Jason A. Long SSC-Insane Points: 23780 More actions · Answer 7

andycadley - Thursday, November 1, 2018 1:14 PM
I think this will do it:
with X as ( select id, max(R) over (order by id, R) as maxR from #Data ), Y as ( select id, min(maxR) as R from X group by id ) select id, nullif(R, lag(R,1) over (order by id)) from Y

Very nice!

Chris Souchik SSC-Addicted Points: 419 More actions · Answer 8

Andy and Jason,

Thanks for your solution. It works great on this data set, but unfortunately it doesn't work for all data sets. For example,

I'm starting to think it may be easier to use a cursor, even if it will be slower. The recursive nature of the query may not allow a set based solution, at least it has me stumped.

george_at_sql SSCommitted Points: 1967 More actions · Answer 9

I also think, it would need a recursive solution to since lead / lag values based solution looks at the values when the query was kicked off and doesnt get influenced by the values that changes when the query progresses.

I could do this using recursive cte as follows

CREATE TABLE Data(Id int, R int) INSERT INTO Data VALUES (1,1); INSERT INTO Data VALUES (2,1); INSERT INTO Data VALUES (3,2); INSERT INTO Data VALUES (3,3); INSERT INTO Data VALUES (4,2); INSERT INTO Data VALUES (4,3); INSERT INTO Data VALUES (4,4); INSERT INTO Data VALUES (5,4); INSERT INTO Data VALUES (5,5); INSERT INTO Data VALUES (6,5); INSERT INTO Data VALUES (6,6); INSERT INTO Data VALUES (7,6);

/* truncate table data

INSERT INTO Data VALUES (1,1); INSERT INTO Data VALUES (2,1); INSERT INTO Data VALUES (2,2); INSERT INTO Data VALUES (2,3); INSERT INTO Data VALUES (3,3);*/

with data1 as (select id /*This block only to generate row_numbers based on the records in the table..*/ ,r ,row_number() over(order by id,r) as rnk from data ) ,cte(id,r,rnk,grp_flag,concat_val) as (select a.id,a.r,a.rnk,1 as grp_flag,cast(a.r as varchar2(100)) as concat_val from data1 a where rnk=1 /*Start with first record and construct the concat_val only if the "r" is not previously used and "r" value is not already populated for the id*/ union all select b.id,b.r,b.rnk,case when a.id=b.id and a.grp_flag=1 then 0 else case when a.concat_val like '%'||b.r||'%' then 0 else 1 end end as grp_flag ,case when a.id=b.id and a.grp_flag=1 then a.concat_val else case when a.concat_val like '%'||b.r||'%' then a.concat_val else b.r||'/'||a.concat_val end end as concat_val from cte a join data1 b on a.rnk+1=b.rnk ) select x.id,y.r from ( select distinct id from data1 ) x left join cte y on x.id=y.id and y.grp_flag=1 order by x.id

george_at_sql SSCommitted Points: 1967 More actions · Answer 10

george_at_sql

SSCommitted

Points: 1967

November 4, 2018 at 10:10 pm

#2012364

Did the above query work out?

Chris Souchik SSC-Addicted Points: 419 More actions · Answer 11

George,

I will try and test this today. At a quick glance, it looks promising.

Thanks,
Chris

Chris Souchik SSC-Addicted Points: 419 More actions · Answer 12

Unfortunately really busy week. I haven't had a chance to review yet. Will update when I get a chance. Thanks

sgmunson SSC Guru Points: 110639 More actions · Answer 13

Here's an actual cursor. I thought about this one for a long time, and realized I could not use a recursive CTE and still use any form of aggregate, or a 2nd reference to the CTE itself. That was really crimping my style, so I gave up and went this way instead. Run times were about 0.01563 seconds, +/-, before caching took over and both queries would run instantaneously. Fortunately, it works for both sets of input data:
SET NOCOUNT ON; IF OBJECT_ID(N'tempdb..#Data', N'U') IS NOT NULL BEGIN DROP TABLE #Data; END;

IF OBJECT_ID(N'tempdb..#Results', N'U') IS NOT NULL BEGIN DROP TABLE #Results; END;

CREATE TABLE #Data( Id int, R int ); INSERT INTO #Data (Id, R) VALUES (1,1), (2,1), (3,2), (3,3), (4,2), (4,3), (4,4), (5,4), (5,5), (6,5), (6,6), (7,6);

CREATE TABLE #Results ( Id int NOT NULL PRIMARY KEY CLUSTERED, MIN_R int NULL );

DECLARE @Id AS int, @r AS int, @First AS int = 1, @BegDateTime AS datetime2 = sysdatetime(), @EndDateTime AS datetime2;

DECLARE MYCURSOR CURSOR FAST_FORWARD READ_ONLY FOR SELECT D.Id, MIN(D.R) AS R FROM #Data AS D GROUP BY D.Id;

OPEN MYCURSOR; FETCH NEXT FROM MYCURSOR INTO @Id, @r;

WHILE @@FETCH_STATUS = 0 BEGIN; IF @First = 1 BEGIN; BEGIN TRAN; INSERT INTO #Results (Id, MIN_R) VALUES (@Id, @r); COMMIT TRAN; END; ELSE BEGIN; BEGIN TRAN; INSERT INTO #Results (Id, MIN_R) VALUES (@Id, CASE WHEN ( SELECT MIN(D.R) FROM #Data AS D WHERE D.R NOT IN (SELECT R.MIN_R FROM #Results AS R WHERE R.MIN_R IS NOT NULL AND R.Id < @Id) AND D.Id = @Id ) IS NULL AND (SELECT COUNT(1) FROM #Data AS D WHERE D.Id = @Id) = 1 THEN NULL ELSE ( SELECT MIN(D.R) FROM #Data AS D WHERE D.R NOT IN (SELECT R.MIN_R FROM #Results AS R WHERE R.MIN_R IS NOT NULL AND R.Id < @Id) AND D.Id = @Id ) END) COMMIT TRAN; END;

SET @First = @First + 1; FETCH NEXT FROM MYCURSOR INTO @Id, @r; END;

CLOSE MYCURSOR; DEALLOCATE MYCURSOR;

SET @EndDateTime = sysdatetime(); PRINT 'Total Run Time, Query Only: ' + CONVERT(varchar(11), CONVERT(decimal(10,9), DATEDIFF(microsecond, @BegDateTime, @EndDateTime)/1E6)) + ' Seconds';

SELECT * FROM #Data ORDER BY Id;

SELECT * FROM #Results AS R ORDER BY R.Id; GO

SET NOCOUNT ON; IF OBJECT_ID(N'tempdb..#Data', N'U') IS NOT NULL BEGIN DROP TABLE #Data; END;

IF OBJECT_ID(N'tempdb..#Results', N'U') IS NOT NULL BEGIN DROP TABLE #Results; END;

CREATE TABLE #Data( Id int, R int ); INSERT INTO #Data (Id, R) VALUES (1,1), (2,1), (2,2), (2,3), (3,3);

CREATE TABLE #Results ( Id int NOT NULL PRIMARY KEY CLUSTERED, MIN_R int NULL );

DECLARE @Id AS int, @r AS int, @First AS int = 1, @BegDateTime AS datetime2 = sysdatetime(), @EndDateTime AS datetime2;

DECLARE MYCURSOR CURSOR FAST_FORWARD READ_ONLY FOR SELECT D.Id, MIN(D.R) AS R FROM #Data AS D GROUP BY D.Id;

OPEN MYCURSOR; FETCH NEXT FROM MYCURSOR INTO @Id, @r;

WHILE @@FETCH_STATUS = 0 BEGIN; IF @First = 1 BEGIN; BEGIN TRAN; INSERT INTO #Results (Id, MIN_R) VALUES (@Id, @r); COMMIT TRAN; END; ELSE BEGIN; BEGIN TRAN; INSERT INTO #Results (Id, MIN_R) VALUES (@Id, CASE WHEN ( SELECT MIN(D.R) FROM #Data AS D WHERE D.R NOT IN (SELECT R.MIN_R FROM #Results AS R WHERE R.MIN_R IS NOT NULL AND R.Id < @Id) AND D.Id = @Id ) IS NULL --AND (SELECT COUNT(1) FROM #Data AS D WHERE D.Id = @Id) = 1 THEN NULL ELSE ( SELECT MIN(D.R) FROM #Data AS D WHERE D.R NOT IN (SELECT R.MIN_R FROM #Results AS R WHERE R.MIN_R IS NOT NULL AND R.Id < @Id) AND D.Id = @Id ) END) COMMIT TRAN; END;

SET @First = @First + 1; FETCH NEXT FROM MYCURSOR INTO @Id, @r; END;

CLOSE MYCURSOR; DEALLOCATE MYCURSOR;

SET @EndDateTime = sysdatetime(); PRINT 'Total Run Time, Query Only: ' + CONVERT(varchar(11), CONVERT(decimal(10,9), DATEDIFF(microsecond, @BegDateTime, @EndDateTime)/1E6)) + ' Seconds';

SELECT * FROM #Data ORDER BY Id;

SELECT * FROM #Results AS R ORDER BY R.Id;

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)