grouping rows into sets

Question

Post reply

grouping rows into sets

J Livingston SQL

SSC Guru

Points: 51550

August 23, 2021 at 7:11 pm

#3923262

wish to group rows into "sets" based on t_id and t_type.

-- set up, notes, and expected results below.

Thanks JLS

CREATE TABLE #l(
t_id INT NOT NULL
,t_type VARCHAR(1) NOT NULL
,t_date DATE NOT NULL
);
INSERT INTO #l(t_id,t_type,t_date) VALUES
(356778,'R','2012-01-01'),(356778,'R','2012-01-02'),(356778,'C','2015-01-07')
,(356778,'C','2015-01-08'),(356778,'C','2015-01-11'),(356778,'R','2019-01-01')
,(356778,'R','2019-01-02'),(356778,'R','2019-01-11'),(356778,'R','2019-01-12')
,(356778,'R','2020-01-01'),(435677,'R','2012-01-04'),(435677,'R','2012-01-05')
,(435677,'C','2015-01-07'),(435677,'C','2015-01-08'),(435677,'C','2015-01-09')
,(435677,'C','2015-01-10'),(435677,'C','2015-01-11'),(435677,'R','2019-01-01')
,(435677,'R','2019-01-02');

SELECT * FROM #l

-- want to incrementaly group by t_id and t_type
-- with an id for each row per incremental group

-- expected results as follows

CREATE TABLE #results(
t_id INT NOT NULL
,t_type VARCHAR(1) NOT NULL
,t_date DATE NOT NULL
,t_Group INT NOT NULL
,t_no INT NOT NULL
);
INSERT INTO #results(t_id,t_type,t_date,t_Group,t_no) VALUES
(356778,'R','2012-01-01',1,1),(356778,'R','2012-01-02',1,2),(356778,'C','2015-01-07',2,1)
,(356778,'C','2015-01-08',2,2),(356778,'C','2015-01-11',2,3),(356778,'R','2019-01-01',3,1)
,(356778,'R','2019-01-02',3,2),(356778,'R','2019-01-11',3,3),(356778,'R','2019-01-12',3,4)
,(356778,'R','2020-01-01',3,5),(435677,'R','2012-01-04',1,1),(435677,'R','2012-01-05',1,2)
,(435677,'C','2015-01-07',2,1),(435677,'C','2015-01-08',2,2),(435677,'C','2015-01-09',2,3)
,(435677,'C','2015-01-10',2,4),(435677,'C','2015-01-11',2,5),(435677,'R','2019-01-01',3,1)
,(435677,'R','2019-01-02',3,2);

SELECT * FROM #results

DROP table #l
DROP TABLE #results

This topic was modified 4 years, 4 months ago by J Livingston SQL.
This topic was modified 4 years, 4 months ago by J Livingston SQL.
This topic was modified 4 years, 4 months ago by J Livingston SQL.
This topic was modified 4 years, 4 months ago by J Livingston SQL.

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Mark Cowne One Orange Chip Points: 26988 More actions · Answer 1

WITH CTE1 AS (
SELECT *,
      CASE WHEN LAG(t_type) OVER(PARTITION BY t_id ORDER BY t_date) = t_type THEN 0 ELSE 1 END AS IsStart
FROM #l
),
CTE2 AS (
SELECT *,
       SUM(IsStart) OVER(PARTITION BY t_id ORDER BY t_date) AS t_Group
FROM CTE1
)
SELECT t_id,t_type,t_date,t_Group,
       ROW_NUMBER() OVER(PARTITION BY t_id,t_Group ORDER BY t_date) AS t_no
FROM CTE2;

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 2

Thanks Mark

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

homebrew01 SSC Guru Points: 55567 More actions · Answer 3

This might be helpful reading.

https://dotnettutorials.net/lesson/over-clause-sql-server/

This reply was modified 4 years, 4 months ago by homebrew01.

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 4

Thanks Homebrew....

it was the "trick" CASE WHEN LAG(t_type) OVER(PARTITION BY t_id ORDER BY t_date) = t_type THEN 0 ELSE 1 END AS IsStart that Mark used was the bit I had forgotten about.

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

jcelko212 32090 SSCrazy Eights Points: 9305 More actions · Answer 5

>> wish to group rows into "sets" based on t_id and t_type. <<

This makes little sense to me. You already have your sets defined by the t_id, So there’s no need to redundantly add some kind of group number. Within the groups, we have no rule given for ordering them. Therefore, I can assume you don’t want any ordering? In fact, since the t_type has duplicates within each t_id, we are stuck. It looks like you might be ordering by the date, but since you didn’t tell us that it’s not part of our specifications.

CREATE TABLE Foobar

(t_id CHAR(6) NOT NULL,

t_type CHAR(1) NOT NULL CHECK(t_type IN (‘R’, ‘C’)),

PRIMARY KEY (t_id, t_type, t_date),

t_date DATE NOT NULL);

Some minor things about your DDL; by definition, not as an option, the table must have a key. What you’ve done is put up a deck of punch cards, written in SQL. . In fact, I love to see the way you used the leading comma at the start of a punch card. We did that back in the 1960s so we can more easily rearrange the deck of punch cards. Ever since about 1970, programmers have simply hit a key and called a pretty printing routine to format their code. Since an identifier has to be on a nominal scale, again, by definition, not an option, it should be a character string. Also, VARCHAR(1) makes no sense.

Finally, why are you materializing the results? We had to do this with punchcards, not with SQL. Also, did you know that the word “group” is a reserved word in SQL?

Remember that one of our goals in RDBMS is to reduce redundancy, not increase it. Another goal is not to do display formatting in the database. That's done in a presentation layer. Just keep it simple.

.

CREATE VIEW Labeled_Foobar

AS

SELECT t_id, t_type, t_date,

ROW_NUMBER() OVER (PARTITION BY t_id ORDER BY t_date) AS group_nbr

FROM Foobar;

Please post DDL and follow ANSI/ISO standards when asking for help.

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 6

JCelko

Thanks for your reply....but your coded response does not meet my expected results (which I posted)

Other posters have resolved the problem for me.

This reply was modified 4 years, 4 months ago by J Livingston SQL.

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

Jeff Moden SSC Guru Points: 1004685 More actions · Answer 7

Heh... Welcome back again, JLS. As you can see, some things haven't changed. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 8

Jeff.....had to smile when I saw his post <grin>

Wonder if he will follow it up?

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

Jeff Moden SSC Guru Points: 1004685 More actions · Answer 9

Shifting gears back to your good question, I'm curious...

What are you going to do with the t_Group and t_no columns once they've been generated?
How often will you need to generate them?
How many distinct values of t_ID are there?
What are the min and max dates for t_date?
How many distinct t_types are there?
Do you ever have a t_ID for a given date that contains more than 1 t_type?
How many dates does each t_ID usually have?
Are there ever any missing or duplicate dates for a given t_ID?

Depending on those answers to those questions, I may have an alternate suggestion for all of this. Of course, some of the questions are for info I'd need to generate a large test table.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 10

Jeff Moden wrote:

Shifting gears back to your good question, I'm curious...
What are you going to do with the t_Group and t_no columns once they've been generated?
How often will you need to generate them?
How many distinct values of t_ID are there?
What are the min and max dates for t_date?
How many distinct t_types are there?
Do you ever have a t_ID for a given date that contains more than 1 t_type?
How many dates does each t_ID usually have?
Are there ever any missing or duplicate dates for a given t_ID?
Depending on those answers to those questions, I may have an alternate suggestion for all of this. Of course, some of the questions are for info I'd need to generate a large test table.

Hi Jeff....my question only really arose because of a conversation with a person I worked with a few years back and they mentioned the report that I had provided for them......I vaguely remembered it but when I saw a similar question elsewhere it piqued my interest and for the life of me I could not remember how I had got the groups....I am retired now , but have recently decided to re-engage with the SQL community to keep the grey cells working...and to be honest I enjoy the problem solving.

So in answering your q's please forgive me if the answers are deliberately vague (confidential) or just vague cos I cant remember !

What are you going to do with the t_Group and t_no columns once they have been generated?
IIRC they were used as the basis for further processes to create a final report.

How often will you need to generate them?
they were run weekly

How many distinct values of t_ID are there?
max c.10K

What are the min and max dates for t_date?
a range as requested by user...........but generally last 12mths

How many distinct t_types are there?
max of 4

Do you ever have a t_ID for a given date that contains more than 1 t_type?
Yes....t-date was a datetime

How many dates does each t_ID usually have?
for a 12mth period maybe 100

Are there ever any missing or duplicate dates for a given t_ID?
date/time per row is ""as and when"" recorded
duplicate dates but datetime is unique

Have fun <grin>

Will be interested in your thoughts......but tis for interest only so if you decide to pass I will understand

This reply was modified 4 years, 4 months ago by J Livingston SQL.
This reply was modified 4 years, 4 months ago by J Livingston SQL.

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

Jeff Moden SSC Guru Points: 1004685 More actions · Answer 11

Thanks, JLS... your answers are great... especially the implied answer of "you're all set". I can "play" at my leisure, which I have disturbingly little time to do lately because I'm not yet retired.

As a bit of a sidebar, if you want to keep your mind active, check out the following video where I destroy a myth long held for decades to be the true and, in the process, lay waste to supposed "Best Practice" index maintenance that the whole world has made the serious mistake of also using for decades. It also introduces IndexDNA™, which is a proc and a spreadsheet I wrote to allow you to actually, literally, and graphically see what an index actually looks like.

https://www.youtube.com/watch?v=qfQtY17bPQ4&list=PLr9ab4Dj3ObuaHAUA9JJz-GUbfbwXEyS5&index=4

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)