How to group by the first column and limit its results?

Question

Post reply

How to group by the first column and limit its results?

hurricaneDBA

SSCertifiable

Points: 5027
More actions
August 4, 2019 at 12:19 pm

#3668574

Hi Everyone
I made a query to display the following:

IDENTITY        NAME                                     MAX(VALUE) MIN(VALUE)
--------------- ---------------------------------------- ---------- ----------
SC11-1      pH                                             10.6         10
SC11-1      Iron                                           6.2       4.72
SC11-1      Phosphate                                      51.5       20.2
SC11-1      Conductivity                                    390      160.1

My question is how do i format the output so that i only see Identity once so it should appear like this:
IDENTITY        NAME                                     MAX(VALUE) MIN(VALUE)
--------------- ---------------------------------------- ---------- ----------
SC11-1      pH                                             10.6         10
Iron                                           6.2       4.72
Phosphate                                      51.5       20.2
Conductivity                                    390      160.1

Have a good day
Kal

Viewing 15 posts - 1 through 15 (of 15 total)

The topic ‘How to group by the first column and limit its results?’ is closed to new replies.

Phil Parkin SSC Guru Points: 246695 More actions · Answer 1

As you've not even provided your query, I'm not sure how you expect anyone to help you change it!

Also, this is a presentation issue. SSRS will easily do this for you, for example.

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

hurricaneDBA SSCertifiable Points: 5027 More actions · Answer 2

Dear Phil

i don’t have access to SSRS. I need to build this as a quick report using TSQL

Heres the query:

SELECT Identity,

RESULT.NAME,

MIN(VALUE),

MAX(VALUE)

FROM SAMPLE_POINT

INNER JOIN SAMPLE ON (IDENTITY = SAMPLING_POINT)

INNER JOIN TEST ON (SAMPLE = ID_NUMERIC)

INNER JOIN RESULT ON (RESULT.TEST_NUMBER = TEST.TEST_NUMBER)

WHERE ( (IDENTITY = '108-SC11-1')

AND (SAMPLED_DATE >= TO_DATE ('22-Jul-19', 'dd-Mon-yy')))

AND (SAMPLED_DATE <= TO_DATE ('29-Jul-19', 'dd-Mon-yy'))

Group by IDENTITY, RESULT.NAME;

Grouping sets didn’t work for me.

Any ideas?

Kal

Phil Parkin SSC Guru Points: 246695 More actions · Answer 3

As you have not provided any DDL, I had to make a guess.

drop table if exists #x;

create table #x
(
    Ident varchar(20)
  , Name varchar(50)
  , MaxValue decimal(5, 2)
  , MinValue decimal(5, 2)
);

insert #x
(
    Ident
  , Name
  , MaxValue
  , MinValue
)
values
('SC11-1', 'pH', 10.6, 10)
, ('SC11-1', 'Iron', 6.2, 4.72);

with SomeData
as (select Ident
         , Name
         , MaxValue
         , MinValue
         , rn = row_number() over (partition by Ident order by Ident, Name)
    from #x)
select Ident = iif(SomeData.rn = 1, SomeData.Ident, '')
     , SomeData.Name
     , SomeData.MaxValue
     , SomeData.MinValue
from SomeData
order by SomeData.Ident
       , SomeData.Name;

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

hurricaneDBA SSCertifiable Points: 5027 More actions · Answer 4

Phil

i have 3 tables in the query and I’m doing an inner join on column from each table.

How will the partition help my query?

Kal

Phil Parkin SSC Guru Points: 246695 More actions · Answer 5

hurricaneDBA wrote:

Phil
i have 3 tables in the query and I’m doing an inner join on column from each table.
How will the partition help my query?
Kal

As I can't see your tables or your data, I made a best guess as to the query. If it's not right, try submitting sample data in the form of CREATE TABLE, INSERT.

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

hurricaneDBA SSCertifiable Points: 5027 More actions · Answer 6

Hello J

Since I don’t know your first name do you mind if I call you J?

yes it was my mistake for not posting the DDLs but per security at my organization I’m not allowed to and yes I have more than enough knowledge to know about RDBMS and everything you have mentioned there. I was in a rush and I like any other professional admit my mistake for not providing further information. However that doesn’t give you a complete stranger who doesn’t know what circumstance made me rush with my question the right to talk to me in that tone of writing.

Im not a child for you to take out your frustration on. I come to this site to get help from other professionals who take time from their busy schedule to lend out a helping hand to other professionals

@Phil thank you for the help

@j-2 I hope you realize that everyone makes a mistake but doesn’t make you the judge to pass judgement on them.

Have a good day everyone

Joe Torre SSChampion Points: 10248 More actions · Answer 7

Please provide the sample data in the form mentioned above for the three tables in your query.

hurricaneDBA SSCertifiable Points: 5027 More actions · Answer 8

Hi Joe

when I get back to the office I’ll post the sample data

Kal

hurricaneDBA SSCertifiable Points: 5027 More actions · Answer 9

Hi Joe / Phil

I modified the query and here are the DDLs and insert statements and output

CREATE TABLE SAMPLE

(

ID_NUMERIC VARCHAR(10),

SAMPLED_DATE DATE,

SAMPLING_POINT VARCHAR(15),

)

CREATE TABLE TEST

(

TEST_NUMBER VARCHAR(10),

SAMPLE VARCHAR(10)

)

CREATE TABLE RESULT

(

TEST_NUMBER VARCHAR(10),

NAME VARCHAR(40),

VALUE FLOAT(15)

)

INSERT INTO SAMPLE VALUES ('11925',' 28-JUL-19', '107-SC7-2');

INSERT INTO SAMPLE VALUES ('11596',' 27-JUL-19', '107-SC7-2');

INSERT INTO SAMPLE VALUES ('11704',' 27-JUL-19', '107-SC7-2');

INSERT INTO SAMPLE VALUES ('11729',' 27-JUL-19', '107-SC7-2');




INSERT INTO TEST VALUES ('2165777','11925');

INSERT INTO TEST VALUES ('2165778','11925');

INSERT INTO TEST VALUES ('2165779','11925');

INSERT INTO TEST VALUES ('2164692','11596');

INSERT INTO TEST VALUES ('2164694','11596');

INSERT INTO TEST VALUES ('2164690','11596');

INSERT INTO TEST VALUES ('2164691','11704');

INSERT INTO TEST VALUES ('2165063','11704');

INSERT INTO TEST VALUES ('2165062','11704');

INSERT INTO TEST VALUES ('2165060','11704');

INSERT INTO TEST VALUES ('2165061','11704');

INSERT INTO TEST VALUES ('2165059','11925');

INSERT INTO TEST VALUES ('2165118','11729');

INSERT INTO TEST VALUES ('2165119','11729');

INSERT INTO TEST VALUES ('2165120','11729');




INSERT INTO RESULT VALUES ('2165777', 'Chemical01', 30.8)

INSERT INTO RESULT VALUES ('2165778', 'Chemical02', 184)

INSERT INTO RESULT VALUES ('2165779', 'Chemical03', 10.2)

INSERT INTO RESULT VALUES ('2164692', 'Chemical01', 40.8)

INSERT INTO RESULT VALUES ('2164694', 'Chemical04', 3.8)

INSERT INTO RESULT VALUES ('2164690', 'Chemical04', 30.8)

INSERT INTO RESULT VALUES ('2164691', 'Chemical03', 30.8)

INSERT INTO RESULT VALUES ('2165063', 'Chemical01', 21.2)

INSERT INTO RESULT VALUES ('2165062', 'Chemical02', 162.8)

INSERT INTO RESULT VALUES ('2165060', 'Chemical04', 3.8)

INSERT INTO RESULT VALUES ('2165061', 'Chemical03', 30)

INSERT INTO RESULT VALUES ('2165059', 'Chemical05', 0.28)

INSERT INTO RESULT VALUES ('2165118', 'Chemical01', 30.8)

INSERT INTO RESULT VALUES ('2165119', 'Chemical02', 155.8)

INSERT INTO RESULT VALUES ('2165120', 'Chemical03', 10.8)

select SAMPLING_POINT, Result.name, max(value), min(value)

FROM SAMPLE

INNER JOIN TEST ON (SAMPLE = ID_NUMERIC)

--INNER JOIN TEST ON (SAMPLE = TEST.TEST_NUMBER)

INNER JOIN RESULT ON (RESULT.TEST_NUMBER = TEST.TEST_NUMBER)

WHERE SAMPLING_POINT IN ('107-SC7-2')

AND SAMPLED_DATE BETWEEN '27-Jul-19'

AND '29-Jul-19'

group by SAMPLING_POINT, Result.name

ORDER BY SAMPLING_POINT;?

The output is below but i would like to know how to limit the Sampling_point code to appear only once.

Will grouping sets help here?

107-SC7-2 Chemical01 40.8 21.2

107-SC7-2 Chemical02 184 155.8

107-SC7-2 Chemical03 30.8 10.2

107-SC7-2 Chemical04 30.8 3.8

107-SC7-2 Chemical05 0.28 0.28

This reply was modified 4 years, 8 months ago by hurricaneDBA.
This reply was modified 4 years, 8 months ago by Steve Jones - SSC Editor. Reason: format code

David Burrows SSC Guru Points: 65138 More actions · Answer 10

USE LAG instead of SAMPLING_POINT in the select like this

CASE WHEN SAMPLE.SAMPLING_POINT = LAG(SAMPLE.SAMPLING_POINT,1,'') OVER (ORDER BY SAMPLE.SAMPLING_POINT ASC)
THEN ''
ELSE SAMPLE.SAMPLING_POINT
END AS [SAMPLING_POINT]

Far away is close at hand in the images of elsewhere.
Anon.

Steve Jones - SSC Editor SSC Guru Points: 734449 More actions · Answer 11

LAG is what I'd use as well, though really this is something we want to handle more in the front end as the positioning and rendering of the spaces in something like a CSV could be problematic.The subsequent rows now have no connection to the first row. A result set is a set of unconnected rows, and if any reordering were to occur (user / interface manipulation), this could be a problem.

This also puts a larger load on the db when it's a limited resource.

Joe Torre SSChampion Points: 10248 More actions · Answer 12

USE Testing
GO
DROP TABLE IF EXISTS dbo.[SAMPLE];
GO
CREATE TABLE dbo.[SAMPLE] (
   ID_NUMERIC VARCHAR(10)
 , SAMPLED_DATE DATE
 , SAMPLING_POINT VARCHAR(15)
);
GO
DROP TABLE IF EXISTS dbo.TEST;
GO
CREATE TABLE dbo.TEST (
   TEST_NUMBER VARCHAR(10)
 , [SAMPLE] VARCHAR(10)
);
GO
DROP TABLE IF EXISTS dbo.RESULT;
GO
CREATE TABLE dbo.RESULT (
   TEST_NUMBER VARCHAR(10)
 , NAME VARCHAR(40)
 , [VALUE] FLOAT(15)
);
GO
INSERT INTO dbo.[SAMPLE]
VALUES ('11925',' 28-JUL-19', '107-SC7-2')
     , ('11596',' 27-JUL-19', '107-SC7-2')
     , ('11704',' 27-JUL-19', '107-SC7-2')
     , ('11729',' 27-JUL-19', '107-SC7-2');

INSERT INTO dbo.TEST
VALUES ('2165777','11925'), ('2165778','11925'), ('2165779','11925')
     , ('2164692','11596'), ('2164694','11596'), ('2164690','11596')
     , ('2164691','11704'), ('2165063','11704'), ('2165062','11704')
     , ('2165060','11704'), ('2165061','11704'), ('2165059','11925')
     , ('2165118','11729'), ('2165119','11729'), ('2165120','11729');

INSERT INTO dbo.RESULT
VALUES ('2165777', 'Chemical01', 030.8), ('2165778', 'Chemical02', 184.0), ('2165779', 'Chemical03', 010.2)
     , ('2164692', 'Chemical01', 040.8), ('2164694', 'Chemical04', 003.8), ('2164690', 'Chemical04', 030.8)     
     , ('2164691', 'Chemical03', 030.8), ('2165063', 'Chemical01', 021.2), ('2165062', 'Chemical02', 162.8)
     , ('2165060', 'Chemical04', 003.8), ('2165061', 'Chemical03', 030.0), ('2165059', 'Chemical05', 00.28)
     , ('2165118', 'Chemical01', 030.8), ('2165119', 'Chemical02', 155.8), ('2165120', 'Chemical03', 010.8);

WITH Query
as (
   SELECT
      S.SAMPLING_POINT 
    , R.[NAME] 
    , Max(R.[VALUE]) MaxValue
    , Min(R.[VALUE]) MinValue
    , Row_Number() OVER ( ORDER BY R.[NAME]) RowNum
   FROM [SAMPLE] S
   INNER JOIN TEST T
      ON ([SAMPLE] = ID_NUMERIC)
   INNER JOIN RESULT R
      ON (R.TEST_NUMBER = T.TEST_NUMBER)
   WHERE SAMPLING_POINT IN ('107-SC7-2')
     AND SAMPLED_DATE BETWEEN '27-Jul-19'
     AND '29-Jul-19'
   GROUP BY
      S.SAMPLING_POINT
    , R.[NAME])
SELECT
   CASE WHEN q.RowNum=1 THEN q.SAMPLING_POINT ELSE '' END [Sampling Point]
 , q.[NAME] [Name]
 , q.MaxValue
 , q.MaxValue
FROM Query q
ORDER BY q.RowNum;

DROP TABLE [SAMPLE];
DROP TABLE TEST;
DROP TABLE RESULT;
GO

Attachments:

You must be logged in to view attached files.

jcelko212 32090 SSCrazy Eights Points: 9294 More actions · Answer 13

>> Since I don’t know your first name do you mind if I call you J? <<

Actually, I go by "Celko" and have for the 30 something years I've been writing on SQL and the trade press. I also served on the ANSI X3H2 standards committee for this language for 10 years. I also have a series of books on databases that have been in print for a few decades from Morgan Kaufmann Publishing/Elsevier. I have some authority in this area.

>> yes it was my mistake for not posting the DDLs but per security, at my organization, I’m not allowed to and yes I have more than enough knowledge to know about RDBMS and everything you have mentioned there. <<

How can security in your organization prevent you from following ISO standards, including a key in your sample data and all the other stuff you messed up? When I've had those problems it usually means I am not a mise the data, but keep the data types in the general structure of the schema.

>> I am not a child for you to take out your frustration on. I come to this site to get help from other professionals who take time from their busy schedule to lend out a helping hand to other professionals <<

Then why don't you follow the most basic netiquette that has been established on SQL forums over the decades? Perhaps I'm prejudiced but I consider less than 10 or 20 years in IT to be a newbie. When someone doesn't seem to know the most basic standards in this trade, I regard them as being like an engineer who doesn't know the metric system. Yes, I am frustrated when I see decade after decade able not even bothering to read the forum rules, not knowing the basics of their trade, etc. we've had RDBMS for over 40 years now.

Please post DDL and follow ANSI/ISO standards when asking for help.

jcelko212 32090 SSCrazy Eights Points: 9294 More actions · Answer 14

This reply has been reported for inappropriate content.

You don't seem to remember that the table must have a key by definition. It s not an option! Since every column in your DDL is NULL-able, nothing here can ever be a valid table.

CREATE TABLE Samples

(sample_id VARCHAR(10) NOT NULL PRIMARY KEY,

sampling_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL,

sample_point VARCHAR(15) NOT NULL);

INSERT INTO Samples

VALUES

('11925', '2019-07-28', '107-SC7-2'),

('11596', '2019-07-27', '107-SC7-2'),

('11704', '2019-07-27', '107-SC7-2'),

('11729', '2019-07-27', '107-SC7-2');

The original row by row insertion syntax that you posted has been replaced by a set oriented syntax that can be optimized. The "R" in RDBMS stands for a relationship or a reference. You have none so here's a guess at something that might make your DDL valid

CREATE TABLE Tests

(test_nbr VARCHAR(10) NOT NULL PRIMARY KEY,

sample_id VARCHAR(10) NOT NULL

REFERENCES Samples(sample_id));

INSERT INTO Tests

VALUES

('2165777', '11925')

('2165778', '11925'),

('2165779', '11925'),

('2164692', '11596'),

('2164694', '11596'),

('2164690', '11596'),

('2164691', '11704'),

('2165063', '11704'),

('2165062', '11704'),

('2165060', '11704'),

('2165061', '11704'),

('2165059', '11925'),

('2165118', '11729'),

('2165119', '11729'),

('2165120', '11729');

Why do you believe there is a generic "name" that is universally understood and doesn't have to be the "name of something in particular"? You probably don't know that "value" besides being vague is also reserved word in ANSI/ISO standard SQL.

CREATE TABLE Test_Results

(test_nbr VARCHAR(10) NOT NULL PRIMARY KEY,

vague_name VARCHAR(40) NOT NULL,

foobar_score DECIMAL (5, 2) NOT NULL);

While I'm sure that you did a lot of careful research before picking up a floating-point number, I just never had to use them in an SQL database. I started my career in the 1960s as a Fortran programmer and had to learn all of the rules about floating-point math. And then I had to learn to live with the inevitable rounding errors. DECIMAL(s,p) seesm to be a better choice in my experience. I later found that most SQLs do not do all of that correction that we had to do by hand. You also don't seem to know that the ANSI/ISO standard SQL syntax uses a date display format based on ISO 8601. After the metric system this is the most popular ISO standard on earth

INSERT INTO Test_Results

VALUES

('2165777', 'Chemical01', 30.8),

('2165778', 'Chemical02', 184),

('2165779', 'Chemical03', 10.2),

('2164692', 'Chemical01', 40.8),

('2164694', 'Chemical04', 3.8),

('2164690', 'Chemical04', 30.8),

('2164691', 'Chemical03', 30.8),

('2165063', 'Chemical01', 21.2),

('2165062', 'Chemical02', 162.8),

('2165060', 'Chemical04', 3.8),

('2165061', 'Chemical03', 30),

('2165059', 'Chemical05', 0.28),

('2165118', 'Chemical01', 30.8),

('2165119', 'Chemical02', 155.8),

('2165120', 'Chemical03', 10.8);

SELECT S.sample_point , T.vague_name, MAX(R.foobar_score), MIN(R.foobar_score)

FROM Samples AS S, Tests AS T, Test_Results AS R

WHERE S.sample_id = T.sample_id

AND R.test_nbr = T.test_nbr

AND S.sample_point IN ('107-SC7-2', '108-SC11-2', '108-SC13-2')

AND S.sampling_date BETWEEN '2019-07-27' AND '2019-07-29'

GROUP BY S.sample_point , R.vague_name;

>> The output is below but I would like to know how to limit the sample_point code to appear only once. <<

That's not how we do it in RDBMS. Each row of table has to stand by itself and give a complete set of attributes. You're confusing SQL with COBOL, a report writing program or spreadsheet. Go back to the first week of your SQL class when we talked about tiered architectures. The idea is it an SQL we maintain the data integrity and retrieve a table that is passed to the next tier in the system. That's a presentation layer and that's where you make it look pretty for people.

Please post DDL and follow ANSI/ISO standards when asking for help.