SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Remove duplicate values


Remove duplicate values

Author
Message
saravanatn
saravanatn
SSC Veteran
SSC Veteran (276 reputation)SSC Veteran (276 reputation)SSC Veteran (276 reputation)SSC Veteran (276 reputation)SSC Veteran (276 reputation)SSC Veteran (276 reputation)SSC Veteran (276 reputation)SSC Veteran (276 reputation)

Group: General Forum Members
Points: 276 Visits: 370
Thanks Thorn and John . Will test with more test data and get back to you guys.

Regards,
Saravanan
saravanatn
saravanatn
SSC Veteran
SSC Veteran (276 reputation)SSC Veteran (276 reputation)SSC Veteran (276 reputation)SSC Veteran (276 reputation)SSC Veteran (276 reputation)SSC Veteran (276 reputation)SSC Veteran (276 reputation)SSC Veteran (276 reputation)

Group: General Forum Members
Points: 276 Visits: 370
Thom A - Wednesday, December 6, 2017 5:23 AM
saravanatn - Wednesday, December 6, 2017 4:49 AM
I am just giving(test data just) an example that (ID=4 )records should be returned even if all the non primary keys columns are null. Because count=1 for ID=4 ....

Regards,
Saravanan

The reason for our questions, however, is that your test data doesn't provide enough scenarios. As a result, we need to ask for more, or provide test data for you to tell us what you expect.

At a pure guess, maybe this is what you're after:
USE Sandbox;
GO

CREATE TABLE details (
id int NOT NULL,
[name] varchar(100),
city varchar(100),
place varchar(100));

--Sample Data: (comments start with a double hyphen)

INSERT INTO details
VALUES (1, 'sara', 'chennai', 't.nagar'),
(1, NULL, NULL, NULL),
(1, 'sara', 'chennai', 't.nagar'),
(2, 'sara', 'chennai', 't.nagar'),
(2, NULL, NULL, NULL),
(3, 'sara', 'chennai', 't.nagar'),
(4, NULL, NULL, NULL),
(5, 'raj', 'vellore', 'kodambakkam'),
(5, 'raj', 'vellore', NULL),
(6, NULL, 'London','Kensington'),
(6,'Steve', 'London',NULL),
(7,'Steve', 'London','Kensington'),
(7,'Steve', 'London','Barnet');
GO

SELECT d.id,
MAX([name]) AS [name],
MAX(city) AS city,
MAX(place) AS place
FROM details d
GROUP BY d.id;

GO
DROP TABLE details;
GO

But, like I said, that's a guess. It doesn't do a logic your specified for ID 6 or 7 (which John and I have asked you about), so if this isn't correct you'll need to let us know what you expect the output to be.

For the test data created by you I want the following output updated in inline commands.

INSERT INTO details
VALUES (1, 'sara', 'chennai', 't.nagar'),
(1, NULL, NULL, NULL),
(1, 'sara', 'chennai', 't.nagar'),
(2, 'sara', 'chennai', 't.nagar'),
(2, NULL, NULL, NULL),
(3, 'sara', 'chennai', 't.nagar'),
(4, NULL, NULL, NULL),
(5, 'raj', 'vellore', 'kodambakkam'),
(5, 'raj', 'vellore', NULL),
(6, NULL, 'London','Kensington'), /* not required as it is having multiple values for ID(Count=2) and it is having Null value/*
(6,'Steve', 'London',NULL), /* not required as it is having multiple values for ID(Count=2) and it is having Null value/*
(7,'Steve', 'London','Kensington'), /* Required as it is not having null values/*
(7,'Steve', 'London','Barnet') /* Required as it is not having null values/*;

Output Results:
id name city place
1 sara chennai t.nagar
1 sara chennai t.nagar
2 sara chennai t.nagar
3 sara chennai t.nagar
4 NULL NULL NULL
5 raj vellore kodambakkam
7 Steve London Kensington
7 Steve London Barnet

Kindly let me know if you requires any additional details.

Regards,
Saravanan

Joe Celko
Joe Celko
SSCrazy
SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)

Group: General Forum Members
Points: 2899 Visits: 2456
[quote]
saravanatn - Wednesday, December 6, 2017 1:00 AM
>>In the below table records [sic] are grouped by ID column. If the count for ID is greater than 1 then remove the records [sic] which are having 'NULL" values in any of the name ,city, place columns and remaining columns need to be fetched. If the count for ID=1 even if we have NULL values on any of the columns then include that records [sic] in the output results. <<

Can you please buy read and study a book on RDBMS and SQL? There is no such thing as a generic "id" in RDBMS. It has to be the identifier of something in particular to be valid. Then on top of that, and identifier can never be a numeric because you don't do math on it.

You might want to read a book on basic data modeling. Things like "city" are too vague generic to be valid column names. They need to have what the ISO 11179 people call "attribute property" to be valid. And you're "name" is one of those attribute properties; it has to be the name of something.

A table by definition, must have a primary key. But this deck of punch cards doesn't have a key and can never have any keys! Let's at least get the column names and how many datatypes correct. I picked a length of 35 for the names based on the International Postal Union requirements for mailing labels. I'm willing to bet that you made up the length of 100, with no research whatsoever. This is not how to be a database professional

CREATE TABLE Details 
(detail_id CHAR(3)NOT NULL PRIMARY KEY,
something_name VARCHAR(35),
city_name VARCHAR(35),
place_name VARCHAR(35)
);

If you have valid data, then you should be able to use the current table constructor insertion syntax. That would look like this. People who think in sets instead of in punchcards really like it. It lets the optimizer do some things, instead of forcing a sequential insertion. But you're trying put garbage in a table. To do this, you of course have to screwup the DDL for that table
you have valid data, then you should be able to use the current table constructor insertion syntax. That would look like this. People who think in sets instead of in punchcards really like it. It lets the optimizer do some things, instead of forcing a sequential insertion. But you're trying put garbage in a table. To do this, you of course have to screwup the DDL for that table

INSERT INTO Details
VALUES
(1, 'sara', 'chennai', 't.nagar'),
(1, NULL, NULL, NULL), -- dup key
(1, 'sara', 'chennai', 't.nagar'), --- dup row!

(2, 'sara', 'chennai', 't.nagar'),
(2, NULL, NULL, NULL), ---dup key

(3, 'sara', 'chennai', 't.nagar'),

(4, NULL, NULL, NULL),

(5, 'raj', 'vellore', 'kodambakkam'),
(5, 'raj', 'vellore', NULL); ---dup key


What are you trying to actually do? My first impulse is that you need to have stored procedure that does the row update for Details.

CREATE PROCEDURE Fill_Details
(@in_detail_id CHAR(3), @in_something_name VARCHAR(35),
@in_city_name VARCHAR(35), @in_place_name )
BEGIN
UPDATE Details
SET something_name = COALESCE(something_name, @in_something_name),
city_name = COALESCE (city_name, @in_city_name),
place_name = COALESCE (place_name, @in_place_name)
WHERE detail_id = @in_detail_id;
-- insert new row, if it does not exist
END;

]What are you trying to actually do? My first impulse is that you need to have stored procedure that does the row update for Details.CREATE [code language="sql"]CREATE PROCEDURE Fill_Details(@in_detail_id CHAR(3), @in_something_name VARCHAR(35), @in_city_name VARCHAR(35), @in_place_name )
BEGIN
UPDATE Details
SET something_name = COALESCE(something_name, @in_something_name),
city_name = COALESCE (city_name, @in_city_name),
place_name = COALESCE (place_name, @in_place_name
)WHERE detail_id = @in_detail_id;-- insert new row, if it does not exist
END;


Please post DDL and follow ANSI/ISO standards when asking for help.
ScottPletcher
ScottPletcher
SSC-Forever
SSC-Forever (42K reputation)SSC-Forever (42K reputation)SSC-Forever (42K reputation)SSC-Forever (42K reputation)SSC-Forever (42K reputation)SSC-Forever (42K reputation)SSC-Forever (42K reputation)SSC-Forever (42K reputation)

Group: General Forum Members
Points: 42319 Visits: 7924
jcelko212 32090 - Thursday, December 7, 2017 11:51 AM
[quote]
saravanatn - Wednesday, December 6, 2017 1:00 AM
>>In the below table records [sic] are grouped by ID column. If the count for ID is greater than 1 then remove the records [sic] which are having 'NULL" values in any of the name ,city, place columns and remaining columns need to be fetched. If the count for ID=1 even if we have NULL values on any of the columns then include that records [sic] in the output results. <<

Can you please buy read and study a book on RDBMS and SQL? There is no such thing as a generic "id" in RDBMS. It has to be the identifier of something in particular to be valid. Then on top of that, and identifier can never be a numeric because you don't do math on it.

You might want to read a book on basic data modeling. Things like "city" are too vague generic to be valid column names. They need to have what the ISO 11179 people call "attribute property" to be valid. And you're "name" is one of those attribute properties; it has to be the name of something.

A table by definition, must have a primary key. But this deck of punch cards doesn't have a key and can never have any keys! Let's at least get the column names and how many datatypes correct. I picked a length of 35 for the names based on the International Postal Union requirements for mailing labels. I'm willing to bet that you made up the length of 100, with no research whatsoever. This is not how to be a database professional

CREATE TABLE Details 
(detail_id CHAR(3)NOT NULL PRIMARY KEY,
something_name VARCHAR(35),
city_name VARCHAR(35),
place_name VARCHAR(35)
);

If you have valid data, then you should be able to use the current table constructor insertion syntax. That would look like this. People who think in sets instead of in punchcards really like it. It lets the optimizer do some things, instead of forcing a sequential insertion. But you're trying put garbage in a table. To do this, you of course have to screwup the DDL for that table
you have valid data, then you should be able to use the current table constructor insertion syntax. That would look like this. People who think in sets instead of in punchcards really like it. It lets the optimizer do some things, instead of forcing a sequential insertion. But you're trying put garbage in a table. To do this, you of course have to screwup the DDL for that table

INSERT INTO Details
VALUES
(1, 'sara', 'chennai', 't.nagar'),
(1, NULL, NULL, NULL), -- dup key
(1, 'sara', 'chennai', 't.nagar'), --- dup row!

(2, 'sara', 'chennai', 't.nagar'),
(2, NULL, NULL, NULL), ---dup key

(3, 'sara', 'chennai', 't.nagar'),

(4, NULL, NULL, NULL),

(5, 'raj', 'vellore', 'kodambakkam'),
(5, 'raj', 'vellore', NULL); ---dup key


What are you trying to actually do? My first impulse is that you need to have stored procedure that does the row update for Details.

CREATE PROCEDURE Fill_Details
(@in_detail_id CHAR(3), @in_something_name VARCHAR(35),
@in_city_name VARCHAR(35), @in_place_name )
BEGIN
UPDATE Details
SET something_name = COALESCE(something_name, @in_something_name),
city_name = COALESCE (city_name, @in_city_name),
place_name = COALESCE (place_name, @in_place_name)
WHERE detail_id = @in_detail_id;
-- insert new row, if it does not exist
END;

]What are you trying to actually do? My first impulse is that you need to have stored procedure that does the row update for Details.CREATE [code language="sql"]CREATE PROCEDURE Fill_Details(@in_detail_id CHAR(3), @in_something_name VARCHAR(35), @in_city_name VARCHAR(35), @in_place_name )
BEGIN
UPDATE Details
SET something_name = COALESCE(something_name, @in_something_name),
city_name = COALESCE (city_name, @in_city_name),
place_name = COALESCE (place_name, @in_place_name
)WHERE detail_id = @in_detail_id;-- insert new row, if it does not exist
END;

Isn't it extraordinarily careless and risky to insert single-digit numeric values into a char(3) key? Even though we will never do math on detail_id, it should still be numeric to provide automatic validity checking. Otherwise:


create table #test1 (
test_id char(3) PRIMARY KEY
)
insert into #test1 values(1)
insert into #test1 values('01')
insert into #test1 values('001')
select * from #test1


Good luck figuring that mess out later! Even having to code CHECK constraints for it is extra effort and somewhat error prone.


SQL DBA,SQL Server MVP(07, 08, 09) Prosecutor James Blackburn, in closing argument in the Fatal Vision murders trial:
If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them.
saravanatn
saravanatn
SSC Veteran
SSC Veteran (276 reputation)SSC Veteran (276 reputation)SSC Veteran (276 reputation)SSC Veteran (276 reputation)SSC Veteran (276 reputation)SSC Veteran (276 reputation)SSC Veteran (276 reputation)SSC Veteran (276 reputation)

Group: General Forum Members
Points: 276 Visits: 370
jcelko212 32090 - Thursday, December 7, 2017 11:51 AM
[quote]
saravanatn - Wednesday, December 6, 2017 1:00 AM
>>In the below table records [sic] are grouped by ID column. If the count for ID is greater than 1 then remove the records [sic] which are having 'NULL" values in any of the name ,city, place columns and remaining columns need to be fetched. If the count for ID=1 even if we have NULL values on any of the columns then include that records [sic] in the output results. <<

Can you please buy read and study a book on RDBMS and SQL? There is no such thing as a generic "id" in RDBMS. It has to be the identifier of something in particular to be valid. Then on top of that, and identifier can never be a numeric because you don't do math on it.

You might want to read a book on basic data modeling. Things like "city" are too vague generic to be valid column names. They need to have what the ISO 11179 people call "attribute property" to be valid. And you're "name" is one of those attribute properties; it has to be the name of something.

A table by definition, must have a primary key. But this deck of punch cards doesn't have a key and can never have any keys! Let's at least get the column names and how many datatypes correct. I picked a length of 35 for the names based on the International Postal Union requirements for mailing labels. I'm willing to bet that you made up the length of 100, with no research whatsoever. This is not how to be a database professional

CREATE TABLE Details 
(detail_id CHAR(3)NOT NULL PRIMARY KEY,
something_name VARCHAR(35),
city_name VARCHAR(35),
place_name VARCHAR(35)
);

If you have valid data, then you should be able to use the current table constructor insertion syntax. That would look like this. People who think in sets instead of in punchcards really like it. It lets the optimizer do some things, instead of forcing a sequential insertion. But you're trying put garbage in a table. To do this, you of course have to screwup the DDL for that table
you have valid data, then you should be able to use the current table constructor insertion syntax. That would look like this. People who think in sets instead of in punchcards really like it. It lets the optimizer do some things, instead of forcing a sequential insertion. But you're trying put garbage in a table. To do this, you of course have to screwup the DDL for that table

INSERT INTO Details
VALUES
(1, 'sara', 'chennai', 't.nagar'),
(1, NULL, NULL, NULL), -- dup key
(1, 'sara', 'chennai', 't.nagar'), --- dup row!

(2, 'sara', 'chennai', 't.nagar'),
(2, NULL, NULL, NULL), ---dup key

(3, 'sara', 'chennai', 't.nagar'),

(4, NULL, NULL, NULL),

(5, 'raj', 'vellore', 'kodambakkam'),
(5, 'raj', 'vellore', NULL); ---dup key


What are you trying to actually do? My first impulse is that you need to have stored procedure that does the row update for Details.

CREATE PROCEDURE Fill_Details
(@in_detail_id CHAR(3), @in_something_name VARCHAR(35),
@in_city_name VARCHAR(35), @in_place_name )
BEGIN
UPDATE Details
SET something_name = COALESCE(something_name, @in_something_name),
city_name = COALESCE (city_name, @in_city_name),
place_name = COALESCE (place_name, @in_place_name)
WHERE detail_id = @in_detail_id;
-- insert new row, if it does not exist
END;

]What are you trying to actually do? My first impulse is that you need to have stored procedure that does the row update for Details.CREATE [code language="sql"]CREATE PROCEDURE Fill_Details(@in_detail_id CHAR(3), @in_something_name VARCHAR(35), @in_city_name VARCHAR(35), @in_place_name )
BEGIN
UPDATE Details
SET something_name = COALESCE(something_name, @in_something_name),
city_name = COALESCE (city_name, @in_city_name),
place_name = COALESCE (place_name, @in_place_name
)WHERE detail_id = @in_detail_id;-- insert new row, if it does not exist
END;


Thanks Celko. I agree with you. We are moving the records from Oracle to Hive(Hadoop), which is similar to SQL. In hive there is no update concept. In hive(for old version) only insert is available .Every time a record came it is inserted (not updated) in hive and I am fetching the records in hive using ID(primary key) and Max(run_time) for that also we are getting multiple records for the particular ID.we are some how getting null values as well in Hive to eliminate the null record I posted the sample data.
What Thorn suggested almost worked.

Regards,
Saravanan
John Mitchell-245523
John Mitchell-245523
SSC Guru
SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)

Group: General Forum Members
Points: 72999 Visits: 17694
saravanatn - Wednesday, December 6, 2017 7:28 PM

For the test data created by you I want the following output updated in inline commands.

INSERT INTO details
VALUES (1, 'sara', 'chennai', 't.nagar'),
(1, NULL, NULL, NULL),
(1, 'sara', 'chennai', 't.nagar'),
(2, 'sara', 'chennai', 't.nagar'),
(2, NULL, NULL, NULL),
(3, 'sara', 'chennai', 't.nagar'),
(4, NULL, NULL, NULL),
(5, 'raj', 'vellore', 'kodambakkam'),
(5, 'raj', 'vellore', NULL),
(6, NULL, 'London','Kensington'), /* not required as it is having multiple values for ID(Count=2) and it is having Null value/*
(6,'Steve', 'London',NULL), /* not required as it is having multiple values for ID(Count=2) and it is having Null value/*
(7,'Steve', 'London','Kensington'), /* Required as it is not having null values/*
(7,'Steve', 'London','Barnet') /* Required as it is not having null values/*;

Output Results:
id name city place
1 sara chennai t.nagar
1 sara chennai t.nagar
2 sara chennai t.nagar
3 sara chennai t.nagar
4 NULL NULL NULL
5 raj vellore kodambakkam
7 Steve London Kensington
7 Steve London Barnet

Kindly let me know if you requires any additional details.

Regards,
Saravanan

This returns the above results:
WITH CountsandNulls AS (
SELECT
id
, name
, city
, place
, COUNT(*) OVER (PARTITION BY id) AS RowsPerID
, CASE
WHEN name + city + place IS NULL THEN 1
ELSE 0
END AS HasNulls
FROM details
)
SELECT
id
, name
, city
, place
FROM CountsandNulls
WHERE RowsPerID = 1
OR HasNulls = 0;


jcelko212 32090 - Thursday, December 7, 2017 11:51 AM
Can you please buy read and study a book on RDBMS and SQL? There is no such thing as a generic "id" in RDBMS. It has to be the identifier of something in particular to be valid. Then on top of that, and identifier can never be a numeric because you don't do math on it.


Can you please buy read and study a book on logic? The (correct) premise "if we need to do arithmetic on a column then the column must have a numeric data type" does not imply "if we do not need to do arithmetic on a column then the column must not have a numeric data type", any more than "if I need to drive a car it must have wheels" implies "if nobody needs to drive a car it must not have wheels".

saravanatn - Thursday, December 7, 2017 10:57 PM
[quote]
jcelko212 32090 - Thursday, December 7, 2017 11:51 AM

Thanks Celko. I agree with you. We are moving the records from Oracle to Hive(Hadoop), which is similar to SQL. In hive there is no update concept. In hive(for old version) only insert is available .Every time a record came it is inserted (not updated) in hive and I am fetching the records in hive using ID(primary key) and Max(run_time) for that also we are getting multiple records for the particular ID.we are some how getting null values as well in Hive to eliminate the null record I posted the sample data.
What Thorn suggested almost worked.


Despite Joe's dogma and shaky logic, I actually agree with most of what he says. In particular, why are you keeping two identical rows for ID 1, and why are you keeping a row of NULLs for ID 4 that doesn't actually tell you anything?

John

saravanatn
saravanatn
SSC Veteran
SSC Veteran (276 reputation)SSC Veteran (276 reputation)SSC Veteran (276 reputation)SSC Veteran (276 reputation)SSC Veteran (276 reputation)SSC Veteran (276 reputation)SSC Veteran (276 reputation)SSC Veteran (276 reputation)

Group: General Forum Members
Points: 276 Visits: 370
Thanks John. I posted the test data similar to what we got in Hive results(test data is coming like this may be due to hive nature as there is no update concept in table and only insert is available in the particular project) .Hadoop-Hive is using Map reduce program and it is very slow. Thorn framed a query which was fast and it almost worked.
I will check with the query you developed with my test data and get back to you .

Regards,
Saravanan
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum







































































































































































SQLServerCentral


Search