SQLServerCentral Article

SQL DISTINCT on Multiple Columns

,

The DISTINCT clause works in combination with SELECT and gives you unique date from a database table or tables. The syntax for DISTINCT is show below

SELECT DISTINCT "column_name"
FROM "table_name"

If you want a DISTINCT combination of more than one column then the syntax is

SELECT DISTINCT column1, column2
FROM "table_name"

Let's look at some examples to understand the usage of the DISTINCT keyword. First, let's create a table for our illustration and insert some data.

CREATE TABLE DuplicateTest(
 Firstname nvarchar (30) NOT NULL,
 Lastname nvarchar(30) NOT NULL,
 PostalCode nvarchar(15) NOT NULL,
 City nvarchar(30) NOT NULL
 )
insert into DuplicateTest 
 (Firstname,Lastname,PostalCode,City)
values
 ('Sarvesh', 'Singh', 'B283SP', 'Birmingham'),
 ('Steve', 'White', 'EC224HQ', 'London'),
 ('Mark', 'Smith', 'L324JK', 'Liverpool'),
 ('Claire', 'whitehood', 'M236DM', 'Manchester'),
 ('Param', 'Singh', 'B263SP', 'Birmingham')
select * 
 from DuplicateTest
DuplicateTest
Firstname Lastname PostalCode City
Sarvesh Singh B263SP Birmingham
Steve White EC224HQ London
Mark Smith L324JK Liverpool
Claire whitehood M236DM Manchester
Param Singh B283SP Birmingham

In the result set above there are repetitions in the City Column. Let's get a list of all cities without repeating them using DISTINCT.

select DISTINCT City
 from DuplicateTest
City
Birmingham
Liverpool
London

Manchester

You can see 'Birmingham' is just returned once in this result, even though it appears more than once in the table. You can get the same result using GROUP BY as shown below.

select city 
 from DuplicateTest
 group by city

Let's now use DISTINCT with more than one column. We will add the Lastname column in as well.

select DISTINCT City,Lastname from DuplicateTest
Distinct City and lastname
City Lastname
Birmingham Singh
Liverpool Smith
London White
Manchester whitehood

We get a list of results that have multiple rows, none of which are duplicated. Again, you can get the same result by using GROUP BY as shown below:

select city,lastname 
 from DuplicateTest
 group by city, lastname

If you look at the original data, there are two users with same Lastname (Singh) who live in the same city (Birmingham). With the DISTINCT keyword you get one unique row. Let's now add another column to our SELECT query.

select DISTINCT City,Lastname,Postalcode
 from DuplicateTest

This returns:

City Lastname PostalCode
Birmingham Singh B263SP
Birmingham Singh B283SP
Liverpool Smith L324JK
London White EC224HQ
Manchester whitehood M236DM

You will notice now that you are seeing two rows with the same lastname of Singh. This is because their 'Postalcode' is different, and the addition of that column makes the rows unique.

Again you will get the same result using GROUP BY as shown below:

select city, lastname, postalcode
 from DuplicateTest
 group by city, lastname, postalcode

Let's look at another example where you can use DISTINCT on multiple columns to find duplicate address. I've taken this example from the post. Please refer to this post for more detail.

SELECT PostCode, COUNT(Postcode) 
 FROM 
  (
    SELECT DISTINCT Address1, Address2, City, Postcode
     FROM AddressTable
  ) AS Sub
 GROUP BY Postcode
 HAVING COUNT(Postcode) > 1

Or you can use GROUP BY as follows:

SELECT Address1,Address2,City,PostCode,Count(PostCode)
 FROM AddressTable
 GROUP BY Address1,Address2,City,PostCode
 HAVING Count(PostCode) > 1

In both of these cases, we are using DISTINCT to find those rows that are duplicates, based on the columns we include in the queries.

DISTINCT can also be used to get unique column values with an aggregate function. In the example below, the query gets the unique values of reorderpoint and then does a SUM.

USE AdventureWorks
GO
SELECT SUM(DISTINCT ReorderPoint) as DistinctSum
 FROM Production.Product
GO

Result: 1848 rows

In the example below query is doing a SUM of ReorderPoint including the duplicates.

SELECT SUM(ReorderPoint) as WithoutDistinct
FROM Production.Product
GO

Result: 202287 rows

As you can see from the above two examples the importance of DISTINCT with an aggregate function. The user could end up un-knowingly using completely incorrect SUM had he used the result from the second query if the requirement was to get the SUM of unique values of ReorderPoint.

The purpose of the article was to show a user simple usage of DISTINCT. I hope you would find this article useful.

Rate

2.78 (142)

You rated this post out of 5. Change rating

Share

Share

Rate

2.78 (142)

You rated this post out of 5. Change rating