Get one type with max id

Question

Post reply

Get one type with max id

kdbarrett 88655

Right there with Babe

Points: 770
More actions
August 3, 2016 at 8:06 am

#312274

I need to get the max address id for each client in this order:
Billing
Mailing
Home
[Null]
So if they have a Billing address type, I need the maxid for that address. If they don't have a Billing, I need the Mailing address, and so on. But only one address id per client.
Mock table data:
USE [tempdb]
GO
CREATE TABLE [dbo].[addresses](
[AddID] [int] NOT NULL PRIMARY KEY CLUSTERED,
[ClientId] [int] NOT NULL,
[AddrType] [nvarchar](10) NOT NULL
)
GO
INSERT INTO addresses VALUES (1001, 9999, 'Mailing')
INSERT INTO addresses VALUES (1002, 9999, 'Billing')
INSERT INTO addresses VALUES (1003, 9999, 'Work')
INSERT INTO addresses VALUES (1004, 9999, 'Billing')
INSERT INTO addresses VALUES (1005, 9998, 'Mailing')
INSERT INTO addresses VALUES (1006, 9998, 'Home')
INSERT INTO addresses VALUES (1007, 9997, '')
This is what I want to see:
+-------+----------+----------+
| AddID | ClientID | AddrType |
+-------+----------+----------+
| 1004 | 9999 | Billing |
| 1005 | 9998 | Mailing |
| 1007 | 9997 | |
+-------+----------+----------+
I've tried many things except/intersect/union, subqueries, etc. Here's a recent work-in-progress:
select
*
from
(select
max(AddID) as 'MaxAddID',
ClientId,
AddrType
from
addresses
group by
ClientId,
AddrType) Keys
left join addresses addresses on Keys.MaxAddID = addresses.AddID
where
addresses.AddrType = 'Billing'
order by
addresses.ClientId
Help, please. Thanks!

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

Here is my solution:

with basedata as (

select

rn = row_number() over (partition by ClientId order by case when AddrType = 'Billing' then 1

when AddrType = 'Mailing' then 2

when AddrType = 'Home' then 3

else 4

end, AddID desc),

AddID,

ClientId,

AddrType

from

dbo.addresses

)

select

AddID,

ClientId,

AddrType

from

basedata

where

rn = 1

order by

ClientId desc;

Look it over and ask any questions you need to understand what I did.

kdbarrett 88655 Right there with Babe Points: 770 More actions · Answer 2

Works great! Thank you very much!

One of my many iterations was along those lines but not nearly as sophisticated. My boss thought it should be done with a cursor, but I knew there had to be a better way.

Thanks again!

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 3

Lynn Pettis

SSC Guru

Points: 442467

August 3, 2016 at 8:52 am

#1893517

Glad it worked.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 4

Just noticed that you have an entry for Work but tat wasn't included in you priority sort order in the original post. Where do you want that to be in the sort order?

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 5

The problem with Lynn's fine solution is that it isn't able to utilise an index on the Address Type, hence a relatively expensive sort operator is introduced in the execution plan. With a little bit of creativity we can eliminate the sort and lower the execution cost about 3/4.

😎

USE TEEST;

GO

SET NOCOUNT ON;

IF OBJECT_ID(N'dbo.addresses') IS NOT NULL DROP TABLE dbo.addresses;

CREATE TABLE [dbo].[addresses](

[AddID] [int] NOT NULL PRIMARY KEY CLUSTERED,

[ClientId] [int] NOT NULL,

[AddrType] [nvarchar](10) NOT NULL

)

INSERT INTO addresses VALUES (1001, 9999, 'Mailing')

,(1002, 9999, 'Billing')

,(1003, 9999, 'Work')

,(1004, 9999, 'Billing')

,(1005, 9998, 'Mailing')

,(1006, 9998, 'Home')

,(1007, 9997, '');

CREATE NONCLUSTERED INDEX NCLIDX_DBO_ADDRESSES_CLIENT_ID_ADDRTYPE_INCL_ADDID ON dbo.addresses (ClientId ASC, AddrType ASC) INCLUDE (AddID)

;WITH BASE_DATA AS

(

SELECT

AD.AddID

,AD.ClientId

,AD.AddrType

,CASE

WHEN AD.AddrType = 'Billing' THEN 1

WHEN AD.AddrType = 'Mailing' THEN 2

WHEN AD.AddrType = 'Work' THEN 3

WHEN AD.AddrType = 'Home' THEN 4

ELSE 5

END AS TRNK

FROM dbo.addresses AD

)

,RANKED_DATA AS

(

SELECT

BD.AddID

,BD.ClientId

,BD.AddrType

,BD.TRNK

,MIN(BD.TRNK) OVER

(

PARTITION BY BD.ClientId

) AS MRNK

FROM BASE_DATA BD

)

,FINAL_RESULTS AS

(

SELECT

RD.AddID

,RD.ClientId

,RD.AddrType

,ROW_NUMBER() OVER

(

PARTITION BY RD.ClientId

ORDER BY (SELECT NULL)

) AS TRID

FROM RANKED_DATA RD

WHERE RD.TRNK = RD.MRNK

)

SELECT

FR.AddID

,FR.ClientId

,FR.AddrType

FROM FINAL_RESULTS FR

WHERE FR.TRID = 1;

IO and Time statistics

--WITHOUT SORT

Table 'Worktable'. Scan count 3, logical reads 27, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'addresses'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 32 ms, elapsed time = 102 ms.

--WITH SORT

Table 'addresses'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 376 ms.

kdbarrett 88655 Right there with Babe Points: 770 More actions · Answer 6

kdbarrett 88655

Right there with Babe

Points: 770

August 4, 2016 at 8:36 am

#1893810

Nice! Thanks!

Brandie Tarvin SSC Guru Points: 173108 More actions · Answer 7

Eirikur Eiriksson (8/4/2016)
The problem with Lynn's fine solution is that it isn't able to utilise an index on the Address Type, hence a relatively expensive sort operator is introduced in the execution plan. With a little bit of creativity we can eliminate the sort and lower the execution cost about 3/4.
😎
USE TEEST;
GO
SET NOCOUNT ON;
IF OBJECT_ID(N'dbo.addresses') IS NOT NULL DROP TABLE dbo.addresses;
CREATE TABLE [dbo].[addresses](
[AddID] [int] NOT NULL PRIMARY KEY CLUSTERED,
[ClientId] [int] NOT NULL,
[AddrType] [nvarchar](10) NOT NULL
)
INSERT INTO addresses VALUES (1001, 9999, 'Mailing')
,(1002, 9999, 'Billing')
,(1003, 9999, 'Work')
,(1004, 9999, 'Billing')
,(1005, 9998, 'Mailing')
,(1006, 9998, 'Home')
,(1007, 9997, '');
CREATE NONCLUSTERED INDEX NCLIDX_DBO_ADDRESSES_CLIENT_ID_ADDRTYPE_INCL_ADDID ON dbo.addresses (ClientId ASC, AddrType ASC) INCLUDE (AddID)
;WITH BASE_DATA AS
(
SELECT
AD.AddID
,AD.ClientId
,AD.AddrType
,CASE
WHEN AD.AddrType = 'Billing' THEN 1
WHEN AD.AddrType = 'Mailing' THEN 2
WHEN AD.AddrType = 'Work' THEN 3
WHEN AD.AddrType = 'Home' THEN 4
ELSE 5
END AS TRNK
FROM dbo.addresses AD
)
,RANKED_DATA AS
(
SELECT
BD.AddID
,BD.ClientId
,BD.AddrType
,BD.TRNK
,MIN(BD.TRNK) OVER
(
PARTITION BY BD.ClientId
) AS MRNK
FROM BASE_DATA BD
)
,FINAL_RESULTS AS
(
SELECT
RD.AddID
,RD.ClientId
,RD.AddrType
,ROW_NUMBER() OVER
(
PARTITION BY RD.ClientId
ORDER BY (SELECT NULL)
) AS TRID
FROM RANKED_DATA RD
WHERE RD.TRNK = RD.MRNK
)
SELECT
FR.AddID
,FR.ClientId
,FR.AddrType
FROM FINAL_RESULTS FR
WHERE FR.TRID = 1;

Hrm. So I ran both sets of code and I see Eirikur's code is getting a different answer than Lynn's code is.

Lynn is getting:

AddIDClientIdAddrType
10049999Billing
10059998Mailing
10079997

And Eirikur is getting:

AddIDClientIdAddrType
10079997
10059998Mailing
10029999Billing

So the issue here is which Billing address does the OP need? The most recent one (i.e., highest AddID) or the oldest or just any old Billing address?

In my office, we'd want the highest ID as the most recent. So I would change the ORDER BY in the last CTE of Eirikur's code to be as below:

WITH BASE_DATA AS

(

SELECT

AD.AddID

,AD.ClientId

,AD.AddrType

,CASE

WHEN AD.AddrType = 'Billing' THEN 1

WHEN AD.AddrType = 'Mailing' THEN 2

WHEN AD.AddrType = 'Work' THEN 3

WHEN AD.AddrType = 'Home' THEN 4

ELSE 5

END AS TRNK

FROM dbo.addresses AD

)

,RANKED_DATA AS

(

SELECT

BD.AddID

,BD.ClientId

,BD.AddrType

,BD.TRNK

,MIN(BD.TRNK) OVER

(

PARTITION BY BD.ClientId

) AS MRNK

FROM BASE_DATA BD

)

,FINAL_RESULTS AS

(

SELECT

RD.AddID

,RD.ClientId

,RD.AddrType

,ROW_NUMBER() OVER

(

PARTITION BY RD.ClientId

ORDER BY (SELECT RD.AddID) DESC

) AS TRID

FROM RANKED_DATA RD

WHERE RD.TRNK = RD.MRNK

)

SELECT

FR.AddID

,FR.ClientId

,FR.AddrType

FROM FINAL_RESULTS FR

WHERE FR.TRID = 1;

EDIT: fixed messed up quote tags.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 8

Nice input Brandy, appreciate it:-) Maybe the OP can enlighten us on the issue?

😎

Brandie Tarvin SSC Guru Points: 173108 More actions · Answer 9

Eirikur Eiriksson (8/4/2016)
Nice input Brandy, appreciate it:-) Maybe the OP can enlighten us on the issue?
😎

There's also the issue of the final sort order. Lynn's results are sorted differently than yours so I don't know how that would affect the stats you pulled when you put both fixes in.

SELECT

FR.AddID

,FR.ClientId

,FR.AddrType

FROM FINAL_RESULTS FR

WHERE FR.TRID = 1

ORDER BY FR.ClientId

But the question is whether or not the OP needs the results sorted the way provided in the original post.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 10

Actually, I think my solution is quite viable. I created a 1,000,000 row test to compare the two solutions. I also noticed a difference between them where Eirikur's solution returned the minimum value for AddID and mine the maximum.

if object_id('dbo.eTally') is not null

DROP FUNCTION [dbo].[eTally]

GO

/****** Object: UserDefinedFunction [dbo].[eTally] Script Date: 8/4/2016 9:32:22 AM ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

create function [dbo].[eTally] (@Range int, @StartZero bit)

returns table

with schemabinding

as return(

with e1(n) as (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),

e2(n) as (select 1 from e1 a cross join e1 b),

e4(n) as (select 1 from e2 a cross join e2 b),

etally(n) as (select n = row_number() over (order by (select null)) from e4 a cross join e4 b)

select top (@Range)

n - cast(@StartZero as int) as n

from

etally);

GO

SET NOCOUNT ON;

IF OBJECT_ID(N'dbo.addresses') IS NOT NULL DROP TABLE dbo.addresses;

CREATE TABLE [dbo].[addresses](

[AddID] [int] NOT NULL PRIMARY KEY CLUSTERED,

[ClientId] [int] NOT NULL,

[AddrType] [nvarchar](10) NOT NULL

)

--INSERT INTO addresses VALUES (1001, 9999, 'Mailing')

-- ,(1002, 9999, 'Billing')

-- ,(1003, 9999, 'Work')

-- ,(1004, 9999, 'Billing')

-- ,(1005, 9998, 'Mailing')

-- ,(1006, 9998, 'Home')

-- ,(1007, 9997, '');

--===== Declare some obviously named variables

DECLARE @NumberOfRows INT,

@StartValue INT,

@EndValue INT,

@Range INT

;

--===== Preset the variables to known values

SELECT @NumberOfRows = 1000000,

@StartValue = 1000,

@EndValue = 9999,

@Range = @EndValue - @StartValue + 1

;

--===== Conditionally drop the test table to make reruns easier in SSMS

IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL

DROP TABLE #SomeTestTable

;

--===== Create the test table with "random constrained" integers and floats

-- within the parameters identified in the variables above.

INSERT INTO addresses

SELECT TOP (@NumberOfRows)

AddId = n + 1000,

SomeRandomInteger = ABS(CHECKSUM(NEWID())) % @Range + @StartValue,

SomeRandomValue = case ABS(CHECKSUM(NEWID())) % 5 + 1 when 1 then 'Billing' when 2 then 'Mailing' when 3 then 'Work' when 4 then 'Home' else '' end

-- INTO #SomeTestTable

FROM dbo.eTally(@NumberOfRows,0);

CREATE NONCLUSTERED INDEX NCLIDX_DBO_ADDRESSES_CLIENT_ID_ADDRTYPE_INCL_ADDID ON dbo.addresses (ClientId ASC, AddrType ASC) INCLUDE (AddID);

raiserror('Eirikur Eiriksson Solution Starting',10,1) with nowait;

set statistics io,time on;

WITH BASE_DATA AS

(

SELECT

AD.AddID

,AD.ClientId

,AD.AddrType

,CASE

WHEN AD.AddrType = 'Billing' THEN 1

WHEN AD.AddrType = 'Mailing' THEN 2

WHEN AD.AddrType = 'Work' THEN 3

WHEN AD.AddrType = 'Home' THEN 4

ELSE 5

END AS TRNK

FROM dbo.addresses AD

)

,RANKED_DATA AS

(

SELECT

BD.AddID

,BD.ClientId

,BD.AddrType

,BD.TRNK

,MIN(BD.TRNK) OVER

(

PARTITION BY BD.ClientId

) AS MRNK

FROM BASE_DATA BD

)

,FINAL_RESULTS AS

(

SELECT

RD.AddID

,RD.ClientId

,RD.AddrType

,ROW_NUMBER() OVER

(

PARTITION BY RD.ClientId

ORDER BY (SELECT NULL)

) AS TRID

FROM RANKED_DATA RD

WHERE RD.TRNK = RD.MRNK

)

SELECT

FR.AddID

,FR.ClientId

,FR.AddrType

FROM FINAL_RESULTS FR

WHERE FR.TRID = 1;

set statistics io,time off;

raiserror('Eirikur Eiriksson Solution Ending',10,1) with nowait;

raiserror('Lynn Pettis Solution Starting',10,1) with nowait;

set statistics io,time on;

with basedata as (

select

rn = row_number() over (partition by ClientId order by case when AddrType = 'Billing' then 1

when AddrType = 'Mailing' then 2

when AddrType = 'Work' then 3

when AddrType = 'Home' then 4

else 5

end, AddID desc),

AddID,

ClientId,

AddrType

from

dbo.addresses

)

select

AddID,

ClientId,

AddrType

from

basedata

where

rn = 1

order by

ClientId;

set statistics io,time off;

raiserror('Lynn Pettis Solution Ending',10,1) with nowait;

select

*

from

dbo.addresses

order by

ClientId asc,

case when AddrType = 'Billing' then 1

when AddrType = 'Mailing' then 2

when AddrType = 'Work' then 3

when AddrType = 'Home' then 4

else 5

end,

AddID desc;

This is the output of the statistics io and time when I ran it on my laptop.

Eirikur Eiriksson Solution Starting

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 1 ms.

Table 'Worktable'. Scan count 3, logical reads 2036001, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'addresses'. Scan count 1, logical reads 2898, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 1700 ms, elapsed time = 1708 ms.

Eirikur Eiriksson Solution Ending

Lynn Pettis Solution Starting

SQL Server parse and compile time:

CPU time = 1 ms, elapsed time = 1 ms.

Table 'addresses'. Scan count 9, logical reads 2956, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 1608 ms, elapsed time = 333 ms.

Lynn Pettis Solution Ending

Also, if you pull the execution plans you will see that both queries use the nonclustered index.

kdbarrett 88655 Right there with Babe Points: 770 More actions · Answer 11

The "Work" option is not to be included in the data. They only want Billing, Mailing, Home, blank (because address type isn't required).

kdbarrett 88655 Right there with Babe Points: 770 More actions · Answer 12

Sorry, just saw the posts with questions about the sort order.

We DO want the most recent of a given address type, i.e., the highest AddrID, which is what Lynn's code returns. I pulled the code apart (because I haven't used partitions that much and wanted to understand it better) and tested it pretty thoroughly.

And in another response, I mentioned that "Work" is not to be included in the returned data, though a blank address type is. I updated the code to exclude "Work" types.

So in this scenario (added some data to orignal):

IF OBJECT_ID(N'dbo.addresses') IS NOT NULL DROP TABLE dbo.addresses;

CREATE TABLE [dbo].[addresses](

[AddID] [int] NOT NULL PRIMARY KEY CLUSTERED,

[ClientId] [int] NOT NULL,

[AddrType] [nvarchar](10) NOT NULL

)

INSERT INTO addresses VALUES (1001, 9999, 'Mailing')

,(1002, 9999, 'Billing')

,(1003, 9999, 'Work')

,(1004, 9999, 'Billing')

,(1005, 9998, 'Mailing')

,(1006, 9998, 'Home')

,(1007, 9997, 'Work')

,(1008, 9997, '')

,(1009, 9996, 'Work');

I want records 1004, 1005 and 1007, but not 1009.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 13

Reworking my test suite, as it generated all 5 address type for everyone.

kdbarrett 88655 Right there with Babe Points: 770 More actions · Answer 14

I added a where clause to filter out 'Work' and a case for ''. Seems to be working just fine.

USE [tempdb]

GO

IF OBJECT_ID(N'dbo.addresses') IS NOT NULL DROP TABLE dbo.addresses;

CREATE TABLE [dbo].[addresses](

[AddID] [int] NOT NULL PRIMARY KEY CLUSTERED,

[ClientId] [int] NOT NULL,

[AddrType] [nvarchar](10) NOT NULL

)

GO

INSERT INTO addresses VALUES (1001, 9999, 'Mailing')

,(1002, 9999, 'Billing')

,(1003, 9999, 'Work')

,(1004, 9999, 'Billing')

,(1005, 9998, 'Mailing')

,(1006, 9998, 'Home')

,(1007, 9997, '')

,(1008, 9997, 'Work')

,(1009, 9996, 'Work')

go

with basedata as (

select

rn = row_number() over (partition by ClientId order by case when AddrType = 'Billing' then 1

when AddrType = 'Mailing' then 2

when AddrType = 'Home' then 3

when AddrType = ''then 4

else 999

end, AddID desc),

AddID,

ClientId,

AddrType

from

dbo.addresses

where

AddrType <> 'Work'

)

select

AddID,

ClientId,

AddrType

from

basedata

where

rn = 1

order by

ClientId desc;