August 3, 2016 at 8:06 am
I need to get the max address id for each client in this order:
Billing
Mailing
Home
[Null]
So if they have a Billing address type, I need the maxid for that address. If they don't have a Billing, I need the Mailing address, and so on. But only one address id per client.
Mock table data:
USE [tempdb]
GO
CREATE TABLE [dbo].[addresses](
[AddID] [int] NOT NULL PRIMARY KEY CLUSTERED,
[ClientId] [int] NOT NULL,
[AddrType] [nvarchar](10) NOT NULL
)
GO
INSERT INTO addresses VALUES (1001, 9999, 'Mailing')
INSERT INTO addresses VALUES (1002, 9999, 'Billing')
INSERT INTO addresses VALUES (1003, 9999, 'Work')
INSERT INTO addresses VALUES (1004, 9999, 'Billing')
INSERT INTO addresses VALUES (1005, 9998, 'Mailing')
INSERT INTO addresses VALUES (1006, 9998, 'Home')
INSERT INTO addresses VALUES (1007, 9997, '')
This is what I want to see:
+-------+----------+----------+
| AddID | ClientID | AddrType |
+-------+----------+----------+
| 1004 | 9999 | Billing |
| 1005 | 9998 | Mailing |
| 1007 | 9997 | |
+-------+----------+----------+
I've tried many things except/intersect/union, subqueries, etc. Here's a recent work-in-progress:
select
*
from
(select
max(AddID) as 'MaxAddID',
ClientId,
AddrType
from
addresses
group by
ClientId,
AddrType) Keys
left join addresses addresses on Keys.MaxAddID = addresses.AddID
where
addresses.AddrType = 'Billing'
order by
addresses.ClientId
Help, please. Thanks!
August 3, 2016 at 8:24 am
Here is my solution:
with basedata as (
select
rn = row_number() over (partition by ClientId order by case when AddrType = 'Billing' then 1
when AddrType = 'Mailing' then 2
when AddrType = 'Home' then 3
else 4
end, AddID desc),
AddID,
ClientId,
AddrType
from
dbo.addresses
)
select
AddID,
ClientId,
AddrType
from
basedata
where
rn = 1
order by
ClientId desc;
Look it over and ask any questions you need to understand what I did.
August 3, 2016 at 8:47 am
Works great! Thank you very much!
One of my many iterations was along those lines but not nearly as sophisticated. My boss thought it should be done with a cursor, but I knew there had to be a better way.
Thanks again!
August 3, 2016 at 8:52 am
Glad it worked.
August 3, 2016 at 2:25 pm
Just noticed that you have an entry for Work but tat wasn't included in you priority sort order in the original post. Where do you want that to be in the sort order?
August 4, 2016 at 6:43 am
The problem with Lynn's fine solution is that it isn't able to utilise an index on the Address Type, hence a relatively expensive sort operator is introduced in the execution plan. With a little bit of creativity we can eliminate the sort and lower the execution cost about 3/4.
😎
USE TEEST;
GO
SET NOCOUNT ON;
IF OBJECT_ID(N'dbo.addresses') IS NOT NULL DROP TABLE dbo.addresses;
CREATE TABLE [dbo].[addresses](
[AddID] [int] NOT NULL PRIMARY KEY CLUSTERED,
[ClientId] [int] NOT NULL,
[AddrType] [nvarchar](10) NOT NULL
)
INSERT INTO addresses VALUES (1001, 9999, 'Mailing')
,(1002, 9999, 'Billing')
,(1003, 9999, 'Work')
,(1004, 9999, 'Billing')
,(1005, 9998, 'Mailing')
,(1006, 9998, 'Home')
,(1007, 9997, '');
CREATE NONCLUSTERED INDEX NCLIDX_DBO_ADDRESSES_CLIENT_ID_ADDRTYPE_INCL_ADDID ON dbo.addresses (ClientId ASC, AddrType ASC) INCLUDE (AddID)
;WITH BASE_DATA AS
(
SELECT
AD.AddID
,AD.ClientId
,AD.AddrType
,CASE
WHEN AD.AddrType = 'Billing' THEN 1
WHEN AD.AddrType = 'Mailing' THEN 2
WHEN AD.AddrType = 'Work' THEN 3
WHEN AD.AddrType = 'Home' THEN 4
ELSE 5
END AS TRNK
FROM dbo.addresses AD
)
,RANKED_DATA AS
(
SELECT
BD.AddID
,BD.ClientId
,BD.AddrType
,BD.TRNK
,MIN(BD.TRNK) OVER
(
PARTITION BY BD.ClientId
) AS MRNK
FROM BASE_DATA BD
)
,FINAL_RESULTS AS
(
SELECT
RD.AddID
,RD.ClientId
,RD.AddrType
,ROW_NUMBER() OVER
(
PARTITION BY RD.ClientId
ORDER BY (SELECT NULL)
) AS TRID
FROM RANKED_DATA RD
WHERE RD.TRNK = RD.MRNK
)
SELECT
FR.AddID
,FR.ClientId
,FR.AddrType
FROM FINAL_RESULTS FR
WHERE FR.TRID = 1;
IO and Time statistics
--WITHOUT SORT
Table 'Worktable'. Scan count 3, logical reads 27, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'addresses'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 32 ms, elapsed time = 102 ms.
--WITH SORT
Table 'addresses'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 376 ms.
August 4, 2016 at 8:36 am
Nice! Thanks!
August 4, 2016 at 8:44 am
Eirikur Eiriksson (8/4/2016)
The problem with Lynn's fine solution is that it isn't able to utilise an index on the Address Type, hence a relatively expensive sort operator is introduced in the execution plan. With a little bit of creativity we can eliminate the sort and lower the execution cost about 3/4.😎
USE TEEST;
GO
SET NOCOUNT ON;
IF OBJECT_ID(N'dbo.addresses') IS NOT NULL DROP TABLE dbo.addresses;
CREATE TABLE [dbo].[addresses](
[AddID] [int] NOT NULL PRIMARY KEY CLUSTERED,
[ClientId] [int] NOT NULL,
[AddrType] [nvarchar](10) NOT NULL
)
INSERT INTO addresses VALUES (1001, 9999, 'Mailing')
,(1002, 9999, 'Billing')
,(1003, 9999, 'Work')
,(1004, 9999, 'Billing')
,(1005, 9998, 'Mailing')
,(1006, 9998, 'Home')
,(1007, 9997, '');
CREATE NONCLUSTERED INDEX NCLIDX_DBO_ADDRESSES_CLIENT_ID_ADDRTYPE_INCL_ADDID ON dbo.addresses (ClientId ASC, AddrType ASC) INCLUDE (AddID)
;WITH BASE_DATA AS
(
SELECT
AD.AddID
,AD.ClientId
,AD.AddrType
,CASE
WHEN AD.AddrType = 'Billing' THEN 1
WHEN AD.AddrType = 'Mailing' THEN 2
WHEN AD.AddrType = 'Work' THEN 3
WHEN AD.AddrType = 'Home' THEN 4
ELSE 5
END AS TRNK
FROM dbo.addresses AD
)
,RANKED_DATA AS
(
SELECT
BD.AddID
,BD.ClientId
,BD.AddrType
,BD.TRNK
,MIN(BD.TRNK) OVER
(
PARTITION BY BD.ClientId
) AS MRNK
FROM BASE_DATA BD
)
,FINAL_RESULTS AS
(
SELECT
RD.AddID
,RD.ClientId
,RD.AddrType
,ROW_NUMBER() OVER
(
PARTITION BY RD.ClientId
ORDER BY (SELECT NULL)
) AS TRID
FROM RANKED_DATA RD
WHERE RD.TRNK = RD.MRNK
)
SELECT
FR.AddID
,FR.ClientId
,FR.AddrType
FROM FINAL_RESULTS FR
WHERE FR.TRID = 1;
Hrm. So I ran both sets of code and I see Eirikur's code is getting a different answer than Lynn's code is.
Lynn is getting:
AddIDClientIdAddrType
10049999Billing
10059998Mailing
10079997
And Eirikur is getting:
AddIDClientIdAddrType
10079997
10059998Mailing
10029999Billing
So the issue here is which Billing address does the OP need? The most recent one (i.e., highest AddID) or the oldest or just any old Billing address?
In my office, we'd want the highest ID as the most recent. So I would change the ORDER BY in the last CTE of Eirikur's code to be as below:
WITH BASE_DATA AS
(
SELECT
AD.AddID
,AD.ClientId
,AD.AddrType
,CASE
WHEN AD.AddrType = 'Billing' THEN 1
WHEN AD.AddrType = 'Mailing' THEN 2
WHEN AD.AddrType = 'Work' THEN 3
WHEN AD.AddrType = 'Home' THEN 4
ELSE 5
END AS TRNK
FROM dbo.addresses AD
)
,RANKED_DATA AS
(
SELECT
BD.AddID
,BD.ClientId
,BD.AddrType
,BD.TRNK
,MIN(BD.TRNK) OVER
(
PARTITION BY BD.ClientId
) AS MRNK
FROM BASE_DATA BD
)
,FINAL_RESULTS AS
(
SELECT
RD.AddID
,RD.ClientId
,RD.AddrType
,ROW_NUMBER() OVER
(
PARTITION BY RD.ClientId
ORDER BY (SELECT RD.AddID) DESC
) AS TRID
FROM RANKED_DATA RD
WHERE RD.TRNK = RD.MRNK
)
SELECT
FR.AddID
,FR.ClientId
,FR.AddrType
FROM FINAL_RESULTS FR
WHERE FR.TRID = 1;
EDIT: fixed messed up quote tags.
August 4, 2016 at 8:50 am
Nice input Brandy, appreciate it:-) Maybe the OP can enlighten us on the issue?
😎
August 4, 2016 at 9:01 am
Eirikur Eiriksson (8/4/2016)
Nice input Brandy, appreciate it:-) Maybe the OP can enlighten us on the issue?😎
There's also the issue of the final sort order. Lynn's results are sorted differently than yours so I don't know how that would affect the stats you pulled when you put both fixes in.
SELECT
FR.AddID
,FR.ClientId
,FR.AddrType
FROM FINAL_RESULTS FR
WHERE FR.TRID = 1
ORDER BY FR.ClientId
But the question is whether or not the OP needs the results sorted the way provided in the original post.
August 4, 2016 at 9:44 am
Actually, I think my solution is quite viable. I created a 1,000,000 row test to compare the two solutions. I also noticed a difference between them where Eirikur's solution returned the minimum value for AddID and mine the maximum.
if object_id('dbo.eTally') is not null
DROP FUNCTION [dbo].[eTally]
GO
/****** Object: UserDefinedFunction [dbo].[eTally] Script Date: 8/4/2016 9:32:22 AM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
create function [dbo].[eTally] (@Range int, @StartZero bit)
returns table
with schemabinding
as return(
with e1(n) as (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
e2(n) as (select 1 from e1 a cross join e1 b),
e4(n) as (select 1 from e2 a cross join e2 b),
etally(n) as (select n = row_number() over (order by (select null)) from e4 a cross join e4 b)
select top (@Range)
n - cast(@StartZero as int) as n
from
etally);
GO
SET NOCOUNT ON;
IF OBJECT_ID(N'dbo.addresses') IS NOT NULL DROP TABLE dbo.addresses;
CREATE TABLE [dbo].[addresses](
[AddID] [int] NOT NULL PRIMARY KEY CLUSTERED,
[ClientId] [int] NOT NULL,
[AddrType] [nvarchar](10) NOT NULL
)
--INSERT INTO addresses VALUES (1001, 9999, 'Mailing')
-- ,(1002, 9999, 'Billing')
-- ,(1003, 9999, 'Work')
-- ,(1004, 9999, 'Billing')
-- ,(1005, 9998, 'Mailing')
-- ,(1006, 9998, 'Home')
-- ,(1007, 9997, '');
--===== Declare some obviously named variables
DECLARE @NumberOfRows INT,
@StartValue INT,
@EndValue INT,
@Range INT
;
--===== Preset the variables to known values
SELECT @NumberOfRows = 1000000,
@StartValue = 1000,
@EndValue = 9999,
@Range = @EndValue - @StartValue + 1
;
--===== Conditionally drop the test table to make reruns easier in SSMS
IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL
DROP TABLE #SomeTestTable
;
--===== Create the test table with "random constrained" integers and floats
-- within the parameters identified in the variables above.
INSERT INTO addresses
SELECT TOP (@NumberOfRows)
AddId = n + 1000,
SomeRandomInteger = ABS(CHECKSUM(NEWID())) % @Range + @StartValue,
SomeRandomValue = case ABS(CHECKSUM(NEWID())) % 5 + 1 when 1 then 'Billing' when 2 then 'Mailing' when 3 then 'Work' when 4 then 'Home' else '' end
-- INTO #SomeTestTable
FROM dbo.eTally(@NumberOfRows,0);
CREATE NONCLUSTERED INDEX NCLIDX_DBO_ADDRESSES_CLIENT_ID_ADDRTYPE_INCL_ADDID ON dbo.addresses (ClientId ASC, AddrType ASC) INCLUDE (AddID);
raiserror('Eirikur Eiriksson Solution Starting',10,1) with nowait;
set statistics io,time on;
WITH BASE_DATA AS
(
SELECT
AD.AddID
,AD.ClientId
,AD.AddrType
,CASE
WHEN AD.AddrType = 'Billing' THEN 1
WHEN AD.AddrType = 'Mailing' THEN 2
WHEN AD.AddrType = 'Work' THEN 3
WHEN AD.AddrType = 'Home' THEN 4
ELSE 5
END AS TRNK
FROM dbo.addresses AD
)
,RANKED_DATA AS
(
SELECT
BD.AddID
,BD.ClientId
,BD.AddrType
,BD.TRNK
,MIN(BD.TRNK) OVER
(
PARTITION BY BD.ClientId
) AS MRNK
FROM BASE_DATA BD
)
,FINAL_RESULTS AS
(
SELECT
RD.AddID
,RD.ClientId
,RD.AddrType
,ROW_NUMBER() OVER
(
PARTITION BY RD.ClientId
ORDER BY (SELECT NULL)
) AS TRID
FROM RANKED_DATA RD
WHERE RD.TRNK = RD.MRNK
)
SELECT
FR.AddID
,FR.ClientId
,FR.AddrType
FROM FINAL_RESULTS FR
WHERE FR.TRID = 1;
set statistics io,time off;
raiserror('Eirikur Eiriksson Solution Ending',10,1) with nowait;
raiserror('Lynn Pettis Solution Starting',10,1) with nowait;
set statistics io,time on;
with basedata as (
select
rn = row_number() over (partition by ClientId order by case when AddrType = 'Billing' then 1
when AddrType = 'Mailing' then 2
when AddrType = 'Work' then 3
when AddrType = 'Home' then 4
else 5
end, AddID desc),
AddID,
ClientId,
AddrType
from
dbo.addresses
)
select
AddID,
ClientId,
AddrType
from
basedata
where
rn = 1
order by
ClientId;
set statistics io,time off;
raiserror('Lynn Pettis Solution Ending',10,1) with nowait;
select
*
from
dbo.addresses
order by
ClientId asc,
case when AddrType = 'Billing' then 1
when AddrType = 'Mailing' then 2
when AddrType = 'Work' then 3
when AddrType = 'Home' then 4
else 5
end,
AddID desc;
This is the output of the statistics io and time when I ran it on my laptop.
Eirikur Eiriksson Solution Starting
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
Table 'Worktable'. Scan count 3, logical reads 2036001, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'addresses'. Scan count 1, logical reads 2898, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1700 ms, elapsed time = 1708 ms.
Eirikur Eiriksson Solution Ending
Lynn Pettis Solution Starting
SQL Server parse and compile time:
CPU time = 1 ms, elapsed time = 1 ms.
Table 'addresses'. Scan count 9, logical reads 2956, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1608 ms, elapsed time = 333 ms.
Lynn Pettis Solution Ending
Also, if you pull the execution plans you will see that both queries use the nonclustered index.
August 4, 2016 at 9:51 am
The "Work" option is not to be included in the data. They only want Billing, Mailing, Home, blank (because address type isn't required).
August 4, 2016 at 10:05 am
Sorry, just saw the posts with questions about the sort order.
We DO want the most recent of a given address type, i.e., the highest AddrID, which is what Lynn's code returns. I pulled the code apart (because I haven't used partitions that much and wanted to understand it better) and tested it pretty thoroughly.
And in another response, I mentioned that "Work" is not to be included in the returned data, though a blank address type is. I updated the code to exclude "Work" types.
So in this scenario (added some data to orignal):
IF OBJECT_ID(N'dbo.addresses') IS NOT NULL DROP TABLE dbo.addresses;
CREATE TABLE [dbo].[addresses](
[AddID] [int] NOT NULL PRIMARY KEY CLUSTERED,
[ClientId] [int] NOT NULL,
[AddrType] [nvarchar](10) NOT NULL
)
INSERT INTO addresses VALUES (1001, 9999, 'Mailing')
,(1002, 9999, 'Billing')
,(1003, 9999, 'Work')
,(1004, 9999, 'Billing')
,(1005, 9998, 'Mailing')
,(1006, 9998, 'Home')
,(1007, 9997, 'Work')
,(1008, 9997, '')
,(1009, 9996, 'Work');
I want records 1004, 1005 and 1007, but not 1009.
August 4, 2016 at 10:06 am
Reworking my test suite, as it generated all 5 address type for everyone.
August 4, 2016 at 11:30 am
I added a where clause to filter out 'Work' and a case for ''. Seems to be working just fine.
USE [tempdb]
GO
IF OBJECT_ID(N'dbo.addresses') IS NOT NULL DROP TABLE dbo.addresses;
CREATE TABLE [dbo].[addresses](
[AddID] [int] NOT NULL PRIMARY KEY CLUSTERED,
[ClientId] [int] NOT NULL,
[AddrType] [nvarchar](10) NOT NULL
)
GO
INSERT INTO addresses VALUES (1001, 9999, 'Mailing')
,(1002, 9999, 'Billing')
,(1003, 9999, 'Work')
,(1004, 9999, 'Billing')
,(1005, 9998, 'Mailing')
,(1006, 9998, 'Home')
,(1007, 9997, '')
,(1008, 9997, 'Work')
,(1009, 9996, 'Work')
go
with basedata as (
select
rn = row_number() over (partition by ClientId order by case when AddrType = 'Billing' then 1
when AddrType = 'Mailing' then 2
when AddrType = 'Home' then 3
when AddrType = ''then 4
else 999
end, AddID desc),
AddID,
ClientId,
AddrType
from
dbo.addresses
where
AddrType <> 'Work'
)
select
AddID,
ClientId,
AddrType
from
basedata
where
rn = 1
order by
ClientId desc;
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply