Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

join/merge 2 status tables (check statuses at any change) Expand / Collapse
Author
Message
Posted Wednesday, October 16, 2013 4:36 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, July 9, 2014 8:23 AM
Points: 4, Visits: 110
Hi,

I have 2 tables indicating since when is the status valid for an object. I need to kind of join/merge/union them so I know in one table what are the both stuses at any given change.

Table1:
Object  status1 date
Obj1 Open 1998-01-01 00:00:00.000
Obj1 Closed 2000-06-01 00:00:00.000
Obj1 Open 2008-03-06 00:00:00.000
Obj1 Closed 2013-01-01 00:00:00.000

Table2:
Object  status2 date
Obj1 ACTIVE 1999-01-01 00:00:00.000
Obj1 INACTIV 2000-08-29 00:00:00.000
Obj1 UNKNOWN 2004-05-06 00:00:00.000
Obj1 check 2014-05-01 00:00:00.000

as result I would need such thing:
Object  status1 status2 date
Obj1 Open NULL 1998-01-01 00:00:00.000
Obj1 Open ACTIVE 1999-01-01 00:00:00.000
Obj1 Closed ACTIVE 2000-06-01 00:00:00.000
Obj1 Closed INACTIV 2000-08-29 00:00:00.000
Obj1 Closed UNKNOWN 2004-05-06 00:00:00.000
Obj1 Open UNKNOWN 2008-03-06 00:00:00.000
Obj1 Closed UNKNOWN 2013-01-01 00:00:00.000
Obj1 Closed check 2014-05-01 00:00:00.000

here the script to create the source tables:
CREATE TABLE [dbo].[table2](
[Object] [nvarchar](1000) NULL,
[status2] [nvarchar](1000) NULL,
[date] [datetime] NULL
) ON [PRIMARY]
GO
INSERT [dbo].[table2] ([Object], [status2], [date]) VALUES (N'Obj1', N'ACTIVE', CAST(0x00008D3F00000000 AS DateTime))
INSERT [dbo].[table2] ([Object], [status2], [date]) VALUES (N'Obj1', N'INACTIV', CAST(0x00008F9D00000000 AS DateTime))
INSERT [dbo].[table2] ([Object], [status2], [date]) VALUES (N'Obj1', N'UNKNOWN', CAST(0x000094DF00000000 AS DateTime))
INSERT [dbo].[table2] ([Object], [status2], [date]) VALUES (N'Obj1', N'check', CAST(0x0000A31E00000000 AS DateTime))
go
CREATE TABLE [dbo].[table1](
[Object] [nvarchar](1000) NULL,
[status1] [nvarchar](1000) NULL,
[date] [datetime] NULL
) ON [PRIMARY]
GO
INSERT [dbo].[table1] ([Object], [status1], [date]) VALUES (N'Obj1', N'Open', CAST(0x00008BD200000000 AS DateTime))
INSERT [dbo].[table1] ([Object], [status1], [date]) VALUES (N'Obj1', N'Closed', CAST(0x00008F4400000000 AS DateTime))
INSERT [dbo].[table1] ([Object], [status1], [date]) VALUES (N'Obj1', N'Open', CAST(0x00009A5700000000 AS DateTime))
INSERT [dbo].[table1] ([Object], [status1], [date]) VALUES (N'Obj1', N'Closed', CAST(0x0000A13900000000 AS DateTime))

Post #1505135
Posted Wednesday, October 16, 2013 4:37 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, July 9, 2014 8:23 AM
Points: 4, Visits: 110
Well....

I kind of have a solution, but it's messy (especially that I acutally have more tables like these to merge, and obviuosly there come joins with other tables).
Maybe someone could suggest something better?
    ;WITH cte as(
select DISTINCT t.[Object], t.date FROM table1 t
union
select DISTINCT t.[Object], t.date FROM table2 t
)

select * FROM cte
outer APPLY (SELECT TOP 1 status1 FROM table1 tx
WHERE tx.date<=cte.date AND tx.[Object]=cte.[OBJECT]
ORDER BY tx.date DESC) t1
outer APPLY (SELECT TOP 1 status2 FROM table2 tx
WHERE tx.date<=cte.date and tx.[Object]=cte.[OBJECT]
ORDER BY tx.date DESC) t2

Post #1505136
Posted Wednesday, October 16, 2013 11:01 AM
SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Thursday, September 11, 2014 8:23 AM
Points: 625, Visits: 2,128
I suppose it depends how you define 'messy'.

You can probably achieve a better solution with a calendar table, but if you have a lot of these status tables, thats still a lot of steps.

Here is a sample of what I'm talking about:

CREATE TABLE #table2(
[Object] [nvarchar](1000) NULL,
[status2] [nvarchar](1000) NULL,
[date] [datetime] NULL
) ON [PRIMARY]
GO
INSERT #table2 ([Object], [status2], [date]) VALUES (N'Obj1', N'ACTIVE', CAST(0x00008D3F00000000 AS DateTime))
INSERT #table2 ([Object], [status2], [date]) VALUES (N'Obj1', N'INACTIV', CAST(0x00008F9D00000000 AS DateTime))
INSERT #table2 ([Object], [status2], [date]) VALUES (N'Obj1', N'UNKNOWN', CAST(0x000094DF00000000 AS DateTime))
INSERT #table2 ([Object], [status2], [date]) VALUES (N'Obj1', N'check', CAST(0x0000A31E00000000 AS DateTime))
go
CREATE TABLE #table1(
[Object] [nvarchar](1000) NULL,
[status1] [nvarchar](1000) NULL,
[date] [datetime] NULL
) ON [PRIMARY]
GO
INSERT #table1 ([Object], [status1], [date]) VALUES (N'Obj1', N'Open', CAST(0x00008BD200000000 AS DateTime))
INSERT #table1 ([Object], [status1], [date]) VALUES (N'Obj1', N'Closed', CAST(0x00008F4400000000 AS DateTime))
INSERT #table1 ([Object], [status1], [date]) VALUES (N'Obj1', N'Open', CAST(0x00009A5700000000 AS DateTime))
INSERT #table1 ([Object], [status1], [date]) VALUES (N'Obj1', N'Closed', CAST(0x0000A13900000000 AS DateTime))

Create table #Calendar (
calendar_date date

)

insert into #Calendar
SELECT top 10000 dateadd(dd, row_number() over (order by s1.name), 'Dec 31, 1997')
FROM sys.syscolumns S1
CROSS JOIN sys.syscolumns S2

;
WITH table1CTE AS (
SELECT [Object], [status1], [date], row_number() over (partition by [Object] order by [date]) AS RowNum
FROM #table1
),
table1SelfJoin AS (
SELECT t1.Object,t1.status1,t1.date AS StartDate,COALESCE (t2.date,'Dec 31, 2030') AS EndDate
FROM table1CTE t1
LEFT OUTER JOIN table1CTE t2
ON t1.Object = t2.Object
AND t1.RowNum +1 =t2.RowNum
),
table1Calendar AS (
SELECT t1.Object,t2.status1,t2.StartDate,c.calendar_date
FROM #Calendar c
CROSS JOIN (select distinct [Object] from #table1) t1
LEFT OUTER JOIN table1SelfJoin t2
ON t1.Object = t2.Object
AND c.calendar_date BETWEEN t2.StartDate AND t2.EndDate
),
table2CTE AS (
SELECT [Object], [status2], [date], row_number() over (partition by [Object] order by [date]) AS RowNum
FROM #table2
),
table2SelfJoin AS (
SELECT t1.Object,t1.status2,t1.date AS StartDate,COALESCE (t2.date,'Dec 31, 2030') AS EndDate
FROM table2CTE t1
LEFT OUTER JOIN table2CTE t2
ON t1.Object = t2.Object
AND t1.RowNum +1 =t2.RowNum
),
table2Calendar AS (
SELECT t1.Object,t2.status2,t2.StartDate,c.calendar_date
FROM #Calendar c
CROSS JOIN (select distinct [Object] from #table2) t1
LEFT OUTER JOIN table2SelfJoin t2
ON t1.Object = t2.Object
AND c.calendar_date BETWEEN t2.StartDate AND t2.EndDate
)
SELECT t1.Object,t1.status1,t2.status2,t1.calendar_date
FROM table1Calendar t1
INNER JOIN table2Calendar t2
ON t1.Object = t2.Object
AND t1.calendar_date = t2.calendar_date
WHERE t1.calendar_date in (t1.StartDate,t2.StartDate)
ORDER by t1.calendar_date


DRop table #Calendar
DRop table #table1
DRop table #table2

As you can see, as is it is FAR more convoluted than your solution.

But, not only should the calendar table be permanent and built in advance (with some other handy reference columns in case you want to use it elsewhere), but you could also pre-build a view for each of your status tables which essentially mirrors the calendar CTEs.

If you pre-do those steps for your status tables, you can merge any of them per your requirements with just the select at the end. And those views will also help other queries where you want to know what a status was on a specific date.

Post #1505342
Posted Wednesday, October 16, 2013 3:05 PM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Yesterday @ 8:06 PM
Points: 581, Visits: 2,711
tomek tomek (10/16/2013)
Well....

I kind of have a solution, but it's messy (especially that I acutally have more tables like these to merge, and obviuosly there come joins with other tables).
Maybe someone could suggest something better?
    ;WITH cte as(
select DISTINCT t.[Object], t.date FROM table1 t
union
select DISTINCT t.[Object], t.date FROM table2 t
)

select * FROM cte
outer APPLY (SELECT TOP 1 status1 FROM table1 tx
WHERE tx.date<=cte.date AND tx.[Object]=cte.[OBJECT]
ORDER BY tx.date DESC) t1
outer APPLY (SELECT TOP 1 status2 FROM table2 tx
WHERE tx.date<=cte.date and tx.[Object]=cte.[OBJECT]
ORDER BY tx.date DESC) t2



First, I think this is an excellent solution and, particularly, an excellent use of OUTER APPLY. Here are my suggestions:

1) You can lose the DISTINCT operators in your cte; it causes an extra distinct sort in your query plan which adds cost but does not change your result set. You are using the UNION set operator which will guarantee unique values.

2) I would avoid using reserved SQL words such as 'date' and 'object' in your column names.

3) If you can add an clustered index to that table you would see notable performance improvements, especially if you will be doing a lot of joins.



-- Alan Burstein



Read this article for best practices on asking questions.
Need to split a string? Try this (Jeff Moden)
Need a pattern-based string spitter? Try this (Dwain Camps)
My blog
Post #1505422
Posted Wednesday, October 16, 2013 7:22 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 3:25 AM
Points: 3,648, Visits: 5,328
I agree with Alan.B that you've got a pretty good way already figured out. I also agree with his comment about named columns, but I assume these are just for example.

There is another way that might be just a smidgen swifter, but you'd need to run a timing test to be sure. That is to use a Quirky Update.

SELECT [Object], [Date], status1, status2
INTO #Temp
FROM
(
SELECT [Object], status1=NULL, status2, [date]
FROM table2
UNION ALL
SELECT [Object], status1, NULL, [date]
FROM table1
) a;

DECLARE @status1 nvarchar(1000)
,@status2 nvarchar(1000);

ALTER TABLE #Temp ALTER COLUMN [Date] DATETIME NOT NULL;
ALTER TABLE #Temp ADD PRIMARY KEY([Date]);

UPDATE #Temp WITH(TABLOCKX)
SET @Status1 = Status1 = CASE WHEN Status1 IS NULL THEN @Status1 ELSE Status1 END
,@Status2 = Status2 = CASE WHEN Status2 IS NULL THEN @Status2 ELSE Status2 END
OPTION (MAXDOP 1);

SELECT * FROM #Temp ORDER BY [Object], [Date];

GO
DROP TABLE #Temp;


You should carefully read the section on the QU in this article: Solving the Running Total and Ordinal Rank Problems by Jeff Moden so that you understand the rules that apply when using the QU.

Initially I wrote that SQL 2012 might have a analytic function (LAG) that might make short work of this problem but I'm now not so sure, as an initial attempt didn't result in anything fruitful. There still could be a way I just don't have the time at the moment to play with it.



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1505460
Posted Thursday, October 17, 2013 2:01 AM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 3:25 AM
Points: 3,648, Visits: 5,328
I forgot to add that if there's a possibility that both status will change on the same day, you should replace the UNION ALL where the #Temp table is created with a FULL OUTER JOIN on date/object.

BTW. Excellent job of providing DDL/consumable sample data for a first time poster. Wish all understood the value of that.



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1505519
Posted Thursday, October 17, 2013 3:51 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, July 9, 2014 8:23 AM
Points: 4, Visits: 110
Hi All,

First of all - Big Thanks for all your input (and good words for newbie as well:).

@Nevyn - thanks for your solution, but as you yourself suggested - it's quite complicated and I also wonder about performance.
If possible I try to keep my code as simple as possible (unless performance requires to make it more complicated), as it is possible that some other people will work on that code and they need to understand fast.

@Alan.B. - I'm happy you liked my way - I'm starting to see it's not that bad at all:)
1. UNION - I feel so stupid - I always thought that union leaves the doubles in first query and removes those from the next.
I just checked and (of course you're right). thanks for that:)
2. those were just for example. But as I work in ETL and DWH it happens to me a lot to get the tables with SQL Words in names.
3. I have to check the performance with clustered index. But it won't be that much of a problem, as the table isn't really that big (so again I will take cleaner code if the performance isn't the issue).

@Dwain.C
1. quirky update - I've seen it before. I didn't know the name. it's a great article you've sent me. Especially why one should use maxdop(1) and tablockx.
2. full outer join vs union all - yep. I've just noticed it. It could also happen that the same status type could change more than one time during a day. But I gues I would have to filter those somehow first...
3. I know there's lag/lead in sql 2012. I'm doing it on SQL 2008r2 so no go.
4. DDL - it's just something I've seen in many post. Obviously if you want somebody's help, you need to make it as simple as possible.

btw. out of 4 forums I've sent the same request, this is were I got the best responses.
Thanks again.
Post #1505561
Posted Thursday, October 17, 2013 4:42 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, July 9, 2014 8:23 AM
Points: 4, Visits: 110
additional question:
The actual situation I have is like this:

StatusType STATUS     Object     date
---------- ---------- ---------- -----------------------
status1 Closed Obj1 2000-06-01 00:00:00.000
status1 Closed Obj1 2013-01-01 00:00:00.000
status1 Open Obj1 1998-01-01 00:00:00.000
status1 Open Obj1 2008-03-06 00:00:00.000
status2 ACTIVE Obj1 1999-01-01 00:00:00.000
status2 check Obj1 2014-05-01 00:00:00.000
status2 INACTIV Obj1 2000-08-29 00:00:00.000
status2 UNKNOWN Obj1 2004-05-06 00:00:00.000

... with more status types.
What I do now is I create separate CTE (or some kind of query/table) for every StatusType. I could of course pivot it as well.
But maybe there's better solution in this situation?

code for source table:
CREATE TABLE [dbo].[StatusTable](
[StatusType] [varchar](7) NOT NULL,
[STATUS] [varchar](10) NULL,
[Object] [varchar](10) NULL,
[date] [datetime] NULL
) ON [PRIMARY]

GO
INSERT [dbo].[StatusTable] ([StatusType], [STATUS], [Object], [date]) VALUES (N'status1', N'Closed', N'Obj1', CAST(0x00008F4400000000 AS DateTime))
INSERT [dbo].[StatusTable] ([StatusType], [STATUS], [Object], [date]) VALUES (N'status1', N'Closed', N'Obj1', CAST(0x0000A13900000000 AS DateTime))
INSERT [dbo].[StatusTable] ([StatusType], [STATUS], [Object], [date]) VALUES (N'status1', N'Open', N'Obj1', CAST(0x00008BD200000000 AS DateTime))
INSERT [dbo].[StatusTable] ([StatusType], [STATUS], [Object], [date]) VALUES (N'status1', N'Open', N'Obj1', CAST(0x00009A5700000000 AS DateTime))
INSERT [dbo].[StatusTable] ([StatusType], [STATUS], [Object], [date]) VALUES (N'status2', N'ACTIVE', N'Obj1', CAST(0x00008D3F00000000 AS DateTime))
INSERT [dbo].[StatusTable] ([StatusType], [STATUS], [Object], [date]) VALUES (N'status2', N'check', N'Obj1', CAST(0x0000A31E00000000 AS DateTime))
INSERT [dbo].[StatusTable] ([StatusType], [STATUS], [Object], [date]) VALUES (N'status2', N'INACTIV', N'Obj1', CAST(0x00008F9D00000000 AS DateTime))
INSERT [dbo].[StatusTable] ([StatusType], [STATUS], [Object], [date]) VALUES (N'status2', N'UNKNOWN', N'Obj1', CAST(0x000094DF00000000 AS DateTime))

Post #1505585
Posted Thursday, October 17, 2013 6:13 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 3:25 AM
Points: 3,648, Visits: 5,328
tomek tomek (10/17/2013)
Hi All,

First of all - Big Thanks for all your input (and good words for newbie as well:).

@Dwain.C
1. quirky update - I've seen it before. I didn't know the name. it's a great article you've sent me. Especially why one should use maxdop(1) and tablockx.


Long live the Quirky Update, long may it reign! To see its performance characteristic on another classic problem check here:Calculating Values within a Rolling Window in Transact SQL.

tomek tomek (10/17/2013)

2. full outer join vs union all - yep. I've just noticed it. It could also happen that the same status type could change more than one time during a day. But I gues I would have to filter those somehow first...

Because you're using a DATETIME, the FULL JOIN will still work for this case. You will simply get multiple records for the same day with different times in your output results.

tomek tomek (10/17/2013)

btw. out of 4 forums I've sent the same request, this is were I got the best responses.
Thanks again.


SSC rocks! Tell the world and drive them all to this site!

Now, as to your additional question in your post that follows this one. I don't understand the expected results you want. You mention PIVOT but I'm not sure what you want to PIVOT.



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1505952
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse