Compare and Delete rows from 2 different tables in 2 different databases - same instance...

Question

Post reply

Compare and Delete rows from 2 different tables in 2 different databases - same instance...

SQL_ME_RICH

SSChampion

Points: 11200
More actions
August 15, 2012 at 3:02 pm

#279834

Title kind of says it all. I have a SQL 2005 Standard Edition, default instance where one of my PROD databases is running out of space. So - one of the bigger, less frequently used tables has been created in an ARCH db in the same instance (residing on a different drive). I copied the original set of rows from the PROD table to this ARCH table, but there are new records that come in to the PROD table on a semi-frequent basis.
I need to construct some sort of logic with T-SQL to compare and see what is already in the ARCH table, and delete those records from the PROD table, while leaving the rows that are not in the ARCH table yet alone for later exporting...
Any help with a PROC or Ad-Hoc way of doing this would be very much appreciated.
SQL_ME_RICH
P.S. SSIS is not an option. 🙁

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

Are the table structures the same?

To be sure, you want to delete rows from PROD where they already exist in ARCH, correct?

Also, do both tables have a PK defined?

SQL_ME_RICH SSChampion Points: 11200 More actions · Answer 2

Hi Lynn - thank you so much for your fast reply!

No - the tables are different in that the one in PROD has a PKey/Clustered Index on the leading column, and the one in the ARCH db is just a heap. Here is the table in the ARCH - same as PROD less the PKey/CI...

USE [yQueue_Archive]

GO

/****** Object: Table [dbo].[yConnections] Script Date: 08/14/2012 11:18:19 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

SET ANSI_PADDING ON

GO

CREATE TABLE [dbo].[yConnections](

[ConnectionID] [varchar](30) NOT NULL,

[GUID] [uniqueidentifier] NULL,

127.0.0.1 [varchar](50) NULL,

[Interface] [varchar](20) NULL,

[Carrier] [varchar](20) NULL,

[Connects] [int] NULL,

[Disconnects] [int] NULL,

[KeepAlives] [int] NULL,

[TrxIn] [int] NULL,

[TrxOut] [int] NULL,

[TrxOutSMS] [int] NULL,

[TrxErrors] [int] NULL,

[TrxExceptions] [int] NULL,

[BytesIn] [bigint] NULL,

[BytesOut] [bigint] NULL,

[StampStart] [smalldatetime] NULL,

[StampLast] [smalldatetime] NULL,

[Serial] [varchar](20) NULL,

[LastStatus] [varchar](20) NULL,

[MaxQueueTime] [smallint] NULL,

[AvgQueueTime] [smallint] NULL,

[HealthIndex] [smallint] NULL

) ON [PRIMARY]

GO

SET ANSI_PADDING OFF

I've tried a few Ad-Hoc attempts to see if I could even just get a result set, but am running into problems getting them to identify the 2 databases (probably just rookie dev mistake stuff, but here is what I have been trying to use to find records which exist in PROD table but not in ARCH table):

SELECT * FROM t1 WHERE NOT EXISTS (SELECT * FROM t2 WHERE t2.Id = t1.Id)

or

SELECT * FROM t1 LEFT OUTER JOIN T2 on t1.Id = t2.Id WHERE t2.Id IS NULL

Really the idea here is the create this process into a PROC that I can run in the evenings, and then an export via the Imp/Exp Wiz to get the records over into ARCH. Again - no SSIS (don't get me started on that one).

Thanks!

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 3

Something like this should work, but it may need some tweaking depending on how many rows will be deleted, especially the first time.

DELETE FROM t1

FROM dbo.MyTable t1

WHERE EXISTS(SELECT 1 FROM ARCH.dbo.MyTable t2 WHERE t2.PK_Column = t1.PK_Column);

SQL_ME_RICH SSChampion Points: 11200 More actions · Answer 4

Lynn - thank you again! So - based on what you have constructed, it looks like if my PROD db is a db called 'AutoDispatch' and my ARCH db is a db called 'yQueue_Archive', then the T-SQL would be this?

DELETE FROM AutoDispatch

FROM dbo.yConnections t1

WHERE EXISTS(SELECT 1 FROM yQueue_Archive.dbo.yConnections t2 WHERE t2.ConnectionID = t1.ConnectionID);

There are over 140 million rows in both tables right now for this first run. What kinds of tweaks (provided I have this right to begin with) should I consider?

Thank you again!

Michael Valentine Jones SSC Guru Points: 64818 More actions · Answer 5

Can you explain why you didn't just create a new filegroup with with the data file for that filegroup on the drive where you have the ARCH database and move the table to that filegroup so that you don't have these issues?

Or just create a synonym (or view) for that table in the PROD database that points to the table in the ARCH database so that any inserts into the PROD table actually just go into the ARCH database.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 6

You can also do some reading on EXCEPT and INTERSECT in Books Online. These could be used to identify rows that exist in one table and not another or rows that are common between the two tables.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 7

SQL_ME_RICH (8/15/2012)
Lynn - thank you again! So - based on what you have constructed, it looks like if my PROD db is a db called 'AutoDispatch' and my ARCH db is a db called 'yQueue_Archive', then the T-SQL would be this?
DELETE FROM AutoDispatch
FROM dbo.yConnections t1
WHERE EXISTS(SELECT 1 FROM yQueue_Archive.dbo.yConnections t2 WHERE t2.ConnectionID = t1.ConnectionID);
There are over 140 million rows in both tables right now for this first run. What kinds of tweaks (provided I have this right to begin with) should I consider?
Thank you again!

No. This is what your delete would look like.

USE AutoDispatch;

go

DELETE FROM t1

FROM dbo.yConnections t1

WHERE EXISTS(SELECT 1 FROM yQueue_Archive.dbo.yConnections t2 WHERE t2.ConnectionID = t1.ConnectionID);

SQL_ME_RICH SSChampion Points: 11200 More actions · Answer 8

Michael - great question, and not one that I hadn't thought of, but the place I am working at is not looking at best practices because in all honesty - they do not much care about this environment right now. I am trying to keep it limping along until they migrate it over fully to MySQL (another story for another tim), but best I can tell you is that I am given directives that are akin to putting a bandaid on a cut-off arm.

SQL_ME_RICH SSChampion Points: 11200 More actions · Answer 9

Lynn Pettis (8/15/2012)
SQL_ME_RICH (8/15/2012)
Lynn - thank you again! So - based on what you have constructed, it looks like if my PROD db is a db called 'AutoDispatch' and my ARCH db is a db called 'yQueue_Archive', then the T-SQL would be this?
DELETE FROM AutoDispatch
FROM dbo.yConnections t1
WHERE EXISTS(SELECT 1 FROM yQueue_Archive.dbo.yConnections t2 WHERE t2.ConnectionID = t1.ConnectionID);
There are over 140 million rows in both tables right now for this first run. What kinds of tweaks (provided I have this right to begin with) should I consider?
Thank you again!
No. This is what your delete would look like.
USE AutoDispatch;
go
DELETE FROM t1
FROM dbo.yConnections t1
WHERE EXISTS(SELECT 1 FROM yQueue_Archive.dbo.yConnections t2 WHERE t2.ConnectionID = t1.ConnectionID);

Perfect Lynn! Thank you, and thank you for the further tips on the EXCEPT and INTERSECT clauses!

🙂

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 10

As for the tweaks:

USE AutoDispatch;

go

declare @BatchSize int;

set @BatchSize = 10000;

WHILE @BatchSize > 0

BEGIN

DELETE TOP (@BatchSize)

FROM t1

FROM dbo.yConnections t1

WHERE EXISTS(SELECT 1 FROM yQueue_Archive.dbo.yConnections t2 WHERE t2.ConnectionID = t1.ConnectionID);

set @BatchSize = @@ROWCOUNT;

END

And that is at a minimum. Please read the following article for more information:

http://www.sqlservercentral.com/articles/T-SQL/67898/

SQL_ME_RICH SSChampion Points: 11200 More actions · Answer 11

I was just going to ask about tweaks (as in this current form it is really slow and the database is in Full Recovery mode, so I do not want to blow the log file up and out of the water). Was going to ask about TRUNCATE instead of DELETE, but I will take your queue here, and see what else I can come up with.

Thank you again, Lynn!

P.S. I did bump the previous thread from yesterday, but ever after that - no response (which was the only reason why I started this new one).

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 12

SQL_ME_RICH (8/15/2012)
I was just going to ask about tweaks (as in this current form it is really slow and the database is in Full Recovery mode, so I do not want to blow the log file up and out of the water). Was going to ask about TRUNCATE instead of DELETE, but I will take your queue here, and see what else I can come up with.
Thank you again, Lynn!
P.S. I did bump the previous thread from yesterday, but ever after that - no response (which was the only reason why I started this new one).

Remember, we are all volunteers and we all have lives outside of here and work. Just as I said, I was leaving work yesterday when I saw it, couldn't do anything as I had to leave. Just forgot about afterward. A bump today would have worked just as well as a new thread.

SQL_ME_RICH SSChampion Points: 11200 More actions · Answer 13

Lynn - Would I need to modify the WHERE clause in the sub-query if I wanted to do this in batches? So it's not trying to eat the elephant all at once?

I tried running this last night, but it ran for over 17 hours and still did not complete.

🙁

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 14

SQL_ME_RICH (8/16/2012)
Lynn - Would I need to modify the WHERE clause in the sub-query if I wanted to do this in batches? So it's not trying to eat the elephant all at once?
I tried running this last night, but it ran for over 17 hours and still did not complete.
🙁

The subquery already has a WHERE clause. What you need, most likely, is an index on the t2.PK_Column column on the table ARCH.dbo.MyTable.