Compare and Delete rows from 2 different tables in 2 different databases - same instance...

  • Title kind of says it all. I have a SQL 2005 Standard Edition, default instance where one of my PROD databases is running out of space. So - one of the bigger, less frequently used tables has been created in an ARCH db in the same instance (residing on a different drive). I copied the original set of rows from the PROD table to this ARCH table, but there are new records that come in to the PROD table on a semi-frequent basis.

    I need to construct some sort of logic with T-SQL to compare and see what is already in the ARCH table, and delete those records from the PROD table, while leaving the rows that are not in the ARCH table yet alone for later exporting...

    Any help with a PROC or Ad-Hoc way of doing this would be very much appreciated.

    SQL_ME_RICH

    P.S. SSIS is not an option. 🙁

  • Are the table structures the same?

    To be sure, you want to delete rows from PROD where they already exist in ARCH, correct?

    Also, do both tables have a PK defined?

  • Hi Lynn - thank you so much for your fast reply!

    No - the tables are different in that the one in PROD has a PKey/Clustered Index on the leading column, and the one in the ARCH db is just a heap. Here is the table in the ARCH - same as PROD less the PKey/CI...

    USE [yQueue_Archive]

    GO

    /****** Object: Table [dbo].[yConnections] Script Date: 08/14/2012 11:18:19 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    SET ANSI_PADDING ON

    GO

    CREATE TABLE [dbo].[yConnections](

    [ConnectionID] [varchar](30) NOT NULL,

    [GUID] [uniqueidentifier] NULL,

    127.0.0.1 [varchar](50) NULL,

    [Interface] [varchar](20) NULL,

    [Carrier] [varchar](20) NULL,

    [Connects] [int] NULL,

    [Disconnects] [int] NULL,

    [KeepAlives] [int] NULL,

    [TrxIn] [int] NULL,

    [TrxOut] [int] NULL,

    [TrxOutSMS] [int] NULL,

    [TrxErrors] [int] NULL,

    [TrxExceptions] [int] NULL,

    [BytesIn] [bigint] NULL,

    [BytesOut] [bigint] NULL,

    [StampStart] [smalldatetime] NULL,

    [StampLast] [smalldatetime] NULL,

    [Serial] [varchar](20) NULL,

    [LastStatus] [varchar](20) NULL,

    [MaxQueueTime] [smallint] NULL,

    [AvgQueueTime] [smallint] NULL,

    [HealthIndex] [smallint] NULL

    ) ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF

    I've tried a few Ad-Hoc attempts to see if I could even just get a result set, but am running into problems getting them to identify the 2 databases (probably just rookie dev mistake stuff, but here is what I have been trying to use to find records which exist in PROD table but not in ARCH table):

    SELECT * FROM t1 WHERE NOT EXISTS (SELECT * FROM t2 WHERE t2.Id = t1.Id)

    or

    SELECT * FROM t1 LEFT OUTER JOIN T2 on t1.Id = t2.Id WHERE t2.Id IS NULL

    Really the idea here is the create this process into a PROC that I can run in the evenings, and then an export via the Imp/Exp Wiz to get the records over into ARCH. Again - no SSIS (don't get me started on that one).

    Thanks!

  • Something like this should work, but it may need some tweaking depending on how many rows will be deleted, especially the first time.

    DELETE FROM t1

    FROM dbo.MyTable t1

    WHERE EXISTS(SELECT 1 FROM ARCH.dbo.MyTable t2 WHERE t2.PK_Column = t1.PK_Column);

  • Lynn - thank you again! So - based on what you have constructed, it looks like if my PROD db is a db called 'AutoDispatch' and my ARCH db is a db called 'yQueue_Archive', then the T-SQL would be this?

    DELETE FROM AutoDispatch

    FROM dbo.yConnections t1

    WHERE EXISTS(SELECT 1 FROM yQueue_Archive.dbo.yConnections t2 WHERE t2.ConnectionID = t1.ConnectionID);

    There are over 140 million rows in both tables right now for this first run. What kinds of tweaks (provided I have this right to begin with) should I consider?

    Thank you again!

  • Can you explain why you didn't just create a new filegroup with with the data file for that filegroup on the drive where you have the ARCH database and move the table to that filegroup so that you don't have these issues?

    Or just create a synonym (or view) for that table in the PROD database that points to the table in the ARCH database so that any inserts into the PROD table actually just go into the ARCH database.

  • You can also do some reading on EXCEPT and INTERSECT in Books Online. These could be used to identify rows that exist in one table and not another or rows that are common between the two tables.

  • SQL_ME_RICH (8/15/2012)


    Lynn - thank you again! So - based on what you have constructed, it looks like if my PROD db is a db called 'AutoDispatch' and my ARCH db is a db called 'yQueue_Archive', then the T-SQL would be this?

    DELETE FROM AutoDispatch

    FROM dbo.yConnections t1

    WHERE EXISTS(SELECT 1 FROM yQueue_Archive.dbo.yConnections t2 WHERE t2.ConnectionID = t1.ConnectionID);

    There are over 140 million rows in both tables right now for this first run. What kinds of tweaks (provided I have this right to begin with) should I consider?

    Thank you again!

    No. This is what your delete would look like.

    USE AutoDispatch;

    go

    DELETE FROM t1

    FROM dbo.yConnections t1

    WHERE EXISTS(SELECT 1 FROM yQueue_Archive.dbo.yConnections t2 WHERE t2.ConnectionID = t1.ConnectionID);

  • Michael - great question, and not one that I hadn't thought of, but the place I am working at is not looking at best practices because in all honesty - they do not much care about this environment right now. I am trying to keep it limping along until they migrate it over fully to MySQL (another story for another tim), but best I can tell you is that I am given directives that are akin to putting a bandaid on a cut-off arm.

  • Lynn Pettis (8/15/2012)


    SQL_ME_RICH (8/15/2012)


    Lynn - thank you again! So - based on what you have constructed, it looks like if my PROD db is a db called 'AutoDispatch' and my ARCH db is a db called 'yQueue_Archive', then the T-SQL would be this?

    DELETE FROM AutoDispatch

    FROM dbo.yConnections t1

    WHERE EXISTS(SELECT 1 FROM yQueue_Archive.dbo.yConnections t2 WHERE t2.ConnectionID = t1.ConnectionID);

    There are over 140 million rows in both tables right now for this first run. What kinds of tweaks (provided I have this right to begin with) should I consider?

    Thank you again!

    No. This is what your delete would look like.

    USE AutoDispatch;

    go

    DELETE FROM t1

    FROM dbo.yConnections t1

    WHERE EXISTS(SELECT 1 FROM yQueue_Archive.dbo.yConnections t2 WHERE t2.ConnectionID = t1.ConnectionID);

    Perfect Lynn! Thank you, and thank you for the further tips on the EXCEPT and INTERSECT clauses!

    🙂

  • As for the tweaks:

    USE AutoDispatch;

    go

    declare @BatchSize int;

    set @BatchSize = 10000;

    WHILE @BatchSize > 0

    BEGIN

    DELETE TOP (@BatchSize)

    FROM t1

    FROM dbo.yConnections t1

    WHERE EXISTS(SELECT 1 FROM yQueue_Archive.dbo.yConnections t2 WHERE t2.ConnectionID = t1.ConnectionID);

    set @BatchSize = @@ROWCOUNT;

    END

    And that is at a minimum. Please read the following article for more information:

    http://www.sqlservercentral.com/articles/T-SQL/67898/

  • I was just going to ask about tweaks (as in this current form it is really slow and the database is in Full Recovery mode, so I do not want to blow the log file up and out of the water). Was going to ask about TRUNCATE instead of DELETE, but I will take your queue here, and see what else I can come up with.

    Thank you again, Lynn!

    P.S. I did bump the previous thread from yesterday, but ever after that - no response (which was the only reason why I started this new one).

  • SQL_ME_RICH (8/15/2012)


    I was just going to ask about tweaks (as in this current form it is really slow and the database is in Full Recovery mode, so I do not want to blow the log file up and out of the water). Was going to ask about TRUNCATE instead of DELETE, but I will take your queue here, and see what else I can come up with.

    Thank you again, Lynn!

    P.S. I did bump the previous thread from yesterday, but ever after that - no response (which was the only reason why I started this new one).

    Remember, we are all volunteers and we all have lives outside of here and work. Just as I said, I was leaving work yesterday when I saw it, couldn't do anything as I had to leave. Just forgot about afterward. A bump today would have worked just as well as a new thread.

  • Lynn - Would I need to modify the WHERE clause in the sub-query if I wanted to do this in batches? So it's not trying to eat the elephant all at once?

    I tried running this last night, but it ran for over 17 hours and still did not complete.

    🙁

  • SQL_ME_RICH (8/16/2012)


    Lynn - Would I need to modify the WHERE clause in the sub-query if I wanted to do this in batches? So it's not trying to eat the elephant all at once?

    I tried running this last night, but it ran for over 17 hours and still did not complete.

    🙁

    The subquery already has a WHERE clause. What you need, most likely, is an index on the t2.PK_Column column on the table ARCH.dbo.MyTable.

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply