SQLServerCentral Article

Cleaning Up Garbage from Data

,

Introduction

In the world of software development you find lots of problems during software development and to overcome those problems we developers mostly depend on Google searches. But it is very true that all the time you cannot succeed with this strategy; sometimes you have to think about a new solution with a new mindset. I could not easily solve one problem and had to develop my own solution.

Cleaning up garbage from data is a big problem for a developer. It's a time intensive and tedious job. It also involves risk because there is a chance to lose your important data if you have not enough experience to handle this situation. So I want to share a painless and generic way to achieve this goal. Before I go further and describe the generic solution, let me explain the typical solution to achieve this goal.

Typical Solution

If the total number of tables is within 5 to 15 then there is no issue. You can simply delete / truncate the garbage data using simple SQL. Let me show you an example. I am assuming there is a database table for this example named DBTable1. Here is the generic SQL syntax of a DELETE command using the WHERE clause to delete data from table

DELETE FROM DBTable1 WHERE [Any Column Name] = [Some Value]

Here is generic SQL syntax of DELETE command to delete all data from table.

DELETE FROM DBTable1

Here is generic SQL syntax of TRUNCATE command to delete all data from table.

TRUNCATE TABLE DBTable1

Key points to remember:

** TRUNCATE TABLE permissions default to the table owner and are not transferable.

Problem Secnerio

Imagine if you working on a project which has lots of database tables, i.e. 50, 100, 200 or more than that. In my scenario I can imagine 1800 tables in an ERP solution. Now the question that arises here: How you identify the tables from those you want to perform the cleanup operation?

Let me point out the pain points are

  • You have to check tables that are related to another tables with a foreign key relationship.
  • You have to analysis the database diagram to know the relationship.
  • You have to prepare a final list of those tables from which you want to perform the cleanup operation.
  • You have to take the decision on which table you want to run delete / truncate statement

It is a tough job now based on above mentioned typical solution, isn't it? To overcome this problem I found the bellow solution which is very useful for me. That's why I want to share my code here. Check it out.

Generic Solution to Cleaning up garbage from data

My goal is to achieve the above mentioned tedious & painful job in a simplified way. To achieve my goal you have to create stored procedure below (stp_CleanGarbageData) and then you have to create the function GetTableName. Let me explain how you execute the code

Example: Let's assume you have a database with 500 tables. Now you want to remove garbage data from 496 tables and keep the data in [500-496 = 4] tables. So you have to put 4 tables name in any order as comma separated value in the SQL statement below. Please keep in mind you only have to provide those table names from which you don't want to remove data.

SQL Syntax:

EXEC StoredProcedureName 'Table name as comma separated value within String Parameter'

So as per example SQL syntax will be

EXEC stp_CleanGarbageData 'Table_1, Table_2, Table_3, Table_n'

After executing the SQL syntax as I mentioned you get a list of table names in Query Analyzer. It shows on how many tables this query performs a delete operation and truncate operation respectively. And you don't need to worry about the pain points now.

** Please don't forget to replace Table_1 & Table_2, Table_3 ... Table_n with original table Name, Only those table name from which you don't want to remove the data.

Code Snippet: Stored Procedure


SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[stp_CleanGarbageData] @TableNames varchar(8000) AS

BEGIN DECLARE @DynamicString varchar(8000) EXEC sp_MSForEachTable '

IF OBJECTPROPERTY(object_id(''?''), ''TableHasForeignRef'') = 1
BEGIN
PRINT ''--DELETED TABLE NAME--''
PRINT ''----------------------''
PRINT ''?''
END
ELSE
BEGIN
PRINT ''--TRUNCATED TABLE NAME--''
PRINT ''----------------------''
PRINT ''?''
END
'

-- disable referential integrity
EXEC sp_MSForEachTable 'ALTER TABLE ? NOCHECK CONSTRAINT ALL'

Set @DynamicString = '

IF (object_id(''?'') not in (select object_id(Id) from dbo.GetTableName(''' + @TableNames + ''')))
BEGIN
IF OBJECTPROPERTY(object_id(''?''), ''TableHasForeignRef'') = 1
BEGIN
DELETE FROM ?
END
ELSE
BEGIN
TRUNCATE TABLE ?
END

END
'

EXEC sp_MSForEachTable @DynamicString -- enable referential integrity again

EXEC sp_MSForEachTable 'ALTER TABLE ? CHECK CONSTRAINT ALL' END GO

Code Snippet:Function


CREATE FUNCTION dbo.GetTableName
(
@Ids varchar(5000)
)
RETURNS @TempTable TABLE (Id varchar(8000))
AS
BEGIN IF(@Ids is not null)
begin
DECLARE @Id varchar(5000), @Pos int
SET @Ids = LTRIM(RTRIM(@Ids))+ ','
SET @Pos = CHARINDEX(',', @Ids, 1)
IF REPLACE(@Ids, ',', '') <> ''
BEGIN
WHILE @Pos > 0
BEGIN
SET @Id = LTRIM(RTRIM(LEFT(@Ids, @Pos - 1)))
IF @Id <> ''
BEGIN
INSERT INTO @TempTable (Id) VALUES (CAST(@Id AS varchar(8000))) --Use Appropriate conversion
END
SET @Ids = RIGHT(@Ids, LEN(@Ids) - @Pos)
SET @Pos = CHARINDEX(',', @Ids, 1)
END
END
end
else
INSERT INTO @TempTable (Id) VALUES (null)
RETURN
END

Explanation

Let me explain my code snippet of Stored Procedure and Function. So I start with stp_CleanGarbageData stored procedure, within this procedure I need to check referential integrity for all tables in a database.

SQL Syntax to check referential integrity

IF OBJECTPROPERTY(object_id(''?''), ''TableHasForeignRef'') = 1

To perform the same actions for all tables in a database, you can create cursor or you can use the sp_MSforeachtable undocumented stored procedure to accomplish the same goal with less work.

For example, you can use the sp_MSforeachtable stored procedure to rebuild all indexes in a database:

EXEC sp_MSforeachtable @command1="print '?' DBCC DBREINDEX ('?')"

I use sp_MSforeachtable to check referential integrity for all tables in a database and prepare a list of that using bellow syntax

 EXEC sp_MSForEachTable '
IF OBJECTPROPERTY(object_id(''?''), ''TableHasForeignRef'') = 1
BEGIN
PRINT ''--DELETED TABLE NAME--''
PRINT ''----------------------''
PRINT ''?''
END
ELSE
BEGIN
PRINT ''--TRUNCATED TABLE NAME--''
PRINT ''----------------------''
PRINT ''?''
END
'

to confirm the table names in which the Delete or Truncate operation done. In my case I want to know the list of table names on which Delete or Truncate operation performed.

Then I disable referential integrity checking for all tables in the database. See the bellow syntax

EXEC sp_MSForEachTable 'ALTER TABLE ? NOCHECK CONSTRAINT ALL'

Then I build a dynamic string to perform the DELETE or TRUNCATE actions respectively for all tables in a database based on referential integrity, excluding those tables from which I don't want to delete data. To build this dynamic string I called GetTableName function, it takes comma separated Table Names (** those table names, which I already passed as a parameter of stp_CleanGarbageData stored procedure) as an input parameter and returns respective table name.

 Set @DynamicString = '
IF (object_id(''?'') not in (select object_id(Id) from dbo.GetTableName(''' + @TableNames + ''')))
BEGIN

IF OBJECTPROPERTY(object_id(''?''), ''TableHasForeignRef'') = 1
BEGIN
DELETE FROM ?
END

ELSE
BEGIN
TRUNCATE TABLE ?
END

END

'
EXEC sp_MSForEachTable @DynamicString

To complete DELETE or TRUNCATE operation after checking 'TableHasForeignRef' I use bellow syntax.

IF OBJECTPROPERTY(object_id(''?''), ''TableHasForeignRef'') = 1
BEGIN
DELETE FROM ?
END

ELSE
BEGIN
TRUNCATE TABLE ?
END

I want to DELETE all data from a table without any condition. See bellow syntax

 DELETE FROM ?

If you want to DELETE data based on condition then please modify bellow syntax based your requirement.

IF OBJECTPROPERTY(object_id(''?''), ''TableHasForeignRef'') = 1
BEGIN
DELETE FROM ? -- Write your own custom delete logic here.
END

** ' ? ' This question mark actually specifies the table name dynamically to complete the DELETE or TRUNCATE operation.

After all, I use sp_MSForEachTable to enable referential integrity checking again and complete my work.

EXEC sp_MSForEachTable 'ALTER TABLE ? CHECK CONSTRAINT ALL'

Conclusion

My approach towards the Cleaning up garbage from data I found it is very useful, reusable and painless solution. Because it will help you a lot to overcome above described problem. You also noticed that I use powerful features of SQL Server with my custom logic to overcome the problem. Hope you like the solution.

🙂 Happy Coding

Rate

2.11 (54)

You rated this post out of 5. Change rating

Share

Share

Rate

2.11 (54)

You rated this post out of 5. Change rating