Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

INSERT with ordered select Expand / Collapse
Author
Message
Posted Wednesday, August 15, 2012 6:21 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, August 27, 2014 7:16 AM
Points: 108, Visits: 87
I am trying to optimise a script for performing DELETES in a DW staging table. DELETES are not logged on the source system, only an extract of all current rows is possible and then deleting on the staging table using a ..WHERE NOT EXISTS () query. I am performing the extract through a linked server.

Currently this is done by the following query.
-- Drop table if it should somehow exist at run-time (requirement for using SELECT INTO that the table does not exist)
IF EXISTS (
SELECT * from tempdb.dbo.sysobjects o
WHERE o.xtype in ('U') AND o.id = object_id(N'tempdb..#XALdboStocktransRecIDtable')
) DROP TABLE #XALdboStocktransRecIDtable;

-- Select all DATASET, ROWNUMBER into the temporary table (at tempdb)
SELECT [DATASET], [ROWNUMBER]
INTO #XALdboStocktransRecIDtable
FROM [SQLXAL].[XALDRIFT].[dbo].[STOCKTRANS]
WHERE [DATASET] IN ('FPC','MAL');

-- Delete all rows in the VALID instance where DATASET, ROWNUMBER does no longer exist on the source (SQLXAL)
DELETE A
FROM [TX_Stage].[dbo].[XAL_dbo_STOCKTRANS_V] A
WHERE NOT EXISTS (
SELECT *
FROM #XALdboStocktransRecIDtable B
WHERE [A].[DATASET]=[B].[DATASET] AND [A].[ROWNUMBER]=[B].[ROWNUMBER]
);

-- Drop the table to clean up the tempdb
IF EXISTS (
SELECT * from tempdb.dbo.sysobjects o
WHERE o.xtype in ('U') AND o.id = object_id(N'tempdb..#XALdboStocktransRecIDtable')
) DROP TABLE #XALdboStocktransRecIDtable;

Now, this seems to perform better if there is a clustered index on the temporary table #XALdboStocktransRecIDtable. However, approximately the same time is being consumed inserting into the table with the index as is saved by creating the table with index as in the below script.
-- Drop table if it should somehow exist at run-time 
IF EXISTS (
SELECT * from tempdb.dbo.sysobjects o
WHERE o.xtype in ('U') AND o.id = object_id(N'tempdb..#TestTable')
) DROP TABLE #TestTable;

CREATE TABLE #TestTable(
[DATASET] [varchar](3) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[ROWNUMBER] [int] NOT NULL
)

CREATE CLUSTERED INDEX [PK] ON #TestTable
( [DATASET] ASC,
[ROWNUMBER] ASC
)

INSERT INTO #TestTable ([DATASET],[ROWNUMBER])
SELECT [DATASET], [ROWNUMBER]
FROM [SQLXAL].[XALDRIFT].[dbo].[STOCKTRANS]
WHERE [DATASET] IN ('FPC','MAL')
ORDER BY [DATASET] ASC ,[ROWNUMBER] ASC

-- Delete all rows in the VALID instance where DATASET, ROWNUMBER does no longer exist on the source (SQLXAL)
DELETE A
FROM TX_Stage.dbo.XAL_dbo_STOCKTRANS_V A
WHERE NOT EXISTS (
SELECT *
FROM #TestTable B
WHERE [A].[DATASET]=[B].[DATASET] AND [A].[ROWNUMBER]=[B].[ROWNUMBER]
);

-- Drop the table to clean up the tempdb
IF EXISTS (
SELECT * from tempdb.dbo.sysobjects o
WHERE o.xtype in ('U') AND o.id = object_id(N'tempdb..#TestTable')
) DROP TABLE #TestTable;

The source system table is indexed on DATASET, ROWNUMBER so no additional strain is placed on that system by the "ORDER BY" clause. The staging table has a non-clustered index on DATASET, ROWNUMBER - the clustered index on this table is DW_Id.

INSERT INTO the temporary table defined with the clustered index is roughly one third more costly (time) than the SELECT INTO which is currently used, however the DELETES performed on the staging table is lower by roughly the same amount of time.

I read about the possibility of specifying "ORDER" clause for the source data on a BULK INSERT to tell the SQL Server hosting the staging table that the input is coming in already ordered by the clustered key, but for the life of me, I cannot figure out the correct syntax for it. My hope is to mix the best of both worlds with a lowest time cost insert and lowest time cost DELETES.

I hope someone is able to help with this :)

[edit]
I tried doing the DELETE statement with the source linked server table as test, but that turns out to take much longer than either of the two methods described above.
Post #1345228
Posted Wednesday, August 15, 2012 6:37 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, August 21, 2014 1:54 PM
Points: 1,432, Visits: 3,229
try inserting into the temp table without an index (or just use select into ) and without the order by, then create the clustered index after the insert.

Also, create the index as a unique clustered index if rownumber-dataset are in fact all unique.




The probability of survival is inversely proportional to the angle of arrival.
Post #1345233
Posted Wednesday, August 15, 2012 7:33 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, August 27, 2014 7:16 AM
Points: 108, Visits: 87
Thank you for the tips.

Specifying the index as unique did shave off time in the overall processing from what I had tested previously.

But creating the index after the insert adds time to the overall process (both for the "SELECT INTO" and the "CREATE TABLE then INSERT INTO" methods).
Post #1345268
Posted Wednesday, August 15, 2012 7:55 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, August 21, 2014 1:54 PM
Points: 1,432, Visits: 3,229
rrn 62873 (8/15/2012)
Thank you for the tips.

Specifying the index as unique did shave off time in the overall processing from what I had tested previously.

But creating the index after the insert adds time to the overall process (both for the "SELECT INTO" and the "CREATE TABLE then INSERT INTO" methods).


Well it was worth a try, but it does depend upon how much performance your tempdb is configured for.

You might even experiment with a permanent staging table, then create the index using SORT_IN_TEMPDB option. Again, any possible advantage depends upon how your disks are set up and how the database is configured.




The probability of survival is inversely proportional to the angle of arrival.
Post #1345278
Posted Wednesday, August 15, 2012 8:30 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Friday, July 18, 2014 9:09 AM
Points: 870, Visits: 2,385
Have you tried turning the problem around and doing a select of the Dataset & RowNumbers that are missing between the Staging and source tables

As this should then produce the dataset that you want to delete

something like this (Needs to be checked that you are getting the rows that are deleted back).

SELECT [DATASET], [ROWNUMBER]
INTO #TestTable
FROM TX_Stage.dbo.XAL_dbo_STOCKTRANS A
Where NOT EXISTS (Select 1 From [SQLXAL].[XALDRIFT].[dbo].[STOCKTRANS] B
Where [A].[DATASET]=[B].[DATASET] AND [A].[ROWNUMBER]=[B].[ROWNUMBER])


DELETE B
FROM TX_Stage.dbo.XAL_dbo_STOCKTRANS_V B
JOIN #TestTable A on [A].[DATASET]=[B].[DATASET]
AND [A].[ROWNUMBER]=[B].[ROWNUMBER]



_________________________________________________________________________
SSC Guide to Posting and Best Practices
Post #1345299
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse