INSERT with ordered select

  • I am trying to optimise a script for performing DELETES in a DW staging table. DELETES are not logged on the source system, only an extract of all current rows is possible and then deleting on the staging table using a ..WHERE NOT EXISTS () query. I am performing the extract through a linked server.

    Currently this is done by the following query.

    -- Drop table if it should somehow exist at run-time (requirement for using SELECT INTO that the table does not exist)

    IF EXISTS (

    SELECT * from tempdb.dbo.sysobjects o

    WHERE o.xtype in ('U') AND o.id = object_id(N'tempdb..#XALdboStocktransRecIDtable')

    ) DROP TABLE #XALdboStocktransRecIDtable;

    -- Select all DATASET, ROWNUMBER into the temporary table (at tempdb)

    SELECT [DATASET], [ROWNUMBER]

    INTO #XALdboStocktransRecIDtable

    FROM [SQLXAL].[XALDRIFT].[dbo].[STOCKTRANS]

    WHERE [DATASET] IN ('FPC','MAL');

    -- Delete all rows in the VALID instance where DATASET, ROWNUMBER does no longer exist on the source (SQLXAL)

    DELETE A

    FROM [TX_Stage].[dbo].[XAL_dbo_STOCKTRANS_V] A

    WHERE NOT EXISTS (

    SELECT *

    FROM #XALdboStocktransRecIDtable B

    WHERE [A].[DATASET]=.[DATASET] AND [A].[ROWNUMBER]=.[ROWNUMBER]

    );

    -- Drop the table to clean up the tempdb

    IF EXISTS (

    SELECT * from tempdb.dbo.sysobjects o

    WHERE o.xtype in ('U') AND o.id = object_id(N'tempdb..#XALdboStocktransRecIDtable')

    ) DROP TABLE #XALdboStocktransRecIDtable;

    Now, this seems to perform better if there is a clustered index on the temporary table #XALdboStocktransRecIDtable. However, approximately the same time is being consumed inserting into the table with the index as is saved by creating the table with index as in the below script.

    -- Drop table if it should somehow exist at run-time

    IF EXISTS (

    SELECT * from tempdb.dbo.sysobjects o

    WHERE o.xtype in ('U') AND o.id = object_id(N'tempdb..#TestTable')

    ) DROP TABLE #TestTable;

    CREATE TABLE #TestTable(

    [DATASET] [varchar](3) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

    [ROWNUMBER] [int] NOT NULL

    )

    CREATE CLUSTERED INDEX [PK] ON #TestTable

    ([DATASET] ASC,

    [ROWNUMBER] ASC

    )

    INSERT INTO #TestTable ([DATASET],[ROWNUMBER])

    SELECT [DATASET], [ROWNUMBER]

    FROM [SQLXAL].[XALDRIFT].[dbo].[STOCKTRANS]

    WHERE [DATASET] IN ('FPC','MAL')

    ORDER BY [DATASET] ASC ,[ROWNUMBER] ASC

    -- Delete all rows in the VALID instance where DATASET, ROWNUMBER does no longer exist on the source (SQLXAL)

    DELETE A

    FROM TX_Stage.dbo.XAL_dbo_STOCKTRANS_V A

    WHERE NOT EXISTS (

    SELECT *

    FROM #TestTable B

    WHERE [A].[DATASET]=.[DATASET] AND [A].[ROWNUMBER]=.[ROWNUMBER]

    );

    -- Drop the table to clean up the tempdb

    IF EXISTS (

    SELECT * from tempdb.dbo.sysobjects o

    WHERE o.xtype in ('U') AND o.id = object_id(N'tempdb..#TestTable')

    ) DROP TABLE #TestTable;

    The source system table is indexed on DATASET, ROWNUMBER so no additional strain is placed on that system by the "ORDER BY" clause. The staging table has a non-clustered index on DATASET, ROWNUMBER - the clustered index on this table is DW_Id.

    INSERT INTO the temporary table defined with the clustered index is roughly one third more costly (time) than the SELECT INTO which is currently used, however the DELETES performed on the staging table is lower by roughly the same amount of time.

    I read about the possibility of specifying "ORDER" clause for the source data on a BULK INSERT to tell the SQL Server hosting the staging table that the input is coming in already ordered by the clustered key, but for the life of me, I cannot figure out the correct syntax for it. My hope is to mix the best of both worlds with a lowest time cost insert and lowest time cost DELETES.

    I hope someone is able to help with this 🙂

    [edit]

    I tried doing the DELETE statement with the source linked server table as test, but that turns out to take much longer than either of the two methods described above.

  • try inserting into the temp table without an index (or just use select into ) and without the order by, then create the clustered index after the insert.

    Also, create the index as a unique clustered index if rownumber-dataset are in fact all unique.

    The probability of survival is inversely proportional to the angle of arrival.

  • Thank you for the tips.

    Specifying the index as unique did shave off time in the overall processing from what I had tested previously.

    But creating the index after the insert adds time to the overall process (both for the "SELECT INTO" and the "CREATE TABLE then INSERT INTO" methods).

  • rrn 62873 (8/15/2012)


    Thank you for the tips.

    Specifying the index as unique did shave off time in the overall processing from what I had tested previously.

    But creating the index after the insert adds time to the overall process (both for the "SELECT INTO" and the "CREATE TABLE then INSERT INTO" methods).

    Well it was worth a try, but it does depend upon how much performance your tempdb is configured for.

    You might even experiment with a permanent staging table, then create the index using SORT_IN_TEMPDB option. Again, any possible advantage depends upon how your disks are set up and how the database is configured.

    The probability of survival is inversely proportional to the angle of arrival.

  • Have you tried turning the problem around and doing a select of the Dataset & RowNumbers that are missing between the Staging and source tables

    As this should then produce the dataset that you want to delete

    something like this (Needs to be checked that you are getting the rows that are deleted back).

    SELECT [DATASET], [ROWNUMBER]

    INTO #TestTable

    FROM TX_Stage.dbo.XAL_dbo_STOCKTRANS A

    Where NOT EXISTS (Select 1 From [SQLXAL].[XALDRIFT].[dbo].[STOCKTRANS] B

    Where [A].[DATASET]=.[DATASET] AND [A].[ROWNUMBER]=.[ROWNUMBER])

    DELETE B

    FROM TX_Stage.dbo.XAL_dbo_STOCKTRANS_V B

    JOIN #TestTable A on [A].[DATASET]=.[DATASET]

    AND [A].[ROWNUMBER]=.[ROWNUMBER]

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply