Replacing a Cursor

  • I just read an article by Clinton Herring titled "Replacing Cursors and While Loops". It got me thinking on if I could save some time in a process that I run on a daily basis. I have a love-hate relationship with cursors and would rather not have them if I could. But there is one place where I cannot get past using a cursor. So maybe you guys here can help me out. Here's the situation: We get a daily file with merchant activity information for the past day. Due to some internal business rules and for faster data retrieval we have to store each merchant's data in its own table. I have a SP that loops thru the daily file and inserts data in each indicidual merchant table. If a table does not exist for the mechant, it first creates a table before the Insert.

    Can this be converted to a non-cursor based solution or am I stuck here? the import job currently takes 45 minutes and the daily file contains 400,000 records in each daily file.

    Here are some details of the tables and process(not real production table and data)

    Thanks for all your help!!!

    DDL for Table with daily data:

    CREATE TABLE [dbo].[DailyData] (

    [DailyDataID] [int] IDENTITY (1, 1) NOT NULL ,

    [MerchID] [varchar] (16) NOT NULL ,

    [FileDate] [datetime] NOT NULL ,

    [TranCode] [varchar] (4) NOT NULL ,

    [TranDetail] [varchar] (100) NOT NULL ,

    )

    --SP Code:

    CREATE PROCEDURE dbo.insDailyData

    @FileDate varchar(10)

    AS

    ----DECLARE variables to be used in this section

    DECLARE

    @MerchID varchar(16)

    , @RecCount int

    , @SQL varchar(4000)

    -----SET Initial value for @RecCount

    SELECT

    @RecCount = 0

    -----DECLARE and Open Cursor for processing

    DECLARE DailyDataCursor CURSOR

    FOR

    SELECT DISTINCT

    T.MerchID

    FROM

    DailyData AS T

    ORDER BY

    T.MerchID

    OPEN DailyDataCursor

    FETCH NEXT FROM DailyDataCursor

    INTO @MerchID

    WHILE @@FETCH_STATUS = 0

    BEGIN

    -- Table Name = Tran+MerchID

    -----Check to see if table exists

    IF EXISTS (SELECT * FROM dbo.sysobjects WHERE [ID] = object_id(N'[dbo].[Tran' + @MerchID + ']') AND OBJECTPROPERTY([ID], N'IsUserTable') = 1)

    BEGIN

    -----Step 1: DELETE data for @FileDate if it exists

    SELECT

    @SQL = '

    DELETE tbl

    FROM

    dbo.Tran' + @MerchID + ' AS tbl

    WHERE

    tbl.FileDate = CONVERT(varchar(10), ''' + @FileDate + ''', 101)'

    EXEC(@SQL)

    END

    ELSE

    BEGIN

    -----Step 1: CREATE table for MerchID

    SELECT

    @SQL = '

    CREATE TABLE [dbo].[Tran' + @MerchID +'] (

    [TranID] [int] IDENTITY (1, 1) NOT NULL ,

    [FileDate] [datetime] NOT NULL ,

    [MerchID] [varchar] (16) NOT NULL ,

    [TranCode] [varchar] (4) NOT NULL ,

    [TranDetail] [varchar] (100) NOT NULL ,

    [DateRecAdded] [datetime] NOT NULL

    )'

    EXEC(@SQL)

    SELECT

    @SQL = '

    ALTER TABLE [dbo].[Tran' + @MerchID +'] WITH NOCHECK ADD

    CONSTRAINT [PK_Tran' + @MerchID +'_TranID] PRIMARY KEY CLUSTERED

    (

    [TranID]

    )'

    EXEC(@SQL)

    SELECT

    @SQL = '

    ALTER TABLE [dbo].[Tran' + @MerchID +'] ADD

    CONSTRAINT [DF_Tran' + @MerchID +'_DateRecAdded_1] DEFAULT (GETDATE()) FOR [DateRecAdded]'

    EXEC(@SQL)

    END

    -----Step 2: INSERT daily data in Merchant table

    SELECT

    @SQL = '

    INSERT INTO dbo.Tran' + @MerchID + ' (FileDate, MerchID, TranCode, TranDetail, DateRecAdded)

    SELECT DISTINCT TOP 100 PERCENT

    D.FileDate

    , D.MerchID

    , D.TranCode

    , D.TranDetail

    , GETDATE() AS DateRecAdded

    FROM

    dbo.DailyDetail AS D

    WHERE

    D.MerchID = ''' + @MerchID + '''

    ORDER BY

    D.MerchID'

    EXEC(@SQL)

    FETCH NEXT FROM DailyDataCursor

    INTO @MerchID

    END

    CLOSE DailyDataCursor

    DEALLOCATE DailyDataCursor

  • There's your problem :

    "Due to some internal business rules and for faster data retrieval we have to store each merchant's data in its own table. I have a SP that loops thru the daily file and inserts data in each indicidual merchant table"

    There should only be one table for that. (unless you have 100M rows in that table, then you can start thinking about splitting it up ).

    Here's a few pointers that might shed 1 minute on the whole process (hey it's a start ).

    SELECT

    @SQL = '

    CREATE TABLE [dbo].[Tran' + @MerchID +'] (

    [TranID] [int] IDENTITY (1, 1) NOT NULL ,

    [FileDate] [datetime] NOT NULL ,

    [MerchID] [varchar] (16) NOT NULL ,

    [TranCode] [varchar] (4) NOT NULL ,

    [TranDetail] [varchar] (100) NOT NULL ,

    [DateRecAdded] [datetime] NOT NULL

    )'

    EXEC(@SQL)

    SELECT

    @SQL = '

    ALTER TABLE [dbo].[Tran' + @MerchID +'] WITH NOCHECK ADD

    CONSTRAINT [PK_Tran' + @MerchID +'_TranID] PRIMARY KEY CLUSTERED

    (

    [TranID]

    EXEC(@SQL)

    SELECT

    @SQL = '

    ALTER TABLE [dbo].[Tran' + @MerchID +'] ADD

    CONSTRAINT [DF_Tran' + @MerchID +'_DateRecAdded_1] DEFAULT (GETDATE()) FOR [DateRecAdded]')

    EXEC(@SQL)

    This code can be executed in a single step. The default can be included in the original create, so can the primary key.

    SELECT DISTINCT TOP 100 PERCENT

    D.FileDate

    , D.MerchID

    , D.TranCode

    , D.TranDetail

    , GETDATE() AS DateRecAdded

    FROM

    dbo.DailyDetail AS D

    WHERE

    D.MerchID = ''' + @MerchID + '''

    ORDER BY

    D.MerchID'

    Remove the order by (unless it performs the insert faster... depends on the clustered index). Also since you have the default on the column, you don't need to select the getdate() (might save a little time here).

  • Thanks for your reply Remi.

    You are right about the suggestion that we should have one table for all the data. Reason we cannot do that is for compliance issues and also majority of the merchant tables have close to 50M records in each table - we have to provide reporting data for 18-month periods.

    Also I can execute the table create process in one SQL statement but we have more columns in the table with different clustered indexes that what has been displayed in my original post.

    You are right too in your suggestion that I should take out “, GETDATE() AS DateRecAdded” in insert statement.

  • I can understand the split if you have 50M rows/merchant .

    The only thing that could speed this up is to make sure that the clustered index of the Details table is on the MerchID column and to remove the order by MerchId in the select since all the columns have the same values, this sort is useless.

    I fear you are stuck with this procedural approach. The final tiny upgrade you could make would be to convert the cursor to a loop

    --don't forget to initialize this variable

    while @Myid not is null

    select @Myid = max(merchid) from @DistinctMerchids where merchid > @MyId order by Merchid

    This will remove the overhead of the cursor from the proc. It's not gonna be much be it's the sum of those upgrades that make the server go fast...

  • I think that if you need that MANY tables you'll need a different approach or your data will become unmanageable.

    I will try any of the following 2 options.

    1. Create a Partitioned view which you will refresh if the customerID does not exists in it and then execute an insert comand against the the Partitioned view

    2. Another option is to try Parallel loading with DTS or BCP+osql

    Just my $0.02

     


    * Noel

  • DECLARE @MerchID varchar(16)

      , @LastMerchID varchar(16)

    SELECT TOP 1 @MerchID = MerchID

    FROM DailyData

    WHERE  MerchID > @LastMerchID

    ORDER BY MerchID

    WHILE LEN(@MerchID) > 0

    BEGIN

      ...

      -- Loop

      SET @LastMerchID = @MerchID

      SELECT TOP 1 @MerchID = MerchID

      FROM DailyData

      WHERE  MerchID > @LastMerchID

      ORDER BY MerchID

      IF @LastMerchID = @MerchID

        BREAK

    END

    Andy

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply