best way to transfer data

  • Hi,

    I have two tables (one is main table and another one is staging table)

    Ex: tableA (contains 1.2 billion rows, trigger for arrival timestamp for insert and update and non clustered index).

    tableA_staging (92 million rows, no trigger and no index)

    I am currently trying it and takes around 10 hour and still data is not transferred.

    Can you please suggest some best way to transfer data from staging table to main table.

    Thanks in advance.

  • monilps (3/5/2013)


    Hi,

    I have two tables (one is main table and another one is staging table)

    Ex: tableA (contains 1.2 billion rows, trigger for arrival timestamp for insert and update and non clustered index).

    tableA_staging (92 million rows, no trigger and no index)

    I am currently trying it and takes around 10 hour and still data is not transferred.

    Can you please suggest some best way to transfer data from staging table to main table.

    Thanks in advance.

    Which exact direction you are transferring your data? Staging to main?

    Please post complete DDL for your tables and trigger.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • /****** Object: Table [dbo].[pp] ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    SET ANSI_PADDING ON

    GO

    CREATE TABLE [dbo].[pp](

    [e_id] [bigint] NULL,

    [p_id] [bigint] NULL,

    [pr_id] [int] NULL,

    [s_name] [varchar](256) NULL,

    [pro_id] [varchar](512) NULL,

    [timestamp] [varchar](30) NULL,

    [extra] [varchar](max) NULL,

    [checked] [bit] NULL,

    [timestamp_ts] [datetime] NULL,

    [arrival_timestamp] [datetimeoffset](7) NULL

    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF

    GO

    CREATE TABLE [dbo].[pp_staging](

    [e_id] [bigint] NULL,

    [p_id] [bigint] NULL,

    [pr_id] [int] NULL,

    [s_name] [varchar](256) NULL,

    [pro_id] [varchar](512) NULL,

    [timestamp] [varchar](30) NULL,

    [extra] [varchar](max) NULL,

    [checked] [bit] NULL,

    [timestamp_ts] [datetime] NULL

    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF

    GO

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    ALTER TRIGGER [dbo].[patient_property_arrival] ON [dbo].[patient_property]

    AFTER INSERT, UPDATE

    AS

    UPDATE dbo.pp

    SET arrival_timestamp = SYSDATETIMEOFFSET()

    FROM inserted i

    WHERE dbo.pp.p_id = i.p_id

    AND dbo.pp.s_name = i.s_name

    AND dbo.pp.pro_id = i.pro_id

    AND dbo.pp.timestamp_ts = i.timestamp_ts

    AND dbo.pp.checked = i.checked

    let me know if you need any additinal info. thanks.

  • Please supply the indexes on pp (patenit_property) and pp_staging as well as the INSERT statement used to copy the data.

    One thing I can say is that if you were to alter your trigger to skip the update if arrival_timestamp were supplied and supply "arrival_timestamp = SYSDATETIMEOFFSET()" in your INSERT statement you would save yourself a ton of I/O. You would need to do some impact analysis however to ensure a change like this would not compromise your data should other inserts or updates supply invalid values for that column thereby circumventing the usefulness of the trigger. If that were a concern there are other things you could do with CONTEXT_INFO to skip the work in the trigger for only your batch process.

    ALTER TRIGGER [dbo].[patient_property_arrival] ON [dbo].[patient_property]

    AFTER INSERT, UPDATE

    AS

    BEGIN

    IF NOT UPDATE(arrival_timestamp)

    BEGIN

    UPDATE dbo.pp

    SET arrival_timestamp = SYSDATETIMEOFFSET()

    FROM inserted i

    WHERE dbo.pp.p_id = i.p_id

    AND dbo.pp.s_name = i.s_name

    AND dbo.pp.pro_id = i.pro_id

    AND dbo.pp.timestamp_ts = i.timestamp_ts

    AND dbo.pp.checked = i.checked;

    END

    END

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • monilps (3/5/2013)


    CREATE TABLE [dbo].[pp]...

    CREATE TABLE [dbo].[pp_staging]...

    ALTER TRIGGER [dbo].[patient_property_arrival] ON [dbo].[patient_property]...

    let me know if you need any additinal info. thanks.

    so we have 3 tables involved here, what is the source and destination tables of the data being transfered? Are the tables in the same database? Are you transferring the data in one big gulp or is it broken out into smaller batches of say 1000 to 5000 records?

  • so we have 3 tables involved here, what is the source and destination tables of the data being transfered? Are the tables in the same database?

    -- Same database

    Are you transferring the data in one big gulp or is it broken out into smaller batches of say 1000 to 5000 records?

    -- One big gulp

    ** Correction ** ALTER TRIGGER [dbo].[patient_property_arrival] ON [dbo].[pp]...

    CREATE NONCLUSTERED INDEX [NonClusteredIndex_pp] ON [dbo].[pp]

    (

    [p_id] ASC,

    [s_name] ASC,

    [pro_id] ASC,

    [checked] ASC,

    [timestamp_ts] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

  • monilps (3/6/2013)


    2 tables involved here...

    Same database...

    One big gulp...

    OK, trying to move a large number of records in one big batch is probably why it looks like no records have been put into your destination from your source table. They won't be committed until the statement completes so the new records are only visible to the session doing the insert. This is a bit of a wild guess, but if you need to remove the records from the staging table and put them into the permanent table, something like this would do, I use a script like this for a system:

    WHILE EXISTS(SELECT TOP 1 NULL FROM [dbo].[pp_staging])

    BEGIN

    DELETE TOP (5000) pps

    OUTPUT DELETED.[e_id], DELETED.[p_id], DELETED.[pr_id], DELETED.[s_name], DELETED.[pro_id],

    DELETED.[timestamp], DELETED.[extra], DELETED.[checked], DELETED.[timestamp_ts], SYSDATETIMEOFFSET()

    INTO [dbo].[pp] ([e_id], [p_id], [pr_id], [s_name], [pro_id],

    [timestamp], [extra], [checked], [timestamp_ts], [arrival_timestamp])

    FROM [dbo].[pp_staging] pps

    END

    If you need to keep the records in the staging table, it gets a bit trickier, because then you need to somehow keep track of which rows in the staging table you've already done and haven't done.

  • When you run your process you can track progress by checking the rowcount of the table in sys.partitions with the iso level READ_UNCOMMITTED.

    Did you understand what I meant about offloading the work the trigger does onto the INSERT...SELECT?

    If it is locking the table for too long you may need to break the insert into smaller batches.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply