Insert data in multiple batches for millions of record

  • Hi,

    I have table x, table y

    table x have millions of record, so i want to insert data by multiple batches into table y.

    Since I have millions of record, batch size should be configurable and after insert it should close loop based on count in table x.

    Thanks in advance🙂

    Regards
    Ravi

  • Use SSIS. It will handle inserts in batches behind the scenes without having to roll your own solution. You can do it manually using ORDER BY with OFFSET.

  • Joe Torre - Thursday, December 7, 2017 9:30 AM

    Use SSIS. It will handle inserts in batches behind the scenes without having to roll your own solution. You can do it manually using ORDER BY with OFFSET.

    +1

  • s.ravisankar - Thursday, December 7, 2017 4:15 AM

    Hi,

    I have table x, table y

    table x have millions of record, so i want to insert data by multiple batches into table y.

    Since I have millions of record, batch size should be configurable and after insert it should close loop based on count in table x.

    Thanks in advance🙂

    Regards
    Ravi

    1.  How many "millions of record"?
    2.  How many bytes does the average "record" contain?
    3.  Are both tables in the same database, different databases on the same instance, different instances, or what ???

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Thursday, December 7, 2017 12:07 PM

    s.ravisankar - Thursday, December 7, 2017 4:15 AM

    Hi,

    I have table x, table y

    table x have millions of record, so i want to insert data by multiple batches into table y.

    Since I have millions of record, batch size should be configurable and after insert it should close loop based on count in table x.

    Thanks in advance🙂

    Regards
    Ravi

    1.  How many "millions of record"?
    2.  How many bytes does the average "record" contain?
    3.  Are both tables in the same database, different databases on the same instance, different instances, or what ???

    Is table y empty at the start?
    Do you need to verify that the data in x isn't already in y?

  • Sorry for the late response

    I need SQL Query for this

    @batch_count is configurable (common for all tables)
    @incremental_value =1

    I have table1 with 500000 records  need to insert to tableA

    I have table2 with 100000 records need to insert to tableB

    I have table3 with 200000 records need to insert to tableC

    after insert batch wise, maximum count respective tables should end their loop

    --- Lynn Pettis
    Table will be empty

    --- Jeff Moden
    1. How many "millions of record"?
    15 tables I want to do, some tables have around million records

    2. How many bytes does the average "record" contain?
    Only one table I have xml data , others are normal

    3. Are both tables in the same database, different databases on the same instance, different instances, or what ???
    Actually it is two different databases and different servers.

    But approach is 
    i. Create NewDB in Source server
    ii. Create 15 tables, Push data to newDB from sourceDB
    iii. Take backup

    iv. Create and Restore NewDB in Destination Server
    v. Push data from 15 tables to DestinationDB

    Thanks a Lot for Response🙂

  • Thanks Joe,🙂

    I got it and modified based on order by offset,

    Can you help me for how to do for multiple tables?

    DECLARE @iterator INT = 0
    DECLARE @batch INT = 10
    declare @n int = (SELECT COUNT(1) FROM Table1)
    WHILE @iterator< @n
    BEGIN
      SELECT *
      FROM Table1
      ORDER BY Id
      OFFSET @iterator ROWS
      FETCH NEXT @batch ROWS ONLY

      SET @iterator +=@batch

    END

  • I probably wouldn't use an iterative approach here.  Since the tables are empty, I'd add the correct Clustered Index to the table, set the database to the BULK LOGGED Recovery Model, and use "Minimal Logging" to do the inserts (which necessarily requires that the inserts be done in the same order as the clustered index).

    Without a Clustered Index, the inserts will be approximately twice as fast because of the greatly reduced need for logging.  But, then you'd need to add the Clustered Index, which would temporarily double the footprint of the large table in the MDF and leave empty space in the MDF equal to about 120% of the largest table.  It would also take a comparatively very long time.

    With the Clustered Index in place and using the correct minimal logging technique, you'll end up with a table with the Clustered Index already in place, very little unwanted freespace in the target database, and it'll take much less time than building the Clustered Index separately.

    And, no... there's no way that I'd use SSIS for any of this.  It's just not necessary.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • And, no... there's no way that I'd use SSIS for any of this. It's just not necessary

    Completely disagree.  SSIS inserts large row counts this very well.  I can load millions of rows in minutes, all batched in without any need for me to do anything else.  I've occasionally had to use TSQL to batch inserts because it can provide a finer control over the order.  I have to use TSQL to batch updates and deletes.  If there's a good way to do this in SSIS I haven't seen it. 

  • RonKyle - Thursday, February 8, 2018 7:49 AM

    And, no... there's no way that I'd use SSIS for any of this. It's just not necessary

    Completely disagree.  SSIS inserts large row counts this very well.  I can load millions of rows in minutes, all batched in without any need for me to do anything else.  I've occasionally had to use TSQL to batch inserts because it can provide a finer control over the order.  I have to use TSQL to batch updates and deletes.  If there's a good way to do this in SSIS I haven't seen it. 

    It's ok.  We'll agree to disagree, partially even because of what you stated about the updates and deletes.  I just don't need SSIS to do any of what most people use SSIS for especially in the areas related to dynamic mapping and the auto-normalization of even some very complex spreadsheet imports.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply