SQL Insert performance tuning

  • Hi,

    Thanks for a great forum ... and site, really nice articles here 🙂

    I'm using .NET & SQL 2008 ( There are no performance topic under SQL 2008 )

    I'm doing some imports of some SQL files from severel online servers:

    http://s4.travian.dk/map.sql - its an online game and I compute some stats fra the information I get.

    The sql file are updated every day.

    I can see that the thing that takes the most time is inserting the data into the DB. So i found a way to extract the info with Regualar Expressions ... insert it into a list ... and do some comparing from that ... since its only some rows that change this part is very fast ... about 1-2 seconds.

    List list;

    But one of the columns is holding the Population of the villages ... and this almost change every day. So from the list I only insert 2 columns ... VillageID and Population.

    Delete the Index if there are any ...

    IF EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[x_world_pop]') AND name = N'IX_x_world_pop')

    DROP INDEX IX_x_world_pop ON x_world_pop

    1000 rows in each batch.

    INSERT INTO x_world_pop (villageid, population), (villageid, population), (villageid, population), (villageid, population)

    Create the Index so the UPDATE join is faster.

    IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[x_world_pop]') AND name = N'IX_x_world_pop')

    CREATE INDEX IX_x_world_pop ON x_world_pop(VID)

    Update the Villages table ...

    UPDATE V SET V.Population = X.Population FROM Villages V INNER JOIN x_world_pop X ON V.VID = X.VID

    WHERE V.Population <> X.Population

    So ... the above is what i'm using ... this is my problem ... and maybe you guys have a better solution 🙂

    INSERTING 130k rows first ... takes 1.3 sec.

    CREATING the index is fast ...

    UPDATING is fast ...

    DROP index is fast ...

    I'm dropping the index so insert is faster ... the insert takes usealy 28 sec with the index on ....

    But ... I need a bit of delay after dropping the index ... before the INSERT is fast again, probebly due to the server needs to remove the data first ... its not a problem, as I will be doing some other updates in the meantime ... which will create natual delay.

    Can' this be optimized in any ways ? What would you do ? As there is about 250 sql files that needs to be imported ... all optimization is great 🙂

    best regards

    Mikael Syska

  • Out of curiousity, are all the inserts and updates happening on the same tables for all of the sql files? If that's the case could you drop the index, import all of the data before creating the index and performing the update?



    Shamless self promotion - read my blog http://sirsql.net

  • Nicholas Cain (1/11/2009)


    Out of curiousity, are all the inserts and updates happening on the same tables for all of the sql files? If that's the case could you drop the index, import all of the data before creating the index and performing the update?

    No ... i have made a speciel SID ( Server ID ) for every server import.

    Before I was importing all the information into a table to have it in one place ( took about 80% of all the time using on the update together ), now I'm using regex to get the information and compare it with the other info allready in the database. Maybe a Player have created a new village, and I need to save that event. maybe a player have change alliance .. save a event about that too. And ofcause I need to add the village to the villages table and change his AID ( Alliance ID ).

    But as you just said that about the import ... I could save the SID, VID and the new population and wait until all servers are updated ... then update the new population count.

    Then I will just have to DROP the index before the very first server update ... then create it after the last server update ... and then maybe the problem aint that big anymore 🙂

    mvh

  • One other thought, if you only need the index for the update, and don't actually need it for queries, why not drop it after the update is completed?



    Shamless self promotion - read my blog http://sirsql.net

  • Nicholas Cain (1/11/2009)


    One other thought, if you only need the index for the update, and don't actually need it for queries, why not drop it after the update is completed?

    Before all the updates starts ... I Drop the Index ...

    gathering all the information ...

    Doing some addional debug ... it runs great(better than before, but the Insert are still TERRIBLE SLOW ... look here:

    FetchMapData took 00:00:01.7230793 - download from the web ( I will put this in a other thread and start it 1 mins before the updates begin, so I can avoid this little delay)

    Inserting into x_world_pop took 00:00:04.5375582 ( this is only insert into the temp table, so I can update the Population count after all updated are done )

    InsertMapData took 00:00:04.8473752 - ( Total time for the Insert Inserting into x_world_pop and the Regex part, so ... 0.33 sec for the regex ... )

    UpdateAlliances took 00:00:00.0146105 - Just some updates

    UpdatePlayers took 00:00:00.0808817 - Just some updates

    UpdateVillages took 00:00:00.1842725 - Just some updates

    InsertEvents took 00:00:00.0010062 - Just some updates

    SubmitChanges took 00:00:00.6827493 - Just some updates

    Commit took 00:00:00.0003631 - Just some updates

    Set the updated Date took 00:00:00.0091737 - Just some updates

    Total time: InsertIntoWorld took 00:00:00.9753588 - Total time for all the "Just some updates"

    So .. as you can see ... my delay is in the Inserting into x_world_pop ...

    This is a rather small server ( where I get this sql file from ) ... so the percentage will increase ....

    I'm lost of why its so slow ...

  • So ... for 4mill rows, 65 mb data ... it takes like ( this is a guess ) 20% of the time my Import app is running ...

    Its all running on the same machine as the database.

    OMG ...

  • I think that you'll have to show us the actual code for us to help you.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung (1/11/2009)


    I think that you'll have to show us the actual code for us to help you.

    Hi Barry,

    Sure ... here it comes.

    With the times above ... this is what takes 0.33 sec

    Regex reg = new Regex("(\\d+),(-?\\d+),(-?\\d+),(-?\\d+),(-?\\d+),'(.+)',(\\d+),'(.+)',(\\d+),'(.*)',(\\d+)", RegexOptions.Compiled);

    Match m;

    int lines = 0;

    List = new List ();

    WorldRow wr;

    StreamReader sr = new StreamReader(MapPath + "\\" + FileName, System.Text.Encoding.UTF8);

    while (!sr.EndOfStream)

    {

    lines++;

    m = reg.Match(sr.ReadLine());

    if (m.Success)

    {

    wr = new WorldRow();

    wr.CoorID = int.Parse(m.Groups[1].Value);

    wr.X = short.Parse(m.Groups[2].Value);

    wr.Y = short.Parse(m.Groups[3].Value);

    wr.TID = byte.Parse(m.Groups[4].Value);

    wr.VID = int.Parse(m.Groups[5].Value);

    wr.VillageName = m.Groups[6].Value;

    wr.UID = int.Parse(m.Groups[7].Value);

    wr.PlayerName = m.Groups[8].Value;

    wr.AID = int.Parse(m.Groups[9].Value);

    wr.AllianceName = m.Groups[10].Value;

    wr.Population = short.Parse(m.Groups[11].Value);

    List.Add(wr);

    }

    }

    Here i loop over the Items in the List ... this takes 4.5375582 sec, still running local on the database server.

    for (int i = 0; i <= List.Count / 1000; i++)

    {

    sb = new StringBuilder("INSERT INTO x_world_pop VALUES ");

    foreach (var v in List.Skip(i * 1000).Take(1000))

    {

    c++;

    sb.Append("(" + f_SID.ToString() + "," + v.VID.ToString() + "," + v.Population.ToString() + "),");

    }

    sb.Remove(sb.Length - 1, 1);

    db.ExecuteCommand(sb.ToString(), 0);

    }

    So ... do I have any other options for doing this ?

    Other ideas are very welcome.

    I saw yesterday that of the 4 mill rows, only 50% of the villages changed population count ... so ... do I have other smart options for updating the database ?

    This bugs me like hell ... just started to really use a database with lots of data, so maybe very obvious things may still be new to me 🙂

    // ouT

  • Mikael,

    I am no expert:), but I liked to put my own thoughts on this.

    Instead of reading the file row-by-row and forming the dynamic sql for every 1000 lines, what you can do is create an SSIS package and extract the script file data to a temp table.

    Then write a procedure, which will read the temp data and form the dynamic sql to dump into the main table. And finally you have the update statement to update population of villages.

    So, you'll ask what are the advantages/improvements here?

    1. Importing bulk read and dump will surely improve the performance, considering you have 250 files.

    2. The entire batch (i.e. drop index, insert, create index & update) will run on the server itself, which will again be faster.

    I know there might be other solutions which are better than this, and I hope someone will just provide that.

    --Ramesh


  • Hi,

    Ramesh (1/12/2009)


    Mikael,

    I am no expert:), but I liked to put my own thoughts on this.

    Instead of reading the file row-by-row and forming the dynamic sql for every 1000 lines, what you can do is create an SSIS package and extract the script file data to a temp table.

    Then write a procedure, which will read the temp data and form the dynamic sql to dump into the main table. And finally you have the update statement to update population of villages.

    So, you'll ask what are the advantages/improvements here?

    1. Importing bulk read and dump will surely improve the performance, considering you have 250 files.

    2. The entire batch (i.e. drop index, insert, create index & update) will run on the server itself, which will again be faster.

    I know there might be other solutions which are better than this, and I hope someone will just provide that.

    Well ... as you can understand in all my post ... I'm open to all suggestions as this is really a problem. Why use all that time, in such a simple area, if it can be done in a much fater and maybe more simple way.

    As for the regex, just putting the Compiled options on ... saved 50% for extracting the data.

    So ... I'm sure there are some good options/possibilities here too

    I will look into yours later today.

    mvh

  • After reading a little about SSIS packages it seems really handy ...

    But with the above ... would that not create some extra steps ... as I see it ... would it not be better doing some like this:

    When I'm doing the Regex, create a file ... import.txt in the following format:

    smallint, int, int - this size is 338 KB and the original map.sql file was: 2539 KB

    Then import it with the SSIS Package ...

    Update the Villages table ...

    Either either doing this for each server or wait until its done, and then update ...

    Just some thoughts ... dont know if its even possible to call SSIS packages from the client, but is should be, everything else would be strange.

  • mikael (1/12/2009)


    After reading a little about SSIS packages it seems really handy ...

    But with the above ... would that not create some extra steps ... as I see it ... would it not be better doing some like this:

    When I'm doing the Regex, create a file ... import.txt in the following format:

    smallint, int, int - this size is 338 KB and the original map.sql file was: 2539 KB

    Then import it with the SSIS Package ...

    Update the Villages table ...

    Either either doing this for each server or wait until its done, and then update ...

    Just some thoughts ... dont know if its even possible to call SSIS packages from the client, but is should be, everything else would be strange.

    I think you won't bother if you have more steps and performs better than the existing methods.:D

    SSIS is really very handy and easy to use, and you can do the entire task in a single package and also

    you can call the package from the client using "DTSEXEC" utility.

    --Ramesh


  • Holy mother of GOD ...

    SSIS Package with the "Script Task" seem handy ...

    Now I can move my ugly console app into the SQL server ... and handle it all from there ... 🙂 and use the SQL Agent to schedule the update.

    I will have to look into this in the near future ... hopefully tonight 😉

    So .... now I'm just wondering how to get the best performance out of it ... regarding the insert. maybe still insert into x_world_pop table, update that one server, and loop it like that ... or ... well, I will have to think it over before I begin.

    Are there any performance gain regarding executing sql commands from the DTS script task ?

  • mikael (1/12/2009)


    RBarryYoung (1/11/2009)


    I think that you'll have to show us the actual code for us to help you.

    Hi Barry,

    Sure ... here it comes.

    ...

    Here i loop over the Items in the List ... this takes 4.5375582 sec, still running local on the database server.

    for (int i = 0; i <= List.Count / 1000; i++)

    {

    sb = new StringBuilder("INSERT INTO x_world_pop VALUES ");

    foreach (var v in List.Skip(i * 1000).Take(1000))

    {

    c++;

    sb.Append("(" + f_SID.ToString() + "," + v.VID.ToString() + "," + v.Population.ToString() + "),");

    }

    sb.Remove(sb.Length - 1, 1);

    db.ExecuteCommand(sb.ToString(), 0);

    }

    So ... do I have any other options for doing this ?

    Yes, use the SQLBulkcopy class for loading this, I think that you will be a lot happier with it. Note that there is pre-existing data in the target table, then you may need to load it into a staging table first and then use SQL to get it into your actual table appropriately. It will still be faster than anything else except BCP.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Oops, didn't see the later posts: If you are going to SSIS, that is fine too and should be about as fast as SQLBulkCopy. If you are going to keep it in you client code though, then definitely use SQLBulkCopy.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply