Using In-Memory Tables To Improve Performance

  • Hi Everyone,

    I'm working on an ETL tool and I'm focusing on improving performance. What we're doing is pretty simple. We read
    a few input files, create a bunch of tables with the data from those files in them, and then transform the data in those
    tables based on predefined rules that are stored in a separate database. The transformed data is then written out to output
    files on disk. When these steps are completed, the entire contents of the working DB where the data is stored are deleted.

    The files we're dealing with can get pretty big, so the tables can be as big as 2 - 10 million rows. I've already identified some
    areas where we can improve performance, but now I'm trying to see if I can improve performance in areas where we're just
    retrieving all the rows from a table that has millions of rows. As an example. we're doing Select * on a table with 2
    million rows. The total data in the table(without indexes) is 294 MB according to sp_spaceused. It takes 22 seconds to retrieve
    all the data. If I restrict the query to just the first column, it takes 9 seconds to run. The first column is declared as varchar(180),
    but the values in the column are all 3 characters.

    My first question is whether this is a reasonable amount of time for a query like this to take. I know there are lots of variables, like
    disk speed, so this is probably impossible to answer with any certainty, but does anyone think that sounds way too long for that
    amount of data? Is there anything I can do to speed up a query that's retrieving all rows from a table, besides eliminating columns?

    And the real question I came here for is whether In-Memory tables would be a good fit for this situation. The total data stored in the
    database is 7 GB. We can easily require 32 GB on the servers that run the software. Has anyone here used In-Memory tables? I know
    it's still maturing, but is the technology ready for prime time?

    Thanks.

  • The first question I have to ask relates to your statement on how long it takes to retrieve the data.   Admittedly, there are a lot of pieces, but stop and think for a moment about exactly HOW you measured that time.   Did you just wait for all that data to arrive in SSMS ?   If that's the case, you may be distorting your measurement significantly because you had to wait for all that data to cross the network to  your workstation, and for SSMS to receive it, format it, and present it.   Try instead to use exact same SELECT statement, but add an INTO #TEMP_TABLE right after the SELECT statement and before the FROM, on it's own line, and see how long that takes.   It might be a more realistic way to measure how long it takes the server to move data within the box, but still writing data to disk...   Since you mentioned in-memory tables, it sounds like a worthy candidate for testing to me...   The question is, will 7 GB always be the size of the data, or will that grow in the future?   Just a couple of things to think about...

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • I would suggest that you thoroughly analyse the problem before choosing the tools😉
    😎

    There are several limitations using memory-optimized tables, have a look here, SQL Server Features Not Supported for In-Memory OLTP

  • Thanks sgmunson. I knew that some piece of the delay was due to the results being pulled into SSMS, but I didn't realize how much until I tried it yesterday after posting here.
    It takes only 2 - 3 seconds to do a Select Into a temp table. But I see the same Select Into taking over a minute when I run the actual program and do a trace.

    I think there might be disk contention because we're running several queries at once. That's why I'm thinking that In-Memory tables might help. Right now I'm working on confirming that
    there is disk contention.

    One reason I think the In-Memory tables might help and not be too complicated is that this database is being used by just one user through the ETL app. Each user has their own database
    and the ETL app is the the only application using the database. So it's not like we have to worry about concurrent access by multiple users. I don't even think I have to worry about multiple
    processes or threads accessing the same table at the same time. It's just one application spawning multiple threads that I'm pretty sure never access the same table at the same time.
    The 7GB database is the worst case scenario, at least right now. That size will probably grow in the future, but currently that's the size the DB grows to with the biggest input files
    we currently deal with.

    Eirikur, I agree about analyzing the problem thoroughly first. That's what I'm working on now, but In-Memory tables sound like they might be a good fit for our fairly unique situation.
    I realize there are many limitations when using them, but so far I don't think any of them will affect us if we upgrade to SQL Server 2017.

  • John O'Connor - Tuesday, April 10, 2018 10:34 AM

    Thanks sgmunson. I knew that some piece of the delay was due to the results being pulled into SSMS, but I didn't realize how much until I tried it yesterday after posting here.
    It takes only 2 - 3 seconds to do a Select Into a temp table. But I see the same Select Into taking over a minute when I run the actual program and do a trace.

    I think there might be disk contention because we're running several queries at once. That's why I'm thinking that In-Memory tables might help. Right now I'm working on confirming that
    there is disk contention.

    One reason I think the In-Memory tables might help and not be too complicated is that this database is being used by just one user through the ETL app. Each user has their own database
    and the ETL app is the the only application using the database. So it's not like we have to worry about concurrent access by multiple users. I don't even think I have to worry about multiple
    processes or threads accessing the same table at the same time. It's just one application spawning multiple threads that I'm pretty sure never access the same table at the same time.
    The 7GB database is the worst case scenario, at least right now. That size will probably grow in the future, but currently that's the size the DB grows to with the biggest input files
    we currently deal with.

    Eirikur, I agree about analyzing the problem thoroughly first. That's what I'm working on now, but In-Memory tables sound like they might be a good fit for our fairly unique situation.
    I realize there are many limitations when using them, but so far I don't think any of them will affect us if we upgrade to SQL Server 2017.

    If you run a trace, you also impose a potentially significant burden on the server, so stats you get out of Profiler may need to be taken with a grain of salt...  not because they aren't accurate, but they end up measuring things that include their own load on the server, which can be significant.   Traces should be kept to the minimum info necessary to achieve your measurement goals, and things like execution time measurement should be compared with the same query only inserting into a temp table but without the trace going...  so as to have some indicator of just how much burden that trace is introducing.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply