Balloons and Data

  • Comments posted to this topic are about the item Balloons and Data

  • My comment is centered around the use of tempdb over 'work' tables. What I mean is that in the very beginning of us using SQL server we wrote almost everything to 'work' tables, like work_db.dbo.policy_list. As we developed more SQL we eventually ran into an issue with a table name being used in more than one job. Of course this happened in the middle of the night and trying to figure out why the code was saying the table didn't exist when by the code it just created it was frustrating in the early AM hours. Another job had deleted it since it was using the exact same table name and happened to run at the same time this night. From that point forward we have coded more and more tables as temp tables, #policy_list instead of work_db_dbo.policy_list. So now when we run our monthly load of our warehouse the tempdb space grows, the rest of the month it's not used as heavily. Our DBA has asked that we look at writing to more 'work' tables instead of so much in the tempdb. My question is how do we avoid the situation that happened in the past? There are some 'work' tables that are deleted at the end of the SQL that created it. So I can't easily look to see if there is anything that is using the same name.

    I'm not a DBA but from my point of view I would think it would be easier to deal with one database growth each month(tempdb) than with multiple databases.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • First, you need to change terminology. Worktables are intermediate tables used by SQL Server. If you refer to tables you create as worktables, it becomes hard to discuss performance related items.

    Second, if you need places for data to manipulate it, like staging tables for ETL, why not just create real tables? Don't drop them, and don't clear them out, except for the process that uses them. If you can't keep naming straight, then use a GUID in each process to create/drop the table. If you have process 1 using the same table as process 2 and you aren't sure when these processes will run, you are architecturally making a big mistake.

  • Sorry for using 'work' table when I mean an intermediate table. If I have process A that normally ran at 10 PM and it created a intermediate table work_db.dbo.policy_list, this is a list of policies to renew in the next 90 days lets say. Process B normally runs at midnight also creates a table called work_db.dbo.policy_list but this is only current policies. If process A is delayed, say it has a file dependency on another job that is delayed so process A starts at 11:58 PM. As the SQL runs it creates its policy_list table at 11:59 PM then Process B starts at midnight and the first thing it does is delete's the policy_list table so it can create it for it's processing. This will either cause Process A to fail of get the wrong results. Neither process is going to use the table policy_list beyond the SQL that created it. That's why I prefer to just use the temp table #policy_list in both of these processes. As you are writing the SQL for the process that may be a 1000 lines long how do you make sure that your intermediate table name doe not already exist in another process. When most people have coded a delete at the end of there SQL for any intermediate tables they used the table name may not exist on the database. Without knowing every line of code you can't be sure your intermediate table name is not used somewhere else. At least not yet, I'm trying to come up with something in my spare time to list all tables used by our processes. My concern is that we are being asked to use intermediate tables instead of temp tables and I'm afraid we will run into the same situation some where down the road. So without some kind of naming convention for the intermediate table names then I say use temp tables. Any thoughts on how you could even come up with a naming convention for intermediate table names?

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • You shouldn't have that many processes that work in the same area, without knowing what is being done with each. Even in large development teams, 20 people, I haven't seen naming collisions get beyond test. We know that the same name is being used and someone changes.

    This is team communication. Either you have a failing here, or an architectural failing that allows the same table to be used for two things.

    If you have the need for a permanent table that you use for temporary storage, then you create a name. The name doesn't matter, but if you create it and only truncate or clear it, you easily know if it is there . If you are dropping tables, then there's no reason not to use a tempdb temp table.

  • Steve Jones - SSC Editor (8/8/2013)


    if you need places for data to manipulate it, like staging tables for ETL, why not just create real tables? Don't drop them, and don't clear them out, except for the process that uses them. If you can't keep naming straight, then use a GUID in each process to create/drop the table. If you have process 1 using the same table as process 2 and you aren't sure when these processes will run, you are architecturally making a big mistake.

    Steve - This is so true. Use of a permanent staging table has always improved the reliability of any well written ETL process. I used to insert into #temp tables all the time. The difference in performance is amazing. I have heard that you should make these tables heaps, but that has not been the better way for me. What is your opinion on that?

  • What is your opinion on how many TempDB files you should have?

    I know that it is best to have the TempDB on it's own LUN separate from all others.

    I have also read that you should create at least one TempDB database file per CPU core.

    What is your opinion on that?

  • Read this: http://www.sqlskills.com/blogs/paul/a-sql-server-dba-myth-a-day-1230-tempdb-should-always-have-one-data-file-per-processor-core/

    In general, here's what I'd say. Separate LUN if possible, but not critical. Go with # cores for <= 8. After that, stick with 8 unless you can test.

  • Steve Jones - SSC Editor (8/8/2013)


    Read this: http://www.sqlskills.com/blogs/paul/a-sql-server-dba-myth-a-day-1230-tempdb-should-always-have-one-data-file-per-processor-core/

    In general, here's what I'd say. Separate LUN if possible, but not critical. Go with # cores for <= 8. After that, stick with 8 unless you can test.

    LOLZ Steve - That was the article I was thinking about when I said "I have read"....

    It is also the one I pointed too when starting at my current place of work when they asked why.

    Doing this has also allowed us to rebuild indexes on Standard Edition servers without taking our applications offline.

    They have stopped asking why.

    Paul is the go to master for SQL Deep Dive's and water ones two. 😉

  • Hi Steve, Years back I instituted the use of and Operational Data Store or as some say staging area. The initial thought was to use this as a clearing house to build dataflows from an XML based webservice network but it grew. Once data folks knew that there we a place where ETL could be done where the expectation was that you could keep the intermediate tables if you choose, it grew. We have had to institute some controls on who, what, and when to not overload the process, but some in the organization are now using it as a main part of their data strategy.

    Why not keep what is reasonable to keep. Much of it could be done in temp tables, but this works nicely.

    Not all gray hairs are Dinosaurs!

  • I don't have a good way to test this anymore. But at my last job we would do a data load once a month that consisted of text files that had millions of rows and 30-60 columns to about 60 matching tables.

    I would purposely grow the tempdb files ahead of time. Then do a shrink afterward. It seemed to work well.

    Was that a bad idea?



    ----------------
    Jim P.

    A little bit of this and a little byte of that can cause bloatware.

  • Right sizing log files and datafiles for the FULL recovery model is a far bigger PITA than TempDB will ever be, IMHO. For example, even if you tell the system to "Sort In TempDB" when you go to do a clustered index rebuild on a very large table and you have Point-In-Time log backups running every 10 minutes, your log file can grow to several times the size of the original table during the process for a multitude of reasons. That's not to mention the bloat that it will cause in the MDF file while it makes a copy of the index.

    Of course, there's the fragmentation nightmare that occurs if you try to shrink a database.

    All in all, there are some serious shortcomings that MS has continued to ignore on index maintenance over the years resulting in Balloons that appear to have check valves in the in the inflation tubes.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I have a job kicks off every 5 minutes and queries active requests and joins session info, t-sql statement, cpu/io/duration, etc. into a table on my local instance. Also, a while back I added tempdb and transaction log usage, so I can monitor and archive that as well.

    Adding an index here and there can switch hash joins or sort operations to merge joins, and not only speed up processing but also use less tempdb allocation. To reduce transaction logging, look for procedures that perform multiple updates and consolidate them into one single update.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Jim P. (8/8/2013)


    I don't have a good way to test this anymore. But at my last job we would do a data load once a month that consisted of text files that had millions of rows and 30-60 columns to about 60 matching tables.

    I would purposely grow the tempdb files ahead of time. Then do a shrink afterward. It seemed to work well.

    Was that a bad idea?

    Over time there has been one test I have done to answer this question and many like it: Did it work? If the answer is yes then it was a good solution. Was it the best solution? Does not matter, it worked and you cleaned up the effects after. Done!

    M...

    Not all gray hairs are Dinosaurs!

  • Jeff Moden (8/8/2013)


    Right sizing log files and datafiles for the FULL recovery model is a far bigger PITA than TempDB will ever be, IMHO. For example, even if you tell the system to "Sort In TempDB" when you go to do a clustered index rebuild on a very large table and you have Point-In-Time log backups running every 10 minutes, your log file can grow to several times the size of the original table during the process for a multitude of reasons. That's not to mention the bloat that it will cause in the MDF file while it makes a copy of the index.

    Of course, there's the fragmentation nightmare that occurs if you try to shrink a database.

    All in all, there are some serious shortcomings that MS has continued to ignore on index maintenance over the years resulting in Balloons that appear to have check valves in the in the inflation tubes.

    Sounds like a good article (hint, hint) :hehe::w00t:;-):-P

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply