Hardware investment - I need to talk to someone knowledgeable and with no intere$t

  • Hi there guys.

    So i'm shaking...

    I've been hired as an IS guy, but with time and budget constraints, i was forced to embrace the IT realm too. Ok... although i knew little about it, it was just enough to rebuild AD's, set Group Policies, setup simple networking, fiddle with firewalls and routers, install printers, add computers and users to the domain, configuring Outlook accounts and the sort...

    Now, even i think this is getting bigger than me. I feel the need to expand the services both IT and IS, and i'm lacking the time for doing just that. Alongside that, i'm about to reach my IT knowledge limit. And I have noone with whom discuss this stuff. I guess i'll offload it on you people. Sorry... 🙂

    My problem is very simple...ok, to me it's not. But maybe to some of you it is, and I hope you can help.

    In the last years we have been saving a lot on IT infrastructure. Every machine i had was checked and rechecked to see if it could be of use. Some were, some weren't. But now, this is getting bigger, denser and that patchwork can no longer be done. I need serious firepower...

    We are currently using 3 physical servers (if you can call them that...); 1 as DC and Active directory, other as the SQL Server and the other as a virtual host (running ESXi 4.1). Virtual host is running about 6 machines, that support various purposes like, access to the ERP maintenance guys, roadwarrior fellow colleagues, automatic production processing, RDP to our retail stores (about 15) and a few vWorkstations.

    Besides this, the SQL Server machine runs an app/job every 3 minutes to check if something of importance happened during that time on the ERP database. If it did, then it issues and email message to the stakeholders.

    Ok, my infrastructure is laid down.

    What i want to do... but first the constraint: money!

    Besides what we have right now, and due to poor service from my provider i want/need to setup an email server (this issue is still out in the open since i'm weighing the use of a VPS for this), and i want to add SMS messaging to the app running every 3 minutes. I'd like also to migrate more users to a RDS approach instead of using full blown workstations. And the retail stores will grow in number so...

    For this i've been scouting the internet and found a very appealing machine: DL385 G7, 32GB RAM and P410i 512MB. This baby holds 2 Opteron octo-core processors and i think of populating it with 6 HDD 600GB SAS 15k (arranged in RAID10). But... since i'd like to virtualize all the machines i've talked about, i don't know if it can really handle the task...

    I'm mostly concerned about SQL performance. Our database is about 16GB, but it's heavily pounded (about 45 users), since our processes on the ERP require constant verifications, which in turn requires reads to the database. It get's worst when the verifications need to access different tables and establish not so straightforward relations.

    The machine is taken care of, now let's hop into the software bandwagon.

    I'm thinking of getting 2 licenses of Windows Server 2012 Standard, 2 Core Licenses for SQL Server 2012 and about 25 CAL for RDS (i'm referencing it correctly?)

    So, this is how i devise the usage of those licenses:

    - On the first WS2k12 Std license, I would set DC and AD, then on the 2 virtual machines allowed i would (1) install the SQL Server (with 2 vCores assigned) and (2) Install the Retail Remote Desktop Access and about 20 RDS CAL's.

    -On the second WS2k12 Std license i would create a vmachine to install the other 5 RDS CAL's (I want it separated from retail stores server). In the remainder vMachine I would install the automatic production processing script.

    - every other stuff i'd do with linux server software (email, etc...)

    Ok, this is my best idea... please feel free to comment and lay down your own ideas on how to approach my problem.

    If for some reason i was not as clear as a think i was, let me know, i'll try to explain it a little better.

    Thanks in advance for your patience and wisdom.

    Regards.

    ____________________________________________________________

    If you can't do things right at the first time, don't try skydiving. I won't.

  • Add up the RAM each machine you want to virtualize should have (at peak times), then add some more for overhead and growth.

    If you keep the data being read by the DB in RAM (more RAM, or less data in the RAM), you'll do better as well.

  • Hi.

    Thanks for posting. 🙂

    I'm thinking something along these lines:

    - AD+DC = 4GB

    - SQL Server + simple messaging services= 10GB

    - Retail (20 CAL) = 8GB

    - RDS (5 CAL)= 4GB

    - Production (1 ERP auto instance) = 1GB

    Total= 27GB/32GB

    Right now our SQL Server is running 8GB and only 5GB are used, so i think 10GB will be more than enough for the next year, at least.

    The other machines are also oversized, like the ERP auto instance doesnt really need 1GB, but i put them there because of the v-WS2012 instance.

    8GB is more than enough for 20 guys to run RDS. Since in those sessions will only be used to run our POS system.

    4GB to 5 RDS CAL's is also more than enough, due to the fact that it will be used to access our ERP when travelling, or for ERP maintenance guys.

    Do you think 5GB is enough for overhead? Most machines are already armed with enough resources to get by until next year, when i think i may get a second server and begin a small cluster.

    What about I/O usage? Do you think the disks (RAID10) will be able to handle the job with ease?

    thanks

    ____________________________________________________________

    If you can't do things right at the first time, don't try skydiving. I won't.

  • For SQL Server usage, modern Intel processors perform much better than modern AMD processors. You would want to be looking at something like an HP DL380p Gen 8, with Xeon E5-2600 series processors, which will give you much better single-threaded performance than anything from AMD. SQL Server 2012 Standard Edition has a physical processor core limit of 16 cores (or four sockets), whichever is lower.

    It also has a memory limit of 64GB for the database engine. You should consider getting more than 32GB of RAM (since DDR3 ECC RAM is very, very affordable). I would actually get slightly more than 64GB of RAM (such as 72GB or 96GB) so you can set your Max Server Memory at 64000. Having lots of RAM will relieve pressure on your I/O subsystem, and RAM is cheaper and faster than any I/O subsystem.

  • Hi there Glenn.

    I really enjoyed your post. Very concise and straighforward. Neverthless i think 64GB of RAM is bit overkill to a database that's only 16GB althgouh, like i said previously, eaggerly and constantly pounded.

    The system itself may have the 64GB (8x8GB, it's cheaper and leaves slots open), but nowhere near that will be provided to the SQL Server... maybe something between 16-20GB. But hey, you're the expert. I just dont see any benefit from it, specially because i have other v-Machines to run on the host too, that could use the extra RAM.

    I looked into the DL380p Gen8 and it seems at a decent price (2P, 16GB RAM baseline, and a P420i w/1GB FBWC). I will now look for the Dell's and Fujitsu's counter-part.

    Thanks alot for your input. 😉

    ____________________________________________________________

    If you can't do things right at the first time, don't try skydiving. I won't.

  • Speaking of performance... the following is admittedly a bit long but well worth the read (IMHO).

    At one company I worked at, we had a 500GB set of databases that made up a system and it was rapidly growing every day. Many jobs took 8 hours to run. One job, in particular, took 24 hours to usually fail. Because it took so long to run and took multiple reruns to get it to run to completion, they were only having it do 2/3 of the work it was supposed to do.

    To fix these performance problems, they bought a new fire-breathing monster of a server, a whole new high performance SAN, and spent a whole lot of money changing from the Standard Edition of SQL Server to the Enterprise Edition. Of course, there was the cost of migration, testing, ancillary equipment such as routers/switches, etc, etc.

    The 8 hour jobs dropped to about 7 hours for about a month and then went back to 8 hours. The 24 hour job still failed a lot and still took 24 hours for the first failure to usually show up.

    Since the 24 hour job was for PUC compliance, we could get fined for every day that we went past the deadline. As a result, every time the run failed, a half dozen managers, several "experts" in Finance, Accounting, and a couple of other departments, several developers, and a couple of DBAs were all brought together to try and fix the problem. You can imagine how much that cost per hour. The funny thing is, they never fixed it. Every month, the same group of "experts" was summoned and every month, they'd watch it fail over and over and over until, by some small miracle, it would finally run to completion. Even then, they sometimes had to rerun it because when the run failed, it would leave some pretty rotten apples on the trail and people would sometimes miss that bad data before they attempted a rerun.

    The reason why it would fail was just because it took so long to run and something would deadlock with it. The reason why it took so long to run was because of the code itself. To be brief and brutal, it was crap code written by people who didn't understand how to work in a database.

    People thought that it was doomed to continue to take that long. After all, it was checking 63 four million row tables scattered across 63 different dynamically named databases for duplicate CDRs (Call Detail Records). The consensus was that no one would ever be able to do anything to make the job run faster especially since we just installed the new fire breathing hardware and the vendor that wrote the code insisted that no one could handle that much data in less than 24 hours. They were "experts". Everyone believed them.

    That wasn't all of it, either. The code was also executed on a sub-set of 3 databases per day. The code would fail every other day for the previous reasons stated. They tolerated it because each daily run "only" took 45 minutes to run to completion. Besides, the experts said we were lucky it didn't fail more often and that they were surprised that it "only" took 45 minutes considering the amount of data we had (running o 32 bit machines at the time, to boot!).

    Even though I offered to take a look at it, I was told to keep my hands off the code because the "expert" vendor had already stated that it was just because of the "overwhelming amount of data" and that it would be a total waste of time to look at it. If you know me, then you'll also know how much attention I actually paid to that. 😉

    With the help of the Director of Finance (he "owed" the code), who wrote a dandy 2 paragraph requirements document, I wrote the code at home because I'd been told I wasn't allowed to work on it. The Director of Finance understood my problem and had his people test my code instead of using official "QA" channels.

    The first time they tested the daily run, I got an urgent email saying that "it didn't run correctly". When I asked them why, they said "it ran for 7 seconds and then quit". I laughed out loud and told them to check the data. They found out that their 45-minute daily job was now correctly running in less time than it takes to yawn.

    The monthly job also did its trick correctly. The 24-hour job now ran in 11 minutes flat. It also did the full 94 databases in that time instead of the usual 63.

    I also got my hands on some of the 8-hour jobs and made them run, quite literally, "in seconds".

    Alright... sounds like a big brag on my part. That's not my intention. What I want you to understand is that, all told, they spent a bit more than a quarter of a million dollars on new hardware, the Enterprise Edition of SQL Server, the manpower to set it all up with new switches, routers, cabling, etc, etc, the manpower to migrate the databases, synchronize them once the bulk of them had been moved, and to test them. And it did squat for performance.

    Along comes someone with just a bit of database knowledge (I'd only been working with SQL Server for a handful of years) and he did the "impossible". He converted a 24 hour "impossible to improve" job to an 11 minute run that did 50% more work and it hasn't erred in 6 years AND he did it in about 20 hours (4 hours a night for a week).

    So here's the takeaway on this. Yes... buy good fast hardware. In any race, you need a really good car (the server), some really good tires (the SAN), and some really good gas (the Enterprise Edition of SQL Server). BUT!... unless you have a driver that can keep that high performance car on the track, that car isn't any better than a skateboard. If you're going to invest in some hardware, then you really need to invest in someone that knows how to drive it. Even if you buy machines 10 times as fast (and those arn't actually available, yet), where are you going to find a box that will take a run from 24 hours to 11 minutes? It's just not going to happen.

    Spend the money on hiring a full time "Ninja" level database Developer or two and then listen to him/them... or leave the car in the garage. Performance is in the code. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi there Jeff.

    Very interesting story you have there. It wasn't that long, or my levels of interest went sky high when i started to read it.

    The need to buy new hardware is not because of the SQL Server performance per se. In fact, currently that's pretty good and people can do their job pretty fast.

    The problem of SQL Server is a by-product of the need to install new and better hardware to accomodate all the IT infrastructure (server and service wise) the company has and will have in the coming year.

    Then it comes my personal touch. I just love virtualization. I think it's a great way to use your hardware to the fullest and reduce ownership costs, but then it comes my real world approach that tells me that although virtualization is great it has some drawbacks that may jeopordize user experience and the whole project. Those drawbacks, to me and at this point, is IO usage. That's why i'm concerned about SQL Server performance under a virtualized environment. Because SQL Server is the life blood of the company. Our ERP is filled with functions, processes, operation and management maps. Everything that's related to the company's operation (logistics, production, HR, accounting and/or management) is litterally done on the ERP.

    To sum all up, i worried about SQL Server because i want to place (just until the company releases more funds to my department) every single service we run under the same bonet, if you know what i mean. That's why I state the need for more firepower. 🙂

    SQL wise, i myself have been dealing with it for the past 3 years. Maybe we could talk about query code design. I really need to know if i'm doing it by the book.

    Thanks a lot for your input. Really helpfull and insightfull.

    Regards.

    ____________________________________________________________

    If you can't do things right at the first time, don't try skydiving. I won't.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply