SQL Needs to scale out!

  • One of the biggest problems i have with SQL Server now, is it scalability. SQL does not scale out and after a while scaling up is not an option. Were I currently work we are using SQL for a web based application. Our current load is in the millions of connections a day (not a problem atm). But we are expecting growth that will take us up to the billions of connections per day. I have serious concerns whether or not SQL will be able to handle it.

    I am taking steps to redesign the database and implementation now to support this kind of load and still met the 10 ms overall response times. If you have scaled out SQL to support your application I would like to hear how you did it. Merge replication seems to be the only answer at present.

    Raymond Laubert
    Exceptional DBA of 2009 Finalist
    MCT, MCDBA, MCITP:SQL 2005 Admin,
    MCSE, OCP:10g

  • to scale our web application out (millions + hits a day) we designed one central database that held all the write information and then this data was replicated out to a number of sql servers that were designed to be read only.

    We were in the fortunate position of having an application that had very few writes, and a very high number of reads so this approach worked well for us. It also allowed for new subscribers to be added if the load demanded.

    Though we planned this from the ground up and it allowed us to build a modular web application that could source the data from one server and write to another server.

    We then used a NLB to handle the traffic between these subscription databases.

    Overall the solution works well, but it took a lot of planning and development

  • Scalability is an architectural and/or a design issue. There is nothing inherent in SQL Server that makes it non-scalable... in fact, in my experience it is a very flexible and scalable platform. It offers all sorts of features in its various flavors including clustering but you can't just throw hardware at it.

    I'd suggest you take a good look at your database design and overall architecture to see where your bottlenecks are.

    The probability of survival is inversely proportional to the angle of arrival.

  • What proportion of reads do you have to writes?

    Can you subdivide what you do along a product line basis?

    Can you break your database out into separate DBs for customers, products etc.

    Sharing data is also an option. The approach that seems to work quite well is having a shard master DB that simply tells the application to which shard a particular piece of data has been assigned.

    If a customer tries to retrieve a piece of information then the first query hits the shard master and asks "where is this data". It then tries to retrieve the data from where it has been told that data is.

    If the shard goes down then the request goes onto a queue for retrieval from a bigger long term store. On retrieval the data is put on a shard and the shard master makes a note of where the data is.

    The beauty of it is you can take shards offline and the system will rebalance the shards that are available. Similarly if you add shards the data will slowly balance out over time.

  • Ray Laubert (7/27/2011)


    One of the biggest problems i have with SQL Server now, is it scalability. SQL does not scale out and after a while scaling up is not an option. Were I currently work we are using SQL for a web based application. Our current load is in the millions of connections a day (not a problem atm). But we are expecting growth that will take us up to the billions of connections per day. I have serious concerns whether or not SQL will be able to handle it.

    I am taking steps to redesign the database and implementation now to support this kind of load and still met the 10 ms overall response times. If you have scaled out SQL to support your application I would like to hear how you did it. Merge replication seems to be the only answer at present.

    No offense but SQL Server is the RDBMS for myspace.com.

    Their design is scale out and it works VERY well.

    You might get in touch with the sqlcat team or at least reaad their whitepapers (might be a good excuse to go to PASS this year, this is where I learnt about this in a session straight from the mouth of the lead DBA of myspace).

    IIRC they were hitting millions of queries per second (minute at the most). They were adding new servers weekly, if not daily. Once they add the design down if was relatively smooth sailing (as much as you can have with exponential growth).

  • I was going to cite myspace.com too. I heard Christa Stelzmuller speak at PASS 2009 and she was flanked by SQLCAT team members. They were explaining how they use Service Broker to scale out to hundreds of servers. They were the pioneers on the Service Broker multi-casting feature. MySpace built their infrastructure on SQL 2005 where multi-casting was not a native feature of Service Broker...it is now and I think it can be attributed to their efforts working with SQLCAT.

    Here is a cover article from SQLMag. It came out less than a month after PASS so it was neat to see them on the cover after I had just heard Christa speak. It's a little light on technical implementation details (i.e. code) but it shows it can be done and explains at a high-level how they got there:

    http://www.sqlmag.com/article/scalability/myspace-answers-the-question-why-sql-server-

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • lol, you know you have a big integration when you need to balance load the machine that balance loads all the requests to the other 100s of servers :w00t:.

    Thanks for the link, I saw her in 06 or 07. I'll see where they are now!

  • Ray here are a couple of great datatbase sharding videos from dbshards.com

    http://vimeo.com/26742356

    http://www.slideshare.net/rightscale/rightscale-webinar-scaling-your-database-in-the-cloud

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply