Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Scaling Out

By Kumar Part,

Sustaining the steadily increasing work load is essential for an application. If the application uses database as a repository to store data, then that database architecture must be flexible enough to withstand the different ways and means of improvising the scalability. Database’s sustainability and adaptability for the software and hardware level changes, will minimize the work involved in changing your software application.

Scalability Vs Availability

Let’s take an example, where you have a web mart which runs on IIS using SQL Server as a data repository. To serve your day by day increasing number of in-coming traffic 24/7, you need to have your system up and running all the time with an efficient load management at the middle-tier as well as at the data-tier levels.

To achieve the scalability and availability at the middle-tier level, Windows provides network load balancing (NLB), which is part of Windows 2000 Advanced and Datacenter editions. Network load balancing distributes IP traffic to multiple instances of web servers each running on a host within the cluster, from the client perspective the cluster appears to be a single server through the virtual IP Address. NLB can host 32 servers in a cluster to provide scalability and if it detects any failure in one of those servers, it can re-route the request to rest of the servers and hence NLB assures that your middle-tier services are highly available and scalable.

To achieve the availability at the data-tier level, Windows provides Microsoft Cluster Service (MSCS). MSCS manages group of nodes that are responsible for hosting your SQL Server database. Microsoft cluster is a shared nothing cluster, means at a time only one node can host a database. If one of the SQL Server nodes goes down, MSCS can restart the SQL Server instance in another node. MSCS does not balance the load on your SQL Server database. To balance the load on your SQL Server, consider scaling-up and scaling-out solutions for your database environment.

Scaling-up is increasing the resources like memory, CPU speed, etc., Scaling out is adding additional computers and leveraging the benefits of parallel computing. Scaling- up is vertical solution and scaling out is horizontal solution.

Scaling-up is more at hardware level and it doesn’t require any changes in your database design. But, if your database workload demands more processing that you cannot achieve by keep on adding additional resources to your single database server then consider linking additional servers.

Scale Out Solutions

Microsoft SQL Server supports two different types of scale-out solutions; adaptability of these two depends on your environment, database design time and the cost of maintainability.

First solution is at the database level. This involves creating similar databases in all the servers and timely synchronizing data for the transactional consistency. The database can be based on regions, for example your sales database in US can serve America customers and similarly the replica of the sales database in London can serve Europe customers. This way you can minimize the traffic based on regions. This approach is called database replication, the disadvantage in replication is the time involved in doing the synchronization to achieve transactional consistency and the complexity involved in doing the coordinated backup and recoveries between the replicated databases.

Second solution is at the table level. This involves breaking a huge table into smaller tables and creating schematically similar smaller tables on individual database servers and connecting them through an updatable view. The concept of linking individual servers together to provide a consolidated view of data is called database federation. Since data of one or more tables is accessed through an abstracted layer (views), this solution provides transactional consistency and with the SQL Server support for SAN-VI architecture, the federated database solution assures the maximum scalability.

This article focuses on the second scale-out solution.

Database Federation

Database federation can be defined as a group of autonomous servers that work together to take the work load. Each server in a group is managed independently and they can have different maintenance procedures. This group of servers does not have any physical private inter-connect like in MSCS. Servers in these groups can be geographically apart and logically linked.

Database federation can connect number of heterogeneous data sources. Microsoft OLEDB plays a vital role in linking and providing uniform access to data. With the advent of .Net and its newly defined ADO.NET, you can even link web service with your relational data. In simple words, database federation helps you to consolidate heterogeneous data in a monolithic fashion without transforming data from one form to another.

Setting up the Link

To link SQL Server B to SQL Server A, execute the sp_addlinkedserver stored procedure on server A as shown below:

Use master
Go
Exec sp_addlinkedserver ‘ServerB’, N’SQL Server’

When you explicitly specify that you are linking to a SQL server through the second parameter of the sp_addlinkedserver stored procedure, you do not need to specify other parameters such as OLEDB provider name, data source and provider string etc. SQL Server by default uses SQLOLEDB provider for the linkage.

Simply, linking to a server may not let you access the data in the remote server because the remote server characteristics can be different from your local server characteristics. These characteristics are security context, collation etc.

To get the list of linked servers of a local server, execute sp_linkedservers stored procedure on a local server.

Managing the Security Context

When you establish the link, by default the connection’s security context is used to access the remote server. The user account which is used in the local server connection may not exist in the remote server. For this map a local connection account to a remote server account. Use sp_addlinkedsrvlogin stored procedure to control the security context of linked servers.

The following example maps domain\localuser of server A to an account in server B

sp_addlinkedsrvlogin ‘ServerB’, ‘false’, ‘domain\localuser’, ‘ServerB_User’,’ServerB_UserPassword’

To map all the accounts of server A to a specific account in server B use sp_addlinkedsrvlogin as shown below:

sp_addlinkedsrvlogin ‘ServerB’, ‘false’, NULL, ‘ServerB_User’,’ServerB_UserPassword’

If the second parameter is true, then the local server security context is used to connect to the remote server. If your local connection uses Windows authentication, your local server must support security account delegation. To delegate an account, you must be running Windows 2000 with Kerberos support enabled and you must be using Active Directory services of Windows 2000. For the delegation to work enable and disable the following options in your AD domain.

Enable these two:

  • Account is trusted for delegation
  • Computer is trusted for delegation

And disable the option Account is sensitive and cannot be delegated

To drop a specific local login mapping with the linked server, execute the stored procedure sp_droplinkedsrvlogin on the local server.

Use Master
Execute sp_droplinkedsrvlogin ‘ServerB’, ‘locallogin’

To remove the default login mapping originally created by sp_addlinkedserver execute sp_droplinkedsrvlogin as follows:
 

Use Master
Execute sp_droplinkedsrvlogin ‘ServerB’, NULL

Linked Server Options to Improve Query Performance

When you link more than one server for scalability the performance of the query execution is very important. Consider the following server options, to improve the query performance.

Use master
sp_serveroption ‘serverB’, ’lazy schema validation’, ‘true’

When you turn on the lazy schema validation option, the SQL Server query engine will not check the schema until the query execution. If the change of the schema in the remote table is detected at the execution time, the query will fail with an error.

Similarly, turn on collation compatible option when the linked server character set and sort order is same as local server. By turning on this option, local server delegates the character column comparison to the OLEDB provider used for the remote table. Because OLEDB provider caches all the remote table data, the comparison can be performed at the provider level using internal index mechanism of the provider. If this option is false all row data will be sent to local server for the comparison, this will increase the amount of data transfer over the network and will impact the performance.

Use master
sp_serveroption ‘serverB’, ‘collation compatible’, ‘true’

Queries involving bit and uniqueidentifier columns are always evaluated locally and cannot be delegated to the provider

Disconnecting the linked server

To drop the linkage use the sp_dropserver stored procedure as shown below:

Use master
Go
Exec sp_dropserver ‘B’, NULL

If you do not specify the second parameter, the default value of NULL will be assumed..

To drop the linkage and associated remote logins from the local server, execute sp_dropserver stored procedure as shown below:

Exec sp_dropserver ‘B’, ‘droplogins’

Partitioning data

After you set up the linkage, the next step is partitioning your data. This involves, breaking a huge table into smaller tables and distributing across the linked servers. These smaller tables will have the identical schema except the validation criteria on columns on which the whole data partition gets devised. These columns are called as data partition columns (DPC). So when you partition data, note down the following:

  • DPC needs to be part of your primary key constraint and hence all the primary key column rules will apply to DPC.
  • Columns in the base table cannot have default values.
  • If the base table references a foreign key, keep the foreign key table’s partition in the same server along with the base table.

Partition can be symmetric or asymmetric. Symmetric partition is an idle partition to achieve, in symmetric partition data is partitioned uniformly across the member servers. For example, consider a system where a customers table is partitioned across Server A(CustomerID from 1 through 3299999) and ServerB (CustomerID from 3300000 through 6599999). When you have more complex data access patterns, you may not be able to achieve symmetric data partition. In this case, asymmetric partition helps. In an asymmetric partition, the tables which are good candidate for partitioning get partitioned, but the tables which do not qualify for partitioning stays in the local server. This way you moved some data to linked servers and hence minimized the work load on the local server. In other words, asymmetric partitioning is better than having no partition at all.

Accessing Partitioned Data

When the total number of data row increases and it crosses the maximum size of your partition, you need to create a new table and also you need to change your application to reference the newly created table. Whenever you create a new partition, changing your application will incur overhead on time and cost.

To avoid this, SQL Server supports abstracting your data through partitioned views. A partitioned view links one or more tables. If a base table of a view comes from a remote server the partitioned view is called as distributed partitioned view (DPV).

If you do not use partitioned view, you have to code the logic to locate a member table.

For example, the below code fetches customer data from ServerB, if the customerID falls in between 3299999 and 6599999, otherwise locale member table will be accessed.

If(@customer_id < 3299999)

BEGIN
…

Select * from ServerA.northwind.dbo.Customers
…

END
…
ELSE 

IF (@customer_id > 3299999 AND @customer_id < 6599999)
BEGIN
…

Select * from ServerB.northwind.dbo.Customers
…

END
…

If you are adding one more member table, the above code snippet needs to be changed.
Instead, if you use partitioned view, you have to change the underlying view definition to include the new member table and it does not require any change in your application because your application references the partitioned views not the member tables.

For example, the above code snippet will be split into two, one is DPV definition and another one is actual code which calls the DPV.

Here is view definition

CREATE VIEW DPV_Customers AS
SELECT * FROM northwind.dbo.Customers
UNION ALL
SELECT * FROM ServerB.northwind.dbo.Customers

And call the above DPV in your code as shown below:-


Here is the modified code:

SELECT * FROM DPV_Customers WHERE CustomerID=@customer_id

In the above code, SQL Server query processor routes the request and fetches the data from ServerB. This routing mechanism is based on the underlying base table’s CHECK constraint and the link definition. In your query always reference the remote table using four part name as shown below.

<Server Name>.<Database Name>.<Owner Name>.<Object Name>

When you design a query for your federated environment, note down the following constraints:

  • Unlike SQL Server 7.0, in SQL Server 2000, partitioned views are updateable. In order for your partition to be updateable, the underlying table primary key needs to include the partitioned column, otherwise you have to use INSTEAD OF trigger to do the update and remember to call BEGIN DISTRIBUTED TRANSACTION statement in your trigger script.
  • It is essential to use CHECK constraint on the base table’s partition column otherwise the query optimizer will search all the tables in the federation.
  • SELECT statements INTO clause referring to a remote table is not allowed
  • DDL statements like CREATE, ALTER and DROP are not allowed against the linked server.
  • Text manipulation functions like READTEXT, WRITETEXT and UPDATETEXT statements are not allowed.
  • GROUP BY ALL is not allowed when the WHERE clause is specified, but GROUP BY without specifying ALL is supported.
  • ORDER BY clause cannot be used if your query references BLOB columns of a remote table.
  • IS NULL and IS NOT NULL cannot reference BLOB columns of a remote table.

Indexed Views

If you have index on the partitioned views, query performance can be further improved,. When you index a view the result set of the index is materialized at the time the index is created. When you include indexed views in your query, SQL Server will compare the cost of fetching data from the indexed view with the cost of fetching data from the remote query and pick the cheaper strategy. If you design indexed views carefully, the performance benefits are not limited to queries that specify the indexed view in their FROM clause, the performance benefits apply to any query that references data covered by the indexed view.

Indexed views depend on base tables for their data. Such dependency means that if you change a base table contributing to an indexed view, the indexed view might become invalid, so use indexed views on a table, whose data is not going to change. For example, if one of your base table references the last year’s transaction, use index view on that base table.

SAN-VI

In a federated environment, group of servers connected by a low latency and high speed inter connect, this type of LAN or WAN is called System Area Network (SAN). Virtual Interface (VI) architecture is used as a protocol for communication in SAN environment. VI replaces TCP/IP, unlike TCP/IP; VI uses direct memory copy technology between servers through VI communication channels. Microsoft SQL Server 2000 supports VI architecture and also offers tools to work with VIA.

For VIA to work, you must use the supported hardware like cLan from Giganet. Refer to http://www.viaarch.org, http://www.giganet.com and http://www.Microsoft.com for more information about VI architecture.

Backup and Restore of Database Federation

Manageability of your environment can become more challenging when more servers are added to your federation. The backup and restore features that are included with SQL Server 2000 become more complex when working in a federated environment.

SQL Server federated database environment does not require that you coordinate backups across member servers. Backups can be taken from each database independently, without regard for the state of the other member databases. Because the backups do not have to be synchronized, there is no processing overhead for synchronization and no blockage of running tasks.

But you need to coordinate recoveries across member servers to restore databases at the same logical point in time. SQL Server 2000 has made these restores easier with the implementation of marked transactions. You can restore the database exactly at the mark or before the mark. These marks are stored in MSDB database.

Here is the nicer example from SQL Server Books online, which describes recovery using transaction marks.

BEGIN TRANSACTION RoyaltyUpdate 
WITH MARK 'Update royalty values'
GO
USE pubs
GO
UPDATE roysched
SET royalty = royalty * 1.10
WHERE title_id LIKE 'PC%'
GO
COMMIT TRANSACTION RoyaltyUpdate
GO
--Time passes. Regular database 
--and log backups are taken.
--An error occurs.
USE master
GO

RESTORE DATABASE pubs
FROM Pubs1
WITH FILE = 3, NORECOVERY
GO
RESTORE LOG pubs
FROM Pubs1
WITH FILE = 4,
STOPATMARK = 'RoyaltyUpdate'

By restoring, using the transaction mark in all the member servers, you can achieve transactional consistency for your data in the federated database environment.

Conclusion

Setting up the scale out environment using database federation is a trade off between manageability and scalability. By using, SQL Server tools, you can reduce the complexity in managing your federated databases and can achieve the effective scalable environment.

Also the growing research in the areas like Infiniband architecture will answer your future growing demand for a scalable solution.

References

1. Microsoft SQL Server Books Online
2. http://www.dell.com/downloads/global/vectors/via_sql2000.pdf
3. http://www.research.ibm.com/journal/sj/414/haas.pdf
4. http://www.viaarch.org

Total article views: 7265 | Views in the last 30 days: 13
 
Related Articles
FORUM

Accessing tables in remote database server from local server.

Accessing tables in remote database server from local server.

FORUM

Partitioning...

querying partitioned table still slow..

FORUM

cluter server databases backup on local drive

cluter server databases backup on local drive

FORUM

Partition

Partition

FORUM

local server

no local server

Tags
miscellaneous    
performance tuning    
sql server 7    
t-sql    
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones