SQLServerCentral Article

Scaling Out

,

Sustaining the steadily increasing work load is essential for an application.

If the application uses database as a repository to store data, then that

database architecture must be flexible enough to withstand the different ways

and means of improvising the scalability. Database’s sustainability and

adaptability for the software and hardware level changes, will minimize the work

involved in changing your software application.

Scalability Vs Availability

Let’s take an example, where you have a web mart which runs on IIS using SQL

Server as a data repository. To serve your day by day increasing number of

in-coming traffic 24/7, you need to have your system up and running all the time

with an efficient load management at the middle-tier as well as at the data-tier

levels.

To achieve the scalability and availability at the middle-tier level, Windows

provides network load balancing (NLB), which is part of Windows 2000 Advanced

and Datacenter editions. Network load balancing distributes IP traffic to

multiple instances of web servers each running on a host within the cluster,

from the client perspective the cluster appears to be a single server through

the virtual IP Address. NLB can host 32 servers in a cluster to provide

scalability and if it detects any failure in one of those servers, it can

re-route the request to rest of the servers and hence NLB assures that your

middle-tier services are highly available and scalable.

To achieve the availability at the data-tier level, Windows provides

Microsoft Cluster Service (MSCS). MSCS manages group of nodes that are

responsible for hosting your SQL Server database. Microsoft cluster is a shared

nothing cluster, means at a time only one node can host a database. If one of

the SQL Server nodes goes down, MSCS can restart the SQL Server instance in

another node. MSCS does not balance the load on your SQL Server database. To

balance the load on your SQL Server, consider scaling-up and scaling-out

solutions for your database environment.

Scaling-up is increasing the resources like memory, CPU speed, etc., Scaling

out is adding additional computers and leveraging the benefits of parallel

computing. Scaling- up is vertical solution and scaling out is horizontal

solution.

Scaling-up is more at hardware level and it doesn’t require any changes in

your database design. But, if your database workload demands more processing

that you cannot achieve by keep on adding additional resources to your single

database server then consider linking additional servers.

Scale Out Solutions

Microsoft SQL Server supports two different types of scale-out solutions;

adaptability of these two depends on your environment, database design time and

the cost of maintainability.

First solution is at the database level. This involves creating similar

databases in all the servers and timely synchronizing data for the transactional

consistency. The database can be based on regions, for example your sales

database in US can serve America customers and similarly the replica of the

sales database in London can serve Europe customers. This way you can minimize

the traffic based on regions. This approach is called database replication, the

disadvantage in replication is the time involved in doing the synchronization to

achieve transactional consistency and the complexity involved in doing the

coordinated backup and recoveries between the replicated databases.

Second solution is at the table level. This involves breaking a huge table

into smaller tables and creating schematically similar smaller tables on

individual database servers and connecting them through an updatable view. The

concept of linking individual servers together to provide a consolidated view of

data is called database federation. Since data of one or more tables is accessed

through an abstracted layer (views), this solution provides transactional

consistency and with the SQL Server support for SAN-VI architecture, the

federated database solution assures the maximum scalability.

This article focuses on the second scale-out solution.

Database Federation

Database federation can be defined as a group of autonomous servers that work

together to take the work load. Each server in a group is managed independently

and they can have different maintenance procedures. This group of servers does

not have any physical private inter-connect like in MSCS. Servers in these

groups can be geographically apart and logically linked.

Database federation can connect number of heterogeneous data sources.

Microsoft OLEDB plays a vital role in linking and providing uniform access to

data. With the advent of .Net and its newly defined ADO.NET, you can even link

web service with your relational data. In simple words, database federation

helps you to consolidate heterogeneous data in a monolithic fashion without

transforming data from one form to another.

Setting up the Link

To link SQL Server B to SQL Server A, execute the sp_addlinkedserver stored

procedure on server A as shown below:

Use master
Go
Exec sp_addlinkedserver ‘ServerB’, N’SQL Server’

When you explicitly specify that you are linking to a SQL server through the

second parameter of the sp_addlinkedserver stored procedure, you do not need to

specify other parameters such as OLEDB provider name, data source and provider

string etc. SQL Server by default uses SQLOLEDB provider for the linkage.

Simply, linking to a server may not let you access the data in the remote

server because the remote server characteristics can be different from your

local server characteristics. These characteristics are security context,

collation etc.

To get the list of linked servers of a local server, execute sp_linkedservers

stored procedure on a local server.

Managing the Security Context

When you establish the link, by default the connection’s security context is

used to access the remote server. The user account which is used in the local

server connection may not exist in the remote server. For this map a local

connection account to a remote server account. Use sp_addlinkedsrvlogin stored

procedure to control the security context of linked servers.

The following example maps domain\localuser of server A to an account in

server B

sp_addlinkedsrvlogin ‘ServerB’, ‘false’, ‘domain\localuser’, ‘ServerB_User’,’ServerB_UserPassword’

To map all the accounts of server A to a specific account in server B use

sp_addlinkedsrvlogin as shown below:

sp_addlinkedsrvlogin ‘ServerB’, ‘false’, NULL, ‘ServerB_User’,’ServerB_UserPassword’

If the second parameter is true, then the local server security context is

used to connect to the remote server. If your local connection uses Windows

authentication, your local server must support security account delegation. To

delegate an account, you must be running Windows 2000 with Kerberos support

enabled and you must be using Active Directory services of Windows 2000. For the

delegation to work enable and disable the following options in your AD domain.

Enable these two:

  • Account is trusted for delegation
  • Computer is trusted for delegation

And disable the option Account is sensitive and cannot be delegated

To drop a specific local login mapping with the linked server, execute the

stored procedure sp_droplinkedsrvlogin on the local server.

Use Master
Execute sp_droplinkedsrvlogin ‘ServerB’, ‘locallogin’

To remove the default login mapping originally created by sp_addlinkedserver

execute sp_droplinkedsrvlogin as follows:

 

Use Master
Execute sp_droplinkedsrvlogin ‘ServerB’, NULL

Linked Server Options to Improve Query Performance

When you link more than one server for scalability the performance of the

query execution is very important. Consider the following server options, to

improve the query performance.

Use master
sp_serveroption ‘serverB’, ’lazy schema validation’, ‘true’

When you turn on the lazy schema validation option, the SQL Server query

engine will not check the schema until the query execution. If the change of the

schema in the remote table is detected at the execution time, the query will

fail with an error.

Similarly, turn on collation compatible option when the linked server

character set and sort order is same as local server. By turning on this option,

local server delegates the character column comparison to the OLEDB provider

used for the remote table. Because OLEDB provider caches all the remote table

data, the comparison can be performed at the provider level using internal index

mechanism of the provider. If this option is false all row data will be sent to

local server for the comparison, this will increase the amount of data transfer

over the network and will impact the performance.

Use master
sp_serveroption ‘serverB’, ‘collation compatible’, ‘true’

Queries involving bit and uniqueidentifier columns are always evaluated

locally and cannot be delegated to the provider

Disconnecting the linked server

To drop the linkage use the sp_dropserver stored procedure as shown below:

Use master
Go
Exec sp_dropserver ‘B’, NULL

If you do not specify the second parameter, the default value of NULL will be

assumed..

To drop the linkage and associated remote logins from the local server,

execute sp_dropserver stored procedure as shown below:

Exec sp_dropserver ‘B’, ‘droplogins’

Partitioning data

After you set up the linkage, the next step is partitioning your data. This

involves, breaking a huge table into smaller tables and distributing across the

linked servers. These smaller tables will have the identical schema except the

validation criteria on columns on which the whole data partition gets devised.

These columns are called as data partition columns (DPC). So when you partition

data, note down the following:

  • DPC needs to be part of your primary key constraint and hence all the

    primary key column rules will apply to DPC.

  • Columns in the base table cannot have default values.
  • If the base table references a foreign key, keep the foreign key table’s

    partition in the same server along with the base table.

Partition can be symmetric or asymmetric. Symmetric partition is an idle

partition to achieve, in symmetric partition data is partitioned uniformly

across the member servers. For example, consider a system where a customers

table is partitioned across Server A(CustomerID from 1 through 3299999) and

ServerB (CustomerID from 3300000 through 6599999). When you have more complex

data access patterns, you may not be able to achieve symmetric data partition.

In this case, asymmetric partition helps. In an asymmetric partition, the tables

which are good candidate for partitioning get partitioned, but the tables which

do not qualify for partitioning stays in the local server. This way you moved

some data to linked servers and hence minimized the work load on the local

server. In other words, asymmetric partitioning is better than having no

partition at all.

Accessing Partitioned Data

When the total number of data row increases and it crosses the maximum size

of your partition, you need to create a new table and also you need to change

your application to reference the newly created table. Whenever you create a new

partition, changing your application will incur overhead on time and cost.

To avoid this, SQL Server supports abstracting your data through partitioned

views. A partitioned view links one or more tables. If a base table of a view

comes from a remote server the partitioned view is called as distributed

partitioned view (DPV).

If you do not use partitioned view, you have to code the logic to locate a

member table.

For example, the below code fetches customer data from ServerB, if the

customerID falls in between 3299999 and 6599999, otherwise locale member table

will be accessed.

If(@customer_id < 3299999)
BEGIN
…
Select * from ServerA.northwind.dbo.Customers
…
END
…
ELSE 
IF (@customer_id > 3299999 AND @customer_id < 6599999)
BEGIN
…
Select * from ServerB.northwind.dbo.Customers
…
END
…

If you are adding one more member table, the above code snippet needs to be

changed.

Instead, if you use partitioned view, you have to change the underlying view

definition to include the new member table and it does not require any change in

your application because your application references the partitioned views not

the member tables.

For example, the above code snippet will be split into two, one is DPV

definition and another one is actual code which calls the DPV.

Here is view definition

CREATE VIEW DPV_Customers AS
SELECT * FROM northwind.dbo.Customers
UNION ALL
SELECT * FROM ServerB.northwind.dbo.Customers

And call the above DPV in your code as shown below:-

Here is the modified code:

SELECT * FROM DPV_Customers WHERE CustomerID=@customer_id

In the above code, SQL Server query processor routes the request and fetches

the data from ServerB. This routing mechanism is based on the underlying base

table’s CHECK constraint and the link definition. In your query always reference

the remote table using four part name as shown below.

<Server Name>.<Database Name>.<Owner Name>.<Object Name>

When you design a query for your federated environment, note down the following

constraints:

  • Unlike SQL Server 7.0, in SQL Server 2000, partitioned views are

    updateable. In order for your partition to be updateable, the underlying

    table primary key needs to include the partitioned column, otherwise you

    have to use INSTEAD OF trigger to do the update and remember to call BEGIN

    DISTRIBUTED TRANSACTION statement in your trigger script.

  • It is essential to use CHECK constraint on the base table’s partition

    column otherwise the query optimizer will search all the tables in the

    federation.

  • SELECT statements INTO clause referring to a remote table is not allowed
  • DDL statements like CREATE, ALTER and DROP are not allowed against the

    linked server.

  • Text manipulation functions like READTEXT, WRITETEXT and UPDATETEXT

    statements are not allowed.

  • GROUP BY ALL is not allowed when the WHERE clause is specified, but

    GROUP BY without specifying ALL is supported.

  • ORDER BY clause cannot be used if your query references BLOB columns of

    a remote table.

  • IS NULL and IS NOT NULL cannot reference BLOB columns of a remote table.

Indexed Views

If you have index on the partitioned views, query performance can be further

improved,. When you index a view the result set of the index is materialized at

the time the index is created. When you include indexed views in your query, SQL

Server will compare the cost of fetching data from the indexed view with the

cost of fetching data from the remote query and pick the cheaper strategy. If

you design indexed views carefully, the performance benefits are not limited to

queries that specify the indexed view in their FROM clause, the performance

benefits apply to any query that references data covered by the indexed view.

Indexed views depend on base tables for their data. Such dependency means

that if you change a base table contributing to an indexed view, the indexed

view might become invalid, so use indexed views on a table, whose data is not

going to change. For example, if one of your base table references the last

year’s transaction, use index view on that base table.

SAN-VI

In a federated environment, group of servers connected by a low latency and

high speed inter connect, this type of LAN or WAN is called System Area Network

(SAN). Virtual Interface (VI) architecture is used as a protocol for

communication in SAN environment. VI replaces TCP/IP, unlike TCP/IP; VI uses

direct memory copy technology between servers through VI communication channels.

Microsoft SQL Server 2000 supports VI architecture and also offers tools to work

with VIA.

For VIA to work, you must use the supported hardware like cLan from Giganet.

Refer to http://www.viaarch.org, http://www.giganet.com and http://www.Microsoft.com

for more information about VI architecture.

Backup and Restore of Database Federation

Manageability of your environment can become more challenging when more

servers are added to your federation. The backup and restore features that are

included with SQL Server 2000 become more complex when working in a federated

environment.

SQL Server federated database environment does not require that you coordinate

backups across member servers. Backups can be taken from each database

independently, without regard for the state of the other member databases.

Because the backups do not have to be synchronized, there is no processing

overhead for synchronization and no blockage of running tasks.

But you need to coordinate recoveries across member servers to restore databases

at the same logical point in time. SQL Server 2000 has made these restores

easier with the implementation of marked transactions. You can restore the

database exactly at the mark or before the mark. These marks are stored in MSDB

database.

Here is the nicer example from SQL Server Books online, which describes recovery

using transaction marks.

BEGIN TRANSACTION RoyaltyUpdate 
WITH MARK 'Update royalty values'
GO
USE pubs
GO
UPDATE roysched
SET royalty = royalty * 1.10
WHERE title_id LIKE 'PC%'
GO
COMMIT TRANSACTION RoyaltyUpdate
GO
--Time passes. Regular database 
--and log backups are taken.
--An error occurs.
USE master
GO
RESTORE DATABASE pubs
FROM Pubs1
WITH FILE = 3, NORECOVERY
GO
RESTORE LOG pubs
FROM Pubs1
WITH FILE = 4,
STOPATMARK = 'RoyaltyUpdate'

By restoring, using the transaction mark in all the member servers, you can

achieve transactional consistency for your data in the federated database

environment.

Conclusion

Setting up the scale out environment using database federation is a trade off

between manageability and scalability. By using, SQL Server tools, you can

reduce the complexity in managing your federated databases and can achieve the

effective scalable environment.

Also the growing research in the areas like Infiniband architecture will

answer your future growing demand for a scalable solution.

References

1. Microsoft SQL Server Books Online

2. http://www.dell.com/downloads/global/vectors/via_sql2000.pdf

3. http://www.research.ibm.com/journal/sj/414/haas.pdf

4. http://www.viaarch.org

Rate

4.75 (4)

You rated this post out of 5. Change rating

Share

Share

Rate

4.75 (4)

You rated this post out of 5. Change rating