jump to navigation

SQL Server 2005 DB Availability Features December 4, 2008

Posted by msrviking in High Availability.
trackback

Sometime ago I had to come up with a document in my IT shop talking and comparing different SQL Server 2005 High Availability features.

With inputs from several SQL Server websites, MSDN, blog posts from my favorite bloggers, and other resources I managed to bring up what is in here. Also, at the end I brought a table which is a checklist of all the available features.

What is availability?

The simple definition of the availability of a system is the amount of time a system is up in a given year. Hence if a system is needed to be available for 99.99%, then it can’t be down for more than about 52 minutes in a year.

What is high availability?

A highly available system is one that can potentially nullify effects of the failure and maintain availability to such a degree that end users are not affected by the failure.

Higher availability in SQL Server world would revolve around these questions

Ø Are the instances up and running?

Ø Are all databases always available for the business?

Ø What if we bring the instance or database down as part of planned downtime, will the business tolerate this planned downtime?

Ø What if the instance or database crashes as part of an unplanned downtime, will the business tolerate such unplanned downtime?

In order to bring the instance or database up and running we have following high availability features in SQL Server 2005

Ø Database mirroring (New in SQL Server 2005)

Ø Failover clustering

Ø Replication

Ø Log shipping

Scalability vs. High Availability:

Availability and Scalability may seem to be conceptually related but they are actually not. With high availability, we guarantee a certain level of up-time, whereas scalability is primarily to get better performance through utilization of more resources.

In scalability, we have two different solutions for achieving better performance and to balance the load on SQL Server – scaling-up and scaling-out. Scaling up is about increasing resources like memory, CPU, disk controllers etc and is at hardware level. Scale out is using additional servers to distribute the load. This document is focused on availability options; a subsequent document will highlight scalability options in SQL Server 2005.

Description of High Availability options

Database mirroring

In database mirroring, an originating SQL Server 2005 instance continuously sends a database’s transaction log records to a copy of the database on another standby SQL Server instance. The originating database and server have the role of principal, and the receiving database and server have the role of mirror. The principal and mirror servers are separate instances of SQL Server 2005.

In all SQL Server databases, data changes are recorded in the transaction log before any changes to actual data pages are made. The transaction log records are placed first in a database’s log buffer in memory, and then flushed to disk (or ‘hardened’) as quickly as possible. In database mirroring, as the principal server writes the principal database’s log buffer to disk, it simultaneously sends that block of log records to the mirror instance. When the mirror server receives a block of log records, it places the log records first into the mirror database’s log buffer and then hardens them to disk as quickly as possible.

The principal and mirror servers are each considered a partner in the database mirroring session. A database mirroring session consists of a relationship between the partner servers when they mirror a database from one partner to another. A given partner server may have the principal role for one database and a mirror role for a different database.

In addition to the two partner servers (principal and mirror) a database mirroring session may have an optional third server, called the witness. The witness server’s role is to enable automatic failover. When database mirroring is used for high availability, if a principal server suddenly fails, if the mirror server has confirmation from the witness, it can automatically take on the role of principal and make its database available within a few seconds. In such a mode, when using ADO.Net or the SQL native client, the application can failover transparently using a connection string that looks, for example, like:

Data Source=myServerAddress; Failover Partner=myMirrorServer; Initial Catalog=myDataBase; Integrated Security=True;

There are two types of transfer mechanism for a database mirroring session: Synchronous and Asynchronous.

Synchronous transfer: The principal server hardens its principal database log records to disk; it also sends them to the mirror. The principal then waits for a response from the mirror server. The mirror responds when it has hardened those same log records to the mirror’s log disk.

Asynchronous transfer: The principal server will not wait for an acknowledgment from the mirror that the mirror has hardened a block of transaction records. The mirror will attempt to keep up with the principal, by recording transactions as quickly as possible, but some transactions may be lost if the principal suddenly fails and you force the mirror into service.

Failover clustering

A failover cluster consists of a set of redundant servers (called nodes) that share an external disk system. Clustering relies on the clustering capability of the Windows operating system. Windows must hence be configured for clustering. The server editions that support clustering in Windows 2003 are Enterprise and Datacenter. This mechanism also relies on a quorum disk that is shared by both instances, and holds all database files. This is typically a shared disk array.

During a cluster failover, a virtual SQL Server instance moves from one node to another. As a result, a cluster failover appears to external applications as if the virtual SQL Server instance is briefly unavailable (usually for less than a minute), then available again. The instance seemingly just stops and restarts. Behind the scenes, when one SQL Server instance located on one physical server becomes unavailable, Windows closes the database data files that the instance had open on a commonly shared disk space, starts another SQL Server instance on another physical server, opens the same data files, and takes over the virtual server name and virtual IP address of the failed instance. Open connections are closed during the failover, all committed transactions are rolled forward during the redo phase of database recovery in secondary node, and the undo phase will rollback the uncommitted records.

There are two types of failover clustering in Windows: Active/Passive and Active/Active.

Active/Passive means that your cluster has an active node and a passive mode. If your active node failed, then its defined resources would shift to the passive node and it would become active. The passive node is not accessible unless an accident occurs and the resources shifted.

In Active/Active clustering, both nodes are accessible and active. If a node fails, then its resources would shift to the other active node. The node that survives would then carry the load for both nodes.

Transactional Replication

Of the various types of replication supported by SQL Server, transactional replication is used for high availability primarily because it can guarantee transactional consistency between the publisher and subscribers, and also because its performance for continuous replication is much better than that of other replication methods. The most common high availability scenario for transactional replication occurs when you copy data from one database – the publisher – to one or more subscribers through a distribution server. If the publisher fails, one of the subscribers can become a read/write server and accept data updates—and even become a publisher to the other subscribers.

Log shipping

Log shipping allows you to automatically send transaction log backups from one database (known as the primary database) to a secondary database on another server (known as the secondary server). At the secondary server, these transaction log backups are restored to the secondary database, keeping it closely synchronized with the primary database. An optional third server, known as the monitor server records the history and status of backup and restores operations and optionally raises alerts if these operations fail to occur as scheduled.

When / Why should you the below availability features:

If any of your answers to the below questions is “Yes”, then that availability feature is for you.

Database mirroring:

1) Do you need a quick (<3-5 sec) automatic failover?

2) Do you want to not have the hardware expenses of a cluster?

3) Are you okay with a small performance overhead due to mirroring in the high protection mode, or with a small potential data loss in the high performance mode?

4) Do you need to protect individual databases (as opposed to all databases on an instance?

5) Do you want an automatic application redirection to standby?

6) Do you want to have standby over WAN?

7) Do you want to access standby for read-only access to a snapshot of the data?

Failover clustering:

1) Do you have hardware like a large shared cluster disk array, a dual NIC, a high capacity network, high capacity CPU and RAM listed in MS Windows 2003 HCL?

2) Do you need databases failover without any performance overhead?

3) Are you okay with a downtime of about 1-2 minutes during failover?

4) Do you need all databases of an instance as highly available?

5) Do you need a scalable solution for database availability?

Transactional replication and Log shipping:

1) Do you tolerate database availability delays that result from a manual failover?

2) Are you okay with clients not being completely transparent to failover?

3) Do you need to spread the standby servers across your WAN?

4) Do you want to balance the load across servers by accessing a read-only copy of the primary database?

Checklist of availability features

Category

Availability Feature

Database Mirroring

Failover Clustering

Log Shipping

Transactional Replication

Failover Characteristics

Standby type

Hot

Hot

Warm

Hot

Failure detection

Yes

Yes

No

No

Automatic failover

Yes

Yes

No

No

Masks disk failure

Yes

No

Yes

Yes

Masks SQL process failure

Yes

Yes

Yes

Yes

Masks other process failure

No

Yes

No

No

Metadata support

User Database

All databases (System and User Database)

User database

User database

Transactional consistency

Yes

Yes

Yes

Yes

Transactions are current

Yes

Yes

No

Yes

Perceived downtime

< 3 seconds

<20 seconds and db recovery time

Seconds plus recovery time

Seconds plus role change time

Transparent to client

Yes

Yes

No

No

Automatic role change

Yes

Yes

No

No

Configuration

Special hardware needed

No

Yes

No

No

Distance limit

Unlimited

Limited

Dispersed

Dispersed

Complexity

Some

More

Some

Some

Standby – accessible

Yes

No

Yes

Yes

Impact on performance

Some

No impact

Some

Some (Depending kind of subscription)

Impact on backup strategy

No impact

No impact

Some

No impact

Version availability

Enterprise of SQL 2005 – All features
Standard of SQL 2005 – Limited features

Enterprise and Standard Edition of SQL 2005

All versions except Express Edition

All versions except Express Edition

Additional server role

Witness

None

Monitor

Distributor

Redundant storage locations

Yes

No

Yes

Yes

Scope of availability

Database

Instance

Database

Database

Management

Load delay on secondary

No

No

Yes

No

Load balancing

No

No

No

No

Scalable

No

Yes

Yes

Yes

Client redirect

Limited support

No

No

No

Monitoring

Third party tools required

No

Yes

Yes

No

Customized scripts

Yes

Yes

Yes

No

Expertise required

Medium

High

Low

Medium

Troubleshooting

Complexity

Medium

High

Low

Medium

Expertise required

Medium

High

Low

Medium

Maintenance and Support

Complexity

Medium

High

Low

Medium

Expertise required

Medium

High

Low

Medium

Support

Support from Microsoft

New feature, so MS support is limited

Well defined support from MS

Well defined support from MS

Difficult to troubleshoot even by MS (sometimes)

Cost

Solution Cost

Low

High

Low

Low

Security

Complexity

Low

High

Low

Low

Expertise required

Low

High

Low

Low

Why don’t you guys tell me how did you find this consolidation?

Cheers!

Advertisements

Comments»

No comments yet — be the first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: