Why you should connect resiliently to SQL Server

Transient failures happen — in the cloud (Azure SQL) and on-prem. A resilient connection strategy lets your app recover gracefully instead of crashing: it waits smartly, retries safely, and doesn’t pound the database when it’s truly unavailable.

Why care?

Less downtime during patching/scaling/failovers
Smoother UX — fewer “please try again” moments for users
Operational robustness against brief network glitches and load spikes
Fewer cascading errors — keep transactions consistent and data clean

Transient vs. persistent errors

Transient errors (good candidates for retry):

Brief network interruptions
“Server busy”/throttling
Deadlocks or timeouts due to temporary load
Failovers during maintenance/patching

Persistent errors (should fail fast):

Bad connection string
Authentication/authorization failures
Missing or corrupted DB objects

The point: classify the error, then do the right thing. Retries only help with transient faults.

Patterns you actually need

Retry with exponential backoff + jitter (spreads load to avoid stampedes)
Circuit Breaker (stop hammering when the DB is really down)
Connection pooling (reduce login churn, lower latency)
Timeouts & Cancellation (no infinite waits; play nice with callers)
Idempotency (retries must not create duplicates or corrupt state)
Always On optimization: MultiSubnetFailover=True in your connection string

Code: Async retry with backoff + jitter (C# / Microsoft.Data.SqlClient)

Below is a quick primer and a C# sample you can paste right in.


using System;
using System.Collections.Generic;
using System.Data;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.Data.SqlClient;

public class DBRepository
{
    private readonly string _cs;

    // Typical transient error codes (extend as needed)
    private static readonly HashSet<int> TransientErrors = new()
    {
        4060,   // Cannot open database
        40197,  // Service error (Azure)
        40501,  // Service busy (Azure)
        40613,  // DB not available (Azure)
        49918, 49919, 49920, // Throttling (Azure)
        11001,  // Could not connect (DNS/connection)
        1205,   // Deadlock victim
        -2      // Timeout (command timeout)
    };

    public DBRepository(string connectionString) => _cs = connectionString;

    public async Task ExecuteWithRetryAsync(CancellationToken ct = default)
    {
        const int maxRetries = 6; // or drive via a total time budget
        var rng = new Random();

        for (int attempt = 1; ; attempt++)
        {
            try
            {
                await using var connection = new SqlConnection(_cs);
                await connection.OpenAsync(ct);

                await using var command = new SqlCommand("YourProc", connection)
                {
                    CommandType = CommandType.StoredProcedure,
                    CommandTimeout = 30
                };

                // Example: command.Parameters.AddWithValue("@ParamName", value);

                await command.ExecuteNonQueryAsync(ct);
                return; // success
            }
            catch (SqlException ex) when (IsTransient(ex) && attempt < maxRetries)
            {
                // Exponential backoff + jitter
                var baseDelayMs = (int)Math.Min(200 * Math.Pow(2, attempt - 1), 5000);
                var delay = TimeSpan.FromMilliseconds(baseDelayMs + rng.Next(0, 250));

                // Consider logging attempt, delay, and error numbers for diagnostics
                await Task.Delay(delay, ct);
            }
        }
    }

    private static bool IsTransient(SqlException ex)
    {
        // Check all errors in the exception, not just ex.Number
        return ex.Errors.Cast<SqlError>().Any(e => TransientErrors.Contains(e.Number));
    }
}

Usage


var repo = new DBRepository(
    "Server=tcp:your-sql-listener,1433;" +
    "Database=YourDb;User ID=YourUser;Password=***;" +
    "Encrypt=True;TrustServerCertificate=False;" +
    "MultiSubnetFailover=True"); // recommended with Always On

await repo.ExecuteWithRetryAsync();

Production tips

Log with structure (attempt, backoff, error codes, correlation ID).
Bound retries (max attempts or a total time budget).
Make writes idempotent or wrap in transactions that tolerate retries.
Consider a Circuit Breaker (e.g., via Polly) around the most critical calls.
Windows auth.
Leverage SqlClient’s built-in retry (RetryLogicProvider) if you want a centralized policy.

Bottom line

Resilience isn’t a nice-to-have; it’s part of correctness in data access. With a thin layer of backoff-based retries, clear error classification, cancellation, and the right connection settings, your application will sail through the little storms that would otherwise dent availability and user experience.

Connect resiliently to SQL Server