Storage 101: Data Center Storage Configurations

In this article in the series, Robert Sheldon explains the differences between direct-attached storage, network-attached storage, and storage area networks.

The series so far:

  1. Storage 101: Welcome to the Wonderful World of Storage
  2. Storage 101: The Language of Storage
  3. Storage 101: Understanding the Hard-Disk Drive 
  4. Storage 101: Understanding the NAND Flash Solid State Drive
  5. Storage 101: Data Center Storage Configurations
  6. Storage 101: Modern Storage Technologies
  7. Storage 101: Convergence and Composability 
  8. Storage 101: Cloud Storage
  9. Storage 101: Data Security and Privacy 
  10. Storage 101: The Future of Storage
  11. Storage 101: Monitoring storage metrics
  12. Storage 101: RAID

Today’s IT teams are turning to a variety of technologies to provide storage for their users and applications. Not only does this include the wide range of hard disk drives (HDDs) and solid-state drives (SSDs), but also technologies such as cloud storage, software-defined storage, and even converged, hyperconverged, or composable infrastructures.

Despite the various options, many data centers continue to rely on three traditional configurations: direct-attached storage (DAS), network-attached storage (NAS), and the storage area network (SAN). Each approach offers both advantages and disadvantages, but it’s not always clear when to use one over the other or the role they might play in more modern technologies such as cloud storage or hyperconverged infrastructures (HCIs).

This article explores the three configurations in order to provide you with a better sense of how they work and when to use them. Later in this series, I’ll cover the more modern technologies so you have complete picture of the available options and what storage strategies might be best for your IT infrastructure. Keep in mind, however, that even with these technologies, DAS, NAS, and SAN will likely still play a vital role in the modern data center.

Direct-Attached Storage

As the name suggests, DAS is a storage configuration in which HDDs or SSDs are attached directly to a computer, rather than connecting via a network such as Ethernet, Fibre Channel, or InfiniBand. DAS typically refers to HDDs or SSDs. Other storage types, such as optical or tape drives, can theoretically be considered DAS if they connect directly to the computer, but references to DAS nearly always refer to HDDs or SSDs, including those in this article.

DAS can connect to a computer internally or externally. External DAS can be a single drive or part of an array or RAID configuration. Whether internal or external, the DAS device is dedicated to and controlled by the host computer.

A computer’s DAS drive can be shared so that other systems can access the drive across the network. Even in this case, however, the computer connected to the drive still controls that drive. Other systems cannot connect to the drive directly but must communicate with the host computer to access the stored data.

DAS connects to a computer via an interface such as Serial-Attached SCSI (SAS), Serial Advanced Technology Attachment (SATA), Small Computer System Interface (SCSI), or Peripheral Component Internet Express (PCIe). Along with other storage technologies, the interface can have a significant impact on drive performance and is an important consideration when choosing a DAS drive. (See the first article in this series for information about interfaces and related storage technologies.)

Some IT teams turn to DAS because it typically provides better performance than networked storage solutions such as NAS and SAN. When using DAS, the host server does not need to contend with potential network bottlenecks such as sluggish network speed or network congestion, and the data is by definition in close proximity to the server. Other systems that connect to the host might run into network issues, but the host itself—and the applications that run on it—have unencumbered access to data hosted on DAS.

DAS is also cheaper and easier to implement and maintain than networked systems such as NAS or SAN. A DAS device can often be implemented through a simple plug-and-play operation, with little administrative overhead. Because DAS storage includes a minimal number of components, other than the SSD or HDD itself, the price tag tends to be much lower than the networked alternatives.

DAS is not without its downsides, however. Because a server can support only a relatively small number of expansion slots or external ports, DAS has limited scalability. In addition, limitations in the server’s compute resources can also impact performance when sharing a drive, as can the data center’s network if contention issues arise. DAS also lacks the type of advanced management and backup features provided by other systems.

Despite these disadvantages, DAS can still play a vital role in some circumstances. For example, high-performing applications or virtualized environments can benefit from DAS because it’s generally the highest performance option, and DAS eliminates potential network bottlenecks. In addition, small-to-medium sized businesses—or departments within larger organizations—might turn to DAS because it’s relatively simple to implement and manage and costs less.

DAS can also be used in hyperscale systems such as Apache Hadoop or Apache Kafka to support large, data-intensive workloads that can be scaled out across a network of distributed computers. More recently, DAS has been gaining traction in HCI appliances, which are made up of multiple server nodes that include both compute and storage resources. The usable storage in each node is combined into a logical storage pool for supporting demanding workloads such as virtual desktop infrastructures (VDIs).

Network-Attached Storage

NAS is a file-level storage device that enables multiple users and applications to access data from a centralized system via the network. With NAS, users have a single access point that is scalable, relatively easy to set up, and cheaper than options such as SAN. NAS also includes built-in fault tolerance, management capabilities, and security protections, and it can support features such as replication and data deduplication.

A NAS device is an independent node on the local area network (LAN) with its own IP address. It is essentially a server that contains multiple HDDs or SSDs, along with processor and memory resources. The device typically runs a lightweight operating system (OS) that manages data storage and file sharing, although in some cases it might run a full OS such as Windows Server or Linux.

Users and applications connect to a NAS device over a TCP/IP network. To facilitate data transport, NAS also employs a file transfer protocol. Some of the more common protocols are Network File System (NFS), Common Internet File System (CIFS), and Server Message Block (SMB). However, a NAS device might also support Internetwork Packet Exchange (IPX), NetBIOS Extended User Interface (NetBEUI), Apple Filing Protocol (AFP), Gigabit Ethernet (GigE), or one of several others. Most NAS devices support multiple protocols.

NAS devices are generally easy to deploy and operate and relatively inexpensive when compared to SANs. In addition, users and applications running on the same network can easily access their files, without the limitations they might encounter if retrieving data from DAS. NAS devices can also be scaled out or integrated with cloud services. In addition, they provide built-in redundancy while offering a great deal of flexibility.

That said, a NAS device must compete with other traffic on the network, so contention can be an issue, especially if network bandwidth is limited. It should be noted, however, that NAS is often configured on private networks, which can help mitigate contention issues. However, too many users can impact storage performance, not only on the network, but also in the NAS device itself. Many NAS devices use HDDs, rather than SSDs, increasing the risk of I/O contention as more users try to access storage.

Because of the network and concurrency issues, NAS is often best suited for small-to-medium sized businesses or small departments within larger organizations. NAS might be used for distributing email, collaborating on spreadsheets, or streaming media files. NAS can also be used for network printing, private clouds, disaster recovery, backups, file archives, or any other use cases that can work within NAS’s limitations, without overwhelming the network or file system.

When deciding whether to implement a NAS device, you should consider the number of users, types of applications, available network bandwidth, and any other factors specific to your environment. DAS might be the optimal choice because it’s typically more performant, cheaper, and easier to set up than NAS. On the other hand, you might consider looking to a SAN for increased scalability and additional management features.

Storage Area Network

A SAN is a dedicated, high-speed network that interconnects one or more storage systems and presents them as a pool of block-level storage resources. In addition to the storage arrays themselves, a SAN includes multiple application servers for managing data access, storage management software that runs on those servers, host bus adapters (HBAs) to connect to the dedicated network, and the physical components that make up that network’s infrastructure, which include high-speed cabling and special switches for routing traffic.

SAN storage arrays can be made up of HDDs or SSDs or a combination of both in hybrid configurations. A SAN might also include one or more tape drives or optical drives. The management software consolidates the different storage devices into a unified resource pool, which enables each server to access the devices as though they were directly connected to that server. Each server also interfaces with the main LAN so client systems and applications can access the storage.

There is a widespread myth that SANs are high-performing systems, but historically this has rarely been true. In fact, slow-performing SANs are ubiquitous across data centers and are first and foremost optimized for data management, not performance. However, now that SSDs are becoming more common, hybrid or all-flash SANs are bringing performance to the forefront.

Integral to an effective SAN solution is a reliable, high-performing network capable of meeting workload demands. For this reason, many modern SANs are based on Fibre Channel, a technology for building network topologies that can deliver high bandwidth and exceptional throughput, with speeds up to 128 gigabits (16 GB) per second. Unfortunately, Fibre Channel is also known for being complex and pricey, causing some organizations to turn to alternatives such as Internet SCSI (iSCSI), Fibre Channel over Ethernet (FCoE), or even NVMe over Fabrics (NVMe-oF).

With the right network topology and internal configuration in place, a SAN can deliver a block-level storage solution that offers high availability and scalability, possibly even high performance. A SAN includes centralized management, failover protection, and disaster recovery, and it can improve storage resource utilization. Because a SAN runs on a dedicated network, the LAN doesn’t have to absorb the SAN-related traffic, eliminating potential contention.

However, a SAN is a complex environment that can be difficult to deploy and maintain, often requiring professionals with specialized skill sets. This alone is enough to drive up costs, but the SAN components themselves can also be pricey. An IT team might try to reduce costs by cutting back in such areas as Fibre Channel or licensed management capabilities, but the result could be lower performance or more application maintenance.

For many organizations—typically larger enterprises—the costs and complexities are worth the investment, especially when dealing with numerous or massive datasets and applications that support a large number of users. SANs can benefit use cases such as email programs, media libraries, database management systems, or distributed applications that require centralized storage and management.

Organizations looking for networked storage solutions often weigh SAN against NAS, taking into account complexity, reliability, performance, management features, and overall cost. NAS is certainly cheaper and easier to deploy and maintain, but it’s not nearly as scalable or fast. For example, a NAS uses file storage, and a SAN uses block storage, which incurs less overhead, although it’s not as easy to work with. Your individual circumstances will determine which storage system is the best fit. (For information about the differences between block and file storage, refer to the first article in this series).

Moving Ahead with DAS, NAS, and SAN

Like any storage technology, SANs are undergoing a transition. For example, vendors now offer something called unified SAN, which can support both block-level and file-level storage in a single solution. Other technologies are also emerging for bridging the gap between NAS and SAN. One example is VMware vSphere, which makes it possible to use NAS and SAN storage in the same cluster as vSAN, VMware’s virtual SAN technology. Another approach to storage is the converged SAN, which implements the SAN environment on the same network used for other traffic, thus eliminating the network redundancy that comes with a more conventional SAN.

For many organizations, traditional DAS, NAS, and SAN solutions properly sized and configured will handle their workloads with ease. If that’s insufficient, they might consider newer technologies that enhance these core configurations, such as converged or hyperconverged infrastructures. Today’s organization can also take advantage of such technologies as cloud storage, object storage, or software-defined storage, as well as the various forms of intelligent storage that are taking hold of the enterprise.

There are, in fact, no shortage of storage options, and those options grow more sophisticated and diverse every day, as technologies continue to evolve and mature in an effort to meet the needs of today’s dynamic and data-intensive workloads.