Commvault
Deduplication Configuration Overview
Jordan Cannata (Unlicensed)
tdopko@commvault.com (Unlicensed)
Carl Brault (Unlicensed)
Quick Links to Topics:
Commvault recommends using building block guidelines for simplicity and scalability when designing a deduplication solution. Use the building block approach provides the best method to ensure a deduplication solution meets current and future needs.
For additional guidelines and the latest recommendations for deduplication building blocks refer to Commvault's online documentation.
Physical and Logical Building Block Layers
There are two layers to a building block, the physical layer and the logical layer:
- Physical Layer – Each building block consists of one or more MediaAgents, one disk library and one or more Deduplication Databases (DDBs).
- Logical Layer – Each building block contains one global deduplication policy and one or more data storage policies.
Hardware Requirements
It is critical to provide adequate hardware to achieve maximum performance for a deduplication building block.
Requirements for a deduplication building block should focus on the following:
- MediaAgent – protects data, as well as coordinates signature lookups in the DDB. It is critical to ensure adequate CPU and memory resources. The amount of resources required is dependent upon the amount of data that will be managed by the building block.
- Deduplication Database (DDB) – must be on high speed disks that meet the minimum IOPS requirements. The IOmeter tool can be used to test IOPS on the planned location for the DDB.
It is strongly recommended that the location of the database is on solid state disks locally attached to the MediaAgent. Fibre attached disks may be used if there is a dedicated connection and dedicated disks for the DDB.
Deduplication Configuration high level overview
Storage Connections
When using deduplication, disk library data paths are configured using the following connection methods:
- Network Attached Storage or NAS
- Storage Area Network or SAN
- Direct Attached Storage or DAS
Network-Attached Storage (NAS)
This method provides the best connection method from a resiliency standpoint since the storage is accessed directly through the NAS device. This means that by using a Common Interface File System (CIFS) or a Network Internet File System (NFS), Universal Naming Convention (UNC) paths are configured to read and write directly to storage. In this case the library is configured as a shared library, where all MediaAgents can see stored data for data protection and recovery operations.
Data Storage using Network-Attached Storage Device
Storage Area Network (SAN)
This method is very common in many data centers. SAN storage can be zoned and presented to MediaAgents using either Fibre Chanel or iSCSI. In this case, the zoned storage is presented directly to the MediaAgent providing Read / Write access to the disks.
When using SAN storage, each building block should use a dedicated MediaAgent, DDB and disk library. Although the backend disk storage in the SAN can reside on the same disk array, it should be configured in the Commvault® software as two separate libraries; where Logical unit numbers (LUNs) are presented as mount paths in dedicated libraries for specific MediaAgents.
SAN storage provides fast and efficient movement of data but, if the building block MediaAgent fails, data cannot be restored. When using SAN storage, either the MediaAgent can be rebuilt or the disk library can be re-zoned to a different MediaAgent. If the disk library is rezoned, it must be reconfigured in the Commvault® software to the MediaAgent that has access to the LUN.
Data Storage using Storage Area Network Device
Direct Attached Storage (DAS)
This connection method is used when the disk library is physically attached to the MediaAgent. In this case, each building block is completely self-contained. This provides for high performance but does not provide resiliency. If the MediaAgent controlling the building block fails, data stored in the disk library cannot be recovered until the MediaAgent is repaired or replaced. Keep in mind that, in this case, all the data in the disk library is still completely indexed and recoverable, even if the index directory is lost. Once the MediaAgent is reconstructed, data from the disk library can be restored.
Disk Library Attached to MediaAgent(s) using a Direct Attached Storage Device
Deduplication Building Blocks
One-Node Building Block
A one-node building block consists of one MediaAgent, a Deduplication Database (DDB) and deduplication store. This is the most common building block design and allows for flexibility to add additional building blocks as an environment grows. However, since each building block is standalone, deduplication ratios are not optimal.
A common use case for one-node building blocks is using different building blocks to manage different data types. For example, an environment contains approximately 10TB of unstructured file data, 5TB of virtual machine data, and 15TB of Oracle database data requiring protection. All the data within the environment equals approximately 30TB which could, based on capacity, be handled by a single building block. However, factoring in performance, it may be determined that the best design is to use two building blocks.
Certain data types that have unique block make up do not typically deduplicate well, such as Oracle and file system data. A solution could be to use two building blocks, one for the file and virtual machine data, and the second for Oracle data.
The following diagram illustrates two standalone building blocks. One building block for file and virtual machine data, and the second building block for Oracle data
Two-Node Building Block
A two-node building block consists of one or two MediaAgents, two Deduplication Databases (DDBs) and one deduplication store. Although there are multiple databases, a two-node building block is a single deduplication engine. The advantage of using a two-node building block is that it can scale up to twice as large as a single node.
Two-node building block using two MediaAgents, two DDBs, and a NAS attached shared disk library
Four-Node Building Block
Commvault® V11 software supports up to four DDB partitions. Although this can provide the best scalability, it may not fit the needs of most situations. Commvault recommends consulting with Professional Services when designing a highly scalable building block solution.
Copyright © 2021 Commvault | All Rights Reserved.