Commvault

Deduplication Database Seeding

Quick Links to Topics:


Commvault deduplication efficiently backs up data from remote sites to the main data center, or sends a copy of the backup data from the main data center to a secondary data center. Duplicate blocks are dropped from the source, sending only changed blocks across the Wide Area Network (WAN). However, running the initial backup or auxiliary copy can be a challenge since all blocks must be sent. This effort may slow down the process considerably. For instance, a large amount of data combined with the limited bandwidth can cause an initial backup or auxiliary copy to take days or months to complete.

To avoid that initial transfer over the WAN, Commvault® software offers a procedure called DDB Seeding. This procedure transfers the initial baseline backup between two sites using available removable storage such as tapes, USB drives or an iSCSI appliance. 

Use DDB Seeding when remote office sites are separated from the data center across a WAN and data needs to be either backed up remotely or replicated periodically to a central data center site. Once the initial baseline is established, all subsequent backups and auxiliary copy operations consume less network bandwidth because only the changes are transferred.

Note that this procedure is used to transfer only the initial baseline backup between two sites. It cannot be used for subsequent backups.

DDB Seeding can be used in two scenarios:

  • The initial backup of a large remote client or a large remote site with several clients.
  • The initial auxiliary (DASH) copy between the main data center and the secondary data center.

DDB Seeding for Initial Backup

The deduplication database seeding process for the initial backup leverages removable storage (USB drives or iSCSI appliance) to transfer the data. The steps for this operation are as follows:

  1. Attach the removable storage to a client from the remote site.
  2. Temporarily install the MediaAgent software on the client to which the removable storage is attached.
  3. Define a library for the removable storage using the client/MediaAgent installed in the previous step.
  4. Create a storage policy for the remote site with the following copies.
    1. Primary copy using the removable storage (can use deduplication if needed).
    2. Secondary copy using the main data center disk library (copy typically using deduplication).
  5. Associate the remote client or all of the remote site clients with the storage policy.
  6. Execute the initial backup, which will write the data in the removable storage.
  7. Ship the removable storage to the main data center and attached to the MediaAgent.
  8. Modify the removable storage library properties to use the main data center MediaAgent from this point.
  9. Execute an auxiliary copy, which will copy the data from the removable storage to the disk library.
  10. Once complete, validate that the data is accessible from the secondary copy.
  11. Promote the secondary copy as the primary copy of the storage policy, resulting in the following.
    1. Primary copy using the main data center disk library.
    2. Secondary copy using the removable storage.
  12. Delete the secondary copy using the removable storage.
  13. Uninstall the MediaAgent software on the remote site client.

From that point on, traditional client-side deduplicated backups will be used for the remote site sending the data directly to the main data center MediaAgent. But since the baseline is now completed, only changed block will travel across the network.

Commvault® software also offers a workflow that automates most of those steps. For more information about the workflow, consult the Commvault Online Documentation

DDB seeding process for an initial backup


DDB Seeding for Initial Auxiliary (DASH) Copy

A similar process is also used for the initial auxiliary copy between the main site and a secondary site. Removable storage such as tapes, USB drives or iSCSI appliance can be used to transfer the data. In this scenario, the steps are as follows.

  1. If not done already, attach the storage to the source MediaAgent.
  2. If not done already, define a library for the removable storage using the source MediaAgent (can use deduplication if needed, unless using tapes).
  3. Typically, the storage policy has a primary copy in the source MediaAgent disk library and a secondary copy in the target MediaAgent disk library. Add another secondary copy using the removable storage library. This will result in the following copies:
    1. Primary copy using the source MediaAgent library.
    2. Secondary copy using the target MediaAgent library.
    3. Secondary copy using the removable storage.
  4. By default, a secondary copy uses the primary copy as a source during an auxiliary copy job. Modify the properties of the copy using the target MediaAgent library to now use the removable storage copy as a source for the auxiliary copy instead of the primary copy.
  5. Run an auxiliary copy for the removable storage copy. This will copy the data from the source disk library to the removable storage.
  6. Once completed, ship the removable storage to the secondary data center.
  7. If using tapes, simply insert in the library. If using other storage, attach it to the target MediaAgent.
  8. If using any storage other than tapes, modify the library data path to point to the target MediaAgent. If using tapes, skip this step.
  9. Run an auxiliary copy for the target library copy. This will copy the data from the removable storage to the target disk library.
  10. Once completed, validate that the data is accessible from the target disk library.
  11. Modify the storage policy target library copy to use the primary copy as a source for an auxiliary copy.
  12. Delete the removable storage copy from the storage policy.

From this point on, traditional DASH copies will be used to transfer the data between the two sites. But since the baseline exists in the target library, only blocks that have changed will be sent over the WAN. 

DDB seeding process for an initial DASH copy


DDB Seeding using Deduplicated Storage

Data can be deduplicated when using block-based storage such as iSCSI to seed the deduplication database (DDB). The DDB used for copying data can be directly created in the storage unit. At the target site, the DDB can be moved (attached) to the target MediaAgent if needed. Since SP11, this DDB move can be executed between MediaAgents using different operating systems. For example, the source MediaAgent can be a Windows system, while the target MediaAgent runs a Linux operating system.

DDB Seeding between MediaAgents using deduplication



Copyright © 2021 Commvault | All Rights Reserved.