Commvault

Partitioned Deduplication

Quick Links to Topics:


Partitioned deduplication provides higher scalability and deduplication efficiency by allowing more than one Deduplication Database (DDB) partition to exist within a single deduplication engine. It works by logically dividing signatures between multiple databases. If two deduplication partitions are used, it effectively doubles the size of the deduplication store. Currently Commvault® software supports up to four database partitions.

How Partitioned Databases Work

During data protection jobs, partitioned DDBs and the data protection operation work using the following logic:

  1. Signature is generated at the source - For primary data protection jobs using client-side deduplication, the source location is the client. For auxiliary DASH copy jobs, the source MediaAgent generates signatures.
  2. Based on the generated signature it is sent to its respective database. The database compares the signature to determine if the block is duplicate or unique.
  3. The defined storage policy data path is used to protect data – regardless of which database the signature is compared in, the data path remains consistent throughout the job. If GridStor® Round-Robin has been enabled for the storage policy primary copy, jobs will load balance across MediaAgents.

Partition Deduplication showing data path and signature lookup paths

Partitioned Databases and Network-Attached Storage (NAS)

If partitioned deduplication is going to be implemented using two MediaAgents, it is recommended to use a shared disk library with a Network-attached Storage (NAS) device. The NAS storage allows either MediaAgent to recover data even if the other MediaAgent is not available.


Partitioned Database for Scalability

The primary purpose for partitioned DDBs is to provide higher scalability. By balancing signatures between database partitions, you can scale up the size of a single deduplication store. If you have two partitions, the size of the store doubles – and having four partitions quadruples its size.

Partitioned Database for Resiliency

Using partitioned databases ensures resiliency. For instance, if one MediaAgent hosting a Deduplication Database (DDB) goes offline, the other MediaAgent continues data protection jobs as the available DDB continues signature lookups. However, with the loss of one database, all signatures previously managed by the off-line database would now be looked up in the remaining online database. This causes existing signatures managed in the off-line database to be compared in the online database, which results in the signatures being treated as unique, and additional data being written to the library.

Copyright © 2021 Commvault | All Rights Reserved.