Deduplication V4 Gen 2 (V5) Overview
In SP14, Deduplication V4 Gen 2 (also referred to as Version 5) introduced major changes to the deduplication database; making it the most efficient deduplication generation to date. Performance has significantly improved the pruning of obsolete signatures and blocks of data. The performance of Deduplication database (DDB) reconstruction was also optimized to aid in the loss of DDB partitions.
DDB Structure Change
The improvements introduced in SP14 are based on a major structure change to the DDB tables and table files. To understand these changes, you must first look at the structure and roles of the tables.
DDB Structure Prior to SP14
Prior to SP14, a DDB was created with three sets of tables:
- Primary table
- Secondary table
- Zero ref table
The DDB structure for a DDB created prior to SP14
Deduplication Database Prior to SP14:
The DDB primary table contains unique signature entries. Each entry represents a unique block of data encountered during a backup job. Each entry also has a counter indicating the number of occurrences for the unique block among all backup job stored in the deduplication store. This counter is increased each time the same signature is encountered during a backup job. The counter is decreased each time a job containing that block ages out. Once a counter reaches zero, the signature is moved to the zero ref table, which indicates that the block can be deleted from storage.
The DDB secondary table files stores archive file (AF) entries for all backup job streams. An archive file contains all the metadata of backup data covered in a job stream. This means that if a backup job has two streams, two archive files are created. The archive files are first logged into the CommServe® server database and written into a DDB secondary table file. The archive file contains block references and job information.
Representation of a backup job archive files
DDB V4 table structure
SP14 DDB Structure
With the introduction of the second generation of V4 deduplication, the table structure was modified for both the primary table and the secondary table files.
Primary Table
In the previous deduplication generation, the performance was impacted due to an increase or decrease of numerous counters when a backup job was running or aged out. This made the I/O process intensive. With DDB V4 Gen 2, counters are removed. When aging out data, secondary table files are scanned to create the list of obsolete blocks.
Secondary Table Files
In the previous deduplication generation, the DDB secondary table was built on files containing up to 16 archive files (AF). This means that as soon as the 16th archive file was written in the secondary file, the secondary file closed and a new one was created. This process continued creating as many secondary table files as were needed. In the new generation of deduplication, each archive file is now written as a standalone secondary table file.
DDB V4 Gen 2 table structure