Commvault

Item Level Retention

Quick Links to Topics:


Item based retention is used to apply retention to protected data based on individual files and email messages. This provides granular retention to meet data recovery requirements, compliance requirements, and optimize storage media.

The following Commvault® agents support item based retention:

  • File system agents using subclient retention settings
  • Exchange Mailbox agent using Configuration policies

Depending on the agent being used, one of two methods are used to implement item based retention:

  • Synthetic full item carry forward – this method does not directly prune items that have exceeded retention. Instead, upon deletion of an item either by the user or the agent, items are carried forward with each synthetic full backup until its 'days' retention is exceeded. Once the synthetic full ages based on storage policy copy retention, the item no longer exists. This method is used for file system agents using V1 indexing and is configured in the Subclient Properties.
  • Index masking – this method marks the item as unrecoverable by masking the item in the index. This method requires V2 indexing. This method is implemented for file system agents using V2 indexing in the Subclient Retention tab and for Exchange Mailbox agent using Configuration policies.

Item Based Retention Benefits:

  • Compliance – certain compliance regulations require item based retention. Using job based retention can result in items being retained beyond their required retention policies.
  • Defensible deletion – some items, specifically email messages, must be destroyed when they are deleted from the production mail server. Item based retention can provide defensible deletion of items.
  • Efficient media usage – Consider the benefit of managing one year of off-site data on considerably fewer tapes. Typically, when data is sent off-site on tapes, the same stale data exists each time a set of tapes is exported. If data is sent off-site weekly on tape, 52 versions of the same stale item exists.

Example: Using item-based retention when secondary tape copies are created, only the items contained within the most recent synthetic full backup are copied to tape. If the retention is set to 365 days, then each tape set contains all items within the past year. This means with a standard off-site tape rotation of 30 days, 365 days of data exists on each set.




Synthetic Full Item Carry Forward Using V1 Indexing

Retention settings defined in the Subclient Properties currently uses the 'synthetic full carry forward' method. To understand how this method works, first an understanding of synthetic full protection jobs is required.

Synthetic Full Protection Jobs

A synthetic full backup synthesizes a full backup by using previous data protection jobs to generate a new full backup. Objects required for the synthetic full backup are pulled from previous incremental or differential backups and the most recent full. To determine which objects are required for the synthetic full, an image file is used. An image file is a logical view of the folder structure including all objects within the folders and is generated every time a traditional backup is executed. The synthetic full backup uses the image file from the most recent traditional backup that was conducted on the production data to determine which objects are required for the new synthetic full.

When an image file is generated, all objects that exist at the time of the scan phase of the backup job are logged in the image file. This information includes date/time stamp and journal counter information, which is used to select the proper version of the object when the synthetic full runs. If an object is deleted prior to the image file being generated, it is not included in the image file and is not backed up in the next synthetic full operation. The concept of synthetic full backups and deleted objects not being carried over in the next synthetic full is the key aspect of how object based retention works.

Synthetic full concept diagram





Deleted Items Carry Forward

When subclient retention is configured, items which have been deleted by the user or by the system during an archive job are carried forward to the next synthetic full based on the number of days specified. Once the days have been exceeded, the item is no longer carried forward in the next synthetic full job. The item still exists in the synthetic full already generated until the 'days and cycles' criteria defined in the primary copy are exceeded. This means that the total retention time of the item upon deletion, is a sum of the days defined in the subclient, and the 'days and cycles' defined in the primary copy.

Multiple Versions Carry Forward

Multiple versions of an item can also be carried forward. This allows an item that has been modified to have all modified versions moved forward with each synthetic full. If the number of versions is set to five, five versions are carried forward. If the item is modified again, upon the next synthetic full, the oldest version is dropped and the most recent five are carried forward. If the item is deleted from the production system, all five items are carried forward until the defined days have been exceeded.

The synthetic full carry forward method is used for V1 file system subclients using subclient retention rules.

Synthetic full operation using subclient retention





Subclient and Storage Policy Retention Combination

It is important to note that subclient retention is not used in place of storage policy based retention, instead the two retentions are added to determine when an object is pruned from protected storage. If an object is carried forward for 90 days upon deletion, each time a synthetic full job runs, it is carried forward until the 90 days elapses.

The synthetic full backups themselves are retained based on the storage policy copy retention rules. So, if the storage policy copy has a retention of 30 days and 4 cycles, then a synthetic full remains in storage until the job exceeds retention. In this instance, the object is carried forward for 90 days and the last synthetic full that copies the object over is retained for 30 days, the object therefore remains in storage from the time of deletion for 120 days – 90 day subclient retention and 30 days storage policy copy retention.




Storage Policy Secondary Copies

Item based retention applies to how long an item is carried forward when synthetic full backups are executed. This applies to backup jobs managed by the storage policy primary copy. Secondary copies always have retention applied to the copy in the traditional manner. If subclient retention is set to 90 days, storage policy primary copy retention is 1 cycle and 0 days, and synthetic full backups are being run daily; a deleted item will be retained for 91 days. If a secondary copy has been configured with a retention of 8 cycles and 90 days, the object may be retained for up to an additional 90 days.

How long a deleted object is potentially retained in a secondary copy depends on the copy type. If the secondary copy is a synchronous copy then the deleted object will always be retained for the retention defined in the secondary copy since all synthetic full backups will be copied to the secondary copy. Selective copies however, allow the selection of full backups at a time interval. If synthetic full backups are run daily and a selective copy is set to select the month end full, then any items that are not present in the month end synthetic full will not be copied to the selective copy. To ensure all items are preserved in a secondary copy, it is recommended to use synchronous copies and not selective copies.




Index Masking Using V2 Indexing

Index masking masks deleted items from all restore operations. The V2 index tracks all messages and files at a granular level. When an item is protected, a field in the database is set to 'visible' for each item. When the item exceeds retention, the field is marked to 'mask' the item. When browse or find operations are run, the masked items do not appear. If aging activity is disabled at a client or client group level, all messages belonging to the client or group are not aged during the aging process.

By default, a cleanup process runs every 24 hours. This process checks the Retention Policy's 'Retain for' setting for messages or the subclient retention for files and marks all items exceeding retention as invisible. It is important to note that if the 'Retain for' setting or the subclient retention is changed, (i.e., decreasing the number of days), the next aging process immediately follows the new retention value.

If Exchange Mailbox agent data is copied to secondary copy locations, the days setting defined in the Retention Policy is not honored. Instead, standard storage policy copy retention determines how long the messages are retained. In other words, the primary copy manages all items at a granular level and secondary copies manage the retention at the job level. From a compliance standpoint, this is an important distinction and should be taken into consideration when defining data retention and destruction policies.

If the V2 index is lost and restored to a previous point-in-time, it is possible that previously masked items will be set to visible. The next time the aging process runs, these items will be re-masked making them unrecoverable.

From a compliance standpoint, defensible deletion of items is crucial. There is the possibility that email messages or files copied to secondary storage such as tape media, could potentially be recovered using the Media Explorer tool. To ensure that this cannot occur, enable the 'Erase Data' checkbox for any storage policies managing Exchange Mailbox agent data. Note that the 'Erase Data' option is enabled by default for all data management storage policies.




Subclient Retention

Right-click Subclient | properties | Advanced | Retention tab

Subclient retention should only be used for users' data. When using synthetic full backups, subclient retention can be applied to both backup and archive operations only.

These settings apply to files and stubs.

Enable subclient retention key points:

  • Blocks the use of traditional full backups, only synthetic full backups are allowed.
  • Enables the use of modification date and deletion retention options.
  • Enables the selection of older versions or number of versions of files.
  • Enables the subclient 'Archiving Rules' tab that allows to configure Commvault OnePass® archive settings.




To configure subclient retention

1 - Right-click client | Properties.

2 - From the retention tab, check to extend retention and select Object Based Retention.

3 - Check to extend data retention based on modification time.

4 - Check to extend data retention for deleted data.

5 - Carries forward deleted items for nn years, months or days in synthetic full jobs. After that period, only standard storage policy copy retention applies.

6 - Retains deleted items indefinitely means they will always be carried forward in synthetic full backups.

7 - Carries forward every existing versions for the defined period of time.

8 - Determines how many versions of an item to retain.


Deleted Item Retention

If 'Deleted Item Retention' is selected, only synthetic full backups should be executed and scheduled. This is required to carry forward items. Retention is job-based measured by both time and cycles. The time specified for 'Backup Retention' is additive to the days criteria specified in the associated storage policy copy.

Example: You enable 'Deleted Item Retention' on the subclient Retention tab and set the 'Retain objects for <period of time>' option time value to 1 month. The 1 month (30 day) count starts from the last time the deleted file appeared in a data protection job's scan. Appearance in a data protection job scan means the file is considered to be "in image." An "in image" file always has a copy in protected storage. A synthetic full backup job keeps the deleted file "in image" for the specified time. Once the backup retention time has passed, storage policy retention is applied. The deleted file appears last in the most recently completed synthetic full backup job. Storage policy copy retention then retains that job for its cycle and days retention criteria. Synthetic full backup jobs must be run to enable aging and pruning of data from media.

If both Extended Retentions are Enabled

If both extended retentions are selected, synthetic full backups should be used. Retention is either time or job-based depending on whether the file is deleted or not.

For files and stubbed files:

  • Retention is cycle and time-based. Files or stubbed files are extended on media by both the archiver and backup retention time based on their file modification time. Once this retention has been exceeded, the storage policy copy retention 'Days and Cycles' criteria are applied. Synthetic full backups must be run to allow aging and pruning of data from media.

Note: A stub file supports non-browse recovery operations (i.e., stub recalls) and acts as a placeholder to persist the associated file on media through synthetic full backups. Stub files have the same modification time as the associated file. Deleting a stub is equivalent to deleting the file.

File Versions Retention

The 'Retention of File versions' is either number-based or time-based. For example, you can retain the last 3 versions of a file or you can retain any versions created in the past 90 days.

Retaining previous file versions essentially applies the same retention clock basis (file modification time) used for the current version to all versions qualified by the criteria.




Deleting Subclients Configured with Subclient Retention

When a file system agent that has the 'Subclient Retention Settings' enabled is deleted, the last cycle has infinite retention applied. This ensures a lock down of all existing protected data since the retention settings defined in the subclient no longer exist. If data within the last cycle is no longer needed, delete the jobs by viewing the job history in the storage policy primary copy. The contents of the subclient is included in the default subclient for future data protection jobs.




Copyright © 2021 Commvault | All Rights Reserved.