Commvault

Compliance and Retention Concepts

Quick Links to Topics:


Retention of data in protected storage as well as production data is a critical component of an overall data management policy of an organization. Data retention policies and data destruction policies in some organizations are well defined and it can be a challenge for administrators to meet those requirements. In other organizations policies are not well defined or not defined at all. In some cases, determining how long data should be held on for falls square in the administrators hands. This of course is not the way it should be, but the truth is that all too often it is. This section is designed to assist administrators in getting the proper people to define the policies for data retention and data destruction and to explain how to implement those policies using Commvault® software.


There are four primary concepts for data retention

  • Disaster recovery
  • Compliance and archiving
  • Data recovery
  • Data destruction


Disaster Recovery policies should always be implemented based on how many complete set of backups should be kept. A set is referred to as a cycle and it will include all data protection jobs required to restore an entire system to a specific point in time. In this case, the number of cycles should be used to determine retention policies.


Compliance and Archiving copies are usually point in time copies of data that will be retained for long periods of time. Month, quarter, or year end point in time full backups are usually retained for months, years or indefinitely. In this case the length of time is the key factor in determining retention, not the number of cycles.


Data Recovery is the ability to go backward in time to a certain point to recover specific data at that point. It could be a file a user deleted two months ago, or an Email message. Data recovery policies may include all data for a specific length of time so keeping full and incremental backups may be required. In this case which copies and the length of time they will be kept for should determine retention policies.


Data Destruction policies focus on determining the useful lifecycle of protected and production data. Expiring and destroying data is a critical component of some organizations. Unfortunately this aspect of retention policies is often overlooked. Moving data throughout its useful lifecycle and then destroying beyond that period can not only help an organization from legal and compliance standpoints, but from a technical aspect old data being destroyed will free up space on production storage.




Key Requirements in Determining Retention

The primary motive driving retention policies should be business. Retention decisions should not be made solely by backup administrators. Meetings with all key decision makers, auditors, and any outside consultants should be conducted. They should be educated on the basics of what types of protection can be provided. Realistic RTOs and RPOs, realistic retention periods, and costs associated with meeting goals should all be addressed. It can sometimes be difficult for non-technical people to really understand what is best for their needs. When they think backup, they think nothing will ever be lost and anything can be recovered at any time. We know that is not the case. You may not have the power to make the final decision, but you do have the power to educate and influence good decisions.


In a perfect world all data would be kept forever, but in the real world this is not always possible. A key element to remember is the advantage of logically addressing data within an environment. This allows businesses to determine retention strategies based on business systems rather than physical servers.


There are several key issues that must be discussed when planning a retention strategy:

  • Business and government regulations
  • Business reputation & customer confidence
  • Current capacity and planned growth
  • Budgetary limitations
  • Risk assessment

Business and Government Regulations

Regulations such as Sarbanes-Oxley HIPAA and Gramm-Leach-Bliley have forced industry to look closer at how they protect information. This can be a difficult task, especially when there are no clear cut rules for retaining information. I have seen many businesses that infinitely retain critical data such as e-mail and financial records. They have no choice since government regulations provide guidelines that can be interpreted differently depending on which auditor you ask.

Business Reputation and Customer Confidence

Depending on the type of business, reputation and customer confidence could be a huge determining factor in setting retention requirements. If you used a free E-mail service and one day you logged on to find only one message in your inbox you would probably be concerned. On opening that e-mail you realize that it is an apology letter that all mail had been lost and they are sorry. Would you continue to use their service?


I like to qualify disasters as sympathetic and non-sympathetic. Disasters such as Katrina, the 2003 Northeast power outage, and 9/11 all qualify as sympathetic disasters. Customers are more likely to understand the situation and accept data loss or interruption in service. On the other hand, if your building catches fire and burns to the ground, and you never properly protected data or maintained off-site copies, then customers may not be as sympathetic because you were not proactive in being prepared for such an event.
In both cases, determine your customer base and user base and consider how data loss may affect your ability to continue to do business and retain customers.

Current Capacity and Planned Growth

You can only retain information based on the capacity to hold the data. Analyzing your current ability to store information will give you a starting point for determining retention capability. If storage capacity does not meet retention requirements you'll need to purchase more storage or change retention policies. Considerations must be paid to expected data growth. This includes the following determining factors:

  • Incremental rate of change for existing data.
  • Projected trends based on historical data.
  • New projects which may implement new business systems.
  • Number of copies of protected data and locations for the copies.


You need to consider how much data and where it will be stored. Historical information could be placed on tape to be archived off-site. User data may be required to be kept on fast disks for easy recovery. Capacity must be thought of not only as a total but broken down based on location of media, ease of access, and speed of recovery. Do not underestimate the amount of protected data that will need to be managed or you may find yourself running out of space.

Budgetary Limitations

Determining retention requirements may force a company to invest in more equipment to accomplish retention goals. However, budgetary limitations can affect the overall retention strategy and force you to readdress retention issues. Legacy hardware that has not reached the end of its lifecycle and strict budgets may force you to make do with what you have. This may ultimately force you to readdress retention strategies.

Risk Assessment

There are many different definitions of a disaster. For smaller companies, losing even one server can be catastrophic. Larger companies that cluster servers and store data on RAID arrays may think on a broader scale and consider a building loss as a disaster. Companies implementing site replication technologies could possibly sustain a site loss but with a larger cost of implementing and maintaining this type of storage infrastructure.




Defining Retention and Destruction Policies

Determining retention is a difficult task in many organizations. The problem with assessing requirements based on the criteria previously discussed is that retention requirements are based on guidelines and not rules. The point is, no one really knows. You can ask 10 people and get 10 different answers.


When defining retention policies the focus is usually on how long to keep the data for. An often overlooked point of discussion is "When does the data have to be destroyed?" Data destruction policies can be just as critical as retention policies. When the shredding truck comes to destroy documents and old tapes, a company may feel safe. With modern data management, the data being destroyed may only be a portion of what really needs to be destroyed. When planning retention policies, careful consideration should be paid to destruction policies as well.

Who Determines Policies

Business NOT backup administrators! That should be all that is written for this topic, but unfortunately it is not. Getting management to specify retention policies can be incredibly difficult. But the truth is it's not always their fault. Non-technical people don't really understand what is going on behind the scenes. They need to understand there is a limit to the scope of protection. Explaining what can be protected, how long it can be protected for, and the cost associated with protecting data can assist them in making intelligent decisions.


On the other hand, there are also the business managers that do not want to accept the responsibility for any decisions that can affect their career path. This presents a difficult situation for CommVault administrators who could become the scapegoat if data is lost and there is no accountability. Unfortunately there is not much that can be done in these cases, but the next two sections may provide some guidance to get reluctant business managers to commit to retention and destruction policies.

Document Policies

Arbitrarily configuring retention settings in a storage policy copy is a bad policy. Not sharing those settings with business managers is even worse. It is by design that when you create a storage policy the default retention is infinite. All retention policies should be well documented and shared with owners of the business data. The decisions of retention and destruction policies should of course be determined by management. These policies should be documented by them at least through Email, but preferably as official company documents. On the administrators end, reports should regularly be run and archived so they can later be referenced if needed. The Data Retention Forecast and Compliance report can be used to show data in storage, retention settings, and expected date of aging.
What the retention policies should be is part of the equation, but equally as important is documenting what those policies are. If archived data such as Emails are requested for investigation from six years ago, but your documented Email retention policy is five years, you will be in better shape than if you had no documented policies.

Default Policies

The major issue regarding retention policies is the lack of cooperation administrators get from the business side of an organization. Many business managers will rely on administrators to determine policies. From the company's perspective, potential loss of data could cost millions. Consider point in time archive copies of financial records that must be maintained for extended periods of time. In the event that the data is needed for investigation and cannot be produced, the company may receive stiff penalties from regulating bodies. From the administrator's perspective, he becomes the scapegoat. The final result is the company loses money and you lose your job.


One way to avoid this and help guide business managers into making a decision is to have documented default policies. Basically this would be presented as a multiple choice question by providing several retention policies a business unit can choose from. These default policies should be established with guidance from IT, cooperating business units, executive cooperation, and auditors if involved. The business unit can choose which policy best suits their needs. If a business unit requires customized retention policies, they can be worked into an existing storage policy by adding additional secondary copies, or a new policy can be created. It would then be a requirement for the business unit owners to sign off on the policy and this of course would be documented. This method will provide guidance to business units in making their choice, result in documented policies with responsibility on business managers and not Simpana administrators (which is how it should be), and could greatly simplify management by limiting the number of storage policies within a CommCell® environment.





Spoliation and Adverse Inference

In an investigation, if evidence from the defense cannot be presented as a result of intentional or unintentional data destruction by the defendant, the jury can infer that the evidence would have been adverse to the defense. This means with the lack of documents to prove innocence in a situation, the jury can interpret the destruction of the documents as intentional as it would have harmed the defense. This will allow the jury to adopt the plaintiff's reasonable account of what happened. Spoliation of evidence can make a defendant appear guilty and in certain situations even if not legally wrong, could sway a jury's decision.


When people think about preserving electronic data, they think about Enron or Martha Stuart and how electronic evidence led to guilty verdicts. But in most cases the preservation of data is used to prove the innocence of an individual or a company. This should be known by management and if not should be explained to them. It should also be explained that the Simpana software provides a wide range of methods to protect data. You don't have to keep everything for ten years or twenty years, you don't even have to keep specific data types for that length of time, you could keep specific user data based on retention policies. In this case, Simpana software provides incredibly granular levels of protection to meet legal needs of an organization.




Data Destruction Policies (defensible Deletion)

Data retention and destruction policies are used to preserve and destroy data based on its useful lifecycle. There are many methods for implementing data lifecycle policies. The most common from a data protection aspect is retention. The problem with this approach is that it still requires the media available to destroy the data. This may not be an issue for disks always attached to Media Agents but it can be a problem for tape media. The following section is designed to explain key CommVault features that can assist in implementing destruction policies.

Data or Information?

Data is what we backup. Disks, folders, databases are just data. Information is what is useful from a business perspective. Simpana administrators look at backing up the home folders disk. The user views the information within their home folders to be productive. The approach for implementing data destruction policies can be based on both data and information.

Data Destruction

When a job exceeds its retention the job is marked as aged. If tape media is in the library, old jobs are overwritten with new jobs. That means until such point that data is overwritten it is recoverable in several ways. The jobs can be browsed and data can be recovered through the CommCell console. The tape can be read using the Media Explorer tool. 3rd party tape tools can read the data, though with processes such as compression and multiplexing, serious forensic knowledge would be required to produce the data. CommVault has several methods to greatly reduce the potential of the data being accessible once the data lifecycle is exceeded.

Erase Media

Erase media operation will physically mount a tape and overwrite the OML header. This will make the data unrecoverable through the CommCell console, Media Explorer, or using the Simpana 9 Catalog feature. Tapes can individually be erased by selecting the tape in the scratch pool and selecting Erase Media. You can also mark all tapes for a storage policy copy to be erased by enabling the Mark Media to be Erased after Recycling option in the Media tab of the policy copy.
In this case data on the tape is still theoretically accessible through 3rd party forensic tools since the data is not physically being destroyed. In some cases this may not be a concern. If this is a concern, CommVault recommends using data encryption. This will make it extremely hard but theoretically not impossible to recover data. It is very important to state at this point that the only true was to make data unrecoverable is to physically destroy the media. No encryption, degaussing, or erasing method will ever guarantee 100% chance that data cannot be recovered.

Information Destruction

Erase Data

The erase data feature provides functionality to selectively destroy information within a job. This can be implemented in two different ways:

  • A Commvault administrator can mark data unrecoverable
  • A user can delete an archive stub file which will mark data unrecoverable in protected storage.


The erase data feature logically marks the data as unrecoverable in the CommServe database. The information is not physically removed from media. Just like the erase media feature, CommVault recommends encrypting jobs to greatly reduce the risk of someone using 3rd party tools to recover the data.


Erase data can also be used in an archiving environment. Normally if a user deletes a stub of a file that has been archived, the stub will be deleted but the data will still be retained in CommVault protected storage. If Erase Data is enabled for the storage policy and the Subclient, you can apply the erase data policy to deleted stubs. When an archiving job is run it will scan for stub files. If any stub files have been deleted, the erase data feature will mark those files unrecoverable in protected storage.
Is this Right for Me?


Erase media is a CommCell level license. Once it is applied you need to enable Erase Data in the General tab of the storage policy. Once this license is applied the Media Explorer tool and the Catalog option cannot be used to recover backup data for the storage policy. This is because random binaries are written to the OML header in the Media Password location. The password will not be able to be used and you will always get a decryption error when using these tools. The Restore by Job feature will also be disabled for the storage policy.


When the erase data license is implemented it will only be effective when writing jobs to new or recycled media. The license cannot be retroactively applied to jobs already in storage. If the license is removed, it will only be effective when writing to new or recycled media. All jobs written to media for the storage policy when the license was being used cannot be recovered through Media Explorer or the Catalog feature.


If the erase data feature is something that would be of value to your organization then it is worth the risks previously described. If you are not sure, then it is recommended that you do not use it. If you have capacity based licensing arrangements with Simpana software, check to see if this license is installed. If it is, you may want to request that the license is removed from the CommCell environment licensing.

Records Management for Data Destruction

An advanced and powerful feature in the Commvault product suite that can be used to implement data destruction policies is Content Indexing. With Content indexing retention and destruction policies can be applied to all granular objects protected by CommVault. The objects can be indexed and searched based on contents to determine its relevance. Content Director policies can be designed to automate the search process and relevant data can be moved into legal hold policies with specific retention and destruction policies defined. With this method; files, document, and Email messages can be indexed and searched for key words. Relevant terms such as: 'Top Secret', 'Confidential', 'CEO', 'Insider Trading', etc… can be used along with specific owners and data types. Information can be preserved, destroyed, analyzed, or moved to 3rd party tools for further management or analysis.



Copyright © 2021 Commvault | All Rights Reserved.