Commvault

Troubleshooting Jobs

Troubleshooting Failed or Hanged Backup Jobs


A backup job is made of multiple phases. Therefore, depending on where in the process a job hangs or fails, root causes can sometimes be narrowed down. Here are some common issues based on the progress of a job.




Backup Failing or Hanging at 0-5%


Check communication between components:
The most common reasons for a job to fail or hang at 0-5% are communication issues between the client and either the CommServe® server, the MediaAgent or both.

To check communication issues, use one of two options from the client:

  • Run Check Readiness
  • Use CVNetworkTestTool
  • If services are running on the client, but it still can't be reached, recycling the services can resolve the issue.


Check DNS resolution
DNS resolution is a frequent issue, especially the reverse DNS zone, which is used by Commvault® software.


To validate resolution, use one of two options:

  • Run CVIPInfo
  • Use CVNetworkTestTool
  • Ensure that the servers do not have host files containing erroneous entries that might override the DNS resolution.


Validate firewall rules:
Firewall can also prevent successful communications if firewall rules are not properly configured in the CommCell® console or if the ports are not properly opened on the firewall by the network team.


To validate the firewall rules:
Run a CVPing test from the CommServe® toward the client using the listening port defined in the firewall rule. This can confirm if the port is opened as expected in the firewall.


Another reason can be that the client index or index database is not available on the MediaAgent, or that the index directory is running out of space. This is usually easy to figure out since the error description clearly mentions the issue.
Look in the Job Controller, you might also see an index restore job that automatically kicked in.


To gather information about communication between components, review the CVD.log and cvfwd.log log files.




Backup Failing or Hanging at 25%


When the scan phase completes, the progress bar of the job reaches 25%. At that point, the client tries to open the data path pipe to reach the library. If for any reason, it has issues doing so, the job fails or hangs.


Common reasons for a job to hang at 25% is when the computer has multiple network interface cards (NICs). This can confuse the software. If this is the case:

  • Consider using data interface pairs (DIPs).
  • Switch ports and NICs duplex settings can also factor in unreliable communication.
  • It is recommended to use defined speed and duplex values, such as 1000 Mb Full Duplex, over auto negotiation.

Another reason can be that a library resource became unavailable during the scan phase:

  • Review the Libraries and ensure that all libraries, mount paths and drives have an online status
  • Review the Event Viewer for any library error related events




Backup Failing or Hanging between 25-100%


The most common issues encountered in the 25-100% range are network related issues, such as a severed connection or the network receives errors. First try to resume the job and validate if it can complete. If not, investigate:

  • Check the read errors on a client, such as when encountering corrupted blocks might impact the job, as this can be perceived as a network failure.
  • Check if permission changes during the backup job, such as on a SQL database might fail or hang a job as the agent can no longer read the data.





Copyright © 2021 Commvault | All Rights Reserved.