Retrospect Virtual - Data Integrity Check

Virtualization


Introduction

Data backup is essential to business or organization and having a data backup plan is only as good as the integrity of the backup data. To ensure that this objective is met, Virtual Console backup server and Host Server/Virtual Console client provides an improved Data Integrity Check (DIC) feature where the end user can easily verify the integrity of the data stored on the backup destination(s) (i.e. Virtual Console, Cloud storage, or Local storage) to ensure that the backup data is recoverable.

In backing up large or even small file(s), data corruption may still occur during a backup job or even in a post-backup job. Some of the possible causes are:

  • Bad program exits (i.e. Host Server/Virtual Console application terminated unexpectedly when an active backup job is in progress)

  • Technical problems on the Host Server/Virtual Console client machine (e.g. hardware failure, unexpected reboot, unexpected loss of power)

  • Technical problems on the Virtual Console backup server (e.g. hardware failure, unexpected reboot, unexpected loss of power, storage issues, human error)

  • Technical problems on the cloud storage service

Since data corruption is always a possibility, the solution is to identify and then remove corrupted files from the backup destination(s). Identifying and removing corrupted files from the backup destination(s) is mission critical as it measures the integrity of the backup data and its restorability.

The primary role of the Data Integrity Check is to identify and remove corrupted files from the backup destination(s). This will allow the next backup job to have an opportunity to back up these files again. However, corrupted files which are located in the retention area will not be backed up as the source file(s) no longer exists.

Key Features

  • Identify and remove the files and/or folders in the backup destination(s) which do not appear in the index

  • Identify and remove the files and/or folders which appear in the index but do not actually exist in the backup destinations (i.e. Virtual Console, Cloud storage, or Local storage)

  • Identify and remove corrupted files from the backup destination(s) when the Run Cyclic Redundancy Check (CRC) During Data Integrity Check setting is enabled

  • Identify and remove partially uploaded (orphan) files from the backup destination(s) to free up storage space

  • (TEST MODE) confirmation screen (applicable on Host Server/Virtual Console client)

  • Update storage statistics

Initiating Data Integrity Check (DIC)

Data Integrity Check can be started using the following options:

  • Host Server/Virtual Console client GUI

  • Virtual Console Web Console for Run on Server (Office 365 and Cloud File) Backup

  • RunDataIntegrityCheck.bat batch file (applicable for Windows operating system only)

  • RunDataIntegrityCheck.sh script file (applicable for FreeBSD/Linux (CLI) operating systems only)

Data Integrity Check (DIC) Modes

There are two (2) data integrity check modes:

  • With Run Cyclic Redundancy Check (CRC) disabled (Default mode)

  • With Run Cyclic Redundancy Check (CRC) enabled

With Run Cyclic Redundancy Check (CRC) disabled (Default mode)

This is the default setting of the data integrity check. Running a data integrity check on this mode allows the Host Server/Virtual Console client or Virtual Console backup server to perform a comparison between the files and/or folders on the backup destination(s) and the list of the files and/or folders recorded in the current index file.

When should I run a Data Integrity Check in default mode?

  • If you encounter index issues on your backup/restore job

  • If you know or suspect the backup set storage statistics are not updated or incorrect and cannot wait for the next weekly Periodic Data Integrity Check (PDIC) job

  • If you need to remove partially uploaded (orphan) files from the backup destination(s) to free up space, as partially uploaded (orphan) files will remain in the backup destination(s) when backup jobs with large files (i.e. database, VMware/Hyper-V, Windows System) backups are terminated unexpectedly or crashes

With Run Cyclic Redundancy Check (CRC) enabled

Running a data integrity check on this mode will perform check on the integrity of the files in the backup destination(s) against the checksum file generated at the time of the backup job.

If there is a discrepancy, this indicates that the file(s) on the backup destination(s) are corrupted. The Host Server/Virtual Console client or Virtual Console backup server will remove these files from the backup destination(s). If these files still exist on the client machine or backup server on the next backup job, The Host Server/Virtual Console client or Virtual Console backup server will upload the latest copy.

For large file sizes, a percentage progress will be displayed throughout the data integrity check job when this setting is enabled:

When should I run a Data Integrity Check with Run Cyclic Redundancy Check (CRC) enabled?

With Periodic Data Integrity Check (PDIC) and post-backup validation features on the Host Server/Virtual Console v8.3.2.11 or above, it is not necessary to frequently run a Data Integrity Check with Run Cyclic Redundancy Check (CRC) enabled. Also, this option can take a long time to complete as the Host Server/Virtual Console will need to download all the files and/or folders from the backup destination(s) on the Host Server/Virtual Console client machine in order to perform the actual Cyclic Redundancy Check (CRC).

To reduce the time taken, you should consider selecting only one backup destination at a time if applicable.

It is recommended to use this option:

  • When the Host Server/Virtual Console client machine encounters “corrupted file” errors during a restore job, running a data integrity check with Cyclic Redundancy Check (CRC) enabled may help to identify and clean up the corrupted files and allows the end user to recover any remaining data from the backup set(s)

  • When a backup destination has encountered a hardware failure (e.g. a disk failure on a Virtual Console user home drive or Host Server/Virtual Console Local destination drive)

Limitations

  • Data Integrity Check has to be started manually from the Host Server/Virtual Console client UI. It cannot be remotely started from the Virtual Console web console or scheduled backup to run automatically. The only exception is for a Run on Server (Office 365 or Cloud File) backup sets were a data integrity check can be started from the Virtual Console web console

  • When a Data Integrity Check has identified issues on the backup set, it may require the end user to confirm the changes before it takes the corrective actions

  • When a data integrity check is running, a backup and restore job cannot be run and vice versa: When an active backup or restore job(s) is running, a data integrity check cannot be run

How it works

The following diagrams show the detailed flow for each data integrity check mode.

With Run Cyclic Redundancy Check (CRC) disabled (Default mode):

Virtual data integrity check 1

Virtual data integrity check 2

With Run Cyclic Redundancy Check (CRC) enabled:

Virtual data integrity check crc 1

Test Mode Confirmation Screen

Normally as part of the data integrity job, (TEST MODE) confirmation screen is usually displayed once a data integrity check is completed, which gives a summary report of the corrupted files, invalid indexes, or storage statistics issue for each backup destination.

The (TEST MODE) confirmation screen allows the end user to review the results of the data integrity check, and to decide whether they would like to proceed with the corrective actions.

To further streamline the data integrity check process and improve user experience, the (TEST MODE) confirmation screen will ONLY prompt if either of the criteria’s below matches the backup data during the data integrity check operation:

  • deleted number of backup files is over 1,000

  • deleted number of backup file size is over 512 MB (in total)

  • deleted number of backup files is over 10% of the total backup files

Otherwise, the data integrity check job will automatically take corrective actions.

The (TEST MODE) screen includes five (5) summary report for the following items found per backup destination:

Virtual data integrity check test mode 1

Virtual data integrity check note 1

Below is an example of a (TEST MODE) confirmation screen with the following scenario:

  • Multiple backup destinations, corrupted items and index-related issues found with correct and incorrect storage statistics.

Data Integrity Check (DIC) vs. Periodic Data Integrity Check (PDIC)

Periodic Data Integrity Check (PDIC)

Periodic data integrity check is performed at the beginning of a backup job, which provides an additional regular data integrity check of the backup data and updates the storage statistics for each backup set. The PDIC feature is enabled and cannot be turned off. This is to ensure a maximum protection of the backup data.

Unlike with the data integrity check, the PDIC starts automatically and performs a quick check of all the backup destination(s) without the end user intervention.

The PDIC will be initiated automatically once either of the following conditions is met:

  • Will be triggered on a weekly basis, usually on the first run of backup job that falls on any of these days: Friday, Saturday, or Sunday

OR

  • If there is no active backup job(s) running on Friday, Saturday, or Sunday, then the PDIC will be triggered on the next available backup job

e.g. If the last PDIC job was run more than seven (7) days ago, then the subsequent PDIC job(s) will run seven days from that day onwards.

Differences between Data Integrity Check (DIC) and Periodic Data Integrity Check (PDIC)

Virtual data integrity check features


Last Update: July 17, 2020