Based on the business needs and the required RTO/RPO, backup types can be categorised as full, incremental, cumulative (or differential), synthetic, incremental forever and image backups. Most organizations use a combination of these backup types to meet their backup and recovery requirements. Application awareness is another factor that needs to be considered when choosing a backup type. Lets see how these backup types works. IBM TSM also offers different types of backups.
Different Types of Data Backups
Full Backup: As the name applies, it is a full copy of the entire data set. The Storage Network Industry Association (SNIA) defines full backup as “A backup in which all of a defined set of data objects are copied, regardless of whether they have been modified since the last backup“. Organizations typically use full backup on a periodic basis because it requires more storage space and also takes more time to back up. It also uses most of the network resources & server resources. The full backup provides a faster data recovery and simplified restore processing.
Cumulative (Differential) Backup: It copies the data that has changed since the last full backup. Suppose for example the administrator wants to create a full backup on Monday and differential backups for the rest of the week. Tuesday’s backup would contain all of the data that has changed since Monday. It would therefore be identical to an incremental backup at this point. On Wednesday, however, the differential backup would backup any data that had changed since Monday (full backup). The advantage that differential backups have over incremental is shorter restore times. Restoring a differential backup never requires more than two copies. Of course the tradeoff is that as time progresses, a differential backup can grow to contain much more data than an incremental backup.
Incremental Backup: It copies the data that has changed since the last backup. For example, a full backup is created on Monday, and incremental backups are created for the rest of the week. Tuesday’s backup would only contain the data that has changed since Monday. Wednesday’s backup would only contain the data that has changed since Tuesday. The primary disadvantage to incremental backups is that they can be time-consuming to restore. Suppose an administrator wants to restore the backup from Wednesday. To do so, the administrator has to first restore Monday’s full backup. After that, the administrator has to restore Tuesday’s copy, followed by Wednesday’s.
Synthetic Backup: Another way to implement full backup is synthetic backup. This method is used when the production volume resources cannot be exclusively reserved for a backup process for extended periods to perform a full backup. A synthetic backup takes data from an existing full backup and merges it with the data from any existing incremental and cumulative backups. This effectively results in a new full backup of the data. This backup is called synthetic because the backup is not created directly from production data. A synthetic full backup enables a full backup copy to be created offline without disrupting the I/O operation on the production volume. This also frees up network resources from the backup process, making them available for other production uses. This is generally known as Progressive incremental backup in IBM TSM.
Incremental Forever Backup: Rather than scheduling periodic full backups, this backup solution requires only one initial full backup. Afterwards, an ongoing (forever) sequence of incremental backups occurs. The real difference, however, is that the incremental backups are automatically combined with the original in such a way that you never need to perform a full backup again. This method reduces the amount of data that goes across the network and reduces the length of the backup. window.
Image Backups:Image-level backup makes a copy of the virtual disk and configuration associated with a particular VM or a physical server. The backup is saved as a single entity called as image. This type of backup is suitable for restoring an entire VM in the event of a hardware failure or human error such as the accidental deletion of the VM. It is also possible to restore individual files and folders/directories within a virtual machine. In an image-level backup, the backup software can backup VMs without installing backup agents inside the VMs or at the hypervisor-level. The backup processing is performed by a proxy server that acts as the backup client, thereby offloading the backup processing from the VMs. The proxy server communicates to the management server responsible for managing the virtualized compute environment. It sends commands to create a snapshot of the VM to be backed up and to mount the snapshot to the proxy server. A snapshot captures the configuration and virtual disk data of the target VM and provides a point-in-time view of the VM. The proxy server then performs backup by using the snapshot. IBM TSM offers TSM for VE software to take this kind of virtual Machines (VM) backup.
To increase the efficiency of image-based backup, some vendors support incremental backup through tracking changed blocks. This feature identifies and tags any blocks that have changed since the last VM snapshot. This enables the backup application to backup only the blocks that have changed, rather than backing up every block. Changed block tracking technique dramatically reduces the amount of data copied before additional data reduction technologies are applied, reduces the backup windows and the amount of required storage for protecting VMs.
This changed block tracking technique also reduces recovery time (RTO) compared to full image restores by only restoring the delta of changed VM blocks. During a restore process, it is determined which blocks have changed since the last backup. For example, if a large database is corrupted, a changed block recovery would just restore the parts of the database that has changed since the last backup was made.
Recovery-in-place (Instant VM recovery) is a term that refers to running a VM directly from the backup device, using a backed up copy of the VM image instead of restoring that image file. One of the primary benefits of recovery in place is that it eliminates the need to transfer the image from the backup area to the primary storage area before it is restarted, so the application that are running on those VMs can be accessed more quickly. This not only saves time for recovery, but also reduces network bandwidth to restore files. When a VM is recovered in place it is dependent on the storage I/O performance of the actual disk target (disk backup appliance).
Application-Aware Backups: Application-aware backups also known as application-consistent backups which require a special agent software to be installed on the host to take backup of an application in consistent state. The backups which are discussed above (Full & Incremental) can only take the flat files backups in consistent state. These types are not useful if you are planning to take an application or database backups which are in online state. You can trigger an full flat file backup on a database files but that backup will not suitable for recovery as the database or application is in online state during backup. For this purpose, you need to install an agent software which communicates with the application or database and and it sends data to the backup device, it provides the intelligence required to backup the application in a consistent state.
Many applications provide a mechanism to back themselves up both hot and in a consistent state. In Windows, we have Volume Shadow Copy Service (VSS) which allows Microsoft applications to be backedup in that hot and consistent state. Similarly, we have RMAN for Oracle databases. IBM TSM offers Tivoli Data Protection software for various types of applications and databases to take the backups in consistent state.
How Microsoft Volume Shadow Copy Service (VSS) works
In the Microsoft VSS space, the backup software which is known as a requestor in VSS instructs the VSS service running on a server that it wants to back up an application on that server. For example, if the intent is to back up Microsoft SQL Server, the backup software (VSS requestor) will tell VSS on the machine that is running SQL Server that it wants to back up SQL Server. VSS then tells the application (SQL Server) to prepare itself for a hot backup. The application does this by completing in-flight I/O, flushing local buffers, applying logs, and setting any application flags required for smooth recovery. The application then reports its status back to VSS. VSS then instructs the application to quiesce (freeze application I/O) while the snapshot is created. Once the snapshot is taken, VSS instructs the application (SQL Server) to resume operations. All application I/O that was frozen during the quiesce is completed, and normal operations resume. The backup is then taken based on the snapshot created by VSS.
But it’s important to understand that even during the quiesce period, while the application remains up and running and serving user requests, the application I/O that is frozen is still occurring but not being committed to disk until the quiesce period ends. The quiesce period usually lasts for only a few seconds while the snapshot is taken.
The end result is an application-consistent backup which is guaranteed to be viable as a recovery option which was taken while the application was online and working. No users or clients of the application will be aware that the application was just placed in hot backup mode and a backup was taken.