NetBackup™ for Apache Cassandra Administrator's Guide
Components and Terminologies of the Cassandra backup and recovery
The following table describes the purpose of different components and terminologies of the Cassandra backup and recovery solution.
Table:
Components and Terminologies | Purpose and Definition |
---|---|
Application Cluster |
|
Protection plan | A protection plan defines when backups are performed, how long the backups are retained, and the type of storage to use. Once a protection plan is set up, assets can be subscribed to the protection plan. |
Backup host | The backup host acts as a proxy client. All the backup and the restore operations are executed through the backup host. The Cassandra Backup Recovery (CBR) solution, uses the BigData policy with application type cassandra. The Protection plan uses this backup host. The media server that is used to configure storage server for the CBR solution must be used as backup host. Note: You can also use NetBackup client as a backup host. |
Cassandra cluster | Represents the Cassandra production cluster that you want to protect |
Cassandra Backup Recovery component | The NetBackup thin client which gets deployed on data staging servers and Cassandra cluster to aid in backup and restore operations. |
Data staging servers | NetBackup requires a set of servers for backup of Cassandra cluster in addition to the NetBackup primary, and backup hosts. These servers are typically 20% of the total number of servers in the Cassandra cluster. These servers are used to deduplicate the data from Cassandra cluster during backup and optimize the backup process. During a backup or restore, Cassandra keyspace are streamed in-parallel between the Cassandra cluster and the data staging servers. The data staging servers, represent a staging cluster. You need to deploy the nodes wherein, they are used depending on the size of data that needs to be backed up or restored. |
Data reduction | As part of data reduction the following tasks are performed:
|
NetBackup primary server | All the jobs are executed from the NetBackup primary server. |
Parallel streams | The NetBackup parallel streaming framework allows data blocks from multiple nodes to be backed up using multiple backup hosts simultaneously. |