NetBackup™ for Apache Cassandra Administrator's Guide

Last Published:
Product(s): NetBackup & Alta Data Protection (10.5)

Components and Terminologies of the Cassandra backup and recovery

The following table describes the purpose of different components and terminologies of the Cassandra backup and recovery solution.

Table:

Components and Terminologies

Purpose and Definition

Application Cluster

  • Application cluster is the Cassandra production cluster name.

  • Cluster name must be a single word with no white spaces in between words and must be the actual cluster name used in the Cassandra.yaml file on the production nodes.

Protection plan

A protection plan defines when backups are performed, how long the backups are retained, and the type of storage to use.

Once a protection plan is set up, assets can be subscribed to the protection plan.

Backup host

The backup host acts as a proxy client. All the backup and the restore operations are executed through the backup host.

The Cassandra Backup Recovery (CBR) solution, uses the BigData policy with application type cassandra.

The Protection plan uses this backup host.

The media server that is used to configure storage server for the CBR solution must be used as backup host.

Note:

You can also use NetBackup client as a backup host.

Cassandra cluster

Represents the Cassandra production cluster that you want to protect

Cassandra Backup Recovery component

The NetBackup thin client which gets deployed on data staging servers and Cassandra cluster to aid in backup and restore operations.

Data staging servers

NetBackup requires a set of servers for backup of Cassandra cluster in addition to the NetBackup primary, and backup hosts. These servers are typically 20% of the total number of servers in the Cassandra cluster. These servers are used to deduplicate the data from Cassandra cluster during backup and optimize the backup process.

During a backup or restore, Cassandra keyspace are streamed in-parallel between the Cassandra cluster and the data staging servers.

The data staging servers, represent a staging cluster. You need to deploy the nodes wherein, they are used depending on the size of data that needs to be backed up or restored.

Data reduction

As part of data reduction the following tasks are performed:

  • Efficient reconciliation

    Efficient reconciliation data for same keys from different nodes are transferred to the same node in the backup nodes.

    Reconciliations happen in-parallel within each data staging servers without any inter-node communication.

  • Record synthesis

    While iterating over the records, columns of the same key from different SStables are merged.

  • Semantic Deduplication

    Stale and duplicate records (replicas) are identified and removed.

NetBackup primary server

All the jobs are executed from the NetBackup primary server.

Parallel streams

The NetBackup parallel streaming framework allows data blocks from multiple nodes to be backed up using multiple backup hosts simultaneously.