NetBackup™ for Apache Cassandra Administrator's Guide
NetBackup Apache Cassandra support overview
Apache Cassandra is a popular scale-out NoSQL database. Cassandra runs on commodity hardware with direct-attached storage. A typical Cassandra cluster consists of nodes that store data. Cassandra replicates data among the nodes to provide resiliency against node downtimes. There is no notion of a primary copy of data and any node may have a more recent version of data record than its replicas. One of the important characteristics of Cassandra is that it prefers availability over consistency. The database is always available even if the replicas of data are not always up to date.
NetBackup provides advanced solution to protecting Cassandra clusters. The solution has the following characteristics:
Agentless: No need to place backup agents on Cassandra cluster nodes. Effectively, there is no code that hinders high-performance Cassandra cluster.
Single pass data copy: During backup, a thin client is used to make a single pass over the Cassandra data files (called sstables) to minimize IO footprint.
Off-host data optimization: Cassandra data is replicated for resiliency. Backups are for longer retention. NetBackup Cassandra solution processes data to:
Determine a cluster-consistent point-in-time.
Remove replica records.
Remove stale data that caused by record overwrites.
All this processing happens off-host on Data staging servers to ensure that backup processes do not affect your high-performance Cassandra clusters.
Incremental backups: NetBackup supports incremental backups of Cassandra to optimize backup times after a full backup. The solution automatically detects new key spaces or column families to take a full backup of these new structures while incremental backups of previously existing structures perform.
Scalable Backup: Cassandra lets you easily scale your Cassandra cluster by adding more nodes whenever required. It automatically redistributes the existing data to new nodes while the cluster is online. NetBackup Cassandra protection is scalable and lets you add more Data Staging Servers to meet your backup requirements.
DataCenter Identification: NetBackup Cassandra protection can be configured to backup data from a specific datacenter. It queries Cassandra cluster and automatically identifies the nodes present in various datacenters. It then engages only the nodes in the specific datacenter for backing up the data.
DataCenter aware restore: At the time of restore, NetBackup connects to the restore cluster and determines its current topology. The solution reconciles this topology with the one present at the backup time to allow for changes in the topology and restores considering the current topology. The solution provides more options for changing the datacenters, number of replicas in each datacenter, change in keyspace and column family names, etc. to help you with your restore requirements.
Note:
Indexes are not restored in case of rename column family scenario. So you add an index to the renamed column family.
Granular restore: NetBackup Cassandra solution lets you restore a part of the backup data set. You have option to restore a few of the key spaces or only some of the column families.
Repair-less Restore: The restore processes ensure that after data is restored, there is no need to perform further recovery steps. The data is available immediately after a restore in your high-performance Cassandra cluster.