NetBackup™ Deduplication Guide
- Introducing the NetBackup media server deduplication option
- Quick start
- Planning your deployment
- About MSDP storage and connectivity requirements
- About NetBackup media server deduplication
- About NetBackup Client Direct deduplication
- About MSDP remote office client deduplication
- About MSDP performance
- About MSDP stream handlers
- MSDP deployment best practices
- Provisioning the storage
- Licensing deduplication
- Configuring deduplication
- Configuring the Deduplication Multi-Threaded Agent behavior
- Configuring the MSDP fingerprint cache behavior
- Configuring MSDP fingerprint cache seeding on the storage server
- About MSDP Encryption using NetBackup KMS service
- Configuring a storage server for a Media Server Deduplication Pool
- Configuring a disk pool for deduplication
- Configuring a Media Server Deduplication Pool storage unit
- About MSDP optimized duplication within the same domain
- Configuring MSDP optimized duplication within the same NetBackup domain
- Configuring MSDP replication to a different NetBackup domain
- About NetBackup Auto Image Replication
- Configuring a target for MSDP replication to a remote domain
- Creating a storage lifecycle policy
- Resilient Network properties
- Editing the MSDP pd.conf file
- About protecting the MSDP catalog
- Configuring an MSDP catalog backup
- About NetBackup WORM storage support for immutable and indelible data
- MSDP cloud support
- About MSDP cloud support
- Cloud space reclamation
- About the disaster recovery for cloud LSU
- About Image Sharing using MSDP cloud
- About MSDP cloud immutable (WORM) storage support
- About immutable object support for AWS S3
- About immutable object support for AWS S3 compatible platforms
- About immutable storage support for Azure blob storage
- About immutable storage support for Google Cloud Storage
- S3 Interface for MSDP
- Configuring S3 interface for MSDP on MSDP build-your-own (BYO) server
- Identity and Access Management (IAM) for S3 interface for MSDP
- S3 APIs for S3 interface for MSDP
- Monitoring deduplication activity
- Managing deduplication
- Managing MSDP servers
- Managing NetBackup Deduplication Engine credentials
- Managing Media Server Deduplication Pools
- Changing a Media Server Deduplication Pool properties
- Configuring MSDP data integrity checking behavior
- About MSDP storage rebasing
- Managing MSDP servers
- Recovering MSDP
- Replacing MSDP hosts
- Uninstalling MSDP
- Deduplication architecture
- Configuring and using universal shares
- Using the ingest mode
- Enabling a universal share with object store
- Configuring isolated recovery environment (IRE)
- Using the NetBackup Deduplication Shell
- Managing users from the deduplication shell
- Managing certificates from the deduplication shell
- Managing NetBackup services from the deduplication shell
- Monitoring and troubleshooting NetBackup services from the deduplication shell
- Managing S3 service from the deduplication shell
- Troubleshooting
- About unified logging
- About legacy logging
- Troubleshooting MSDP installation issues
- Troubleshooting MSDP configuration issues
- Troubleshooting MSDP operational issues
- Trouble shooting multi-domain issues
- Appendix A. Migrating to MSDP storage
- Appendix B. Migrating from Cloud Catalyst to MSDP direct cloud tiering
- About direct migration from Cloud Catalyst to MSDP direct cloud tiering
- Appendix C. Encryption Crawler
About variable-length deduplication on NetBackup clients
Currently, NetBackup deduplication follows a fixed-length deduplication method where the data streams are chunked into fixed-length segments (128 KB) and then processed for deduplication. Fixed-length deduplication has the advantage of being a swift method and it consumes less computing resources. Fixed-length deduplication handles most kinds of data streams efficiently. However, there can be cases where fixed-length deduplication might result in low deduplication ratios.
If your data was modified in a shifting mode, that is, if some data was inserted in the middle of a file, then variable-length deduplication enables you to get higher deduplication ratios when you back up the data. Variable-length deduplication reduces backup storage, improves the backup performance, and lowers the overall cost that is spent on data protection.
Note:
Use variable-length deduplication for data types that do not show a good deduplication ratio with the current MSDP intelligent deduplication algorithm and affiliated streamers. Enabling Variable-length deduplication might improve the deduplication ratio, but consider that the CPU performance might get affected.
In variable-length deduplication, every segment has a variable size with configurable size boundaries. The NetBackup client examines and applies a secure hash algorithm (SHA-2) to the variable-length segments of the data. Each data segment is assigned a unique ID and NetBackup evaluates if any data segment with the same ID exists in the backup. If the data segment already exists, then the segment data is not stored again.
Warning:
If you enable compression for the backup policy, variable-length deduplication does not work even when you configure it.
The following table describes the effect of variable-length deduplication on the data backup:
Table: Effect of variable-length deduplication
Effect on the deduplication ratio |
Variable-length deduplication is beneficial if the data file is modified in a shifting mode, that is when data is inserted, removed, or modified at a binary level. When such modified data is backed up again, variable-length deduplication achieves a higher deduplication ratio. Thus, the second or subsequent backups have higher deduplication ratios. |
Effect on the CPU |
Variable-length deduplication can be a bit more resource-intensive than fixed-length deduplication to achieve a better deduplication ratio. Variable-length deduplication needs more CPU cycles to compute segment boundaries and the backup time might be more than the fixed-length deduplication method. |
Effect on data restore |
Variable-length deduplication does not affect the data restore process. |
By default, the variable-length deduplication is disabled on a NetBackup client. From NetBackup 10.2 onwards, you can enable variable-length deduplication by using cacontrol command-line utility. In the previous versions of NetBackup, you can enable it by adding parameters in the pd.conf
file. To enable the same settings for all NetBackup clients or policies, you must specify all the clients or policies in the pd.conf
file.
From NetBackup 10.2 onwards, the default version of the variable-length deduplication is VLD v2. If you have enabled variable-length deduplication in pd.conf
file and image backups do not exists in the storage, VLD v2 is used by default. If image backups already exist in the storage, NetBackup continues to use VLD v1.
In a deduplication load balancing scenario, you must upgrade the media servers to NetBackup 8.1.1 or later and modify the pd.conf
file on all the media servers. If a backup job selects an older media server (earlier than NetBackup 8.1.1) for the load balancing pool, fixed-length deduplication is used instead of variable-length deduplication. Avoid configuring media servers with different NetBackup versions in a load balancing scenario. The data segments generated from variable-length deduplication are different from the data segments generated from fixed-length deduplication. Therefore, load balancing media servers with different NetBackup versions results in a low deduplication ratio.
See Managing the variable-length deduplication using the cacontrol command-line utility.
See About the MSDP pd.conf configuration file.