NetBackup™ Deduplication Guide
- Introducing the NetBackup media server deduplication option
- Quick start
- Planning your deployment
- About MSDP storage and connectivity requirements
- About NetBackup media server deduplication
- About NetBackup Client Direct deduplication
- About MSDP remote office client deduplication
- About MSDP performance
- About MSDP stream handlers
- MSDP deployment best practices
- Provisioning the storage
- Licensing deduplication
- Configuring deduplication
- Configuring the Deduplication Multi-Threaded Agent behavior
- Configuring the MSDP fingerprint cache behavior
- Configuring MSDP fingerprint cache seeding on the storage server
- About MSDP Encryption using NetBackup Key Management Server service
- Configuring a storage server for a Media Server Deduplication Pool
- Configuring a disk pool for deduplication
- Configuring a Media Server Deduplication Pool storage unit
- About MSDP optimized duplication within the same domain
- Configuring MSDP optimized duplication within the same NetBackup domain
- Configuring MSDP replication to a different NetBackup domain
- About NetBackup Auto Image Replication
- Configuring a target for MSDP replication to a remote domain
- Creating a storage lifecycle policy
- Resilient network properties
- Editing the MSDP pd.conf file
- About protecting the MSDP catalog
- Configuring an MSDP catalog backup
- About NetBackup WORM storage support for immutable and indelible data
- Running MSDP services with the non-root user
- MSDP cloud support
- About MSDP cloud support
- Cloud space reclamation
- About the disaster recovery for cloud LSU
- About Image Sharing using MSDP cloud
- About MSDP cloud immutable (WORM) storage support
- About immutable object support for AWS S3
- About bucket-level immutable storage support for Google Cloud Storage
- About object-level immutable storage support for Google Cloud Storage
- About AWS IAM Role Anywhere support
- About Azure service principal support
- About NetBackup support for AWS Snowball Edge
- S3 Interface for MSDP
- Configuring S3 interface for MSDP on MSDP build-your-own (BYO) server
- Identity and Access Management (IAM) for S3 interface for MSDP
- S3 APIs for S3 interface for MSDP
- Disaster recovery in S3 interface for MSDP
- Monitoring deduplication activity
- Viewing MSDP job details
- Managing deduplication
- Managing MSDP servers
- Managing NetBackup Deduplication Engine credentials
- Managing Media Server Deduplication Pools
- Changing a Media Server Deduplication Pool properties
- Configuring MSDP data integrity checking behavior
- About MSDP storage rebasing
- Managing MSDP servers
- Recovering MSDP
- Replacing MSDP hosts
- Uninstalling MSDP
- Deduplication architecture
- Configuring and using universal shares
- Configuring universal share user authentication
- Using the ingest mode
- Enabling a universal share with object store
- Configure a universal share accelerator
- About the universal share accelerator quota
- Configuring isolated recovery environment (IRE)
- Configuring an isolated recovery environment using the web UI
- Configuring an isolated recovery environment using the command line
- Using the NetBackup Deduplication Shell
- Managing users from the deduplication shell
- About the external MSDP catalog backup
- Managing certificates from the deduplication shell
- Managing NetBackup services from the deduplication shell
- Monitoring and troubleshooting NetBackup services from the deduplication shell
- Managing S3 service from the deduplication shell
- Troubleshooting
- About unified logging
- About legacy logging
- Troubleshooting MSDP configuration issues
- Troubleshooting MSDP operational issues
- Trouble shooting multi-domain issues
- Appendix A. Migrating to MSDP storage
- Appendix B. Migrating from Cloud Catalyst to MSDP direct cloud tiering
- About direct migration from Cloud Catalyst to MSDP direct cloud tiering
- Appendix C. Encryption Crawler
About sampling and predictive cache
MSDP uses a memory up to a size that is configured in MaxCacheSize to cache fingerprints for efficient deduplication lookup. A new fingerprint cache lookup data scheme that is introduced in NetBackup release 10.1 reduces the memory usage. It splits the current memory cache into two components, sampling cache (S-cache) and predictive cache (P-cache). S-cache caches a percentage of the fingerprints from each backup and is used to find similar data from the samples of previous backups for deduplication. P-cache caches the fingerprints that is most likely used in the immediate future for deduplication lookup.
At the start of a job, a small portion of the fingerprints from its last backup is loaded into P-cache as initial seeding. The fingerprint lookup is done with P-cache to find duplicates, and the lookup misses are searched from S-cache samples to find the possible matches of previous backup data. If found, part of the matched backup fingerprints is loaded into P-cache for future deduplication.
The S-cache and P-cache fingerprint lookup method is enabled for local and cloud storage volumes with MSDP non-BYO deployments including Flex, Flex Worm, Flex Scale, NetBackup Appliance, AKS, and EKS deployment. This method is also enabled for cloud-only volumes for MSDP BYO platforms. For the platforms with cloud-only volume support, local volume still uses the original cache lookup method. You can find S-cache and P-cache configuration parameters under Cache section of configuration file contentrouter.cfg
.
From NetBackup 10.2, S-cache and P-cache fingerprint lookup method for local storage is used with the new setup for Flex, Flex WORM, and NetBackup Appliance. Upgrade does not change S-cache and P-cache fingerprint lookup method.
The default values for MSDP BYO platforms:
Configuration | Default value |
---|---|
MaxCacheSize | 50% |
MaxPredictiveCacheSize | 20% |
MaxSamplingCacheSize | 5% |
EnableLocalPredictiveSamplingCache in | false |
EnableLocalPredictiveSamplingCache in | false |
The default values for MSDP non-BYO platforms:
Configuration | Default value |
---|---|
MaxCacheSize | 512MiB |
MaxPredictiveCacheSize | 40% |
MaxSamplingCacheSize | 20% |
EnableLocalPredictiveSamplingCache in | true |
EnableLocalPredictiveSamplingCache in | true |
For MSDP non-BYO deployments, the local volume and cloud volume share the same S-cache and P-cache size. For the BYO deployment, S-cache and P-cache are only for cloud volume, and MaxCacheSize is still used for local volume. In case the system is not used for cloud backup, MaxPredictiveCacheSize and MaxSamplingCacheSize can be set to a small value, for example, 1% or 128MiB. MaxCacheSize can be set to a large value, for example, 50% or 60%. Similarly, if the system is used for cloud backups only, MaxCacheSize can be set to 1% or 128MiB, and MaxPredictiveCacheSize and MaxSamplingCacheSize can be set to a larger value.
The S-cache size is determined by the back-end MSDP capacity or the number of fingerprints from the back-end data. With the assumption that average segment size of 32KB, the S-cache size is about 100MB per TB of back-end capacity. P-cache size is determined by the number of concurrent jobs and data locality or working set of the incoming data. With working set of 250MB per stream (about 5 million fingerprints). For example, 100 concurrent stream needs minimum memory of 25GB (100*250MB). The working set can be larger for certain applications with multiple streams and large data sets. As P-cache is used for fingerprint deduplication lookup and all fingerprints that are loaded into P-cache stay there until its allocated capacity is reached, the larger the P-cache size, the better the potential lookup hit rate, and the more memory usage. Under-sizing S-cache or P-cache leads to reduced deduplication rates and over-sizing increases the memory cost.