Search <book_title>...

NetBackup™ Deduplication Guide

Last Published: 2024-03-31

Product(s): NetBackup & Alta Data Protection (10.4)

Introducing the NetBackup media server deduplication option
1. About the NetBackup deduplication options
Quick start
1. About client-side deduplication
2. About the media server deduplication (MSDP) node cloud tier
  1. Configuring the MSDP node cloud tier
3. About Auto Image Replication (A.I.R.)
Planning your deployment
1. Planning your MSDP deployment
2. NetBackup naming conventions
3. About MSDP deduplication nodes
4. About the NetBackup deduplication destinations
5. About MSDP storage capacity
6. About MSDP storage and connectivity requirements
  1. Fibre Channel and iSCSI comparison for MSDP
7. About NetBackup media server deduplication
8. About NetBackup Client Direct deduplication
  1. About MSDP client deduplication requirements and limitations
9. About MSDP remote office client deduplication
  1. About MSDP remote client data security
  2. About remote client backup scheduling
10. About the NetBackup Deduplication Engine credentials
11. About the network interface for MSDP
12. About MSDP port usage
13. About MSDP optimized synthetic backups
14. About MSDP and SAN Client
15. About MSDP optimized duplication and replication
16. About MSDP performance
  1. How file size may affect the MSDP deduplication rate
17. About MSDP stream handlers
  1. Oracle stream handler
  2. Microsoft SQL Server stream handler
18. MSDP deployment best practices
Provisioning the storage
1. About provisioning the storage for MSDP
2. Do not modify MSDP storage directories and files
3. About volume management for NetBackup MSDP
Licensing deduplication
1. About the MSDP license
2. Licensing NetBackup MSDP
Configuring deduplication
1. Configuring MSDP server-side deduplication
2. Configuring MSDP client-side deduplication
3. About the MSDP Deduplication Multi-Threaded Agent
4. Configuring the Deduplication Multi-Threaded Agent behavior
  1. MSDP mtstrm.conf file parameters
5. Configuring deduplication plug-in interaction with the Multi-Threaded Agent
6. About MSDP fingerprinting
7. About the MSDP fingerprint cache
8. Configuring the MSDP fingerprint cache behavior
  1. MSDP fingerprint cache behavior options
9. About seeding the MSDP fingerprint cache for remote client deduplication
10. Configuring MSDP fingerprint cache seeding on the client
11. Configuring MSDP fingerprint cache seeding on the storage server
  1. NetBackup seedutil options
12. About sampling and predictive cache
13. Rebuilding the sampling cache
14. Enabling 400 TB support for MSDP
15. About MSDP Encryption using NetBackup Key Management Server service
  1. Upgrading KMS for MSDP
  2. Enabled KMS encryption for Local LSU
16. About MSDP Encryption using external KMS server
17. Configuring a storage server for a Media Server Deduplication Pool
  1. MSDP storage path properties
  2. MSDP network interface properties
18. About disk pools for NetBackup deduplication
19. Configuring a disk pool for deduplication
  1. Media Server Deduplication Pool properties
20. Creating the data directories for 400 TB MSDP support
21. Adding volumes to a 400 TB Media Server Deduplication Pool
22. Configuring a Media Server Deduplication Pool storage unit
  1. Media Server Deduplication Pool storage unit properties
  2. MSDP storage unit recommendations
23. Configuring client attributes for MSDP client-side deduplication
24. Disabling MSDP client-side deduplication for a client
25. Disable client-side deduplication for all clients in a policy
26. About MSDP compression
27. About MSDP encryption
28. Configuring encryption for MSDP local storage volume
29. Configuring encryption for MSDP cloud storage volumes
30. Configuring MSDP encryption on different platforms
31. About the rolling data conversion mechanism for MSDP
32. Modes of rolling data conversion
33. MSDP encryption behavior and compatibilities
34. Configuring optimized synthetic backups for MSDP
35. About a separate network path for MSDP duplication and replication
36. Configuring a separate network path for MSDP duplication and replication
37. About MSDP optimized duplication within the same domain
  1. About the media servers for MSDP optimized duplication within the same domain
    1. About MSDP push duplication within the same domain
    2. About MSDP pull duplication within the same domain
38. Configuring MSDP optimized duplication within the same NetBackup domain
  1. Configuring NetBackup optimized duplication or replication behavior
    1. Setting NetBackup configuration options by using the command line
39. About MSDP replication to a different domain
40. Configuring MSDP replication to a different NetBackup domain
  1. About NetBackup Auto Image Replication
  2. About trusted primary servers for Auto Image Replication
  3. About the certificate to use to add a trusted primary server
  4. Add a trusted primary server
  5. Remove a trusted primary server
  6. Enable inter-node authentication for a NetBackup clustered primary server
  7. Configuring NetBackup CA and NetBackup host ID-based certificate for secure communication between the source and the target MSDP storage servers
  8. Configuring external CA for secure communication between the source MSDP storage server and the target MSDP storage server
  9. Configuring a target for MSDP replication to a remote domain
    1. Target options for MSDP replication
    2. Configuring a NetBackup Deduplication Engine user with limited permissions for Auto Image Replication
41. About configuring MSDP optimized duplication and replication bandwidth
42. About performance tuning of optimized duplication and replication for MSDP cloud
43. About storage lifecycle policies
44. About the storage lifecycle policies required for Auto Image Replication
45. Creating a storage lifecycle policy
  1. Storage Lifecycle Policy dialog box settings
46. About MSDP backup policy configuration
47. Creating a backup policy
48. Resilient network properties
  1. Resilient connection resource usage
50. Adding an MSDP load balancing server
51. About variable-length deduplication on NetBackup clients
52. Managing the variable-length deduplication using the cacontrol command-line utility
53. About the MSDP pd.conf configuration file
54. Editing the MSDP pd.conf file
  1. MSDP pd.conf file parameters
55. About the MSDP contentrouter.cfg file
56. About saving the MSDP storage server configuration
57. Saving the MSDP storage server configuration
58. Editing an MSDP storage server configuration file
59. Setting the MSDP storage server configuration
60. About the MSDP host configuration file
61. Deleting an MSDP host configuration file
62. Resetting the MSDP registry
63. About protecting the MSDP catalog
  1. About the MSDP shadow catalog
  2. About the MSDP catalog backup policy
64. Changing the MSDP shadow catalog path
65. Changing the MSDP shadow catalog schedule
66. Changing the number of MSDP catalog shadow copies
67. Configuring an MSDP catalog backup
  1. MSDP drcontrol options
68. Updating an MSDP catalog backup policy
69. About MSDP FIPS compliance
70. Configuring the NetBackup client-side deduplication to support multiple interfaces of MSDP
71. About MSDP multi-domain support
72. About MSDP application user support
73. About MSDP mutli-domain VLAN Support
74. About NetBackup WORM storage support for immutable and indelible data
  1. About the NetBackup command line options to configure immutable and indelible data
75. Running MSDP services with the non-root user
  1. Changing the service user after installation or upgrade
76. Running MSDP commands with the non-root user
MSDP cloud support
1. About MSDP cloud support
  1. Operating system requirement for configuration
  2. Limitations
2. Create a Media Server Deduplication Pool (MSDP, MSDP Cloud) storage server in the NetBackup web UI
3. Managing credentials for MSDP-C
4. Creating a cloud storage unit
5. Updating cloud credentials for a cloud LSU
6. Updating encryption configurations for a cloud LSU
7. Deleting a cloud LSU
8. Backup data to cloud by using cloud LSU
9. Duplicate data cloud by using cloud LSU
10. Configuring AIR to use cloud LSU
11. About backward compatibility support
12. About the configuration items in cloud.json, contentrouter.cfg, and spa.cfg
13. Cloud space reclamation
  1. Configuring the container aging
  2. Configuring the cloud compaction
14. About the tool updates for cloud support
15. About the disaster recovery for cloud LSU
  1. Common disaster recovery steps
  2. Disaster recovery for cloud LSU in Flex Scale
  3. Additional steps for Veritas Alta Recovery Vault Azure disaster recovery
16. About Image Sharing using MSDP cloud
  1. Things to consider before you use image sharing to convert VM image to VHD in Azure
  2. Converting the VM image to VHD in Azure
17. About restore from a backup in Microsoft Azure Archive
18. About Veritas Alta Recovery Vault Azure and Amazon
19. Configuring Veritas Alta Recovery Vault Azure and Azure Government
20. Configuring Veritas Alta Recovery Vault Azure and Azure Government using the CLI
21. Configuring Veritas Alta Recovery Vault Amazon and Amazon Government
22. Configuring Veritas Alta Recovery Vault Amazon and Amazon Government using the CLI
23. Migrating from standard authentication to token-based authentication for Recovery Vault
24. About MSDP cloud immutable (WORM) storage support
  1. Creating a cloud immutable storage unit using the web UI
  2. Updating a cloud immutable volume
  3. About immutable object support for AWS S3
  4. About immutable object support for AWS S3 compatible platforms
  5. About immutable storage support for Azure blob storage
  6. About bucket-level immutable storage support for Google Cloud Storage
    1. Creating a Google cloud immutable storage using the Web UI
    2. Managing a Google cloud immutable storage using msdpcldutil tool
  7. About object-level immutable storage support for Google Cloud Storage
    1. Google cloud storage user permissions to create the cloud immutable volume
  8. About using the cloud immutable storage in a cluster environment
  9. Troubleshooting the errors when disk volume creation using web UI fails
  10. Deleting the immutable image with the enterprise mode
  11. Deleting the S3 object permanently
  12. About MSDP cloud admin tool
25. About AWS IAM Role Anywhere support
  1. Prerequisites for AWS IAM Role Anywhere configuration
  2. Configure IAM Role Anywhere in AWS
26. About Azure service principal support
  1. Prerequisites for Azure service principal configuration
  2. Configure Azure service principal
    1. Create a new custom role
    2. Create a new service principal
  3. Configure a disk pool using Azure service principal
27. About instant access for object storage in cloud
28. About NetBackup support for AWS Snowball Edge
  1. Interfacing with the device
  2. Using Credentials
  3. Configuring NetBackup for AWS Snowball Edge
    1. Configuring SSL for AWS Snowball Edge
    2. Configuring NetBackup for AWS Snowball Edge with SSL Enabled
  4. Shipping the device
  5. Reconfigure NetBackup to work with S3
    1. Bucket is in a default AWS Region
    2. Bucket is in a non-default AWS Region (or storage already exists in the AWS region)
  6. Configuring NetBackup for AWS Snowball Edge using CLI
  7. Using AWS Snowball Edge for large backup restore
  8. Limitations when AWS Snowball Edge is used
29. Upgrading to NetBackup 10.3 and cluster environment
S3 Interface for MSDP
1. About S3 interface for MSDP
2. Prerequisites for MSDP build-your-own (BYO) server
3. Configuring S3 interface for MSDP on MSDP build-your-own (BYO) server
  1. Changing the certificate in S3 server
  2. Changing the ETAG type of the S3 objects
4. Identity and Access Management (IAM) for S3 interface for MSDP
5. S3 Object Lock In Flex WORM
6. S3 APIs for S3 interface for MSDP
7. Creating a protection policy for the MSDP object store
8. Recovering the MSDP object store data from the backup images
9. Disaster recovery in S3 interface for MSDP
  1. Recovering the MSDP S3 IAM configurations from cloud LSU
10. Limitations in S3 interface for MSDP
11. Logging and troubleshooting
12. Best practices
Monitoring deduplication activity
1. Monitoring the MSDP deduplication and compression rates
2. Viewing MSDP job details
  1. MSDP job details
3. About MSDP storage capacity and usage reporting
4. About MSDP container files
5. Viewing storage usage within MSDP container files
6. About monitoring MSDP processes
7. Reporting on Auto Image Replication jobs
8. Checking the image encryption status
Managing deduplication
1. Managing MSDP servers
2. Managing NetBackup Deduplication Engine credentials
3. Managing Media Server Deduplication Pools
4. Analyzing the disc space consumption of the backup images
5. Deleting backup images
6. About MSDP queue processing
7. Processing the MSDP transaction queue manually
8. About MSDP data integrity checking
9. Configuring MSDP data integrity checking behavior
  1. MSDP data integrity checking configuration parameters
10. About managing MSDP storage read performance
11. About MSDP storage rebasing
  1. MSDP server-side rebasing parameters
12. About the MSDP data removal process
13. Resizing the MSDP storage partition
14. How MSDP restores work
15. Configuring MSDP restores directly to a client
16. About restoring files at a remote site
17. About restoring from a backup at a target primary domain
18. Specifying the restore server
19. Enabling extra OS STIG hardening on WORM storage server instance
Recovering MSDP
1. About recovering the MSDP catalog
2. Restoring the MSDP catalog from a shadow copy
3. Recovering from an MSDP storage server disk failure
4. Recovering from an MSDP storage server failure
5. Recovering the MSDP storage server after NetBackup catalog recovery
Replacing MSDP hosts
1. Replacing the MSDP storage server host computer
Uninstalling MSDP
1. About uninstalling MSDP
2. Deactivating MSDP
Deduplication architecture
1. MSDP server components
2. Media server deduplication backup process
3. MSDP client components
4. MSDP client - side deduplication backup process
Configuring and using universal shares
1. About universal shares
2. Advantages of universal shares
3. Configuring and using an MSDP build-your-own (BYO) server for universal shares
4. MSDP build-your-own (BYO) server prerequisites and hardware requirements to configure universal shares
5. About the deduplication web service user and the user group for MSDP BYO server
6. Configuring universal share user authentication
  1. Active Directory-based authentication
  2. Local user-based authentication
  3. Kerberos-based authentication
7. Mounting a universal share created from the NetBackup web UI
8. About universal share self-service recovery
9. Performing a universal share self-service recovery
10. Using the ingest mode
  1. Using the ingest mode to take a snapshot over NFS or SMB
  2. Using the ingest mode to run a policy using NFS or SMB
11. About universal shares with object store
12. Enabling a universal share with object store
  1. Enabling instant access with object storage
13. Universal share with disabled MSDP data volumes
14. About the vpfs_stats utility
15. Disaster recovery for a universal share
16. Changing the number of vpfsd instances
17. Enabling variable-length deduplication (VLD) algorithm for universal shares
18. Upgrading to NetBackup 10.4
19. About universal share accelerator
20. Preparing NetBackup for the universal share accelerator
21. Installing the universal share accelerator
22. Configure a universal share accelerator
  1. Creating a universal share accelerator
  2. Mounting a Universal share accelerator
  3. Deleting a universal share accelerator
  4. Unconfiguring a universal share accelerator
  5. Managing the universal share accelerator services
  6. Adding additional storage paths for universal share accelerator
23. Creating a protection policy for the universal share accelerator
24. About the universal share accelerator quota
  1. Enabling or changing the quota
  2. Reviewing the quota usage
  3. Repairing the quota of the universal share
25. Recovering a point in time for the universal share accelerator
26. Deleting a recovered universal share accelerator
27. Logging for universal share accelerator
28. Logging and reporting for universal share VPFS instance
29. Vpfsd logs for file system operations in universal shares
30. Using the marker file interface for universal share operations
Configuring isolated recovery environment (IRE)
1. Requirements
2. Configuring the network isolation
3. Configuring an isolated recovery environment using the web UI
4. Configuring an isolated recovery environment using the command line
Using the NetBackup Deduplication Shell
1. About the NetBackup Deduplication Shell
2. Managing users from the deduplication shell
3. Managing VLAN interfaces from the deduplication shell
4. Managing the retention policy on a WORM storage server
5. Managing images with a retention lock on a WORM storage server
6. Auditing WORM retention changes
7. Protecting the NetBackup catalog from the deduplication shell
8. About the external MSDP catalog backup
9. Managing certificates from the deduplication shell
10. Managing FIPS mode from the deduplication shell
11. Encrypting backups from the deduplication shell
12. Tuning the MSDP configuration from the deduplication shell
13. Setting the MSDP log level from the deduplication shell
14. Managing NetBackup services from the deduplication shell
15. Monitoring and troubleshooting NetBackup services from the deduplication shell
16. Managing S3 service from the deduplication shell
17. Multi-person authorization for deduplication shell commands
18. Managing cloud LSU in Flex Scale and Cloud Scale
Troubleshooting
1. About unified logging
  1. About using the vxlogview command to view unified logs
  2. Examples of using vxlogview to view unified logs
2. About legacy logging
  1. Creating NetBackup log file directories for MSDP
3. NetBackup MSDP log files
4. Troubleshooting MSDP configuration issues
5. Troubleshooting MSDP operational issues
6. Viewing MSDP disk errors and events
7. MSDP event codes and messages
8. Unable to obtain the administrator password to use an AWS EC2 instance that has a Windows OS
9. Trouble shooting multi-domain issues
10. Troubleshooting the cloud compaction error messages
Appendix A. Migrating to MSDP storage
1. Migrating from another storage type to MSDP
Appendix B. Migrating from Cloud Catalyst to MSDP direct cloud tiering
1. About migration from Cloud Catalyst to MSDP direct cloud tiering
2. About Cloud Catalyst migration strategies
3. About direct migration from Cloud Catalyst to MSDP direct cloud tiering
4. About postmigration configuration and cleanup
5. About the Cloud Catalyst migration -dryrun option
6. About Cloud Catalyst migration cacontrol options
7. Reverting back to Cloud Catalyst from a successful migration
8. Reverting back to Cloud Catalyst from a failed migration
Appendix C. Encryption Crawler
1. About the Encryption Crawler
2. About the two modes of the Encryption Crawler
3. Managing the Encryption Crawler
4. Advanced options
5. Tuning options
6. Encrypting the data
7. Command usage example outputs
Index

About sampling and predictive cache

MSDP uses a memory up to a size that is configured in MaxCacheSize to cache fingerprints for efficient deduplication lookup. A new fingerprint cache lookup data scheme that is introduced in NetBackup release 10.1 reduces the memory usage. It splits the current memory cache into two components, sampling cache (S-cache) and predictive cache (P-cache). S-cache caches a percentage of the fingerprints from each backup and is used to find similar data from the samples of previous backups for deduplication. P-cache caches the fingerprints that is most likely used in the immediate future for deduplication lookup.

At the start of a job, a small portion of the fingerprints from its last backup is loaded into P-cache as initial seeding. The fingerprint lookup is done with P-cache to find duplicates, and the lookup misses are searched from S-cache samples to find the possible matches of previous backup data. If found, part of the matched backup fingerprints is loaded into P-cache for future deduplication.

The S-cache and P-cache fingerprint lookup method is enabled for local and cloud storage volumes with MSDP non-BYO deployments including Flex, Flex Worm, Flex Scale, NetBackup Appliance, AKS, and EKS deployment. This method is also enabled for cloud-only volumes for MSDP BYO platforms. For the platforms with cloud-only volume support, local volume still uses the original cache lookup method. You can find S-cache and P-cache configuration parameters under Cache section of configuration file contentrouter.cfg.

From NetBackup 10.2, S-cache and P-cache fingerprint lookup method for local storage is used with the new setup for Flex, Flex WORM, and NetBackup Appliance. Upgrade does not change S-cache and P-cache fingerprint lookup method.

The default values for MSDP BYO platforms:

Configuration	Default value
MaxCacheSize	50%
MaxPredictiveCacheSize	20%
MaxSamplingCacheSize	5%
EnableLocalPredictiveSamplingCache in `contentrouter.cfg`	false
EnableLocalPredictiveSamplingCache in `spa.cfg`	false

The default values for MSDP non-BYO platforms:

Configuration	Default value
MaxCacheSize	512MiB
MaxPredictiveCacheSize	40%
MaxSamplingCacheSize	20%
EnableLocalPredictiveSamplingCache in `contentrouter.cfg`	true
EnableLocalPredictiveSamplingCache in `spa.cfg`	true

For MSDP non-BYO deployments, the local volume and cloud volume share the same S-cache and P-cache size. For the BYO deployment, S-cache and P-cache are only for cloud volume, and MaxCacheSize is still used for local volume. In case the system is not used for cloud backup, MaxPredictiveCacheSize and MaxSamplingCacheSize can be set to a small value, for example, 1% or 128MiB. MaxCacheSize can be set to a large value, for example, 50% or 60%. Similarly, if the system is used for cloud backups only, MaxCacheSize can be set to 1% or 128MiB, and MaxPredictiveCacheSize and MaxSamplingCacheSize can be set to a larger value.

The S-cache size is determined by the back-end MSDP capacity or the number of fingerprints from the back-end data. With the assumption that average segment size of 32KB, the S-cache size is about 100MB per TB of back-end capacity. P-cache size is determined by the number of concurrent jobs and data locality or working set of the incoming data. With working set of 250MB per stream (about 5 million fingerprints). For example, 100 concurrent stream needs minimum memory of 25GB (100*250MB). The working set can be larger for certain applications with multiple streams and large data sets. As P-cache is used for fingerprint deduplication lookup and all fingerprints that are loaded into P-cache stay there until its allocated capacity is reached, the larger the P-cache size, the better the potential lookup hit rate, and the more memory usage. Under-sizing S-cache or P-cache leads to reduced deduplication rates and over-sizing increases the memory cost.