NetBackup™ Backup Planning and Performance Tuning Guide

Last Published:
Product(s): NetBackup & Alta Data Protection (10.4, 10.3.0.1, 10.3, 10.2.0.1, 10.2, 10.1.1, 10.1, 10.0.0.1, 10.0, 9.1.0.1, 9.1, 9.0.0.1, 9.0, 8.3.0.2, 8.3.0.1, 8.3)
  1. NetBackup capacity planning
    1.  
      Purpose of this guide
    2.  
      Changes in Veritas terminology
    3.  
      Disclaimer
    4.  
      How to analyze your backup requirements
    5.  
      How to calculate the size of your NetBackup image database
    6. Sizing for capacity with MSDP
      1. Key sizing parameters
        1.  
          Data types and deduplication
        2.  
          Determining FETB for workloads
        3.  
          Retention periods
        4.  
          Change rate
        5.  
          Replication and duplication of backups
        6.  
          Sizing calculations for MSDP clients
    7.  
      About how to design your OpsCenter server
  2. Primary server configuration guidelines
    1.  
      Size guidance for the NetBackup primary server and domain
    2.  
      Factors that limit job scheduling
    3.  
      More than one backup job per second
    4.  
      Stagger the submission of jobs for better load distribution
    5.  
      NetBackup job delays
    6.  
      Selection of storage units: performance considerations
    7.  
      About file system capacity and NetBackup performance
    8.  
      About the primary server NetBackup catalog
    9.  
      Guidelines for managing the primary server NetBackup catalog
    10.  
      Adjusting the batch size for sending metadata to the NetBackup catalog
    11.  
      Methods for managing the catalog size
    12.  
      Performance guidelines for NetBackup policies
    13.  
      Legacy error log fields
  3. Media server configuration guidelines
    1. NetBackup hardware design and tuning considerations
      1.  
        PCI architecture
      2.  
        Central processing unit (CPU) trends
      3.  
        Storage trends
      4.  
        Conclusions
    2. About NetBackup Media Server Deduplication (MSDP)
      1.  
        Data segmentation
      2.  
        Fingerprint lookup for deduplication
      3.  
        Predictive and sampling cache scheme
      4.  
        Data store
      5.  
        Space reclamation
      6.  
        System resource usage and tuning considerations
      7.  
        Memory considerations
      8.  
        I/O considerations
      9.  
        Network considerations
      10.  
        CPU considerations
      11.  
        OS tuning considerations
      12. MSDP tuning considerations
        1.  
          Sample steps to change MSDP contentrouter.cfg
      13. MSDP sizing considerations
        1.  
          Data gathering
        2.  
          Leveraging requirements and best practices
    3.  
      Cloud tier sizing and performance
    4. Accelerator performance considerations
      1.  
        Accelerator for file-based backups
      2.  
        Controlling disk space for Accelerator track logs
      3.  
        Accelerator for virtual machine backups
      4.  
        Forced rescan schedules
      5.  
        Reporting the amount of Accelerator data transferred over the network
      6.  
        Accelerator backups and the NetBackup catalog
  4. Media configuration guidelines
    1.  
      About dedicated versus shared backup environments
    2.  
      Suggestions for NetBackup media pools
    3.  
      Disk versus tape: performance considerations
    4.  
      NetBackup media not available
    5.  
      About the threshold for media errors
    6.  
      Adjusting the media_error_threshold
    7.  
      About tape I/O error handling
    8.  
      About NetBackup media manager tape drive selection
  5. How to identify performance bottlenecks
    1.  
      Introduction
    2.  
      Proper mind set for performance issue RCA
    3.  
      The 6 steps of performance issue RCA and resolution
    4. Flowchart of performance data analysis
      1.  
        How to create a workload profile
  6. Best practices
    1.  
      Best practices: NetBackup SAN Client
    2. Best practices: NetBackup AdvancedDisk
      1.  
        AdvancedDisk performance considerations
      2.  
        Exclusive use of disk volumes with AdvancedDisk
      3.  
        Disk volumes with different characteristics
      4.  
        Disk pools and volume managers with AdvancedDisk
      5.  
        Network file system considerations
      6.  
        State changes in AdvancedDisk
    3.  
      Best practices: Disk pool configuration - setting concurrent jobs and maximum I/O streams
    4.  
      Best practices: About disk staging and NetBackup performance
    5.  
      Best practices: Supported tape drive technologies for NetBackup
    6. Best practices: NetBackup tape drive cleaning
      1.  
        How NetBackup TapeAlert works
      2.  
        Disabling TapeAlert
    7.  
      Best practices: NetBackup data recovery methods
    8.  
      Best practices: Suggestions for disaster recovery planning
    9.  
      Best practices: NetBackup naming conventions
    10.  
      Best practices: NetBackup duplication
    11.  
      Best practices: NetBackup deduplication
    12. Best practices: Universal shares
      1.  
        Benefits of universal shares
      2.  
        Configuring universal shares
      3.  
        Tuning universal shares
    13. NetBackup for VMware sizing and best practices
      1.  
        Configuring and controlling NetBackup for VMware
      2.  
        Discovery
      3.  
        Backup and restore operations
    14. Best practices: Storage lifecycle policies (SLPs)
      1.  
        Data flow and SLP design best practices
      2.  
        Targeted SLP
      3.  
        Limiting the number of SLP secondary operations to maximize performance
      4.  
        Storage Server IO
    15.  
      Best practices: NetBackup NAS-Data-Protection (D-NAS)
    16.  
      Best practices: NetBackup for Nutanix AHV
    17.  
      Best practices: NetBackup Sybase database
    18.  
      Best practices: Avoiding media server resource bottlenecks with Oracle VLDB backups
    19.  
      Best practices: Avoiding media server resource bottlenecks with MSDPLB+ prefix policy
    20.  
      Best practices: Cloud deployment considerations
  7. Measuring Performance
    1.  
      Measuring NetBackup performance: overview
    2.  
      How to control system variables for consistent testing conditions
    3.  
      Running a performance test without interference from other jobs
    4.  
      About evaluating NetBackup performance
    5.  
      Evaluating NetBackup performance through the Activity Monitor
    6.  
      Evaluating NetBackup performance through the All Log Entries report
    7. Table of NetBackup All Log Entries report
      1.  
        Additional information on the NetBackup All Log Entries report
    8. Evaluating system components
      1.  
        About measuring performance independent of tape or disk output
      2.  
        Measuring performance with bpbkar
      3.  
        Bypassing disk performance with the SKIP_DISK_WRITES touch file
      4.  
        Measuring performance with the GEN_DATA directive (Linux/UNIX)
      5.  
        Monitoring Linux/UNIX CPU load
      6.  
        Monitoring Linux/UNIX memory use
      7.  
        Monitoring Linux/UNIX disk load
      8.  
        Monitoring Linux/UNIX network traffic
      9.  
        Monitoring Linux/Unix system resource usage with dstat
      10.  
        About the Windows Performance Monitor
      11.  
        Monitoring Windows CPU load
      12.  
        Monitoring Windows memory use
      13.  
        Monitoring Windows disk load
    9.  
      Increasing disk performance
  8. Tuning the NetBackup data transfer path
    1.  
      About the NetBackup data transfer path
    2.  
      About tuning the data transfer path
    3.  
      Tuning suggestions for the NetBackup data transfer path
    4.  
      NetBackup client performance in the data transfer path
    5. NetBackup network performance in the data transfer path
      1.  
        Network interface settings
      2.  
        Network load
      3. Setting the network buffer size for the NetBackup media server
        1.  
          Network buffer size in relation to other parameters
      4.  
        Setting the NetBackup client communications buffer size
      5.  
        About the NOSHM file
      6.  
        Using socket communications (the NOSHM file)
    6. NetBackup server performance in the data transfer path
      1. About shared memory (number and size of data buffers)
        1.  
          Default number of shared data buffers
        2.  
          Default size of shared data buffers
        3.  
          Amount of shared memory required by NetBackup
        4.  
          How to change the number of shared data buffers
        5.  
          Notes on number data buffers files
        6.  
          How to change the size of shared data buffers
        7.  
          Notes on size data buffer files
        8.  
          Size values for shared data buffers
        9.  
          Note on shared memory and NetBackup for NDMP
        10.  
          Recommended shared memory settings
        11.  
          Recommended number of data buffers for SAN Client and FT media server
        12.  
          Testing changes made to shared memory
      2.  
        About NetBackup wait and delay counters
      3.  
        Changing parent and child delay values for NetBackup
      4. About the communication between NetBackup client and media server
        1.  
          Processes used in NetBackup client-server communication
        2.  
          Roles of processes during backup and restore
        3.  
          Finding wait and delay counter values
        4.  
          Note on log file creation
        5.  
          About tunable parameters reported in the bptm log
        6.  
          Example of using wait and delay counter values
        7.  
          Issues uncovered by wait and delay counter values
      5.  
        Estimating the effect of multiple copies on backup performance
      6. Effect of fragment size on NetBackup restores
        1.  
          How fragment size affects restore of a non-multiplexed image
        2.  
          How fragment size affects restore of a multiplexed image on tape
        3.  
          Fragmentation and checkpoint restart
      7. Other NetBackup restore performance issues
        1.  
          Example of restore from multiplexed database backup (Oracle)
    7.  
      NetBackup storage device performance in the data transfer path
  9. Tuning other NetBackup components
    1.  
      When to use multiplexing and multiple data streams
    2.  
      Effects of multiplexing and multistreaming on backup and restore
    3. How to improve NetBackup resource allocation
      1.  
        Improving the assignment of resources to NetBackup queued jobs
      2.  
        Sharing reservations in NetBackup
      3.  
        Disabling the sharing of NetBackup reservations
      4.  
        Disabling on-demand unloads
    4.  
      Encryption and NetBackup performance
    5.  
      Compression and NetBackup performance
    6.  
      How to enable NetBackup compression
    7.  
      Effect of encryption plus compression on NetBackup performance
    8.  
      Information on NetBackup Java performance improvements
    9.  
      Information on NetBackup Vault
    10.  
      Fast recovery with Bare Metal Restore
    11.  
      How to improve performance when backing up many small files
    12. How to improve FlashBackup performance
      1.  
        Adjusting the read buffer for FlashBackup and FlashBackup-Windows
    13.  
      Veritas NetBackup OpsCenter
  10. Tuning disk I/O performance
    1. About NetBackup performance and the hardware hierarchy
      1.  
        About performance hierarchy level 1
      2.  
        About performance hierarchy level 2
      3.  
        About performance hierarchy level 3
      4.  
        About performance hierarchy level 4
      5.  
        Summary of performance hierarchies
      6.  
        Notes on performance hierarchies
    2.  
      Hardware examples for better NetBackup performance

Data flow and SLP design best practices

When constructing Storage Lifecycle Policies (SLP), it is important to construct a data flow that considers several factors resulting in a sustainable solution. These may include, but are not limited to:

  • Typical backup windows and duration

  • Backup deduplication rates

  • Image sizes (many small images less than 5 GB, few large images greater than 10 TB, etc.)

  • Workload types (database, stream handlers, NDMP, VMware, etc.)

  • Source and target storage types (Data Domain, MSDP, cloud, tape, etc.)

  • Retention levels

  • Immutability duration

  • Encryption requirements

  • Total number of desired copies

  • Network bandwidth

  • Performance/throughput expectations

  • Concurrent read/write IO to the source and destination storage

Definition of a data flow

A data flow refers to the paths the data takes between copies. For example, with optimized duplication or replication with OpenStorage (OST), deduplicated data on the source is cached and then unique segments are transferred to the target. With most workload types and deduplication rates in the 90% range this is very efficient with little IO impact.

A second example is OST to tape. This data is rehydrated on the source storage server into its raw native format, transferred, and then written to tape storage. This data flow is much more IO-intensive than optimized duplication or replication and is best done as a final copy and as sparingly as possible. When an SLP is configured, the following items are defined: the number of copies, retention of each copy, the storage target for each copy, and the order in which each copy is made. The creation of these copies can be scheduled but is managed automatically by the nbstserv service in NetBackup.

Types of storage targets

A storage target can be located on a BYO server, a Veritas appliance, a 3rd party OST target, a private or public cloud storage target, or tape. When choosing a storage target it is important to choose the most performant storage target for the backup in order to complete the primary backup copy as quickly as possible. High-performance storage targets like MSDP residing upon storage HW that provides the highest level of speed and resiliency should be prioritized for the primary backup copy.

It is also important to determine what performance features are supported with the storage target, like deduplication, compression, and I/O bandwidth (that is, the speed of each spindle and number of spindles). When using MSDP as your primary storage target for backup, replications and duplications to MSDP and MSDP-C leverage optimized duplication, which minimizes storage utilization and optimizes duplication speed.

Creating an SLP that copies data from dissimilar OST devices (such as MSDP to Data Domain) is not recommended. Rehydration and an additional deduplication cycle will create performance bottlenecks that significantly reduce performance and also reverses the benefits of other features. For example, a 3rd-party OST storage target uses a different deduplication engine than MSDP/MSDP-C. Therefore, a copy of the backup image from MSDP/MSDP-C to/from a 3rd party OST storage target will require full backup image rehydration and then full image deduplication. These steps take longer to complete and consume more memory, CPU, network, and I/O resources than would be required when leveraging the optimized duplication feature. Basically, once a backup image is deduplicated, any subsequent copies made to different storage targets do not require rehydration and another deduplication activity as long as the OST devices are similar (MSDP to MSDP/C, Data Domain to Data Domain, etc.).

For long-term retention (LTR) copies, the use of tape may be chosen. Although we support duplication from OST to tape, a full rehydration of the data is required. This operation is resource intensive and reduces performance significantly when done at scale.

When using public cloud storage, it is important to consider both the performance costs and the financial impacts. Performance factors to consider include the speed, resiliency, and sizing of the storage tier that was chosen for the copy, as well as the network bandwidth between the source storage and target storage. Financial impacts are also extremely important, such as deduplication rates, total bucket size, retention targets, and the cost of warming and restoring the data. Optimized duplication will reduce the total data transferred compared to traditional cloud (cloud without deduplication).

The use of storage unit groups (STUGs) is not recommended with SLP other than with tape storage units. There are two main reasons for this:

  • NetBackup evaluates the sum of all the streams of the disk pools, the sum of all active jobs using all the storage units for the disk pools, and then has to add/subtract those from SLP workgroups. This is very resource intensive when done at scale and leads to significant delays in job submissions and excessive queueing that impacts backup operations as well. This is less significant for tape because the concurrent jobs that are written to a tape storage unit is relatively low (typically dozens, potentially 100) compared to the concurrent jobs to a disk storage server (typically hundreds, potentially thousands).

  • Optimized duplication is not supported for disk storage units in STUG, even with similar storage device types. The data is rehydrated, transferred in full, then deduplicated again at the target where this is used, and incurs all of the negative impacts of doing so. Immutability is not supported with STUG configurations, with or without SLP.

Sizing a storage target

When designing an SLP configuration, it is important to properly size the backup copy storage so that most restore operations will be done from it for the best performance. From a resource perspective it is more expensive to restore from secondary copies such as cloud or tape than from a local copy or an on-premises copy. Longer retention copies require more space and needs to be considered for secondary copy storage.

Number of copies and retention

When determining the number of copies to be retained of each backup image, it is best not to apply a single universal approach to all data. For example, less critical images such as QA/dev should not require as many copies or be retained as long as critical production images. Treating them the same leads to extraneous copies, space, network consumption, and other impacts. Consider prioritizing or tiering the data into separate SLPs to remove inefficiency.

SLP operations should cascade with retentions, where copy 1 is the shortest retention for high performance, copy 2+ is longer than copy 1, and copy 3 is the longest. Creating more than three copies with SLP in a large configuration is not a best practice, as the resource overhead needed to create and track the additional copies with a large volume of them compounds exponentially from a database perspective.

Immutability objectives

Objectives including immutability should be considered, and implemented where WORM duration increases with each subsequent copy, similar to backup retention. If copy 1 is locked for too long it may cause capacity management problems and impact backup operations. Data cannot be easily deleted to make space for backups in this scenario. If an SLP backlog starts to build, this may impact the expiration and cleanup of WORM images also.

Data characteristics

Data characteristics are attributes of specific data such as segment object size, encryption (MSDP, 3rd-party, KMS, etc.), compression, database with stream handlers, VMs with stream handlers, etc. These operations are often most critical for backup copies.

How backup copies are created impacts secondary copies. Take the following scenarios for example:

  • Images with exponentially more segments (DB stream handler) have much larger caches, require more memory, and/or may take longer to process if the cache limit for them is exceeded.

  • Data compressed or encrypted before the source backup is taken leads to low deduplication rates, often zero. It is best to use native features such as Data in-Transit Encryption (DTE), MSDP encryption, KMS, Client Direct, etc., to have NetBackup deduplicate the data first and then encrypt it.

  • Individual large images such as over 10 TB may require much larger cache sizes and take significantly longer to restore, duplicate, or rehydrate. Increasing the cache for these increases memory use for the storage servers involved and must be considered in designing an SLP.

  • If third-party encryption must be used on the source data, consider not deduplicating it and instead writing to non-OST storage to avoid the performance impacts.

Not properly considering the data characteristics for an SLP configuration often leads to poor performance, large SLP backlogs, and capacity management problems. The simplest configurations are often the best performing and most scalable configurations. Carefully consider the requirements end to end for all copies and the resources for each to avoid large-scale problems in an environment.

Network bandwidth

When designing an SLP configuration, consider the network path that the data will traverse. Keep in mind that the ideal configuration has copy 1 designed for the fastest backup and restore and shortest retentions, where copies 2+ are longer retention copies.

Traversing a slow or unstable network, excessive rehydration, shared network links between sites or applications, and other external factors such as bandwidth limits can all have significant impacts on the SLP configuration.

Order of storage target priority

As a best practice, the fastest and most resilient backup storage should be targeted as the primary location for the initial backup and most of the anticipated restores. Each additional copy should have the same or increasing retention period with the longest term retention (LTR) copy being the final one. Often these reside in the cloud, tape, or Access appliances.

SLP tuning

NetBackup provides a number of tunables for SLP processing such as:

  • SLP windows to defer or schedule secondary operations.

  • Window close behaviors that cancel or stop submitting jobs after a certain time.

  • Batching logic:

    • FIFO: First In, First Out

    • LIFO: Last In, First Out

  • Minimum and maximum batch sizes

  • Resource multipliers (controls how many jobs can be submitted simultaneously)

There are more parameters outlined in the NetBackup Administrator's Guide, Volume I for the desired NetBackup version.

Key points

The entire environment must be considered to design an efficient, sustainable SLP configuration. Think end-to-end what is required to create each copy, the path it must take, the resources required to create each, and the impacts of doing so.

Broadly speaking, the best practices include:

  • Tier the data to protect the most critical first and the less critical second.

  • The simplest configurations are usually the best performing ones.

  • When using OST, do not use dissimilar devices in the same SLP.

  • Avoid rehydration and duplication to tape as much as possible.

  • Use native NetBackup features for encryption, compression, stream handlers, etc., over third party features when possible.

  • Create copy 1 on the fastest storage available, and size it to satisfy most of the anticipated resource requests.

  • Create subsequent copies with the same or longer retention in sequence where the final copy has the longest retention in the SLP.