NetBackup™ Backup Planning and Performance Tuning Guide
- NetBackup capacity planning
- Primary server configuration guidelines
- Media server configuration guidelines
- NetBackup hardware design and tuning considerations
- About NetBackup Media Server Deduplication (MSDP)
- MSDP tuning considerations
- MSDP sizing considerations
- Accelerator performance considerations
- Media configuration guidelines
- How to identify performance bottlenecks
- Best practices
- Best practices: NetBackup AdvancedDisk
- Best practices: NetBackup tape drive cleaning
- Best practices: Universal shares
- NetBackup for VMware sizing and best practices
- Best practices: Storage lifecycle policies (SLPs)
- Measuring Performance
- Table of NetBackup All Log Entries report
- Evaluating system components
- Tuning the NetBackup data transfer path
- NetBackup network performance in the data transfer path
- NetBackup server performance in the data transfer path
- About shared memory (number and size of data buffers)
- About the communication between NetBackup client and media server
- Effect of fragment size on NetBackup restores
- Other NetBackup restore performance issues
- About shared memory (number and size of data buffers)
- Tuning other NetBackup components
- How to improve NetBackup resource allocation
- How to improve FlashBackup performance
- Tuning disk I/O performance
Proper mind set for performance issue RCA
It is said that troubleshooting a performance issue is like looking for a needle in a haystack. The problem is vague and unstructured, moreover, it can be anywhere in the product and can be from both H/W components and software stack. Most non-performance engineers struggle with where to start the troubleshooting and many of them will dive into the area of their own expertise. For example, an FS expert will start at file system component, while a network engineer may start investigating the network layer. The mind set detailed in this section provides a structured approach to guide the resolution of an otherwise unstructured problem.
By following these guidelines, finding an entry point to start drilling down performance issue will become easier.
Block level understanding of both the hardware and the software components. Understanding the process flow to help narrow down the problem area.
Systematically drilling down the issue - top down and outside in, like peeling an onion. Always start by ensuring that the system has enough H/W resource bandwidth to handle the workload before jumping into the application tuning right away.
Tailor the tuning for each customer if necessary. Tuning that works for one customer may not work for the other, because differences in workload may create different tuning needs. So do not blindly apply a known tuning to other system unless the root cause is the same.
Be meticulous in data collection. Troubleshooting performance problems is an iterative process. As one bottleneck is resolved a new one may emerge. Therefore, automating data collection to ensure consistent data collection throughout the RCA process is critical for efficient problem resolution. In addition, avoid adding additional jobs or allowing unrelated jobs to run on the system while the data collection is in progress.
Remain relentless in RCA. Don't attempt to tune the system until a root cause is identified. Without knowing the root cause, the tuning will be trial and error. It is time consuming and risky. Incorrect tuning can destabilize the system and can result in further performance degradation.
Keep laser focus on the four major resources - CPU, memory, IO, network. All performance issues manifest themselves in one or more of the 4 major H/W resources. By focusing on the usage patten of the four major resources, you can quickly identify an entry point to start the iterative RCA. Look for patterns that defy the common sense or the norm. For example, in general, higher throughput will consume more CPU cycles. If the throughput decreases, while CPU usage increases or remains the same, then your entry point should be the CPU. You may want to look for processes that consume more CPU. Another example is when throughput has plateaued, but disk queue length increases. This is an indication of an I/O subsystem bottleneck and the entry point to RCA should be the I/O code path.
Performance numbers, both throughput and performance statistics, are relative. A number is meaningless until you compare with another number. For example, a disk queue length of 10 is meaningless until you compare with a similar workload which has a queue length of 5. That is why it is important to keep a set of performance data when system is running normally, and when a performance problem occurs, collect the same kind of data for comparison. Having a set of baseline numbers to compare with throughout the iterative process is key for successful problem resolution.
Identify changes in the Environment, such as newly implemented security requirements, changes in workloads applications, hardware of network infrastructure changes, and increases in size of data to the workloads.