How to use the GEN_DATA file list directives with NetBackup for UNIX/Linux Clients for Performance Tuning
Problem
GEN_DATA functionality can be utilized on all supported NetBackup Linux and UNIX clients. This functionality has been available in NetBackup since version 6.5.4.
Solution
A need was identified to provide a means of generating test data to process through NetBackup. This data should be:
- Repeatable and controllable.
- As 'light-weight' as possible during generation.
- Indistinguishable from regular data, to allow for further processing, such as duplications, verifies, restores, etc.
- These directives can be used to create data in any profile that is desired, with little to no impact on the client machine.
- The client's network will be impacted, just like a regular backup, unless the client is also the media server.
- The images that are created are standard images, and can be verified, imported, duplicated and restored.
- During backup a checksum is embedded in the data stream by bpbkar. This value is used at restore time to provide an internal data integrity check of the data path from bpbkar to storage and back to tar. If a mismatch occurs, the restore progress log will provide details. Please note that any mismatch will also terminate the restore stream without progressing to see if later portions of the stream are also corrupt.
- UNIX/Linux clients only, in a Standard NetBackup policy.
- Client encryption may be used, but not client compression.
- Since no actual data is being used for the backup, restores will not produce any files.
- This will generate real images, using up storage space, and should be dealt with accordingly, i.e. expired, removed, etc.
Implementation:
In the file list of a Standard policy, the following tags are implemented, with the defaults shown:
NEW_STREAM
GEN_DATA
GEN_KBSIZE=100
GEN_MAXFILES=100
GEN_PERCENT_RANDOM=50
NEW_STREAM
GEN_DATA
GEN_KBSIZE=100
GEN_MAXFILES=100
GEN_PERCENT_RANDOM=60
GEN_FILENAME_OFFSET=100
NEW_STREAM
...
/usr/openv/netbackup/bin/bpbackup -p GEN_Test -s User -w GEN_DATA GEN_KBSIZE=2000000 GEN_MAXFILES=10 GEN_PERCENT_RANDOM=100
This will generate a backup containing files, each 2GB in size, with all random data.
To create a User backup policy, please use the Backup Policy Wizard from Action-New from the Administration console:
1) Select the following options: Backup Type: User Backup; Start Window: All day
2) BackUp Selections: <leave blank>
3) Client: Add the name of the client where the command line interface will be run.
Deduplication directives for GEN_DATA
Deduplication engines require special controls to allow the user to control the amount of the generated data that can be deduplicated. Some knowledge of the capabilities of the specific engine is required to gain control of the results. This functionality has been available in NetBackup since version 7.0.
Consideration of the engines compression capabilities is also required (by default, PDDO does not compress. DD appears to compress on a 128KB page). The "GEN_SIMPLE_DEDUP" control is useful if the engine does NOT perform compression on the data stream.
The KBSTRIDE must be set to match the engine's de-duplication block size (for reference, PDDO uses 128KB as the default and DD appears to use 8K).
The Deduplication directives are:
"GEN_RANDOM_SEED=seed"
"GEN_DEDUP_KBSTRIDE=kb"
"GEN_SIMPLE_DEDUP"
"GEN_PERCENT_DEDUP=%%%"
seed is the 32 bit seed value for the random number generator. Provide a value ONLY if you with to create the same stream of data run to run. The default seed is the PID of the bpbkar process and the backup time (48 bit seed), which will be unique in a single NBU domain, and highly likely to be unique between domains.
kb is the size of the "stride" (in KB) to apply when making small changes to the data stream. The default is 64KB. It will have no effect unless GEN_PERCENT_DEDUP < 100%. The correct value of stride is dependent on the deduplication engine and can be 1, 2, 4, 8, 16, 32, or any multiple of 64.
%%% is the percentage of the data that should be deduplicatable. The default is 100%. When set to 0%, the data will be modified every STRIDE, when between 0 and 100%, the data will be modified some times every STRIDE.
Operational notes
- Backups will not fail on misspelled or incorrectly formatted directives.
- The bpbkar log on the client will log the directives that are being used.
- The Activity Monitor will show if the number of files and the total image size are as requested.
- The amount of data written vs. the capacity of the cartridge will show if compression settings (GEN_PERCENT_RANDOM) are working as expected.
Usage examples
Device Characterization
Q: What effect does block size and compressibility of data have on throughput?
/usr/openv/netbackup/bin/bpbackup -w -p GEN_Test -s User GEN_DATA GEN_KBSIZE=2000000 GEN_MAXFILES=10 GEN_PERCENT_RANDOM=10
15:40:28.865 [1069] <4> write_backup: successfully wrote backup id client_1252183197, copy 1, fragment 1, 20000000 Kbytes at 125223.722 Kbytes/sec
Buf_size,.1,.2,.3,.4,.5,.6,.7,.8,.9,1
64,125223.722,123680.141,122988.908,122870.458,110717.241,94962.561,85023.739,77593.657,74000.224,75353.780
128,150592.237,150230.548,145309.733,127522.788,107549.175,94796.926,86482.846,75955.237,73912.171,75954.949
256,168157.583,168161.816,159579.303,124068.494,110119.856,94483.835,84651.689,77347.724,73740.928,75162.581
512,174339.525,169284.475,160540.669,126791.278,109265.862,92515.352,84086.920,76899.015,73046.839,74681.263
System Tuning
Create a test policy with full/incremental schedules, add the clients to be tested, and add the GEN_DATA directives to the file list of the policy. Run the policy as often as desired, changing the system tuning parameters as described in the Veritas NetBackup Backup Planning and Performance Tuning Guide (TechPDF 281842, linked below) to see the effect those parameters have on backup performance.
Sample Examples:
1. Basic policy containing this filelist yields 100 64K files with 2 to 1 compression:
GEN_DATA
GEN_MAXFILES=100
GEN_KBSIZE=64
GEN_PERCENT_RANDOM=50
2. Policy with two streams yields 100 64K files with 2 to 1 compression and 200 1G files with no compression:
NEW_STREAM
GEN_DATA GEN_MAXFILES=100
GEN_KBSIZE=64
GEN_PERCENT_RANDOM=50
NEW_STREAM
GEN_DATA
GEN_MAXFILES=200
GEN_KBSIZE=1000000
GEN_PERCENT_RANDOM=100
3. Policy when targeting PDDO/PDDE storage with their default segment size (128KB):
GEN_DATA
GEN_KBSIZE=1024
GEN_MAXFILES=60
GEN_PERCENT_DEDUP=80
GEN_DEDUP_KBSTRIDE=128
GEN_SIMPLE_DEDUP
GEN_PERCENT_RANDOM=100
Yields data when backed up generates this message in NBU job details and bptm logs:
08/12/2009 17:47:58 - Info server (pid=1613) StorageServer=PureDisk:test; Report=PDDO Stats for (test): scanned: 61475 KB, stream rate: 29.71 MB/sec, CR sent: 13092 KB, dedup: 78.7%, cache hits: 0 (0.0%)
When a backup or duplication job completes, the dedup value is provided in the 'Deduplication Rate' column of the NBU job list. This column is disabled by default in the windows GUI.