Problem
A Solaris 10 ZFS ARC (Adaptive Replacement Cache) configured as default can gradually impact NetBackup performance at Memory level, forcing NetBackup to use a lot of swap memory even when there are several gigabytes of RAM "Available."
In the following example from a Solaris 10 server, it can be seen that initially, 61% of the memory is owned by ZFS File Data (ARC):
# echo ::memstat | mdb -k
Page Summary Pages MB %Tot
------------ ---------------- ---------------- ----
Kernel 1960930 15319 24%
ZFS File Data 5006389 39112 61%
Anon 746499 5832 9%
Exec and libs 37006 289 0%
Page cache 22838 178 0%
Free (cachelist) 342814 2678 4%
Free (freelist) 103593 809 1%
Total 8220069 64219
Physical 8214591 64176
Error Message
The ARChits.sh script available at http://dtrace.org/blogs/brendan/2012/01/09/activity-of-the-zfs-arc/, can be used to find out how often the operating system hits or requests memory from the ARC. In this example, the hitrate hits 100%:
# ./ARChits.sh
HITS MISSES HITRATE
2147483647 692982 99.99%
518 4 99.23%
2139 0 100.00%
2865 0 100.00%
727 0 100.00%
515 0 100.00%
700 0 100.00%
2032 0 100.00%
4529 0 100.00%
1040 0 100.00%
...
There is a "middle man" between NetBackup and the physical memory.
Cause
To know which processes are the ones hitting ARC or requesting memory, dtrace can be used to count the number of positive and missed hits.
# dtrace -n 'sdt:zfs::arc-hit,sdt:zfs::arc-miss { @[execname] = count() }'
...
...
nbproxy 1099
nbpem 1447
nscd 1649
bpstsinfo 1785
find 1806
fsflush 2065
bpclntcmd 2257
bpcompatd 2394
perl 2945
bpimagelist 4019
bprd 4268
avrd 8899
grep 9249
dbsrv11 20782
bpdbm 37955
In the example above, dbsrv11 and bpdbm are the main consumers of ARC memory.
The next step is to know the memory request sizes in order to measure the impact of the ARC to NetBackup requests, due to the ARC's nature of slicing memory in small blocks.
# dtrace -n 'sdt:zfs::arc-hit,sdt:zfs::arc-miss { @["bytes"] = quantize(((arc_buf_hdr_t *)arg0)->b_size); }'
bytes
value ------------- Distribution ------------- count
256 | 0
512 |@@@@@ 10934
1024 | 1146
2048 | 467
4096 | 518
8192 |@@@@ 9485
16384 |@ 1506
32768 | 139
65536 | 356
131072 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 67561
262144 | 0
The majority of memory requests are 128KB (131072) block sizes. A few are very small; this occurs when there are no major requests at NetBackup level.
Things change when a lot of NetBackup requests come in, suddenly raising the number of small blocks requests. The following output comes from a master pulling some data running several vmquery commands:
# dtrace -n 'sdt:zfs::arc-hit,sdt:zfs::arc-miss { @["bytes"] = quantize(((arc_buf_hdr_t *)arg0)->b_size); }'
bytes
value ------------- Distribution ------------- count
256 | 0
512 |@@@@@@@@@@@@ 78938
1024 |@ 7944
2048 | 1812
4096 |@ 3751
8192 |@@@@@@@@@@@@ 76053
16384 |@ 9030
32768 | 322
65536 | 992
131072 |@@@@@@@@@@@@ 77239
262144 | 0
Not only is vmquery draining all the memory requests, the operating system is forced to rehydrate the memory into bigger blocks in order to meet NetBackup's block size requirements, impacting the application performance mainly at NBDB or EMMDB levels.
# dtrace -n 'sdt:zfs::arc-hit,sdt:zfs::arc-miss { @[execname] = count() }'
...
...
avrd 1210
bpimagelist 2865
dbsrv11 2970
grep 4971
bpdbm 6662
vmquery 94161
The memory rehydration forces the operating system to use a lot of swap memory, even when there is a lot available under ZFS File Data:
# vmstat 1
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr s1 s2 s3 s4 in sy cs us sy id
0 0 0 19244016 11342680 432 1518 566 604 596 0 0 8 -687 8 -18 8484 30088 9210 10 5 84
0 2 0 11441128 3746680 44 51 8 23 23 0 0 0 0 0 0 6822 19737 7929 9 3 88
0 1 0 11436168 3745440 14 440 8 23 23 0 0 0 0 0 0 6460 18428 7038 9 4 87
0 2 0 11440808 3746856 6 0 15 170 155 0 0 0 0 0 0 6463 18163 6996 9 4 87
0 2 0 11440808 3747000 295 822 15 147 147 0 0 0 0 0 0 7604 27577 8989 11 5 84
0 1 0 11440552 3746872 122 683 8 70 70 0 0 0 0 0 0 5926 20430 6444 9 3 88
In this case, there are 39GB of RAM Allocated for ZFS File Data (ARC) which are supposed to be free in case any application needs it, but due to the ARC's nature of slicing memory into small pieces, when the operating system takes away some of the memory it takes long time to respond to the application.
# echo ::memstat | mdb -k
Page Summary Pages MB %Tot
------------ ---------------- ---------------- ----
Kernel 1960930 15319 24%
ZFS File Data 5006389 39112 61%
Anon 746499 5832 9%
Exec and libs 37006 289 0%
Page cache 22838 178 0%
Free (cachelist) 342814 2678 4%
Free (freelist) 103593 809 1%
Total 8220069 64219
Physical 8214591 64176
When the master is rebooted, initially there will be no ZFS File Data allocation so NetBackup will seem to run "perfectly" - but performance of the master will also seem to degrade slowly, depending on how fast the ARC "eats the memory:"
# echo ::memstat | mdb -k
Page Summary Pages MB %Tot
------------ ---------------- ---------------- ----
Kernel 479738 3747 6%
Anon 422140 3297 5%
Exec and libs 45443 355 1%
Page cache 83530 652 1%
Free (cachelist) 2200908 17194 27%
Free (freelist) 4988310 38971 61%
Total 8220069 64219
Physical 8214603 64176
Solution
To address this issue, limit the size of the ZFS ARC on each problematic system.
To determine the limit value, the following procedure may be followed.
Note: As with any changes of this nature, please bear in mind that the setting may have to be tweaked to accommodate additional load and/or memory changes. Monitor and adjust as needed.
1. After system is fully loaded and running backups, sample the total memory use.
Consider the following example:# prstat -s size -–a
NPROC USERNAME SWAP RSS MEMORY TIME CPU
32 sybase 96G 96G 75% 42:38:04 0.2%
72 root 367M 341M 0.3% 9:38:11 0.0%
6 daemon 7144K 9160K 0.0% 0:01:01 0.0%
1 smmsp 2048K 6144K 0.0% 0:00:22 0.0%
2. Compare percentage of memory in use to total physical memory:# prtdiag | grep -i Memory
Memory size: 131072 Megabytes
3. In the above example, approximately 75% of the physical memory is used under typical load. Add a few percent for "headroom" - in this example, 80% will be used.
4. 100% - 80% = 20%. 20% of 128GB is 26GB = 27917287424 bytes. This is the new limit which will be specified for the cache.
5. Configure this new ZFS ARC limit in /etc/system: set zfs:zfs_arc_max=27917287424
6. Reboot the system for the new value to take effect.
References:
High Memory Utilized by ZFS File Data
https://forums.oracle.com/thread/2340011
ZFS Evil Tuning Guide: Limiting the ARC Cache
https://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Limiting_the_ARC_Cache
Activity of the ZFS ARC
https://dtrace.org/blogs/brendan/2012/01/09/activity-of-the-zfs-arc/