Translation Notice
Please note that this content includes text that has been machine-translated from English. Veritas does not guarantee the accuracy regarding the completeness of the translation. You may also refer to the English Version of this knowledge base article for up-to-date information.
Infoscale 7.4.2 Component Patch on RHEL7
Abstract
Description
This patch provides a component patch on IS-7.4.2 Update for RHEL7 platform.
Prerequisite:
This patch should be installed on IS7.4.2GA + 7.4.2.4900 + 7.4.2.5300 + 7.4.2.5500
SORT ID: 21724
Patch IDs:
VRTSvxfs-7.4.2.5400-RHEL7 for VRTSvxfs
SPECIAL NOTES:
- In case the internet is not available, installation of the patch must be performed concurrently with the latest CPI patch downloaded from the Download Center.
* * * READ ME * * *
* * * Veritas File System 7.4.2 * * *
* * * Patch 5400 * * *
Patch Date: 2025-01-08
This document provides the following information:
* PATCH NAME
* OPERATING SYSTEMS SUPPORTED BY THE PATCH
* PACKAGES AFFECTED BY THE PATCH
* BASE PRODUCT VERSIONS FOR THE PATCH
* SUMMARY OF INCIDENTS FIXED BY THE PATCH
* DETAILS OF INCIDENTS FIXED BY THE PATCH
* INSTALLATION PRE-REQUISITES
* INSTALLING THE PATCH
* REMOVING THE PATCH
* KNOWN ISSUES
PATCH NAME
----------
Veritas File System 7.4.2 Patch 5400
OPERATING SYSTEMS SUPPORTED BY THE PATCH
----------------------------------------
RHEL7 x86-64
PACKAGES AFFECTED BY THE PATCH
------------------------------
VRTSvxfs
BASE PRODUCT VERSIONS FOR THE PATCH
-----------------------------------
* InfoScale Enterprise 7.4.2
* InfoScale Foundation 7.4.2
* InfoScale Storage 7.4.2
SUMMARY OF INCIDENTS FIXED BY THE PATCH
---------------------------------------
Patch ID: 7.4.2.5400
* 4187585 (4187617) Exception caught while accessing a directory which resulted in panic
Patch ID: 7.4.2.5300
* 4149891 (4145203) Invoking veki through systemctl inside vxfs-startup script.
* 4149898 (4095890) In Solaris, panic seen with changes related to delegation of FREE EAU to primary.
* 4149904 (4111385) Export of FS from AIX (big-endian) to Linux (little-endian) using fscdsconv command will make FS disable.
* 4149906 (4085768) In this we are unable to adjust the record of the rct inode at the indirect level.
* 4150621 (4103398) The IOs on filesystem get hung.
* 4155169 (4106777) Testing Enabling ted parameters may cause test failure or assertions
* 4155832 (4028534) Add VX_HOLE check while reading next extent after allocating partial requested length
* 4164517 (4155961) Panic in vx_rwlock during force unmount.
* 4164519 (4158381) Server panicked with "Kernel panic - not syncing: Fatal exception"
* 4165125 (4142555) Modify reference to running fsck -y in mount.vxfs error message and update fsck_vxfs manpage
* 4165175 (4163337) Intermittent df slowness seen across cluster.
* 4166277 (4076098) FS migration on Linux machines with falcon-sensor enabled, might fail.
* 4184397 (4175488) DB2 thread hang seen while trying to acquire vx_rwsleep_rec lock.
Patch ID: 7.4.2.4800
* 4135149 (4129680) Generate and add changelog in VxFS rpm
* 4137139 (4126943) Create lost+found directory in VxFS file system with default ACL permissions as 700.
* 4140587 (4136235) Includes module parameter for changing pnlct merge frequency.
* 4140594 (4116887) Running fsck -y on large size metasave with lots of hardlinks is consuming huge amount of system memory.
* 4140599 (4132435) Failures seen in FSQA cmds->fsck tests, panic in get_dotdotlst
* 4140782 (4137040) System got hung.
* 4141124 (4034246) In case of race condition in cluster filesystem, link count table in VxFS might miss some metadata.
* 4141125 (4008980) In cluster filesystem, due to mismatch between size in Linux inode and VxFS inode, wrong file size may be reported.
Patch ID: 7.4.2.4600
* 4128876 (4104103) File system unmount operation is in hang state due to missing rele of vnode.
* 4134659 (4103045) Veritas File Replication failover(promote) might fail during disaster recovery or upgrade scenarios.
* 4134662 (4134661) Hang seen in the cp command in case of checkpoint promote in cluster filesystem environment.
* 4134665 (4130230) vx_prefault_uio_readable() function is going beyond intended boundaries of the uio->uio_iov structure, potentially causing it to access memory addresses that are not valid.
* 4135000 (4070819) Handling the case where the error is returned while we are going to get the inode from the last clone which is marked as overlay and is going to be removed.
* 4135005 (4068548) File system log replay fails with "inode marked bad, allocation flags (0x0001)"
* 4135008 (4126957) System crashes with VxFS stack.
* 4135017 (4119961) We hit the assert "xted_irwunlock:2" while doing in-house testing of WORM/Aulog features.
* 4135018 (4068201) File system corruption can happen in cases where node which committed the transaction crashed after sending the reply and before flushing to the log.
* 4135022 (4101075) During in-house CFS testing we hit the assert "vx_extfindbig:4" in extent look path.
* 4135027 (4084239) Machine hit with Panic because if assert "f:xted_irwunlock:2"
* 4135028 (4058153) FSCK hangs while clearing VX_EXHASH_CLASS attribute in 512 byte FS.
* 4135040 (4092440) FSPPADM giving return code 0 (success) despite policy enforcement is failing.
* 4135042 (4068953) FSCK detected error on 512 byte node FS, in 1 fset ilist while verifying the FS after doing log replay and upgrading the FS to 17 DLV.
* 4135102 (4099740) UX:vxfs mount: ERROR: V-3-21264: <device> is already mounted, <mount-point> is busy,
or the allowable number of mount points has been exceeded.
* 4135105 (4112056) Hitting assert "f:vx_vnode_deinit:1" during in-house FS testing.
* 4136095 (4134194) vxfs/glm worker thread panic with kernel NULL pointer dereference
* 4136238 (4134884) Unable to deport Diskgroup. Volume or plex device is open or attached
Patch ID: 7.4.2.4500
* 4015834 (3988752) Use ldi_strategy() routine instead of bdev_strategy() for IO's in solaris.
* 4093193 (4090032) System might panic in vx_dev_strategy() while Sybase or Oracle configuration.
* 4093943 (4076185) VxODM goes into maintenance mode after reboot.
* 4106387 (4100021) Running setfacl followed by getfacl resulting in "No such device or address" error.
* 4113616 (4027640) resource with type ApplicationNone doesn't come online
* 4116328 (4116329) While checking FS sanity with the help of "fsck -o full -n" command, we tried to correct the FS flag value (WORM/Softworm), but failed because -n (read-only) option was given.
* 4117341 (4117342) System might panic due to hard lock up detected on CPU
* 4119279 (4119281) Higher page-in requests on Solaris 11 SPARC.
* 4120516 (3943232) System panic in vx_unmount_cleanup_notify when unmounting file system.
* 4120526 (4089199) Dynamic reconfiguration operation for CPU takes a lot of time.
* 4120531 (4096561) Running FULLFSCK on the filesystem reports error regarding incorrect state file.
Patch ID: 7.4.2.4200
* 4110765 (4110764) Security Vulnerability observed in Zlib a third party component used by VxFS .
Patch ID: 7.4.2.4100
* 4106702 (4106701) A security vulnerability exists in the third-party component sqlite.
Patch ID: 7.4.2.3900
* 4050870 (3987720) vxms test is having failures.
* 4071105 (4067393) Panic "UG: unable to handle kernel NULL pointer dereference at 00000000000009e0."
* 4074298 (4069116) fsck got stuck in pass1 inode validation.
* 4075873 (4075871) Utility to find possible pending stuck messages.
* 4075875 (4018783) Metasave collection and restore takes significant amount of time.
* 4084881 (4084542) Enhance fsadm defrag report to display if FS is badly fragmented.
* 4088078 (4087036) The fsck binary has been updated to fix a failure while running with the "-o metasave" option on a shared volume.
* 4090573 (4056648) Metasave collection can be executed on a mounted filesystem.
* 4090600 (4090598) Utility to detect culprit nodes while cfs hang is observed.
* 4090601 (4068143) fsck->misc is having failures.
* 4090617 (4070217) Command fsck might fail with 'cluster reservation failed for volume' message for a disabled cluster-mounted filesystem.
* 4090639 (4086084) VxFS mount operation causes system panic.
* 4091580 (4056420) VFR Hardlink file is not getting replicated after modification in incremental sync.
* 4093306 (4090127) CFS hang in vx_searchau().
Patch ID: VRTSvxfs-7.4.2.3600
* 4089394 (4089392) Security vulnerabilities exist in the OpenSSL third-party components used by VxFS.
Patch ID: 7.4.2.3500
* 4083948 (4070814) Security Vulnerability in VxFS third party component Zlib
Patch ID: VRTSvxfs-7.4.2.3400
* 4079532 (4079869) Security Vulnerability in VxFS third party components
Patch ID: 7.4.2.2600
* 4015834 (3988752) Use ldi_strategy() routine instead of bdev_strategy() for IO's in solaris.
* 4040612 (4033664) Multiple different issues occur with hardlink replication using VFR.
* 4040618 (4040617) Veritas file replicator is not performing as per the expectation.
* 4060549 (4047921) Replication job getting into hung state when pause/resume operations performed repeatedly.
* 4060566 (4052449) Cluster goes in an 'unresponsive' mode while invalidating pages due to duplicate page entries in iowr structure.
* 4060585 (4042925) Intermittent Performance issue on commands like df and ls.
* 4060805 (4042254) A new feature has been added in vxupgrade which fails disk-layout upgrade if sufficient space is not available in the filesystem.
* 4061203 (4005620) Internal counter of inodes from Inode Allocation Unit (IAU) can be negative if IAU is marked bad.
* 4061527 (4054386) If systemd service fails to load vxfs module, the service still shows status as active instead of failed.
Patch ID: 7.4.2.2200
* 4013420 (4013139) The abort operation on an ongoing online migration from the native file system to VxFS on RHEL 8.x systems.
* 4040238 (4035040) vfradmin stats command failed to show all the fields in the command output in-case job paused and resume.
* 4040608 (4008616) fsck command got hung.
* 4042686 (4042684) ODM resize fails for size 8192.
* 4044184 (3993140) Compclock was not giving accurate results.
* 4046265 (4037035) Added new tunable "vx_ninact_proc_threads" to control the number of inactive processing threads.
* 4046266 (4043084) panic in vx_cbdnlc_lookup
* 4046267 (4034910) Asynchronous access/updatation of global list large_dirinfo can corrupt its values in multi-threaded execution.
* 4046271 (3993822) fsck stops running on a file system
* 4046272 (4017104) Deleting a lot of files can cause resource starvation, causing panic or momentary hangs.
* 4046829 (3993943) The fsck utility hit the coredump due to segmentation fault in get_dotdotlst()
* 4047568 (4046169) On RHEL8, while doing a directory move from one FS (ext4 or vxfs) to migration VxFS, the migration can fail and FS will be disable.
* 4049091 (4035057) On RHEL8, IOs done on FS, while other FS to VxFS migration is in progress can cause panic.
* 4049097 (4049096) Dalloc change ctime in background while extent allocation
Patch ID: 7.4.2.1600
* 4012765 (4011570) WORM attribute replication support in VxFS.
* 4014720 (4011596) Multiple issues were observed during glmdump using hacli for communication
* 4015287 (4010255) "vfradmin promote" fails to promote target FS with selinux enabled.
* 4015835 (4015278) System panics during vx_uiomove_by _hand.
* 4016721 (4016927) For multi cloud tier scenario, system panic with NULL pointer dereference when we try to remove second cloud tier
* 4017282 (4016801) filesystem mark for fullfsck
* 4017818 (4017817) VFR performance enhancement changes.
* 4017820 (4017819) Adding cloud tier operation fails while trying to add AWS GovCloud.
* 4019877 (4019876) Remove license library dependency from vxfsmisc.so library
* 4020055 (4012049) Documented "metasave" option and added one new option in fsck binary.
* 4020056 (4012049) Documented "metasave" option and added one new option in fsck binary.
* 4020912 (4020758) Filesystem mount or fsck with -y may see hang during log replay
Patch ID: 7.4.2.1400
* 4020337 (4020334) VxFS Dummy incidents for FLEX patch archival.
Patch ID: 7.4.2.1300
* 4002850 (3994123) Running fsck on a system may show LCT count mismatch errors
* 4005220 (4002222) Code changes have been done to prevent cluster-wide hang in a scenario where the cluster filesystem is FCL enabled and the disk layout version is greater than or equals to 14.
* 4010353 (3993935) Fsck command of vxfs may hit segmentation fault.
* 4012061 (4001378) VxFS module failed to load on RHEL8.2
* 4012522 (4012243) Read/Write performance improvement in VxFS
* 4012765 (4011570) WORM attribute replication support in VxFS.
* 4012787 (4007328) VFR source keeps processing file change log(FCL) records even after connection closure from target.
* 4012800 (4008123) VFR fails to replicate named extended attributes if the job is paused.
* 4012801 (4001473) VFR fails to replicate named extended attributes set on files
* 4012842 (4006192) system panic with NULL pointer de-reference.
* 4012936 (4000465) FSCK binary loops when it detects break in sequence of log ids.
* 4013084 (4009328) In cluster filesystem, unmount hang could be observed if smap is marked bad previously.
* 4013143 (4008352) Using VxFS mount binary inside container to mount any device might result in core generation.
* 4013144 (4008274) Race between compression thread and clone remove thread while allocating reorg inode.
* 4013626 (4004181) Read the value of VxFS compliance clock
* 4013738 (3830300) Degraded CPU performance during backup of Oracle archive logs
on CFS vs local filesystem
DETAILS OF INCIDENTS FIXED BY THE PATCH
---------------------------------------
This patch fixes the following incidents:
Patch ID: 7.4.2.5400
* 4187585 (Tracking ID: 4187617)
SYMPTOM:
Exception caught while accessing a directory which resulted in panic.
Sample stack trace:
invalid_op
[exception RIP: _d_rehash+122]
d_rehash
vx_splice_alias_v2
vx_lookup
lookup_real
DESCRIPTION:
During file access, we splice a disconnected dentry (if found) into the dentry tree. If the given file is a directory then we try to find any existing aliases. if it exists, we instantiate that alias that is basically filling in the details of the inode(directory file) into that alias and choose it to be the dentry for that file. Then we update the dcache to reflect the move of a file name and during this operation we also add an entry in the lookup hash list. Due to a bug in the code base, we retry this rehash operation (adding an entry in the lookup hash list) for the existing alias and that leads to a panic in the kernel code as it was already added of the hash list.
RESOLUTION:
The code changes have been done to skip rehashing the alias found if it is already hashed.
Patch ID: 7.4.2.5300
* 4149891 (Tracking ID: 4145203)
SYMPTOM:
vxfs startup scripts fails to invoke veki for kernel version higher than 3.
DESCRIPTION:
vxfs startup script failed to start Veki, as it was calling system V init script to start veki instead of the systemctl interface.
RESOLUTION:
Current code changes checks if kernel version is greater than 3.x and if systemd is present then use systemctl interface otherwise use system V interface
* 4149898 (Tracking ID: 4095890)
SYMPTOM:
Machine is going into the panic state.
DESCRIPTION:
In Solaris, panic seen with changes related to delegation of FREE EAU to primary. From code walk through the error handling code is not initialized properly
RESOLUTION:
Updated code to initialise the error scenario correctly.
* 4149904 (Tracking ID: 4111385)
SYMPTOM:
FS will be disabled for non-debug bits, and PANIC will happen for debug bits.
DESCRIPTION:
When we migrate the FS from big-endian to little-endian using fscdsconv, VxFS essenatial structures get converted based on destination machine endianness. However secure clock fields conversion requires special handing as it contains hash value which salt is endianness based.
RESOLUTION:
Now fscdsconv code consider secure clock value as exception, and re-init during migration.
* 4149906 (Tracking ID: 4085768)
SYMPTOM:
Machine is going into the panic state.
DESCRIPTION:
We are going to adjust the bmap at the lower level of rct inode which do not require adjustment.
RESOLUTION:
Changes made in code to handle the issue.
* 4150621 (Tracking ID: 4103398)
SYMPTOM:
The vxvm IO threads could hang with following stack trace.
__schedule
schedule
schedule_timeout
io_schedule_timeout
io_schedule
get_request
blk_queue_bio
vxvm_gen_strategy
generic_make_request
submit_bio
vx_dev_strategy
vx_snap_strategy
vx_logbuf_write
vx_logbuf_io
vx_logbuf_flush
vx_logflush_disabled
vx_disable
vx_dataioerr_disable
vx_dataioerr
vx_pageiodone
vx_end_io_v2
bio_endio
volkiodone
volsiodone
vol_mv_write_done
voliod_iohandle
voliod_loop
kthread
ret_from_fork_nospec_begin
DESCRIPTION:
In response to the IO error returned by VxVM IO thread, VxFS initiated another IO - to mark FS as DISABLED. This new IO get scheduled in the same VxVM IO thread. Since, VxVM IO thread was waiting in the VxFS function stack. It created a deadlock.
RESOLUTION:
VxFS code instead of issuing the IO "to mark FS as DISABLED" in the VxVM thread context, delegates the task to a different VxFS thread and returns control back to VxVM immediately.
* 4155169 (Tracking ID: 4106777)
SYMPTOM:
After enabling ted_hypochondriac and ted_call_back parameter conformance test failed
DESCRIPTION:
assert in vx_dataioerr() seen for test conform:ioerror, fs can be disabled asynchronously in different thread context
Added check to handle this case.
RESOLUTION:
added fix to check fs_disabled only when VX_IS_DISABLE_ASYNC is false
* 4155832 (Tracking ID: 4028534)
SYMPTOM:
Reorg optimisation for extents are sandwiched between holes does not consider changes made to file after reorg request is decided.
DESCRIPTION:
Add additional check in reorg optimisation to consider changes after reorg request is decided.
RESOLUTION:
During optimisation in extent reorg, recheck if hole is punched with in reorg length
* 4164517 (Tracking ID: 4155961)
SYMPTOM:
System panic due to null i_fset in vx_rwlock().
DESCRIPTION:
Panic in vx_rwlock due to race between vx_rwlock() and vx_inode_deinit() function.
Panic stack
[exception RIP: vx_rwlock+174]
.
.
#10 __schedule
#11 vx_write
#12 vfs_write
#13 sys_pwrite64
#14 system_call_fastpath
RESOLUTION:
Code changes have been done to fix this issue.
* 4164519 (Tracking ID: 4158381)
SYMPTOM:
Server panicked with "Kernel panic - not syncing: Fatal exception"
DESCRIPTION:
Server panicked due to accessing the freed dentry, also the dentry's hlist has been corrupted.
There is a difference betwen the VXFS's dentry implementation and the kernel equivalent of dentry implementation.
VXFS implementation of find_alias and splice_alias is based on some old kernel versions of d_find_alias and d_splice_alias.
We need to keep them in sync with the newer kernel code to avoid landing into any such issue.
RESOLUTION:
Addressing the difference between our dentry related function like splice_alias, find_alias and the kernel equivalent of these functions.
Made kernel equivalent code changes in our dentry's find_alias and splice_alias functions.
* 4165125 (Tracking ID: 4142555)
SYMPTOM:
Modify reference to running fsck -y in mount.vxfs error message and update fsck_vxfs manpage
DESCRIPTION:
While trying to mount the corrupted FS the error message promts in to run fsck -y.
Without understanding the implication of running fsck -y on a FS can lead to data loss.
RESOLUTION:
Updated the mount.vxfs error messages with recommedation to refer to fsck_vxfs manpage.
And in the fsck_vxfs manpage added additional message to connect with veritas support
for further assistance in collecting more debug logs before running fsck -y.
* 4165175 (Tracking ID: 4163337)
SYMPTOM:
Intermittent df slowness seen across cluster due to slow cluster-wide file system freeze.
DESCRIPTION:
For certain workload, intent log reset can happen relatively frequently and whenever it happens it will trigger cluster-wide freeze. If there are a lot of dirty buffers that need flushing and invalidation, then the freeze might take long time to finish. The slowest part in the invalidation of cluster buffers is the de-initialisation of its glm lock which requires lots of lock release messages to be sent to the master lock node. This can cause flowcontrol to be set at LLT layer and slow down the cluster-wide freeze and block commands like df, ls for that entire duration.
RESOLUTION:
Code is modified to avoid buffer flushing and invalidation in case freeze is triggered by intent log reset.
* 4166277 (Tracking ID: 4076098)
SYMPTOM:
FS migration from ext4 to vxfs on Linux machines with falcon-sensor enabled, might fail.
DESCRIPTION:
Falcon-sensor driver installed on test machines is tapping system calls such as close and is doing some
additional vfs calls such as read. Due to this vxfs driver received read file - operation call from fsmigbgcp
process context. Read operation is allowed only on special files from fsmigbgcp process context. As
the file in picture was not a special file, the vxfs debug code asserted.
RESOLUTION:
Read on non-special files from fsmigbgcp process context is allowed.
]
* 4184397 (Tracking ID: 4175488)
SYMPTOM:
DB2 hang seen with following stacktrace
#0 __schedule
#1 schedule
#2 vx_svar_sleep_unlock
#3 vx_rwsleep_rec_lock
#4 vx_recsmp_rangelock
#5 vx_irwlock
#6 vx_irwglock
#7 vx_setcache
#8 vx_uioctl
#9 vx_unlocked_ioctl
DESCRIPTION:
The VxFS CIO advisory is set to improve performance by enabling concurrent reads and writes on a file. If CIO advisory is being set on a file while another thread is doing a read on the same file/inode (by locking it in SHARED mode) then there can be a condition where the read thread can incorrectly miss unlocking the file and do its processing and exit. As the read thread misses releasing the lock, the inode remains locked in SHARED mode. Later when another thread tries to set CIO advisory to the same file, it needs to lock the inode in EXCLUSIVE mode and it conflicts as the lock is already taken in SHARED mode and never released. This could cause this thread to hang indefinitely.
RESOLUTION:
Code changes have been done to fix the missing unlock.
Patch ID: 7.4.2.4800
* 4135149 (Tracking ID: 4129680)
SYMPTOM:
VxFS rpm does not have changelog
DESCRIPTION:
Changelog in rpm will help to find missing incidents with respect to other version.
RESOLUTION:
Changelog is generated and added to VxFS rpm.
* 4137139 (Tracking ID: 4126943)
SYMPTOM:
Create lost+found directory in VxFS file system with default ACL permissions as 700.
DESCRIPTION:
Due to security reasons, there was ask to create lost+found directory in VxFS file system with default ACL permissions as 700. So that, except root, no other users are able to access files under lost+found directory.
RESOLUTION:
VxFS filesystem creation with mkfs command will now result in creation of lost+found directory with default ACL permissions as 700.
* 4140587 (Tracking ID: 4136235)
SYMPTOM:
System with higher number of attribute inodes and pnlct inodes my see higher number of IOs on an idle system.
DESCRIPTION:
System with higher number of attribute inodes and pnlct inodes my see higher number of IOs on an idle CFS. Hence reducing the pnlct merge frequency may show
some performance improvement.
RESOLUTION:
Module parameter to change pnlct merge frequency.
* 4140594 (Tracking ID: 4116887)
SYMPTOM:
Running fsck -y on large size metasave with lots of hardlinks is consuming huge amount of system memory.
DESCRIPTION:
On a FS with lot of hardlinks, requires a lot of memory for storing dotdot information in memory.
Pass1d populates this dotdot linklist. But it never frees this space. During the whole fsck run
if it requires some change in structural files, it will do rebuild. Every time it rebuilds, it will
add up the space to the already consumed memory and this way the total memory consumption will be huge.
RESOLUTION:
Code changes are done to free the dotdot list.
* 4140599 (Tracking ID: 4132435)
SYMPTOM:
Failures seen in FSQA cmds->fsck tests, panic in get_dotdotlst
DESCRIPTION:
The inode getting processed in pass_unload->clean_dotdotlst(),
was not in the incore imap table, so its related dotdot list is also not created.
Because the dotdotlist is not initialized it hit the null pointer
dereference error in clean_dotdotlst, hence the panic.
RESOLUTION:
Code changes are done to check for inode allocation status
in incore imap table while cleaning the dotdot list.
* 4140782 (Tracking ID: 4137040)
SYMPTOM:
System got hung due to missing unlock on a file directory, this issue could be hit if there are a lot of mv(1) operations happening against one large VxFS directory.
DESCRIPTION:
In a large VxFS directory, LDH (alternate indexing) is activated once the number of directory entries cross the large directory threshold (vx_dexh_sz), LDH creates hash attribute inode into the main directory. The exclusive lock of this LDH inode is required during file system rename operations (mv(1)), in case of multiple rename operations happening against one large directory, the trylock of LDH inode may fail due to the contention, and VX_EDIRLOCK is returned.
In case of VX_EDIRLOCK, VxFS should release the exclusive lock of source directory and update the locker list, then retry the operation. However VxFS releases the exclusive lock of target directory wrongly, instead of source and doesnt update the locker list, during the retry operation, although it happens to release the lock (target equals source if rename happens within the same dir), the locker list isnt updated, this locker record still remains in locker list, consequently, the same lock will not get released due to this extra record.
RESOLUTION:
Release the source dir lock instead of target, and update locker list accordingly.
* 4141124 (Tracking ID: 4034246)
SYMPTOM:
In case of race condition in cluster filesystem, link count table in VxFS might miss some metadata.
DESCRIPTION:
Due to race between multiple threads working on link count table in VxFS, flag related to flushing on the buffers for link count table might reset even if there are some pending buffers.
RESOLUTION:
Code is modified to reset the flag related to flushing on the buffers with appropriate locking protection.
* 4141125 (Tracking ID: 4008980)
SYMPTOM:
In cluster filesystem, due to mismatch between size in Linux inode and VxFS inode, wrong file size may be reported.
DESCRIPTION:
In cluster filesystem, when inode ownership is transferred between cluster nodes, due to race condition mismatch between size field in Linux inode and VxFS inode may occur. This will result in reporting of garbage value for file size.
RESOLUTION:
After every ownership change between cluster nodes, synchronize size field between Linux inode and VxFS inode.
Patch ID: 7.4.2.4600
* 4128876 (Tracking ID: 4104103)
SYMPTOM:
File system unmount is hang
DESCRIPTION:
In case of error rele was missing on inode. Vnode count was leaked. Umount on the node was stuck waiting for the vnode count to become 1.
RESOLUTION:
Release the hold on vnode in case of error.
* 4134659 (Tracking ID: 4103045)
SYMPTOM:
Veritas File Replication failover(promote) might fail during disaster recovery or upgrade scenarios.
DESCRIPTION:
Veritas File Replication failover is used to swap the role of source and target site during disaster recovery. As part of failover, the filesystem is being unmounted and mounted again to update the state and other replication configurations. The failover might failed, because the unmount(offline of filesystem) operation is not succesful. As offline of filesystem is not successful and after certain retries to offline the filesystem as final step the process holding mount point is being killed. So Failover is exited.
RESOLUTION:
The fix is to open the replication config file with O_CLOEXEC, which will ensure not to inherit the process on the filesystem from replication context.
* 4134662 (Tracking ID: 4134661)
SYMPTOM:
Hang seen in the cp command in case of checkpoint promote in cluster filesystem environment.
DESCRIPTION:
The Hang is seen in cp command as we are not able to pull the inode blocks which is marked as overlay, hence made the code changes to pull the inode blocks marked as overlay.
RESOLUTION:
The Hang is seen in cp command as we are not able to pull the inode blocks which is marked as overlay, hence made the code changes to pull the inode blocks marked as overlay.
* 4134665 (Tracking ID: 4130230)
SYMPTOM:
vx_prefault_uio_readable() function is going beyond intended boundaries of the uio->uio_iov structure, potentially causing it to access memory addresses that are not valid.
DESCRIPTION:
The maximum length we can pre-fault is fixed(8K). But, the amount of user IO in some situation is less than the fixed value that we pre-fault. This leads the code in vx_prefault_uio_readable() to run off the end of uio->uio_iov structure and access invalid memory address.
RESOLUTION:
To fix this we are introducing a check in the code which will stop the code from accessing invalid memory location, after it has processed (pre-faulted) all the requested user-space IO pages.
* 4135000 (Tracking ID: 4070819)
SYMPTOM:
Handling the case where the error is returned while we are going to get the inode from the last clone which is marked as overlay and is going to be removed.
DESCRIPTION:
We are going to get the inode which is marked as overlay from the last clone that is marked for deletion. Hence have made the code changes to handle the scenario where we can get the error while fetching the inode in this case.
RESOLUTION:
Have handled this scenario through code.
* 4135005 (Tracking ID: 4068548)
SYMPTOM:
Fullfsck set on the file system and message "WARNING: msgcnt 222 mesg 017: V-2-17: vx_nattr_dirremove_1 - <mntpt> file system inode <ino> marked
bad incore" logged in dmesg
DESCRIPTION:
In case if file system is full, allocation fails with ENOSPC. enospc processing is done through inactive_process. If the file creation is done by non-root user and this thread itself starts doing the worklist processing it may fail with EACCES while processing IEREMOVE on files with root ownership.
RESOLUTION:
Set the root credentials while doing enospc processing and restore the old credentials after it is done.
* 4135008 (Tracking ID: 4126957)
SYMPTOM:
If "fsadm -o mntunlock=<string> <mountpoint>" and "umount -f <mountpoint>" operations are run in parallel,
system may crash with following stack:
vx_aioctl_unsetmntlock+0xd3/0x2a0 [vxfs]
vx_aioctl_vfs+0x256/0x2d0 [vxfs]
vx_admin_ioctl+0x156/0x2f0 [vxfs]
vxportalunlockedkioctl+0x529/0x660 [vxportal]
do_vfs_ioctl+0xa4/0x690
ksys_ioctl+0x64/0xa0
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x5b/0x1b0
DESCRIPTION:
There is a race condition between these two operations, due to which by the time fsadm thread tries to access
FS data structure, it is possible that umount operation has already freed the structures, which leads to panic.
RESOLUTION:
As a fix, the fsadm thread first checks if the umount operation is in progress. If so, it fails rather than continuing.
* 4135017 (Tracking ID: 4119961)
SYMPTOM:
Machine hit with Kernel PANIC, and generated the core dump.
DESCRIPTION:
During read we were trying to release the lock which was never taken.
RESOLUTION:
Fixed the issue with code changes.
* 4135018 (Tracking ID: 4068201)
SYMPTOM:
File system corruption
DESCRIPTION:
In certain cases we are not not flushing the intent log for transactions committed by a msg handler thread before replying to the msg.
RESOLUTION:
Flush the transactions done during ilist pull and push in case of error before sending the response.
* 4135022 (Tracking ID: 4101075)
SYMPTOM:
Will hit the core dump with debug bits, during finding big size extent.
DESCRIPTION:
During search of big extent (32K) with delegation, it is okay for smap to be NULL, if it is not NULL then unlock it.
RESOLUTION:
Relaxed the assert as it was unnecessarily complaining, also added the code to free the smap if it is not NULL.
* 4135027 (Tracking ID: 4084239)
SYMPTOM:
In case of OOM (Out of Memory) situation we might hit the issue if IOCTL fails to copy the data.
DESCRIPTION:
In case of error while copying the data (here it is OOM), we tried to release the lock which was never taken because of error.
RESOLUTION:
Fixed the bug with code changes.
* 4135028 (Tracking ID: 4058153)
SYMPTOM:
# mkfs.vxfs -o inosize=512 /dev/vx/dsk/testdg/testvol
# mount.vxfs /dev/vx/dsk/testdg/testvol /mnt1
# mkdir /mnt1/dir1
nxattrset -n ab -v 012345678901234567890123456789012345678901234567890123456789012345678901 /mnt1/dir1 >>>>>> creating 88 byte nxattr
# ./create_20k_file.sh >>>>>>>>>>>> creating 20k files inside /mnt1/dir1/ to create LDH attribute.
Now if we remove LDH attain with some free inode, the fsck will go to invite loop.
DESCRIPTION:
We were doing calculation error while clearing the LDH attribute from inode.
RESOLUTION:
Fixed the bug with code changes, now FSCK will not hang and will clear the LDH attribute.
* 4135040 (Tracking ID: 4092440)
SYMPTOM:
# /opt/VRTS/bin/fsppadm enforce /mnt4
UX:vxfs fsppadm: ERROR: V-3-27988: Placement policy file does not exist for mount point /mnt4: No such file or directory
# echo $?
0
DESCRIPTION:
FSPPADM command was returning rc 0 even in case of error during policy enformentmet.
RESOLUTION:
Fixed the issue by code change.
* 4135042 (Tracking ID: 4068953)
SYMPTOM:
# /opt/VRTS/bin/fsck /dev/vx/rdsk/testdg/testvol
# mount.vxfs /dev/vx/dsk/testdg/testvol /testfsck
# vxupgrade -n 17 /testfsck
# umount /testfsck
# /opt/VRTS/bin/fsck -o full -n /dev/vx/rdsk/testdg/testvol
pass0 - checking structural files
pass1 - checking inode sanity and blocks
pass2 - checking directory linkage
pass3 - checking reference counts
pass4 - checking resource maps
fileset 1 au 0 imap incorrect - fix (ynq)n >>>>>>> NOT EXPECTED
fileset 1 iau 0 summary incorrect - fix? (ynq)n >>>>>>> NOT EXPECTED
OK to clear log? (ynq)n
DESCRIPTION:
In case of HOLE in ilist file we might hit the issue, because of incorrect calculation of available space.
RESOLUTION:
With the code changes, corrected the way we were calculating the space.
* 4135102 (Tracking ID: 4099740)
SYMPTOM:
While mounting a file system, it fails with EBUSY error. Although on the setup, same device can not be seen as "mounted".
DESCRIPTION:
During mounting a filesystem, if it encounters error in kernel space, it leaks a hold count of the block device. This falsely implies the block device is still open in any future mounts. Because of that, when user retries the mount, the mount fails with EBUSY. It also causes memory leak for the same reason.
RESOLUTION:
Code changes are done to release the hold count on the block device properly.
* 4135105 (Tracking ID: 4112056)
SYMPTOM:
Will have incorrect values in inode fields i_acl and i_default_acl that is 0, however expected value is ACL_NOT_CACHED (-1)
DESCRIPTION:
VxFS does not set get_acl() callback in inode_operations (i_op), hence whenever kernel (version 4.x and above) checks the presence of this callback and does not
find, it sets i_acl and i_default_act fields to 0.
RESOLUTION:
Corrected the bug with code changes.
* 4136095 (Tracking ID: 4134194)
SYMPTOM:
vxfs/glm worker thread panic with kernel NULL pointer dereference
DESCRIPTION:
In vx_dir_realloc(), When the directory block is full, to fit new file entry it reallocate this directory block into a larger extent.
So as the new extent gets allocated, the old cbuf is now part of the new extent.
But we dont invalidate old cbuf during dir_realloc, which ends up with a staled cbuf in the cache.
This staled buffer can cause the buffer overflow issue.
RESOLUTION:
Code changes are done to invalidate the cbuf immediately after the realloc.
* 4136238 (Tracking ID: 4134884)
SYMPTOM:
After unmounting the FS, when the diskgroup deport is initiated, it gives below error:
vxvm:vxconfigd: V-5-1-16251 Disk group deport of testdg failed with error 70 - Volume or plex device is open or attached
DESCRIPTION:
During mount of a dirty file system, vxvm device open count is leaked, and consequently, the deport of the vxvm DG got failed.
During the VXFS FS mount operation the corresponding vxvm device will be opened.
If the FS is not clean, it signifies mount to do the log replay. Later the log replay completes, and the mount will succeed.
But this device open count leak causes the diskgroup deport to fail.
RESOLUTION:
Code changes are done to address the device open count leak.
Patch ID: 7.4.2.4500
* 4015834 (Tracking ID: 3988752)
SYMPTOM:
Use ldi_strategy() routine instead of bdev_strategy() for IO's in solaris.
DESCRIPTION:
bdev_strategy() is deprecated from solaris code and was causing performance issues when used for IO's. Solaris has recommended to use LDI framework for all IO's.
RESOLUTION:
Code is modified to use ldi framework for all IO's in solaris.
* 4093193 (Tracking ID: 4090032)
SYMPTOM:
System might panic in vx_dev_strategy() while Sybase or Oracle configuration,
the panic stack looks like following:
vx_dev_strategy
vx_dio_physio
vx_dio_rdwri
vx_write_direct
vx_write1
vx_write_common_slow
vx_write_common
vx_write
fop_write
pwrite
DESCRIPTION:
When we are allocating a different buffer, vx_dev_strategy() unable to find the LDI handle.
RESOLUTION:
Code is modified to fix this issue.
* 4093943 (Tracking ID: 4076185)
SYMPTOM:
VxODM goes into maintenance mode after reboot, if Solaris local zones are configured.
DESCRIPTION:
Solaris changed their booting sequence in SOL11.4 SRU 42. When upgrading to SOL11.4 SRU 42 or greater, after reboot, VxODM in the global zone goes into maintenance mode if the Solaris local zones are configured on the system.
RESOLUTION:
Removed VCS dependency from VxODM and added zones service dependency on VxODM.
* 4106387 (Tracking ID: 4100021)
SYMPTOM:
Running setfacl followed by getfacl resulting in "No such device or address" error.
DESCRIPTION:
When running setfacl command on some of the directories which have the VX_ATTR_INDIRECT type of acl attribute, it is not removing the existing acl attribute and adding a new one, which should not happen ideally. This is resulting in the failure of getfacl with following "No such device or address" error.
RESOLUTION:
we have done the code chages to removal of VX_ATTR_INDIRECT type acl in setfacl code.
* 4113616 (Tracking ID: 4027640)
SYMPTOM:
Global symbol "$py" requires explicit package name (did you forget to declare "my $py"?) in VCS logs agent/engine logs.
DESCRIPTION:
The issue is result of 'py' variable not declared
RESOLUTION:
Code changes to declare the variable before using it.
* 4116328 (Tracking ID: 4116329)
SYMPTOM:
fsck -o full -n command will fail with error:
"ERROR: V-3-28446: bc_write failure devid = 0, bno = 8, len = 1024"
DESCRIPTION:
Previously, to correct the file system WORM/SoftWORM, we didn't check if user wanted to correct the pflags or just wanted to validate if value is flag is missing or not. Also fsck was not capable to handle SOFTWORM flag.
RESOLUTION:
Code added to not try to fix the the problem if user ran fsck with -n option. Also SOFTWORM scenario is added.
* 4117341 (Tracking ID: 4117342)
SYMPTOM:
System might panic due to hard lock up detected on CPU
DESCRIPTION:
When purging the dentries, there is a possible race which can
lead to corrupted vnode flag. Because of these corrupted flag,
vxfs tries to purge dentry again and it gets stuck for vnode lock
which was taken in the current thread context which leads to
deadlock/softlockup.
RESOLUTION:
Code is modified to protect vnode flag with vnode lock.
* 4119279 (Tracking ID: 4119281)
SYMPTOM:
Higher page-in requests on Solaris 11 SPARC.
DESCRIPTION:
After upgrading Infoscale, page-in requests are much higher. "vmstat" output looks normal but "sar" output looks abnormal (showing high page-in requests).
"sar" is taking absolute sample for some reasons. "sar" is not supposed to use these values.
RESOLUTION:
Code changes are done to solve this issue
* 4120516 (Tracking ID: 3943232)
SYMPTOM:
System panic in vx_unmount_cleanup_notify when unmounting file system.
DESCRIPTION:
Every vnode having watches on it gets attached to root vnode of file system via vnode hook
v_inotify_list during dentry purge. When user removes all watches from vnode, vnode is destroyed and
VxFS free their associated memory. But it's possible that this vnode is still attached to root vnode list.
during unmount, if Vxfs pick this vnode from root vnode list, then this could lead to null pointer deference
when trying to access freed memory. To fix this issue, VxFS will now remove such vnodes from root vnode
list.
RESOLUTION:
Code is modified to remove Vnode from root vnode list.
* 4120526 (Tracking ID: 4089199)
SYMPTOM:
Dynamic reconfiguration operation for CPU takes a lot of time. Temporary I/O hang is also observed during DR.
DESCRIPTION:
DR processing in VxFS is done for each CPU change notified by kernel. DR processing involves VxFS reinit and cluster-wide file system freeze.
If the processor has SMT enabled then the cluster-wide file system freeze happens for each SMT thread per virtual CPU. This causes the slowness and
temporary I/O hangs during CPU DR operations.
RESOLUTION:
Optimised the DR code to do the processing of several CPU DR events together.
* 4120531 (Tracking ID: 4096561)
SYMPTOM:
Running FULLFSCK on the filesystem reports error regarding incorrect state file.
au <au number> state file incorrect - fix? (ynq)
DESCRIPTION:
When we allocate a Zero Fill on Demand (ZFOD) extent larger than an AU then it is split into smaller sized chunks. After splitting, it requires to change the
Allocation Unit (AU) state from ALLOCATED to EXPANDED. But this state change is missing in the code that leads to incorrect state file scenario.
RESOLUTION:
Code changes have been done to update Extent allocation unit state correctly.
Patch ID: 7.4.2.4200
* 4110765 (Tracking ID: 4110764)
SYMPTOM:
Security Vulnerability observed in Zlib a third party component VxFS uses.
DESCRIPTION:
In an internal security scans vulnerabilities in Zlib were found.
RESOLUTION:
Upgrading the third party component Zlib to address these vulnerabilities.
Patch ID: 7.4.2.4100
* 4106702 (Tracking ID: 4106701)
SYMPTOM:
A security vulnerability exists in the third-party component sqlite.
DESCRIPTION:
VXFS uses a third-party component named sqlitein which a security vulnerability exists.
RESOLUTION:
VxFS is updated to use a newer version of sqlitein which the security vulnerability has been addressed.
Patch ID: 7.4.2.3900
* 4050870 (Tracking ID: 3987720)
SYMPTOM:
vxms test is having failures.
DESCRIPTION:
vxms test is having failures.
RESOLUTION:
updated vxms.
* 4071105 (Tracking ID: 4067393)
SYMPTOM:
System panicked with the following stack trace:
page_fault
[exception RIP: vx_ckptdir_nmspc_match+29]
vx_nmspc_resolve
vx_drevalidate
lookup_dcache
do_last
path_openat
do_filp_open
do_sys_open
sys_open
DESCRIPTION:
Negative path lookup on force unmounted file system was not handled, hence NULL pointer
de-reference due to accessing already freed fs struc of force unmounted fs.
RESOLUTION:
Handled cases for force umounted before vx_nmspc_resolve() call, so it can NULL pointer
de-reference.
* 4074298 (Tracking ID: 4069116)
SYMPTOM:
fsck got stuck in pass1 inode validation.
DESCRIPTION:
fsck could land into a infinite retry loop during inode validation with the following stack trace:
pthread_mutex_unlock()
bc_getfreebuf()
sl_getblk()
bc_rgetblk()
fs_getblk()
bmap_bread()
fs_bmap_typ()
fs_callback_bmap()
fsck_callback_bmap()
bmap_check_overlay()
ivalidate()
pass1()
iproc_do_work()
start_thread()
This is because the inode is completely corrupted in such a way that it matches a known inode type in ivalidate() and goes ahead to verify the inode bmap. While trying to do so it requests for a buffer size larger than maximum fsck buffer cache memory and hence gets stuck in a loop.
RESOLUTION:
Added code changes to skip bmap validation if the inode mode bits are corrupted
* 4075873 (Tracking ID: 4075871)
SYMPTOM:
Utility to find possible pending stuck messages.
DESCRIPTION:
Utility to find possible pending stuck messages.
RESOLUTION:
Added utility to find possible pending stuck messages.
* 4075875 (Tracking ID: 4018783)
SYMPTOM:
Metasave collection and restore takes significant amount of time.
DESCRIPTION:
Metasave collection and restore takes significant amount of time.
RESOLUTION:
Code changes have been done in metasave code base to improve metasave collection and metasave restore in the range of 30-40%.
* 4084881 (Tracking ID: 4084542)
SYMPTOM:
Enhance fsadm defrag report to display if FS is badly fragmented.
DESCRIPTION:
Enhance fsadm defrag report to display if FS is badly fragmented.
RESOLUTION:
Added method to identify if FS needs defragmentation.
* 4088078 (Tracking ID: 4087036)
SYMPTOM:
FSCK utility exits with an error while running it with the "-o metasave" option on a shared volume.
DESCRIPTION:
FSCK utility exits with an error while running it with the "-o metasave" option on a shared volume. Besides this, while running this utility with "-n" and either "-o metasave" or "-o dumplog", it silently ignores the latter option(s).
RESOLUTION:
Code changes have been done to resolve the above-mentioned failure and also warning messages have been added to inform users regarding mutually exclusive behavior of "-n" and either of "metasave" and "dumplog" options instead of silently ignoring them.
* 4090573 (Tracking ID: 4056648)
SYMPTOM:
Metasave collection can be executed on a mounted filesystem.
DESCRIPTION:
If metasave image is collected from a mounted filesystem then it might be an inconsistent state of the filesystem as there could be ongoing changes happening on the filesystem.
RESOLUTION:
Code changes have been done to fail default metasave collection for a mounted filesystem. If metasave needs to be collected from mounted filesystem then this can still be achieved with option "-o inconsistent".
* 4090600 (Tracking ID: 4090598)
SYMPTOM:
Utility to detect culprit nodes while cfs hang is observed.
DESCRIPTION:
Utility to detect culprit nodes while cfs hang is observed.Customer can reboot and collect crash from those nodes to get the application up and running. Integrated msgdump and glmdump utiltiy with cfshang_check.
RESOLUTION:
Integrated msgdump and glmdump utiltiy with cfshang_check.
* 4090601 (Tracking ID: 4068143)
SYMPTOM:
fsck->misc is having failures.
DESCRIPTION:
fsck->misc is having failures.
RESOLUTION:
Updated fsck->misc.
* 4090617 (Tracking ID: 4070217)
SYMPTOM:
Command fsck might fail with 'cluster reservation failed for volume' message for a disabled cluster-mounted filesystem.
DESCRIPTION:
On a disabled cluster-mounted filesystem, release of cluster reservation might fail during unmount operation resulting in a failure of command fsck with 'cluster reservation failed for volume' message.
RESOLUTION:
Code is modified to release cluster reservation in unmount operation properly even for cluster-mounted filesystem.
* 4090639 (Tracking ID: 4086084)
SYMPTOM:
VxFS mount operation causes system panic when -o context is used.
DESCRIPTION:
VxFS mount operation supports context option to override existing extended attributes, or to specify a different, default context for file systems that do not support extended attributes. System panic observed when -o context is used.
RESOLUTION:
Required code changes are added to avoid panic.
* 4091580 (Tracking ID: 4056420)
SYMPTOM:
VFR Hardlink file is not getting replicated after modification in incremental sync.
DESCRIPTION:
VFR Hardlink file is not getting replicated after modification in incremental sync.
RESOLUTION:
Updated code to address: VFR Hardlink file is not getting replicated after modification in incremental sync.
* 4093306 (Tracking ID: 4090127)
SYMPTOM:
CFS hang in vx_searchau().
DESCRIPTION:
As part of SMAP transaction changes, allocator changed its logic to call mdele tryhold always when getting the emap for a particular EAU, and it passes
nogetdele as 1 to mdele_tryhold, which suggests that mdele_tryhold should not ask for delegation when detecting a free EAU without delegation, so in our case,
allocator finds such an EAU in device summary tree but without delegation, and it keeps retrying but without asking for delegation, hence the forever.
RESOLUTION:
In case a FREE EAU is found without delegation, delegate it back to Primary.
Patch ID: VRTSvxfs-7.4.2.3600
* 4089394 (Tracking ID: 4089392)
SYMPTOM:
Security vulnerabilities exist in the OpenSSL third-party components used by VxFS.
DESCRIPTION:
VxFS uses the OpenSSL third-party components in which some security vulnerability exist.
RESOLUTION:
VxFS is updated to use newer version (1.1.1q) of this third-party components in which the security vulnerabilities have been addressed.
Patch ID: 7.4.2.3500
* 4083948 (Tracking ID: 4070814)
SYMPTOM:
Security Vulnerability found in VxFS while running security scans.
DESCRIPTION:
In our internal security scans we found some Vulnerabilities in VxFS third party component Zlib.
RESOLUTION:
Upgrading the third party component Zlib to resolve these vulnerabilities.
Patch ID: VRTSvxfs-7.4.2.3400
* 4079532 (Tracking ID: 4079869)
SYMPTOM:
Security Vulnerability found in VxFS while running security scans.
DESCRIPTION:
In our internal security scans we found some Vulnerabilities in VxFS third party components. The Attackers can exploit these security vulnerability
to attack on system.
RESOLUTION:
Upgrading the third party components to resolve these vulnerabilities.
Patch ID: 7.4.2.2600
* 4015834 (Tracking ID: 3988752)
SYMPTOM:
Use ldi_strategy() routine instead of bdev_strategy() for IO's in solaris.
DESCRIPTION:
bdev_strategy() is deprecated from solaris code and was causing performance issues when used for IO's. Solaris has recommended to use LDI framework for all IO's.
RESOLUTION:
Code is modified to use ldi framework for all IO's in solaris.
* 4040612 (Tracking ID: 4033664)
SYMPTOM:
Multiple issues occur with hardlink replication using VFR.
DESCRIPTION:
Multiple different issues occur with hardlink replication using Veritas File Replicator (VFR).
RESOLUTION:
VFR is updated to fix issues with hardlink replication in the following cases:
1. Files with multiple links
2. Data inconsistency after hardlink file replication
3. Rename and move operations dumping core in multiple different scenarios
4. WORM feature support
* 4040618 (Tracking ID: 4040617)
SYMPTOM:
Veritas file replicator is not performing as per the expectation.
DESCRIPTION:
Veritas FIle replicator was having some bottlenecks at networking layer as well as data transfer level. This was causing additional throttling in the Replication.
RESOLUTION:
Performance optimisations done at multiple places to make use of available resources properly so that Veritas File replicator
* 4060549 (Tracking ID: 4047921)
SYMPTOM:
Replication job was getting into hung state because of the deadlock involving below threads :
Thread : 1
#0 0x00007f160581854d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f1605813e9b in _L_lock_883 () from /lib64/libpthread.so.0
#2 0x00007f1605813d68 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x000000000043be1f in replnet_sess_bulk_free ()
#4 0x000000000043b1e3 in replnet_server_dropchan ()
#5 0x000000000043ca07 in replnet_client_connstate ()
#6 0x00000000004374e3 in replnet_conn_changestate ()
#7 0x0000000000437c18 in replnet_conn_evalpoll ()
#8 0x000000000044ac39 in vxev_loop ()
#9 0x0000000000405ab2 in main ()
Thread 2 :
#0 0x00007f1605815a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x000000000043902b in replnet_msgq_waitempty ()
#2 0x0000000000439082 in replnet_bulk_recv_func ()
#3 0x00007f1605811ea5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f1603ef29fd in clone () from /lib64/libc.so.6
DESCRIPTION:
When replication job is paused/resumed in a succession multiple times because of the race condition it may lead to a deadlock situation involving two threads.
RESOLUTION:
Fix the locking sequence and add additional holds on resources to avoid race leading to deadlock situation.
* 4060566 (Tracking ID: 4052449)
SYMPTOM:
Cluster goes in an 'unresponsive' mode while invalidating pages due to duplicate page entries in iowr structure.
DESCRIPTION:
While finding pages for invalidation of inodes, VxFS traverses radix tree by taking RCU lock and fills the IO structure with dirty/writeback pages that need to be invalidated in an array. This lock is efficient for read but does not protect the parallel creation/deletion of node. Hence, when VxFS finds page, consistency for the page in checked through radix_tree_exception()/radix_tree_deref_retry(). And if it fails, VxFS restarts the page finding from start offset. But VxFs does not reset the array index, leading to incorrect filling of IO structure's array which was causing duplicate entries of pages. While trying to destroy these pages, VxFS takes page lock on each page. Because of duplicate entries, VxFS tries to take page lock couple of times on same page, leading to self-deadlock.
RESOLUTION:
Code is modified to reset the array index correctly in case of failure to find pages.
* 4060585 (Tracking ID: 4042925)
SYMPTOM:
Intermittent Performance issue on commands like df and ls.
DESCRIPTION:
Commands like "df" "ls" issue stat system call on node to calculate the statistics of the file system. In a CFS, when stat system call is issued, it compiles statistics from all nodes. When multiple df or ls are fired within specified time limit, vxfs is optimized. vxfs returns the cached statistics, instead of recalculating statistics from all nodes. If multiple such commands are fired in succession and one of the old caller of stat system call takes time, this optimization fails and VxFS recompiles statistics from all nodes. This can lead to bad performance of stat system call, leading to unresponsive situations for df, ls commands.
RESOLUTION:
Code is modified to protect last modified time of stat system call with a sleep lock.
* 4060805 (Tracking ID: 4042254)
SYMPTOM:
vxupgrade sets fullfsck flag in the filesystem if it is unable to upgrade the disk layout version because of ENOSPC.
DESCRIPTION:
If the filesystem is 100 % full and its disk layout version is upgraded by using vxupgrade, then this utility starts the upgrade and later it fails with ENOSPC and ends up setting fullfsck flag in the filesystem.
RESOLUTION:
Code changes introduced which first calculate the required space to perform the disk layout upgrade. If the required space is not available, it fails the upgrade gracefully without setting fullfsck flag.
* 4061203 (Tracking ID: 4005620)
SYMPTOM:
Inode count maintained in the inode allocation unit (IAU) can be negative when an IAU is marked bad. An error such as the following is logged.
V-2-4: vx_mapbad - vx_inoauchk - /fs1 file system free inode bitmap in au 264 marked bad
Due to the negative inode count, errors like the following might be observed and processes might be stuck at inode allocation with a stack trace as shown.
V-2-14: vx_iget - inode table overflow
vx_inoauchk
vx_inofindau
vx_findino
vx_ialloc
vx_dirmakeinode
vx_dircreate
vx_dircreate_tran
vx_pd_create
vx_create1_pd
vx_do_create
vx_create1
vx_create0
vx_create
vn_open
open
DESCRIPTION:
The inode count can be negative if somehow VxFS tries to allocate an inode from an IAU where the counter for regular file and directory inodes is zero. In such a situation, the inode allocation fails and the IAU map is marked bad. But the code tries to further reduce the already-zero counters, resulting in negative counts that can cause subsequent unresponsive situation.
RESOLUTION:
Code is modified to not reduce inode counters in vx_mapbad code path if the result is negative. A diagnostic message like the following flashes.
"vxfs: Error: Incorrect values of ias->ifree and Aus rifree detected."
* 4061527 (Tracking ID: 4054386)
SYMPTOM:
VxFS systemd service may show active status despite the module not being loaded.
DESCRIPTION:
If systemd service fails to load vxfs module, the service still shows status as active instead of failed.
RESOLUTION:
The script is modified to show the correct status in case of such failures.
Patch ID: 7.4.2.2200
* 4013420 (Tracking ID: 4013139)
SYMPTOM:
The abort operation on an ongoing online migration from the native file system to VxFS on RHEL 8.x systems.
DESCRIPTION:
The following error messages are logged when the abort operation fails:
umount: /mnt1/lost+found/srcfs: not mounted
UX:vxfs fsmigadm: ERROR: V-3-26835: umount of source device: /dev/vx/dsk/testdg/vol1 failed, with error: 32
RESOLUTION:
The fsmigadm utility is updated to address the issue with the abort operation on an ongoing online migration.
* 4040238 (Tracking ID: 4035040)
SYMPTOM:
After replication job paused and resumed some of the fields got missed in stats command output and never shows missing fields on onward runs.
DESCRIPTION:
rs_start for the current stat initialized to the start time of the replication and default value of rs_start is zero.
Stat don't show some fields in-case rc_start is zero.
if (rs->rs_start && dis_type == VX_DIS_CURRENT) {
if (!rs->rs_done) {
diff = rs->rs_update - rs->rs_start;
}
else {
diff = rs->rs_done - rs->rs_start;
}
/*
* The unit of time is in seconds, hence
* assigning 1 if the amount of data
* was too small
*/
diff = diff ? diff : 1;
rate = rs->rs_file_bytes_synced /
(diff - rs->rs_paused_duration);
printf("\t\tTransfer Rate: %s/sec\n", fmt_bytes(h,rate));
}
In replication we initialize the rs_start to zero and update with the start time but we don't save the stats to disk. That small window leave a case where
in-case, we pause the replication and start again we always see the rs_start to zero.
Now after initializing the rs_start we write to disk in the same function. In-case in resume case we found rs_start to zero, we again initialize the rs_start
field to current replication start time.
RESOLUTION:
Write rs_start to disk and added a check in resume case to initialize rs_start value in-case found 0.
* 4040608 (Tracking ID: 4008616)
SYMPTOM:
fsck command got hung.
DESCRIPTION:
fsck got stuck due to deadlock when a thread which marked buffer aliased is waiting for itself for the reference drain, while
getting block code was called with NOBLOCK flag.
RESOLUTION:
honour NOBLOCK flag
* 4042686 (Tracking ID: 4042684)
SYMPTOM:
Command fails to resize the file.
DESCRIPTION:
There is a window where a parallel thread can clear IDELXWRI flag which it should not.
RESOLUTION:
setting the delayed extending write flag incase any parallel thread has cleared it.
* 4044184 (Tracking ID: 3993140)
SYMPTOM:
In every 60 seconds, compclock was lagging behind approximate 1.44 seconds from actual time elapsed.
DESCRIPTION:
In every 60 seconds, compclock was lagging behind approximate 1.44 seconds from actual time elapsed.
RESOLUTION:
Made adjustment to logic responsible for calculating and updating compclock timer.
* 4046265 (Tracking ID: 4037035)
SYMPTOM:
Added new tunable "vx_ninact_proc_threads" to control the number of inactive processing threads.
DESCRIPTION:
On high end servers, heavy lock contention was seen during inactive removal processing, which was caused by the large number of inactive worker threads spawned by VxFS. To avoid the contention, new tunable "vx_ninact_proc_threads" was added so that customer can adjust the number of inactive processing threads based on their server config and workload.
RESOLUTION:
Added new tunable "vx_ninact_proc_threads" to control the number of inactive processing threads.
* 4046266 (Tracking ID: 4043084)
SYMPTOM:
panic in vx_cbdnlc_lookup
DESCRIPTION:
Panic observed in the following stack trace:
vx_cbdnlc_lookup+000140 ()
vx_int_lookup+0002C0 ()
vx_do_lookup2+000328 ()
vx_do_lookup+0000E0 ()
vx_lookup+0000A0 ()
vnop_lookup+0001D4 (??, ??, ??, ??, ??, ??)
getFullPath+00022C (??, ??, ??, ??)
getPathComponents+0003E8 (??, ??, ??, ??, ??, ??, ??)
svcNameCheck+0002EC (??, ??, ??, ??, ??, ??, ??)
kopen+000180 (??, ??, ??)
syscall+00024C ()
RESOLUTION:
Code changes to handle memory pressure while changing FC connectivity
* 4046267 (Tracking ID: 4034910)
SYMPTOM:
Garbage values inside global list large_dirinfo.
DESCRIPTION:
Garbage values inside global list large_dirinfo, which will lead to fsck failure.
RESOLUTION:
Make access/updataion to global list large_dirinfo synchronous throughout the fsck binary, so that garbage values due to race condition can be avoided.
* 4046271 (Tracking ID: 3993822)
SYMPTOM:
running fsck on a file system core dumps
DESCRIPTION:
buffer was marked as busy without taking buffer lock while getting buffer from freelist in 1 thread and there was another thread
that was accessing this buffer through its local variable
RESOLUTION:
marking buffer busy within the buffer lock while getting free buffer.
* 4046272 (Tracking ID: 4017104)
SYMPTOM:
Deleting a huge number of inodes can consume a lot of system resources during inactivations which cause hangs or even panic.
DESCRIPTION:
Delicache inactivations dumps all the inodes in its inventory, all at once for inactivation. This causes a surge in the resource consumptions due to which other processes can starve.
RESOLUTION:
Gradually process the inode inactivation.
* 4046829 (Tracking ID: 3993943)
SYMPTOM:
The fsck utility hit the coredump due to segmentation fault in get_dotdotlst().
Below is stack trace of the issue.
get_dotdotlst
check_dotdot_tbl
iproc_do_work
start_thread
clone ()
DESCRIPTION:
Due to a bug in fsck utility the coredump was generated while running the fsck on the filesystem. The fsck operation aborted in between due to the coredump.
RESOLUTION:
Code changes are done to fix this issue
* 4047568 (Tracking ID: 4046169)
SYMPTOM:
On RHEL8, while doing a directory move from one FS (ext4 or vxfs) to migration VxFS, the migration can fail and FS will be disable. In debug testing, the issue was caught by internal assert, with following stack trace.
panic
ted_call_demon
ted_assert
vx_msgprint
vx_mig_badfile
vx_mig_linux_removexattr_int
__vfs_removexattr
__vfs_removexattr_locked
vfs_removexattr
removexattr
path_removexattr
__x64_sys_removexattr
do_syscall_64
DESCRIPTION:
Due to different implementation of "mv" operation in RHEL8 (as compared to RHEL7), there is a removexattr call on the target FS - which in migration case will be migration VxFS. In this removexattr call, kernel asks "system.posix_acl_default" attribute to be removed from the directory to be moved. But since the directory is not present on the target side yet (and hence no extended attributes for the directory), the code returns ENODATA. When code in vx_mig_linux_removexattr_int() encounter this error, it disables the FS and in debug pkg calls assert.
RESOLUTION:
The fix is to ignore ENODATA error and not assert or disable the FS.
* 4049091 (Tracking ID: 4035057)
SYMPTOM:
On RHEL8, IOs done on FS, while other FS to VxFS migration is in progress can cause panic, with following stack trace.
machine_kexec
__crash_kexec
crash_kexec
oops_end
no_context
do_page_fault
page_fault
[exception RIP: memcpy+18]
_copy_to_iter
copy_page_to_iter
generic_file_buffered_read
new_sync_read
vfs_read
kernel_read
vx_mig_read
vfs_read
ksys_read
do_syscall_64
DESCRIPTION:
- As part of RHEL8 support changes, vfs_read, vfs_write calls were replaced with kernel_read, kernel_write as the vfs_ calls are no longer exported. The kernel_read, kernel_write calls internally set the memory segment of the thread to KERNEL_DS and expects the buffer passed to have been allocated in kernel space.
- In migration code, if the read/write operation cannot be completed using target FS (VxFS), then the IO is redirected to source FS. And in doing so, the code passes the same buffer - which is a user buffer to kernel call. This worked well with vfs_read, vfs_write calls. But is does not work with kernel_read, kernel_write calls, causing a panic.
RESOLUTION:
- Fix is to use vfs_iter_read, vfs_iter_write calls, which work with user buffer. To use these methods the user buffer needs to passed as part of struct iovec.iov_base
* 4049097 (Tracking ID: 4049096)
SYMPTOM:
Tar command errors out with 1 throwing warnings.
DESCRIPTION:
This is happening due to dalloc which is changing the ctime of the file after allocating the extents `(worklist thread)->vx_dalloc_flush -> vx_dalloc_off` in between the 2 fsstat calls in tar.
RESOLUTION:
Avoiding changing ctime while allocating delayed extents in background.
Patch ID: 7.4.2.1600
* 4012765 (Tracking ID: 4011570)
SYMPTOM:
WORM attribute replication support in VxFS.
DESCRIPTION:
WORM attribute replication is not supported in VFR. Modified code to replicate WORM attribute during attribute processing in VFR.
RESOLUTION:
Code is modified to replicate WORM attributes in VFR.
* 4014720 (Tracking ID: 4011596)
SYMPTOM:
It throws error saying "No such file or directory present"
DESCRIPTION:
Bug observed during parallel communication between all the nodes. Some required temp files were not present on other nodes.
RESOLUTION:
Fixed to have consistency maintained while parallel node communication. Using hacp for transferring temp files.
* 4015287 (Tracking ID: 4010255)
SYMPTOM:
"vfradmin promote" fails to promote target FS with selinux enabled.
DESCRIPTION:
During promote operation, VxFS remounts FS at target. When remounting FS to remove "protected on" flag from target, VxFS first fetch current mount options. With Selinux enabled (either in permissive mode/enabled), OS adds default "seclable" option to mount. When VxFS fetch current mount options, "seclabel" was not recognized by VxFS. Hence it fails to mount FS.
RESOLUTION:
Code is modified to remove "seclabel" mount option during mount processing on target.
* 4015835 (Tracking ID: 4015278)
SYMPTOM:
System panics during vx_uiomove_by _hand
DESCRIPTION:
During uiomove, VxFS get the pages from OS through get_user_pages() to copy user data. Oracle use hugetablfs internally for performance reason. This can allocate hugepages. Under low memory condition, it is possible that get_user_pages() might return VxFS compound pages. In case of compound pages, only head page has valid mapping set and all other pages are mapped as TAIL_MAPPING. In case of uiomove, if VxFS gets compound page, then it try to check writable mapping for all pages from this compound page. This can result into dereferencing illegal address (TAIL_MAPPING) which was causing panic in stack. VxFS doesn't support huge pages but it is possible that compound page is present on the system and VxFS might get one through get_user_pages.
RESOLUTION:
Code is modified to get head page in case of tail pages from compound page when VxFS checks writeable mapping.
* 4016721 (Tracking ID: 4016927)
SYMPTOM:
Remove tier command panics the system, crash has panic reason "BUG: unable to handle kernel NULL pointer dereference at 0000000000000150"
DESCRIPTION:
When fsvoladm removes device all devices are not moved. Number of device count also remains same unless it is the last device in the array. So check for free slot before trying to access device.
RESOLUTION:
In the device list check for free slot before accessing the device in that slot.
* 4017282 (Tracking ID: 4016801)
SYMPTOM:
filesystem mark for fullfsck
DESCRIPTION:
In cluster environment, some operation can be perform on primary node only. When such operations are executed from secondary node, message is
passed to primary node. During this, it may possible sender node has some transaction and not yet reached to disk. In such scenario, if sender node rebooted
then primary node can see stale data.
RESOLUTION:
Code is modified to make sure transactions are flush to log disk before sending message to primary.
* 4017818 (Tracking ID: 4017817)
SYMPTOM:
NA
DESCRIPTION:
In order to increase the overall throughput of VFR, code changes have been done
to replicate files parallelly.
RESOLUTION:
Code changes have been done to replicate file's data & metadata parallely over
multiple socket connections.
* 4017820 (Tracking ID: 4017819)
SYMPTOM:
Cloud tier add operation fails when user is trying to add the AWS GovCloud.
DESCRIPTION:
Adding AWS GovCloud as a cloud tier was not supported in InfoScale. With these changes, user will be able to add AWS GovCloud type of cloud.
RESOLUTION:
Added support for AWS GovCloud
* 4019877 (Tracking ID: 4019876)
SYMPTOM:
vxfsmisc.so is publicly shared library for samba and doesn't require infoscale license for its usage
DESCRIPTION:
vxfsmisc.so is publicly shared library for samba and doesn't require infoscale license for its usage
RESOLUTION:
Removed license dependency in vxfsmisc library
* 4020055 (Tracking ID: 4012049)
SYMPTOM:
"fsck" supports the "metasave" option but it was not documented anywhere.
DESCRIPTION:
"fsck" supports the "metasave" option while executing with the "-y" option. but it is not documented anywhere. Also, it tries to store metasave in a particular location. The user doesn't have the option to specify the location. If that location doesn't have enough space, "fsck" fails to take the metasave and it continues to change filesystem state.
RESOLUTION:
Code changes have been done to add one new option with which the user can specify the location to store metasave. "metasave" and "target", these two options have been added in the "usage" message of "fsck" binary.
* 4020056 (Tracking ID: 4012049)
SYMPTOM:
"fsck" supports the "metasave" option but it was not documented anywhere.
DESCRIPTION:
"fsck" supports the "metasave" option while executing with the "-y" option. but it is not documented anywhere. Also, it tries to store metasave in a particular location. The user doesn't have the option to specify the location. If that location doesn't have enough space, "fsck" fails to take the metasave and it continues to change filesystem state.
RESOLUTION:
Code changes have been done to add one new option with which the user can specify the location to store metasave. "metasave" and "target", these two options have been added in the "usage" message of "fsck" binary.
* 4020912 (Tracking ID: 4020758)
SYMPTOM:
Filesystem mount or fsck with -y may see hang during log replay
DESCRIPTION:
fsck utility is used to perform the log replay. This log replay is performed during mount operation or during filesystem check with -y option, if needed. In certain cases if there are lot of logs that needs to be replayed then it end up into consuming entire buffer cache. This results into out of buffer scenario and results into hang.
RESOLUTION:
Code is modified to make sure enough buffers are always available.
Patch ID: 7.4.2.1400
* 4020337 (Tracking ID: 4020334)
SYMPTOM:
VxFS Dummy incidents for FLEX patch archival.
DESCRIPTION:
Incident included e4009779 for the FLEX team patch.
RESOLUTION:
Incident included e4009779 for the FLEX team patch.
Patch ID: 7.4.2.1300
* 4002850 (Tracking ID: 3994123)
SYMPTOM:
Running fsck on a system may show LCT count mismatch errors
DESCRIPTION:
Multi-block merged extents in IFIAT inodes, may only process the first block of the extent, thus leaving some references unprocessed. This will lead to LCT counts not matching. Resolving the issue will require a fullfsck.
RESOLUTION:
Code changes added to process merged multi-block extents in IFIAT inodes correctly.
* 4005220 (Tracking ID: 4002222)
SYMPTOM:
The cluster can hang if the cluster filesystem is FCL enabled and its disk layout version is greater than or equals to 14.
DESCRIPTION:
VxFS worker threads that are responsible for handling "File Change Log" feature related operations, can be stuck in a deadlock if the disk layout version of the FCL enabled cluster filesystem is greater than or equals to 14.
RESOLUTION:
Code changes have been done to prevent cluster-wide hang in a scenario where the cluster filesystem is FCL enabled and the disk layout version is greater than or equals to 14.
* 4010353 (Tracking ID: 3993935)
SYMPTOM:
Fsck command of vxfs may hit segmentation fault with following stack.
#0 get_dotdotlst ()
#1 find_dotino ()
#2 dir_sanity ()
#3 pass2 ()
#4 iproc_do_work ()
#5 start_thread ()
#6 sysctl ()
DESCRIPTION:
TURNON_CHUNK() and TURNOFF_CHUNK() are modifying the values of arguments.
RESOLUTION:
Code has been modified to fix the issue.
* 4012061 (Tracking ID: 4001378)
SYMPTOM:
VxFS module failed to load on RHEL8.2
DESCRIPTION:
The RHEL8.2 is new release and it has some changes in kernel which caused VxFS module failed to load
on it.
RESOLUTION:
Added code to support VxFS on RHEL8.2
* 4012522 (Tracking ID: 4012243)
SYMPTOM:
During IO MM semaphores lock contention may reduce performance
DESCRIPTION:
During IO mmap locks taken may introduce lock contention and reduce IO performance.
RESOLUTION:
New VxFS API is introduced to skip these locks whenever required on specific file.
* 4012765 (Tracking ID: 4011570)
SYMPTOM:
WORM attribute replication support in VxFS.
DESCRIPTION:
WORM attribute replication is not supported in VFR. Modified code to replicate WORM attribute during attribute processing in VFR.
RESOLUTION:
Code is modified to replicate WORM attributes in VFR.
* 4012787 (Tracking ID: 4007328)
SYMPTOM:
After replication service is stopped on target, the job failed at source only after processing all the FCL records.
DESCRIPTION:
After replication service is stopped on target, the job failed at source only after processing all the fcl records. It should get failed immediately, but it is failed after processing all the fcl records. If target breaks the connection, ideally the source received the error, which job can fail while reading FCL records, but the source received that connection is closed but the other thread doesnt receive the signal to stop, while processing FCL and ends after processing is complete.
RESOLUTION:
If replication service is stopped at target and processing of FCL records are being handled fail immediately based on return status of the connection.
* 4012800 (Tracking ID: 4008123)
SYMPTOM:
If a file has more than one named extended attributes set & if the job is paused. It fails
to replicate the remaining named extended attributes. (This behaviour is intermittent).
DESCRIPTION:
During a VFR replication if the job is paused while a file's nxattr are getting replicated, next
time when the job is resumed, the seqno. triplet received from target side causes source to miss
the remaining nxattr.
RESOLUTION:
Handling of named extended attributes is re-worked to make sure it doesn't miss the remaining
attributes on resume.
* 4012801 (Tracking ID: 4001473)
SYMPTOM:
If a file has named extended attributes set, VFR fails to replicate the job &
job goes into failed state.
DESCRIPTION:
VFR tries to use open(2) on nxattr files, since this files are not visible outside
it fails with ENOTDIR.
RESOLUTION:
Using the internal VXFS specific API to get a valid file descriptor for nxattr files.
* 4012842 (Tracking ID: 4006192)
SYMPTOM:
system panic with NULL pointer de-reference.
DESCRIPTION:
VxFS supports checkpoint i.e. point in image copy of filesystem. For this it needs keep copy of some metadata for checkpoint. In some cases it
misses to make copy. Later while processing files corresponds to this missed metadata, it got empty extent information. Extent information is block map for a
give file. This empty extent information causing NULL pointer de-reference.
RESOLUTION:
Code changes are made to fix this issue.
* 4012936 (Tracking ID: 4000465)
SYMPTOM:
FSCK binary loops when it detects break in sequence of log ids.
DESCRIPTION:
When FS is not cleanly unmounted, FS will end up with unflushed intent log. This intent log will either be flushed during next subsequent mount or when fsck ran on the FS. Currently to build the transaction list that needs to be replayed, VxFS uses binary search to find out head and tail. But if there are breakage in intent log, then current code is susceptible to loop. To avoid this loop, VxFS is now going to use sequential search to find out range instead of binary search.
RESOLUTION:
Code is modified to incorporate sequential search instead of binary search to find out replayable transaction range.
* 4013084 (Tracking ID: 4009328)
SYMPTOM:
In a cluster filesystem, if smap corruption is seen and the smap is marked bad then it could cause hang while unmounting the filesystem.
DESCRIPTION:
While freeing an extent in vx_extfree1() for logversion >= VX_LOGVERSION13 if we are freeing whole AUs we set VX_AU_SMAPFREE flag for those AUs. This ensures that revoke of delegation for that AU is delayed till the AU has SMAP free transaction in progress. This flag gets cleared either in post commit/undo processing of the transaction or during error handling in vx_extfree1(). In one scenario when we are trying to free a whole AU and its smap is marked bad, we do not return any error to vx_extfree1() and neither do we add the subfunction to free the extent to the transaction. So, the VX_AU_SMAPFREE flag is not cleared and remains set even if there is no SMAP free transaction in progress. This could lead to hang while unmounting the cluster filesystem.
RESOLUTION:
Code changes have been done to add error handling in vx_extfree1 to clear VX_AU_SMAPFREE flag in case where error is returned due to bad smap.
* 4013143 (Tracking ID: 4008352)
SYMPTOM:
Using VxFS mount binary inside container to mount any device might result in core generation.
DESCRIPTION:
Using VxFS mount binary inside container to mount any device might result in core generation.
This issue is because of improper initialisation of local pointer, and dereferencing garbage value later.
RESOLUTION:
This fix properly initialises all the pointers before dereferencing them.
* 4013144 (Tracking ID: 4008274)
SYMPTOM:
Race between compression thread and clone remove thread while allocating reorg inode.
DESCRIPTION:
Compression thread does the reorg inode allocation without setting i_inreuse and it takes HLOCK in exclusive mode. Later this lock in downgraded to shared mode. While this processing is happening clone delete thread can do iget on this inode and call vx_getownership without hold. If the inode is of type IFEMR or IFPTI or FREE success is returned after the ownership call. Later in the same function getownership is called with hold set before doing the processing (truncate or mark the inode as IFPTI). Removing the first redundant ownership call
RESOLUTION:
Delay taking ownership on inode until we check the inode mode.
* 4013626 (Tracking ID: 4004181)
SYMPTOM:
VxFS internally maintains compliance clock, without this API, user will not be able to read the value
DESCRIPTION:
VxFS internally maintains compliance clock, without this API, user will not be able to read the value
RESOLUTION:
Provide an API on mount point to read the Compliance clock for that filesystem
* 4013738 (Tracking ID: 3830300)
SYMPTOM:
Heavy cpu usage while oracle archive process are running on a clustered
fs.
DESCRIPTION:
The cause of the poor read performance in this case was due to fragmentation,
fragmentation mainly happens when there are multiple archivers running on the
same node. The allocation pattern of the oracle archiver processes is
1. write header with O_SYNC
2. ftruncate-up the file to its final size ( a few GBs typically)
3. do lio_listio with 1MB iocbs
The problem occurs because all the allocations in this manner go through
internal allocations i.e. allocations below file size instead of allocations
past the file size. Internal allocations are done at max 8 Pages at once. So if
there are multiple processes doing this, they all get these 8 Pages alternately
and the fs becomes very fragmented.
RESOLUTION:
Added a tunable, which will allocate zfod extents when ftruncate
tries to increase the size of the file, instead of creating a hole. This will
eliminate the allocations internal to file size thus the fragmentation. Fixed
the earlier implementation of the same fix, which ran into
locking issues. Also fixed the performance issue while writing from secondary node.
INSTALLING THE PATCH
--------------------
Run the Installer script to automatically install the patch:
-----------------------------------------------------------
Please be noted that the installation of this P-Patch will cause downtime.
To install the patch perform the following steps on at least one node in the cluster:
1. Copy the patch fs-rhel7_x86_64-Patch-7.4.2.5400.tar.gz to /tmp
2. Untar fs-rhel7_x86_64-Patch-7.4.2.5400.tar.gz to /tmp/patch
# mkdir /tmp/patch
# cd /tmp/patch
# gunzip /tmp/fs-rhel7_x86_64-Patch-7.4.2.5400.tar.gz
# tar xf /tmp/fs-rhel7_x86_64-Patch-7.4.2.5400.tar
3. Install the patch(Please be noted that the installation of this P-Patch will cause downtime.)
# pwd /tmp/patch
# ./installVRTSvxfs742P5400 [<host1> <host2>...]
You can also install this patch together with 7.4.2 base release using Install Bundles
1. Download this patch and extract it to a directory
2. Change to the Veritas InfoScale 7.4.2 directory and invoke the installer script
with -patch_path option where -patch_path should point to the patch directory
# ./installer -patch_path [<path to this patch>] [<host1> <host2>...]
Install the patch manually:
--------------------------
rpm -Uvh VRTSvxfs-7.4.2.5400-RHEL7.x86_64.rpm
REMOVING THE PATCH
------------------
rpm -evh VRTSvxfs-7.4.2.5400-RHEL7.x86_64
KNOWN ISSUES
------------
* Tracking ID: 4097111
SYMPTOM: While doing two or more mount (of vxfs file system) operations in parallel, underneath an already existing vxfs mount point, if a force umount is attempted on the parent vxfs mount point, then sometimes the force unmount operation hangs permanently.
WORKAROUND: None except rebooting the system.
SPECIAL INSTRUCTIONS
--------------------
NONE
OTHERS
------
NONE
Applies to the following product releases
This update requires
InfoScale 7.4.2 Update 7 Cumulative Patch on RHEL7 Platform
Update files
|
File name | Description | Version | Platform | Size |
---|
Knowledge base
Machine crashes in _d_rehash function in 7.4.2 Infoscale on RedHat 7
2025-01-24Problem Machine crashes in _d_rehash function in 7.4.2 Infoscale on RedHat 7 Error Message Snippet from crash dump follows: crash bt PID: 22396 TASK: ffff9de891714200 CPU: 0 COMMAND: "find" #0 [ffff9de85522f810] machine_kexec at ffffffffb34698c4 ...