NetBackup™ Deployment Guide for Kubernetes Clusters
- Introduction
- Section I. Deployment
- Prerequisites for Kubernetes cluster configuration
- Deployment with environment operators
- Deploying NetBackup
- Primary and media server CR
- Deploying NetBackup using Helm charts
- Deploying MSDP Scaleout
- Deploying Snapshot Manager
- Section II. Monitoring and Management
- Monitoring NetBackup
- Monitoring MSDP Scaleout
- Monitoring Snapshot Manager
- Managing the Load Balancer service
- Managing MSDP Scaleout
- Performing catalog backup and recovery
- Section III. Maintenance
- MSDP Scaleout Maintenance
- Upgrading
- Uninstalling
- Troubleshooting
- Troubleshooting AKS and EKS issues
- Troubleshooting AKS-specific issues
- Troubleshooting EKS-specific issues
- Troubleshooting AKS and EKS issues
- Appendix A. CR template
Elastic media server related issues
This section provides the issues and their respective workaround for the issues observed in elastic media server feature.
Issue with autoscaler for scaling in the media server pods
This issue is observed when there is no load or only few jobs are running even when there are maximum number of media server pods/nodes that are in running state.
Verify if media server autoscaler is trying to scale-in but unable to shutdown the media server pods which are marked to be scaled-in.
Verify if there are any jobs or bpps processes running on the media pods with the higher indexed running pod by referring to the NetBackup operator logs as mentioned below:
2023-03-01T08:14:56.470Z INFO controller-runtime.manager.controller.mediaserver Running jobs 0: on Media Server nbux-10-244-33-77.vxindia.veritas.com. {"reconciler group": "netbackup.veritas.com", "reconciler kind": "MediaServer", "name": "media1", "namespace": "netbackup-environment", "Media Server": "nbux-10-244-33-77.vxindia.veritas.com"} 2023-03-01T08:14:56.646Z INFO controller-runtime.manager.controller.mediaserver bpps processes running status. false: on Media Server nbux-10-244-33-77.vxindia.veritas.com. {"reconciler group": "netbackup.veritas.com", "reconciler kind": "MediaServer", "name": "media1", "namespace": "netbackup-environment", "Media Server": "nbux-10-244-33-77.vxindia.veritas.com"}
Perform the following to know which bpps processes are running and are not allowing to scale-in the media server pod:
Exec into the media server pod for which there are running bpps process.
Refer to the
/mnt/nblogs/nbprocesscheck
log to get the list of bpps process running. user must wait until the process listed in the Extra bpps process exits.
Media server pod not in ready state due to certificate issues
If the
count is reduced and the reduced media servers are not decommissioned and in case the replicas are again increased you may observe that the media server pods will not be in ready state due to certificate related errors. For example: certificate expired.Reduce the replicas and decommission the reduced replicas and then proceed with increasing replica count.
To reduce the maximum number of replicas, perform the media server decommissioning steps mentioned in References to nonexistent or decommissioned media servers remain in NetBackup.
Additional steps
Delete the Load Balancer service created for the media server by running the following commands:
$ kubectl get service --namespace <namespce_name>
$ kubectl delete service <service-name> --namespace <namespce_name>
Identify and delete any outstanding persistent volume claims for the media server by running the following commands:
$ kubectl get pvc --namespace <namespce_name>
$ kubectl delete pvc <pvc-name>
Locate and delete any persistent volumes created for the media server by running the following commands:
$ kubectl get pv
$ kubectl delete pv <pv-name> --grace-period=0 --force
Duplication job is not getting completed
Duplication job is not getting completed (in turn scale-down is not happening) because
is set on the media server as part of scale down while duplication is still running.While configuring destination storage unit, manually select media servers which are always up, running and would not be able to scale-in anytime. Number of media server which are always running would be same as that mentioned against
field in CR. Above recommendation also applies when upgrading from older version to NetBackup version 10.2. Post-upgrade manually select media servers that are mentioned against field in CR during upgrade. Default is 1.Error while connecting to media server
For scaled in media servers, certain resources and configurations are retained to avoid reconfiguration during subsequent scale out. Post entries for scaled in media servers are not removed from NetBackup primary server and hence if those media servers are used for any operation, connectivity issue is observed.
Workaround:
It is recommenced to use media servers that are always up, running and would never scale in (by the media server autoscaler). Number of media servers that are always up and running would be same as that of the value mentioned in
field in CR.