NetBackup™ Deployment Guide for Kubernetes Clusters
- Introduction
- Section I. Configurations
- Prerequisites
- Recommendations and Limitations
- Configurations
- Configuration of key parameters in Cloud Scale deployments
- Section II. Deployment
- Section III. Monitoring and Management
- Monitoring NetBackup
- Monitoring Snapshot Manager
- Monitoring MSDP Scaleout
- Managing NetBackup
- Managing the Load Balancer service
- Managing PostrgreSQL DBaaS
- Performing catalog backup and recovery
- Managing MSDP Scaleout
- Section IV. Maintenance
- MSDP Scaleout Maintenance
- PostgreSQL DBaaS Maintenance
- Patching mechanism for Primary and Media servers
- Upgrading
- Cloud Scale Disaster Recovery
- Uninstalling
- Troubleshooting
- Troubleshooting AKS and EKS issues
- Troubleshooting AKS-specific issues
- Troubleshooting EKS-specific issues
- Troubleshooting AKS and EKS issues
- Appendix A. CR template
Elastic media server related issues
This section provides the issues and their respective workaround for the issues observed in elastic media server feature.
Issue with autoscaler for scaling in the media server pods
This issue is observed when there is no load or only few jobs are running even when there are maximum number of media server pods/nodes that are in running state.
Verify if media server autoscaler is trying to scale-in but unable to shutdown the media server pods which are marked to be scaled-in.
Verify if there are any jobs or bpps processes running on the media pods with the higher indexed running pod by referring to the NetBackup operator logs as mentioned below:
2023-03-01T08:14:56.470Z INFO controller-runtime.manager.controller.mediaserver Running jobs 0: on Media Server nbux-10-244-33-77.vxindia.veritas.com. {"reconciler group": "netbackup.veritas.com", "reconciler kind": "MediaServer", "name": "media1", "namespace": "netbackup-environment", "Media Server": "nbux-10-244-33-77.vxindia.veritas.com"} 2023-03-01T08:14:56.646Z INFO controller-runtime.manager.controller.mediaserver bpps processes running status. false: on Media Server nbux-10-244-33-77.vxindia.veritas.com. {"reconciler group": "netbackup.veritas.com", "reconciler kind": "MediaServer", "name": "media1", "namespace": "netbackup-environment", "Media Server": "nbux-10-244-33-77.vxindia.veritas.com"}
Perform the following to know which bpps processes are running and are not allowing to scale-in the media server pod:
Login to NetBackup Web UI portal.
Check the notifications tab for any notifications of Media server elasticity event category. The notification has the list of additional process running on specific media server. User must wait until the process listed in the additional process running exits.
Alternatively, user can also see the list of processes in the NetBackup operator logs as follows:
2023-07-11T13:33:44.142Z INFO controller-runtime.manager.controller.mediaserver Following processes are still running : bpbkar test1, bpbkar test2 {"reconciler group": "netbackup.veritas.com", "reconciler kind": "MediaServer", "name": "test-media-server", "namespace": "netbackup-environment"}
Media server pod not in ready state due to certificate issues
If the
count is reduced and the reduced media servers are not decommissioned and in case the replicas are again increased you may observe that the media server pods will not be in ready state due to certificate related errors. For example: certificate expired.Reduce the replicas and decommission the reduced replicas and then proceed with increasing replica count.
To reduce the maximum number of replicas, perform the media server decommissioning steps mentioned in References to nonexistent or decommissioned media servers remain in NetBackup.
Additional steps
Delete the Load Balancer service created for the media server by running the following commands:
$ kubectl get service --namespace <namespce_name>
$ kubectl delete service <service-name> --namespace <namespce_name>
Identify and delete any outstanding persistent volume claims for the media server by running the following commands:
$ kubectl get pvc --namespace <namespce_name>
$ kubectl delete pvc <pvc-name>
Locate and delete any persistent volumes created for the media server by running the following commands:
$ kubectl get pv
$ kubectl delete pv <pv-name> --grace-period=0 --force
Duplication job is not getting completed
Duplication job is not getting completed (in turn scale-down is not happening) because
is set on the media server as part of scale down while duplication is still running.While configuring destination storage unit, manually select media servers which are always up, running and would not be able to scale-in anytime. Number of media server which are always running would be same as that mentioned against
field in CR. Above recommendation also applies when upgrading from older version to NetBackup version 10.3. Post-upgrade manually select media servers that are mentioned against field in CR during upgrade. Default is 1.Error while connecting to media server
For scaled in media servers, certain resources and configurations are retained to avoid reconfiguration during subsequent scale out. Post entries for scaled in media servers are not removed from NetBackup primary server and hence if those media servers are used for any operation, connectivity issue is observed.
Workaround:
It is recommenced to use media servers that are always up, running and would never scale in (by the media server autoscaler). Number of media servers that are always up and running would be same as that of the value mentioned in
field in CR.