Search <book_title>...

NetBackup™ Deployment Guide for Kubernetes Clusters

Last Published: 2023-04-24

Product(s): NetBackup & Alta Data Protection (10.2)

Elastic media server related issues

This section provides the issues and their respective workaround for the issues observed in elastic media server feature.

Issue with autoscaler for scaling in the media server pods
This issue is observed when there is no load or only few jobs are running even when there are maximum number of media server pods/nodes that are in running state.
- Verify if media server autoscaler is trying to scale-in but unable to shutdown the media server pods which are marked to be scaled-in.
- Verify if there are any jobs or bpps processes running on the media pods with the higher indexed running pod by referring to the NetBackup operator logs as mentioned below:
```
2023-03-01T08:14:56.470Z        INFO    controller-runtime.manager.controller.mediaserver       Running jobs 0:  on Media Server nbux-10-244-33-77.vxindia.veritas.com.        {"reconciler group": "netbackup.veritas.com", "reconciler kind": "MediaServer", "name": "media1", "namespace": "netbackup-environment", "Media Server": "nbux-10-244-33-77.vxindia.veritas.com"}
2023-03-01T08:14:56.646Z        INFO    controller-runtime.manager.controller.mediaserver       bpps processes running status. false:  on Media Server nbux-10-244-33-77.vxindia.veritas.com.  {"reconciler group": "netbackup.veritas.com", "reconciler kind": "MediaServer", "name": "media1", "namespace": "netbackup-environment", "Media Server": "nbux-10-244-33-77.vxindia.veritas.com"}
```
  Perform the following to know which bpps processes are running and are not allowing to scale-in the media server pod:
  - Exec into the media server pod for which there are running bpps process.
  - Refer to the /mnt/nblogs/nbprocesscheck log to get the list of bpps process running. user must wait until the process listed in the Extra bpps process exits.
Media server pod not in ready state due to certificate issues
If the maximumReplica count is reduced and the reduced media servers are not decommissioned and in case the replicas are again increased you may observe that the media server pods will not be in ready state due to certificate related errors. For example: certificate expired.
Reduce the replicas and decommission the reduced replicas and then proceed with increasing replica count.
To reduce the maximum number of replicas, perform the media server decommissioning steps mentioned in References to nonexistent or decommissioned media servers remain in NetBackup.
Additional steps
- Delete the Load Balancer service created for the media server by running the following commands:
  $ kubectl get service --namespace <namespce_name>
  $ kubectl delete service <service-name> --namespace <namespce_name>
- Identify and delete any outstanding persistent volume claims for the media server by running the following commands:
  $ kubectl get pvc --namespace <namespce_name>
  $ kubectl delete pvc <pvc-name>
- Locate and delete any persistent volumes created for the media server by running the following commands:
  $ kubectl get pv
  $ kubectl delete pv <pv-name> --grace-period=0 --force
Duplication job is not getting completed
Duplication job is not getting completed (in turn scale-down is not happening) because MachineAdministrativePause is set on the media server as part of scale down while duplication is still running.
While configuring destination storage unit, manually select media servers which are always up, running and would not be able to scale-in anytime. Number of media server which are always running would be same as that mentioned against minimumReplicas field in CR. Above recommendation also applies when upgrading from older version to NetBackup version 10.2. Post-upgrade manually select media servers that are mentioned againstminimumReplicas field in CR during upgrade. Default minimumReplicas is 1.
Error while connecting to media server
For scaled in media servers, certain resources and configurations are retained to avoid reconfiguration during subsequent scale out. Post entries for scaled in media servers are not removed from NetBackup primary server and hence if those media servers are used for any operation, connectivity issue is observed.
Workaround:
It is recommenced to use media servers that are always up, running and would never scale in (by the media server autoscaler). Number of media servers that are always up and running would be same as that of the value mentioned in minimumReplicas field in CR.