NetBackup™ for Kubernetes Administrator's Guide

Last Published:
Product(s): NetBackup & Alta Data Protection (10.5)
  1. Overview of NetBackup for Kubernetes
    1.  
      Overview
    2.  
      Features of NetBackup support for Kubernetes
  2. Deploying and configuring the NetBackup Kubernetes operator
    1.  
      Prerequisites for NetBackup Kubernetes Operator deployment
    2.  
      Deploy service package on NetBackup Kubernetes operator
    3.  
      Port requirements for Kubernetes operator deployment
    4.  
      Upgrade the NetBackup Kubernetes operator
    5.  
      Delete the NetBackup Kubernetes operator
    6.  
      Configure NetBackup Kubernetes data mover
    7.  
      Automated configuration of NetBackup protection for Kubernetes
    8. Configure settings for NetBackup snapshot operation
      1.  
        Kubernetes operators supported configuration parameters
      2.  
        Prerequisites for backup from snapshot and restore from backup operations
      3.  
        DTE client settings supported in Kubernetes
      4.  
        Customization of datamover properties
    9.  
      Troubleshooting NetBackup servers with short names
    10.  
      Data mover pod schedule mechanism support
    11.  
      Validating accelerator storage class
  3. Deploying certificates on NetBackup Kubernetes operator
    1.  
      Deploy certificates on the Kubernetes operator
    2.  
      Perform Host-ID-based certificate operations
    3.  
      Perform ECA certificate operations
    4.  
      Identify certificate types
  4. Managing Kubernetes assets
    1.  
      Add a Kubernetes cluster
    2. Configure settings
      1.  
        Change resource limits for Kuberentes resource types
      2.  
        Configure autodiscovery frequency
      3.  
        Configure permissions
    3.  
      Add protection to the assets
    4. Scan for malware
      1.  
        Assets by workload type
  5. Managing Kubernetes intelligent groups
    1.  
      About intelligent group
    2.  
      Create an intelligent group
    3.  
      Delete an intelligent group
    4.  
      Edit an intelligent group
  6. Managing Kubernetes policies
    1.  
      Create a policy
  7. Protecting Kubernetes assets
    1.  
      Protect an intelligent group
    2.  
      Remove protection from an intelligent group
    3.  
      Configure backup schedule
    4.  
      Configure backup options
    5.  
      Configure backups
    6.  
      Configure Auto Image Replication (A.I.R.) and duplication
    7.  
      Configure storage units
    8.  
      Volume mode support
    9.  
      Configure application consistent backup
  8. Managing image groups
    1. About image groups
      1.  
        Image expire
      2.  
        Image copy
  9. Protecting Rancher managed clusters in NetBackup
    1.  
      Add Rancher managed RKE cluster in NetBackup using automated configuration
    2.  
      Add Rancher managed RKE cluster manually in NetBackup
  10. Recovering Kubernetes assets
    1.  
      Explore and validate recovery points
    2.  
      Restore from snapshot
    3.  
      Restore from backup copy
  11. About incremental backup and restore
    1.  
      Incremental backup and restore support for Kubernetes
  12. Enabling accelerator based backup
    1.  
      About NetBackup Accelerator support for Kubernetes workloads
    2.  
      Controlling disk space for track logs on primary server
    3.  
      Effect of storage class behavior on Accelerator
    4.  
      About Accelerator forced rescan
    5.  
      Warnings and probable reason for Accelerator backup failures
  13. Enabling FIPS mode in Kubernetes
    1.  
      Enable Federal Information Processing Standards (FIPS) mode in Kubernetes
  14. About Openshift Virtualization support
    1.  
      OpenShift Virtualization support
    2.  
      Application consistent virtual machines backup
    3.  
      Troubleshooting for virtualization
  15. Troubleshooting Kubernetes issues
    1.  
      Error during the primary server upgrade: NBCheck fails
    2.  
      Error during an old image restore: Operation fails
    3.  
      Error during persistent volume recovery API
    4.  
      Error during restore: Final job status shows partial failure
    5.  
      Error during restore on the same namespace
    6.  
      Datamover pods exceed the Kubernetes resource limit
    7.  
      Error during restore: Job fails on the highly loaded cluster
    8.  
      Custom Kubernetes role created for specific clusters cannot view the jobs
    9.  
      Openshift creates blank non-selected PVCs while restoring applications installed from OperatorHub
    10.  
      NetBackup Kubernetes operator become unresponsive if PID limit exceeds on the Kubernetes node
    11.  
      Failure during edit cluster in NetBackup Kubernetes 10.1
    12.  
      Backup or restore fails for large sized PVC
    13.  
      Restore of namespace file mode PVCs to different file system partially fails
    14.  
      Restore from backup copy fails with image inconsistency error
    15.  
      Connectivity checks between NetBackup primary, media, and Kubernetes servers.
    16.  
      Error during accelerator backup when there is no space available for track log
    17.  
      Error during accelerator backup due to track log PVC creation failure
    18.  
      Error during accelerator backup due to invalid accelerator storage class
    19.  
      Error occurred during track log pod start
    20.  
      Failed to setup the data mover instance for track log PVC operation
    21.  
      Error to read track log storage class from configmap

NetBackup Kubernetes operator become unresponsive if PID limit exceeds on the Kubernetes node

In Linux systems, there is an initd or system process running as PID 1 to reap zombie processes. Containers that do not have such an initd process would keep spawning zombie processes.

After certain time period these zombie processes accumulates and then reaches the max limit of PIDs set on the Kubernetes node.

In NetBackup Kubernetes operator, nbcertcmdtool spawns child processes to carry out certificate-related operations. On completion of the operation, the processes get orphan and are not reaped. Eventually it hits the max PID limit and NetBackup Kubernetes operator becomes unresponsive.

Error message: login pod/nbukops-controller-manager-67f5498bbb-gn9zw -c netbackupkops -n nbukops ERRO[0005] exec failed: container_linux.go:380: starting container process caused: read init-p: connection reset by peer a command that is terminated with exit code 1.

Recommended actions:

  • To fix the PID limit exceed issue, you can use the Initd script. Initd script acts as parent process or entry point script to the controller pod.

    As a parent process it attaches zombie process to itself after the child process completion to terminate the persistent zombie process. It also helps you to shut down the container gracefully. Initd script is available in NBUKOPs build version 10.0.1.

  • Use the following steps to remove the existing nbcertcmdtool zombie processes:

  1. Describe the NetBackup operator pod and find the Kubernetes node on which the controller pod is running. Run the command:

    kubectl describe -c netbackupkops <NB k8s operator pod name> -n <namespace>

  2. Log on to the Kubernetes node, run the command:

    kubectl debug node/nodename

  3. Terminate the nbcertcmdtool zombie processes, run the command:

    ps -ef | grep "\[nbcertcmdtool\] <defunct>" | awk '{print $3}' | xargs kill -9

Note:

These steps terminate all the zombie processes for that worker node. But it resolves the issue temporarily. For a permanent solution, you must deploy a new KOps build with Initd script.