NetBackup™ Deployment Guide for Amazon Elastic Kubernetes Services (EKS) Cluster

Last Published:
Product(s): NetBackup (10.1)
  1. Introduction to NetBackup on EKS
    1.  
      About NetBackup deployment on Amazon Elastic Kubernetes (EKS) cluster
    2.  
      Required terminology
    3.  
      User roles and permissions
    4.  
      About MSDP Scaleout
    5.  
      About MSDP Scaleout components
    6.  
      Limitations in MSDP Scaleout
  2. Deployment with environment operators
    1. About deployment with the environment operator
      1.  
        Prerequisites
      2.  
        Contents of the TAR file
      3.  
        Known limitations
    2.  
      Deploying the operators manually
    3.  
      Deploying NetBackup and MSDP Scaleout manually
    4.  
      Configuring the environment.yaml file
    5.  
      Uninstalling NetBackup environment and the operators
    6.  
      Applying security patches
  3. Assessing cluster configuration before deployment
    1.  
      How does the webhook validation works
    2.  
      Webhooks validation execution details
    3.  
      How does the Config-Checker utility work
    4.  
      Config-Checker execution and status details
  4. Deploying NetBackup
    1.  
      Preparing the environment for NetBackup installation on EKS
    2.  
      Recommendations of NetBackup deployment on EKS
    3.  
      Limitations of NetBackup deployment on EKS
    4. About primary server CR and media server CR
      1.  
        After installing primary server CR
      2.  
        After Installing the media server CR
    5.  
      Monitoring the status of the CRs
    6.  
      Updating the CRs
    7.  
      Deleting the CRs
    8.  
      Configuring NetBackup IT Analytics for NetBackup deployment
    9.  
      Managing NetBackup deployment using VxUpdate
    10.  
      Migrating the node group for primary or media servers
  5. Upgrading NetBackup
    1.  
      Preparing for NetBackup upgrade
    2.  
      Upgrading NetBackup operator
    3.  
      Upgrading NetBackup application
    4.  
      Upgrade NetBackup during data migration
    5.  
      Procedure to rollback when upgrade fails
  6. Deploying MSDP Scaleout
    1.  
      Deploying MSDP Scaleout
    2.  
      Prerequisites
    3.  
      Installing the docker images and binaries
    4.  
      Initializing the MSDP operator
    5.  
      Configuring MSDP Scaleout
    6.  
      Using MSDP Scaleout as a single storage pool in NetBackup
    7.  
      Configuring the MSDP cloud in MSDP Scaleout
  7. Upgrading MSDP Scaleout
    1.  
      Upgrading MSDP Scaleout
  8. Monitoring NetBackup
    1.  
      Monitoring the application health
    2.  
      Telemetry reporting
    3.  
      About NetBackup operator logs
    4.  
      Expanding storage volumes
    5.  
      Allocating static PV for Primary and Media pods
  9. Monitoring MSDP Scaleout
    1.  
      About MSDP Scaleout status and events
    2.  
      Monitoring with Amazon CloudWatch
    3.  
      The Kubernetes resources for MSDP Scaleout and MSDP operator
  10. Managing the Load Balancer service
    1.  
      About the Load Balancer service
    2.  
      Notes for Load Balancer service
    3.  
      Opening the ports from the Load Balancer service
  11. Performing catalog backup and recovery
    1.  
      Backing up a catalog
    2.  
      Restoring a catalog
  12. Managing MSDP Scaleout
    1.  
      Adding MSDP engines
    2.  
      Adding data volumes
    3. Expanding existing data or catalog volumes
      1.  
        Manual storage expansion
    4.  
      MSDP Scaleout scaling recommendations
    5. MSDP Cloud backup and disaster recovery
      1.  
        About the reserved storage space
      2.  
        Cloud LSU disaster recovery
    6.  
      MSDP multi-domain support
    7.  
      Configuring Auto Image Replication
    8. About MSDP Scaleout logging and troubleshooting
      1.  
        Collecting the logs and the inspection information
  13. About MSDP Scaleout maintenance
    1.  
      Pausing the MSDP Scaleout operator for maintenance
    2.  
      Logging in to the pods
    3.  
      Reinstalling MSDP Scaleout operator
    4.  
      Migrating the MSDP Scaleout to another node group
  14. Uninstalling MSDP Scaleout from EKS
    1.  
      Cleaning up MSDP Scaleout
    2.  
      Cleaning up the MSDP Scaleout operator
  15. Troubleshooting
    1.  
      View the list of operator resources
    2.  
      View the list of product resources
    3.  
      View operator logs
    4.  
      View primary logs
    5.  
      Pod restart failure due to liveness probe time-out
    6.  
      Socket connection failure
    7.  
      Resolving an invalid license key issue
    8.  
      Resolving an issue where external IP address is not assigned to a NetBackup server's load balancer services
    9.  
      Resolving the issue where the NetBackup server pod is not scheduled for long time
    10.  
      Resolving an issue where the Storage class does not exist
    11.  
      Resolving an issue where the primary server or media server deployment does not proceed
    12.  
      Resolving an issue of failed probes
    13.  
      Resolving token issues
    14.  
      Resolving an issue related to insufficient storage
    15.  
      Resolving an issue related to invalid nodepool
    16.  
      Resolving a token expiry issue
    17.  
      Resolve an issue related to KMS database
    18.  
      Resolve an issue related to pulling an image from the container registry
    19.  
      Resolving an issue related to recovery of data
    20.  
      Check primary server status
    21.  
      Pod status field shows as pending
    22.  
      Ensure that the container is running the patched image
    23.  
      Getting EEB information from an image, a running container, or persistent data
    24.  
      Resolving the certificate error issue in NetBackup operator pod logs
    25.  
      Resolving the primary server connection issue
    26.  
      Primary pod is in pending state for a long duration
    27.  
      Host mapping conflict in NetBackup
    28.  
      NetBackup messaging queue broker take more time to start
    29.  
      Local connection is getting treated as insecure connection
    30.  
      Issue with capacity licensing reporting which takes longer time
    31.  
      Backing up data from Primary server's /mnt/nbdata/ directory fails with primary server as a client
  16. Appendix A. CR template
    1.  
      Secret
    2.  
      MSDP Scaleout CR

Configuring the environment.yaml file

The following configurations apply to all the components:

Table: Common environment parameters

Parameter

Description

name: environment-sample

Specify the name of the environment in your cluster.

namespace: example-ns

Specify the namespace where all the NetBackup resources are managed. If not specified here, then it will be the current namespace when you run the command kubectl apply -f on this file.

containerRegistry: example.dkr.ecr.us-east-2.amazonaws.com/exampleReg

Specify a container registry that the cluster has access. NetBackup images are pushed to this registry.

tag: 10.1

This tag is used for all images in the environment. Specifying a `tag` value on a sub-resource affects the images for that sub-resource only. For example, if you apply an EEB that affects only primary servers, you might set the `primary.tag` to the custom tag of that EEB. The primary server runs with that image, but the media servers and MSDP scaleouts continue to run images tagged `10.1`. Beware that the values that look like numbers are treated as numbers in YAML even though this field needs to be a string; quote this to avoid misinterpretation.

licenseKeys:

List the license keys that are shared among all the sub-resources. Licenses specified in a sub-resource are appended to this list and applied only to the sub-resource.

paused: false

Specify whether the NetBackup operator attempts to reconcile the differences between this YAML specification and the current Kubernetes cluster state. Only set it to true during maintenance.

configCheckMode: default

This controls whether certain configuration restrictions are checked or enforced during setup. Other allowed values are skip and dryrun.

corePattern: /corefiles/core.%e.%p.%t

Specify the path to use for storing core files in case of a crash.

loadBalancerAnnotations: service.beta.kubernetes.io/aws-load-balancer-subnets:

example-subnet1 name

Specify the annotations to be added for the network load balancer

The following configurations apply to the primary server. The values specified in the following table can override the values specified in the table above.

Table: Environment parameters for the primary server

Paragraph

Description

paused: false

Specifies whether the NetBackup operator attempts to reconcile the differences between this YAML specification and the current Kubernetes cluster state. Set it to true only during maintenance. This applies only to the environment object. To pause reconciliation of the managed primary server, for example, you must set spec.primary.paused. Setting spec.paused:true ceases updates to the managed resources, including updates to their `paused` status. Entries in the media servers and MSDP scaleouts lists also support the `paused` field. The default value is false.

primary

Specifies attributes specific to the primary server resources. Every environment has exactly one primary server, so this section cannot be left blank.

name: primary-name

Set resourceNamePrefix to control the name of the primary server. The default value is the same as the environment's name.

tag: 10.1-special

To use a different image tag specifically for the primary server, uncomment this value and provide the desired tag. This overrides the tag specified in the common section.

nodeSelector:

labelKey: kubernetes.io/os

labelValue: linux

Specify a key and value that identifies nodes where the primary server pod runs.

Note:

This labelKey and labelValue must be the same label key:value pair used during node group creation which would be used as a toleration for primary server.

networkLoadBalancer:

annotations: service.beta.kubernetes.io/aws-load-balancer-subnets: example-subnet1 name

ipList:
        - ipAddr: 4.3.2.1
          fqdn: primary.example.com

Uncomment the annotations to specify additional primary server-specific annotations. These values are merged with the values given in the loadBalancerAnnotations above. Any duplicate values given here override the corresponding values above.

Next, specify the hostname and IP address of the primary server.

credSecretName: primary-credential-secret

This determines the credentials for the primary server. Media servers use these credentials to register themselves with the primary server.

itAnalyticsPublicKey: ssh-rsaxxx

If using NetBackup IT Analytics, uncomment this and provide the SSH public key. IT Analytics uses this to access the primary server.

kmsDBSecret: kms-secret

Secret name which contains the Host Master Key ID (HMKID), Host Master Key passphrase (HMKpassphrase), Key Protection Key ID (KPKID) and Key Protection Key passphrase (KPKpassphrase) for NetBackup Key Management Service. The secret should be 'Opaque', and can be created either using a YAML or the following example command: kubectl create secret generic kms-secret --namespace nb-namespace --from-literal=HMKID="HMK@ID" --from-literal=HMKpassphrase="HMK@passphrase" --from-literal=KPKID="KPK@ID" --from-literal=KPKpassphrase="KPK@passphrase"

licenseKeys:

To specify additional license keys that are applied only to the primary server, uncomment this and provide the license key(s). In this example, the primary server would have the "X" license key defined in the previous section, followed by this "Y" key.

catalog:

capacity: 100Gi

storageClassName: <pv_name>

This storage applies to the primary server for the NetBackup catalog, log and data volumes. The primary server catalog volume must be at least 100 Gi.

Note:

Only for primary server, static provisioning is supported using EFS.

Note:

This Capacity has to be same as the capacity of static PVC which is already created.

log:

capacity: 30Gi

storageClassName: <EBS based storage class>

Log volume must be at least 30Gi.

data:

capacity: 30Gi

storageClassName: <EBS based storage class>

The primary server data volume must be at least 30Gi.

The following section describes the media server configurations. If you do not have a media server either remove this section from the configuration file entirely, or define it as an empty list.

Note:

The environment name or media server name in environment.yaml file must always be less than 22 characters.

Table: Media server related parameters

parameters

Description

mediaServers:

- name: media1

This specifies media server configurations. This is given as a list of media servers, but most environments will have just one, with multiple replicas. It's also possible to have zero media servers; in that case, either remove the media servers section entirely, or define it as an empty list: mediaServers: []

replicas: 1

Specifies the number of replicas of this media server. Minimum number of supported replicas is 1.

tag: 10.1-special

To use a different image tag specifically for the media servers, uncomment this value and provide the desired tag. This overrides the tag specified above in the common table.

nodeSelector:

labelKey: kubernetes.io/os

labelValue: linux

Specify a key and value that identifies nodes where media-server pods will run.

Note:

This labelKey and labelValue must be the same label key:value pair used during node group creation which would be used as a toleration for media server.

data:

capacity: 50Gi

storageClassName: <EBS based storage class>

This storage applies to the media server data volumes.

The minimum data size for a media server is 50 Gi.

log

capacity: 30Gi

storageClassName: <EBS based storage class>

This storage applies to the media server log volumes.

Log volumes must be at least 30Gi.

networkLoadBalancer:

annotations: -service.beta.kubernetes.io/aws-load-balancer-subnets: example-subnet1 name

ipList:

ipAddr: 4.3.2.2

fqdn: media1-1.example.com

ipAddr: 4.3.2.3

fqdn: media1-2.example.com

Uncomment annotations to specify additional media-server specific annotations. These values are merged with the values given in the loadBalancerAnnotations. The duplicate values given here, override the corresponding values in the loadBalancerAnnotations.

The number of entries in the IP list should match the replica count specified above.

Note the following:

To use gp3 (EBS based storage class), user must specify provisioner for storage class as ebs.csi.aws.com and must install EBS CSI driver. For more information on installing the EBS CSI driver, see Amazon EBS CSI driver. Example, for gp3 storage class:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: gp3
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
allowVolumeExpansion: true
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
  type: gp3

The following section describes MSDP-related parameters. You may also deploy without any MSDP scaleouts. In that case, remove the msdpScaleouts section entirely from the configuration file.

Table: MSDP Scaleout related parameters

Parameter

Description

msdpScaleouts:

- name: dedupe1

This specifies MSDP Scaleout configurations. This is given as a list, but it would be rare to need more than one scaleout deployment in a single environment. Use the `replicas` property below to scale out. It's also possible to have zero MSDP scaleouts; in that case, either remove the msdpScaleouts section entirely, or define it to an empty list: msdpScaleouts: []

tag: '17.0'

This tag overrides the one defined in the table 1-3. It is necessary because the MSDP Scaleout images are shipped with tags different from the NetBackup primary and media images.

replicas: 4

This is the scaleout size of this MSDP Scaleout component. It is a required value, and it must be between 4 and 16 inclusive.

Note:

Scale-down of the MSDP Scaleout replicas after deployment is not supported.

serviceIPFQDNs:

ipAddr: 1.2.3.4

fqdn: dedupe1-1.example.com

ipAddr: 1.2.3.5

fqdn: dedupe1-2.example.com

ipAddr: 1.2.3.6

fqdn: dedupe1-3.example.com

ipAddr: 1.2.3.7

fqdn: dedupe1-4.example.com

These are the IP addresses and host names of the MSDP Scaleout servers. The number of the entries should match the number of the replicas specified above.

kms:

keyGroup: example-key-group

Specifies the initial key group and key secret to be used for KMS encryption. When reusing storage from a previous deployment, the key group and key secret may already exist. In this case, provide the keyGroup only.

keySecret:

example-key-secret

Specify keySecret only if the key group does not already exist and needs to be created. The secret type should be Opaque, and you can create the secret either using a YAML or the following command:

kubectl create secret generic example-key-secret --namespace nb-namespace --from-literal=username="devuser" --from-literal=passphrase="test passphrase"

loadBalancerAnnotations:

service.beta.kubernetes .io/aws-load- balancer-internal: true

For MSDP scaleouts, the default value for the AWS-load-balancer-internal annotation is `false`, which may cause the MSDP Scaleout services in this Environment to be accessible publicly. To make sure that they use private IP addresses, specify `true` here or in the loadBalancerAnnotations above in Table 1-3.

credential:

secretName: msdp-secret1

This defines the credentials for the MSDP Scaleout server. It refers to a secret in the same namespace as this environment resource. Secret can be either of type 'Basic-auth' or 'Opaque'. You can create secrets using a YAML or by using the following command:kubectl create secret generic <msdp-secret1> --namespace <nb-namespace> --from-literal=username=<"devuser"> --from-literal=password=<"Y@123abCdEf">

autoDelete: false

Optional parameter. Default value is true. When set to true, the MSDP Scaleout operator deletes the MSDP secret after using it. In such case, the MSDP and primary secrets must be distinct. To use the same secret for both MSDP scaleouts and the primary server, set autoDelete to false.

catalog:

capacity: 1Gi

storageClassName: gp2

This storage applies to MSDP Scaleout to store the catalog and metadata. The catalog size may only be increased for capacity expansion. Expanding the existing catalog volumes cause short downtime of the engines. Recommended size is 1/100 of backend data capacity.

dataVolumes:

capacity: 5Gi

storageClassName: gp2

This specifies the data storage for this MSDP Scaleout resource. You may increase the size of a volume or add more volumes to the end of the list, but do not remove or re-order volumes. Maximum 16 volumes are allowed. Appending new data volumes or expanding existing ones will cause short downtime of the Engines. Recommended volume size is 5Gi-32Ti.

log:

capacity: 20Gi

storageClassName: gp2

Specifies log volume size used to provision Persistent Volume Claim for Controller and MDS Pods. In most cases, 5-10 Gi capacity should be big enough for one MDS or Controller Pod to use.

nodeSelector:

labelKey: kubernetes.io/os

labelValue: linux

Specify a key and value that identifies nodes where MSDP Scaleout pods will run.

Edit restricted parameters post deployment

Do not change these parameters post initial deployment. Changing these parameters may result in an inconsistent deployment.

Table: Edit restricted parameters post deployment

Parameter

Description

name

Specifies the prefix name for the primary, media, and MSDP Scaleout server resources.

ipAddr, fqdn and

loadBalancerAnnotations

The values against ipAddr, fqdn and loadBalancerAnnotations against following fields should not be changed post initial deployment. This is applicable for primary, media, and MSDP Scaleout servers. For example:

- The loadBalancerAnnotations for loadBalancerAnnotations:
 service.beta.kubernetes.io/aws-load-balancer -internal-subnet:
 example-subnet service.beta.kubernetes.io/
aws-load-balancer -internal:
 'true'
- The IP and FQDNs values defined for Primary, 
Media and MSDPScaleout  ipList:
    - ipAddr: 4.3.2.1      fqdn: primary.example.com
ipList:
    - ipAddr: 4.3.2.2      fqdn: media1-1.example.com
  - ipAddr: 4.3.2.3 
     fqdn: media1-2.example.com   serviceIPFQDNs: 
- ipAddr: 1.2.3.4  
   fqdn: dedupe1-1.example.com    - ipAddr: 1.2.3.5
   fqdn: dedupe1-2.example.com
    - ipAddr: 1.2.3.6      fqdn: dedupe1-3.example.com    - ipAddr: 1.2.3.7 
     fqdn: dedupe1-4.example.com