Cleaning up orphaned images in Amazon Glacier vault manually

Article: 100042314
Last Published: 2018-09-17
Ratings: 0 1
Product(s): NetBackup & Alta Data Protection

Problem

There may be instances where you cannot clean up orphaned images in Amazon Glacier vault due to the absence of a  metadata object. A metadata object contains mapping information between data objects and NetBackup images.

You can use the following steps for failed backups, for example, network connectivity problems. The failure in backup may cause failure in cleanup, thus leading to orphaned images. The following error messages are displayed in such scenarios:

  • For failure other than access denied, you will see the following message:
    Failed to clean up archives. Manual cleanup suggested. See NetBackup Cloud Administrator's guide.
  • For orphaned images due to lock problem, you will see the following message:
    Access denied to delete Amazon Glacier archive.

Solution

Complete the following steps to manually clean up orphaned images in Amazon Glacier vault.

The solution lists valid NetBackup images corresponding to the vault using bpimmedia and then finds the same from vault inventory. Orphaned images are the additional images apart from valid NetBackup images. You can check size leaked due to these orphaned images which can help you make a decision regarding cleaning up the vault.

Warning: The following steps can lead to permanent deletion of archives.

Note: Examples provided in the following section is for a UNIX system only. Ensure you replace the UNIX system commands with the native platform command options.

The following commands bpimagelist or bpimmedia can be found at <installpath>/netbackup/bin/admincmd.

  1. List the images for the media ID from the NetBackup catalog using bpimagelist or bpimmedia and fetch the Image IDs from the details.
    Command: bpimmedia  -mediaid <media_id> -dp <disk_pool_name> -stype <stype>
    Example: bpimmedia  -mediaid @aaaad -dp dp-nbu-vault-worm -stype amazon_cryptc | grep -i IMAGE | cut -d' ' -f3
    Output example:
    myhostname.abc.xyz.qwe.com_1517484967
    myhostname.abc.xyz.qwe.com_1517485050
  2. Copy the output of above command (list of Image IDs) to a text file, for example, image_ids_from_nbu_catlog.txt.
    Command: bpimmedia  -mediaid @aaaad -dp dp-nbu-vault-worm -stype amazon_cryptc | grep -i IMAGE | cut -d' ' -f3 > image_ids_from_nbu_catlog.txt
  3. List the inventory from the vault and fetch the image description using Amazon’s AWS CLI.
    Installation details can be found here.
    1. Initiate a job to list an inventory.
      Command: aws glacier initiate-job --account-id <account_id> --vault-name <vault_name> --job-parameters '{"Type": "inventory-retrieval"}'
      Example: aws glacier initiate-job --account-id - --vault-name my-vault-name --job-parameters '{"Type": "inventory-retrieval"}'
      Output example:
      {
          "location": "/SOME_RANDOM_NUMBER/vaults/my-vault-name/jobs/DUMMY_JOB_ID",
          "jobId": "DUMMY_JOB_ID"
      }
    2. List the job to get the job status.
      Command: aws glacier list-jobs --account-id <account_id> --vault-name <vault_name>
      Example: aws glacier list-jobs --account-id - --vault-name my-vault-name
      Output example:
      {

                     "JobList": [
                                    {
                                                   "VaultARN": "arn:aws:glacier:ca-central-1:SOME_RANDOM_NUMBER:vaults/my-vault-name",
                                                   "RetrievalByteRange": "0-459",
                                                   "Tier": "Standard",
                                                   "SHA256TreeHash": "TREE_HASH_STRING",
                                                   "Completed": false,
                                                   "JobId": "DUMMY_JOB_ID",
                                                   "ArchiveId": "ARCHIVE_ID",
                                                   "ArchiveSizeInBytes": 460,
                                                   "Action": "ArchiveRetrieval",
                                                   "ArchiveSHA256TreeHash": "TREE_HASH_STRING",
                                                   "CreationDate": "2018-02-09T09:13:17.427Z",
                                                   "StatusCode": "InProgress"
                                    }
                     ]
      }
    3. If the job status is marked as completed ("Completed": true) in the above step, get the job output using the Job ID in a json format and copy that to a file.
      Command: aws glacier get-job-output --account-id <account_id> --vault-name <vault_name> --job-id <job_id_string>
      Example: aws glacier get-job-output --account-id - --vault-name my-vault-name --job-id DUMMY_JOB_ID > output.json
      Output example:
      {"VaultARN":"arn:aws:glacier:ca-central-1:SOME_RANDOM_NUMBER:vaults/my-vault-name","InventoryDate":"2018-01-25T14:56:50Z","ArchiveList":[{"ArchiveId":"ARCHIVE_ID","ArchiveDescription":"ARCHIVE_DESCRIPTION_CONTAINING_NETBACKUP_IMAGE_BASENAME","CreationDate":"2017-12-20T12:20:28Z","Size":1024,"SHA256TreeHash":"TREE_HASH_STRING"}...}]}
    4. Filter the output based on the ArchiveDescription field that contains  the NetBackup Image ID.
      1. Run a script to filter the inventory list based on the ArchiveDescription. A sample script for UNIX is as follows:
        cat output.json | python -m json.tool | grep "ArchiveDescription\|ArchiveId" | tr -s [:space:] | cut -d':' -f2 | sed 's/,$//' | sed 's/"$//' | sed 's/"//' | paste -s -d' \n' > myinventorylist.txt
      2. Copy the output of the previous step (4.1) into a file, for example, myinventorylist.txt.
  4. Obtain all the Archive IDs from the inventory list (myinventorylist.txt) that do not match the list of Image ID patterns (present in the step 2 output file (image_ids_from_nbu_catlog.txt)).
    Refer to the following script to complete this step:
    pattern=$(<image_ids_from_nbu_catlog.txt); grep -v "$pattern" myinventorylist.txt | sed 's/[^ ]*  //' > delete_these_archives.txt
     
  5. Delete the Archive IDs obtained in the step 4.
    Command: aws glacier delete-archive --account-id <account_id> --vault-name <vault_name> --archive-id <archive_id_string_to_be_deleted>
    Example: aws glacier delete-archive --account-id - --vault-name my-vault-name --archive-id ARCHIVE_ID
    Refer to the following script to complete this step:

    cat delete_these_images.txt | while read in; do aws glacier delete-archive --account-id - --vault-name my-vault-name --archive-id "$in"; done 

Was this content helpful?