Veritas Data Insight Classification Guide

Last Published:
Product(s): Data Insight (6.1.6)

Classification Jobs

The section explains the function of the classification jobs that run in various services. You can view the status of all Data Insight processes from the Settings > Data Insight Servers > Services tab on the Management Console.

Table: Communication service jobs

Jobs

Description

FileTransferJob_classify

Runs on all Data Insight nodes once every minute.

Distributes the classification events between Data Insight nodes.

FileTransferJob_content

Runs every 10 seconds on the Windows File Server.

Routes content file and CSQLite file to the assigned Classification Server.

ClassifyInputJob

Runs every 10 seconds on the Management Server.

Processes the classification requests from the Data Insight console and from reports for the consumption of the book keeping database.

ClassifyBatchJob

Runs every minute on the Indexer.

Splits the classification batch input databases for the scanner's consumption, which are later pushed to the Collector.

ClassifyIndexJob

Runs once every minute on the Indexer node.

Updates the index with classification tags and also updates the status of the book keeping database.

ClassifyMergeStatusJob

Runs once every minute on the Management Server.

Calls the files with the classification update status that are received from each indexer. These files are automatically created on the indexer whenever updates are available. It also updates the global book keeping database that is used to show high level classification status on the Console.

CreateFeaturesJob

Runs once every week on Sunday at 00.01 A.M. on the Indexer.

Checks if sufficient classified data is available for the supervised learning algorithm to create predictions (training sets).

This job has a multi-threaded execution framework which executes actions in parallel. The default thread count is 2. You can set the value using the matrix.classification.sl.features.threads property at global or node level.

Note that the node level property always takes precedence over the global level property.

Table: Classification service jobs

Job

Description

ClassifyFetchJob

Runs every minute on the server that is assigned the role of a Classification Server.

Searches the classification/inbox folder for the input files and adds them to the priority queues. One input file can result in multiple snapshots with the name <PRIORITY>_<CRID>_<BATCHID>_<NODEID>_<MSUID>_<TIMESTAMP>_snap<N>.csqlite.

The input file contains the location where the actual file has been kept in the classification/content folder. The job also keeps a list of files that could not be fetched.

Note:

Error logs are created in the <Install directory>/log/fetch folder.

ClassifyFetchPauseJob

Runs once every minute on any node that acts as the Classification Server.

Refreshes the pause or resume status of fetch jobs as per the duration configured for content fetching.

CancelClassifyRequestJob

Runs every 20 seconds in Communication Service and Classification Service.

Fetches the list of classification requests that are cancelled and distributes this request between Data Insight nodes.

Before classifying files, all the classification jobs consult this list to identify the requests that are marked for cancellation. If they observe any canceled request in the new request that is submitted for classification, then that request is deleted.

ClassifyJob

Runs once every minute on any node that acts as a Classification Server.

Checks the classification/inbox folder for input files submitted for classification folder and adds them to three separate priority queues. It picks a file from the highest queue in FIFO order, and starts classifying content using Veritas Information Classifier. All files in that input file are submitted for classification. Once all paths in the file have been classified, result of the classification and any resulting errors are written to a database in the classification/outbox folder.

UpdateVICPolicyMapJob

Runs every ten seconds on the Management Server.

Ensures that Data Insight configuration database is in sync with the Classification Policy Manager.

UpdateConfigJob

Reconfigures jobs based on the configuration changes made on the Management Server.

PredictJob

Runs once every week on Sunday at 05.00 A.M. on the Indexer.

Copies the prediction files from the temp output directory to a classification outbox.

SLCreateBatchesJob

Runs every 2 hours on the Indexer.

Creates batches of files for the consumption of Veritas Information Classifier. These files are classified with high priority.

ClassifyManageWorkloadJob

Runs every one minute on the server that is assigned the role of a Classification Server. This job is enabled only on master Classification Server.

Checks the classification or workload folder on master Classification Server and counts batches based on their priority. If the workload needs to be distributed, the job fetches a list of servers' in it its pool and fetches the number of batches based on their priority in the classification or inbox folder. If the number of batches on any slave that have priority less than 10, then the job distributes the batches across that slave and copies them to the slave's the classification or inbox folder.