Enterprise Vault™ Classification using the Veritas Information Classifier
- About this guide
- Preparing Enterprise Vault for classification
- Setting up Veritas Information Classifier policies
- Defining and applying Enterprise Vault classification policies
- Defining classification policies
- Running classification in test mode
- Using classification with smart partitions
- Appendix A. Enterprise Vault properties for use in custom field searches
- Appendix B. PowerShell cmdlets for use with classification
- Appendix C. Classification cache folder
- Appendix D. Migrating from FCI classification to the Veritas Information Classifier
- Appendix E. Monitoring and troubleshooting
Creating or editing patterns
You cannot edit the built-in patterns, but you can edit any custom patterns that you have created.
To create or edit a pattern
- At the left of the Veritas Information Classifier, click Patterns.
- The following diagram shows the New Pattern dialog with the pattern type as Regular expression.
Do one of the following:
To create a pattern, click New.
To edit an existing pattern, select it and then click Edit.
- Set the fields as follows:
Name
Specifies the pattern name. The name must be unique, and it can contain up to 100 alphanumeric, space, and special characters.
Description
(Optional.) Provides a short description of the pattern for display in the Veritas Information Classifier.
Type
Specifies the pattern type.
For a Text or Regular expression pattern, you must specify the value for which to look. The same guidelines that you must observe when you enter these values in a policy condition apply when you enter them as a pattern value.
Choose Similar document to find items that resemble a supplied template. For example, you can find completed forms by submitting the blank form as a template. Unlike Text and Regular expression patterns, you can set the required confidence levels for Similar document patterns when you incorporate them in a policy condition.
The document similarity feature can find instances where users have created variants of the template document by adding, removing, or reordering paragraphs, sentences, or words. It can also find instances where users have changed individual words. However, the more extensive these word changes are, the less likely the Veritas Information Classifier is to find a match.
You must choose the required similarity mode: Full or Section. In Full mode, the Veritas Information Classifier compares the template document in its entirety with other documents in their entirety. This mode is useful when looking for instances where users have altered the template document in places without greatly affecting its overall size. In Section mode, the Veritas Information Classifier looks for instances where the content of the template document appears as one section within a larger document.
To submit the template document, click Browse and then select the required document.
Choose Exact Data Match to find match of one or more specific values in an item. Exact Data Match (EDM) gives precise control over the data classification process by setting more granular level data match conditions and provides less false positives.
With EDM you can create patterns using database records.
- Test the pattern by clicking Browse and then choosing a document that ought to match it.
Select the Include text in images checkbox for extracting information from images and performing classification using Optical character recognition (OCR).
Note:
The Include text in images checkbox is displayed only when the Tesseract software is installed on the system where Veritas Information Classifier is running.
After a few moments, the Veritas Information Classifier indicates whether it has found a match. When this is the case, you can click Show details to see the matching text and confidence levels.
The test functionality on patterns page also shows risk level and risk score information in details pane as a part of classification.
- Click Save.
To create an Exact Data Match based pattern
- Follow the initial steps for creating a pattern as described earlier.
- In the Type box, click to select Exact Data Match.
- Specify the following configuration options:
First row contains column headers
Select Yes if the first row in the source document contains the names of each field. If selected, content of first row from the source document will not be considered for rule generation.
Select No if the first row in the source document do not contain the names of each field.
Column delimiter
Specifies the delimiter character that separates each column/field in the data file.
Note:
Delimiter can be any single special character. For example, a comma(,), pipe (|), a space, and so on.
If the source document contains only a single column/field, you can set any delimiter character that is not present in file.
Delimiter must be a single character value.
Perform hashing to secure data fields
Select Yes if the generated rule used for creating EDM pattern need to be hashed to protect the data. The data fields are hashed using hashing algorithm SHA256 when storing them in the generated classification rule.
Note:
Classification performance drops if hashing is used while creating Exact Data Match pattern.
Use case-sensitive matching
Select Yes if the match needs to be case sensitive.
Proximity for matches
Specifies the distance between two columns or fields in number of characters for a match to be considered valid. Valid values are greater than 0.
Note:
If source document contains only a single column/field, proximity value should be set to 1.
The generateRulePack API that generates classification rule uses "From the first condition option" proximity option. "Sliding Window" proximity option is not supported for Exact Data Match.
Example:
With proximity = 20, if the CSV source document content is as follows,
Goodbye, Hello
and test document content is,
… You say Goodbye and I say Hello …
Here, between the two words "Goodbye" and "Hello" the proximity is 19 characters. The matched words are within the set range of proximity value, that is 20 characters. Therefore, Veritas Information Classifier will show a match.
Minimum columns to match
Specifies the minimum number of columns that should match to trigger a result. Note that matching of the first column is compulsory regardless of the value specified in Minimum columns while creating EDM pattern.
Note:
Minimum columns field will be ignored if All columns checkbox is selected.
All columns
Select this checkbox if all columns/fields in source document need to match to trigger a result.
- Under the Source Document section, browse to select the EDM source file based on which you want to create the classification rule.
Note:
EDM source document must be of type CSV or TXT (plain text only)
Maximum document size is configurable. Recommended size is 5 MB.
CSV document with fields quoted is not supported
- Click Save.
The created EDM pattern shows the user configured exact data matching options. The source document name is retained for pattern, but its location or direct link is not provided. See the following image.
You can use the EDM pattern created to:
Enhance an existing policy
Create a new policy
For more information,
More Information