Enterprise Vault™ Classification using the Veritas Information Classifier

Last Published:
Product(s): Enterprise Vault (14.5)
  1. About this guide
    1. Introducing this guide
      1.  
        Relationship between the Veritas Information Classifier and other classification methods
    2.  
      What's in this guide
    3. Where to get more information about Enterprise Vault
      1.  
        Enterprise Vault training modules
  2. Preparing Enterprise Vault for classification
    1.  
      About the preparatory steps
    2.  
      What you need
    3.  
      Checking the cache location on the Enterprise Vault storage servers
    4.  
      Setting up the Data Access account
    5.  
      Enabling the Veritas Information Classifier on all Enterprise Vault servers
    6.  
      Configuring the Veritas Information Classifier for secure client connections
  3. Setting up Veritas Information Classifier policies
    1.  
      Introducing Veritas
    2.  
      Opening the Veritas Information Classifier
    3.  
      Finding your way around
    4.  
      Analyzing sample content for policy matches
    5. About policies
      1.  
        Creating policies
      2.  
        About policy conditions
      3.  
        Enabling or disabling policies
      4.  
        Exporting or importing policies
      5.  
        Resetting policies
      6.  
        Deleting policies
    6. About patterns
      1.  
        Creating or editing patterns
      2.  
        Exporting or importing patterns
      3.  
        Deleting patterns
    7. About tags
      1.  
        Creating or editing tags
      2.  
        Exporting or importing tags
      3.  
        About the Enterprise Vault index properties
      4.  
        How classification property values and retention categories interact
      5.  
        Points to note on setting retention categories
      6.  
        Deleting tags
    8. About sentiment analysis
      1.  
        About sentiment conditions
      2.  
        Enforcing sentiment analysis at a site level
  4. Defining and applying Enterprise Vault classification policies
    1.  
      About Enterprise Vault classification policies
    2. Defining classification policies
      1.  
        Configuring classification policies to assign retention categories with the shortest duration
    3.  
      About the PowerShell cmdlets for working with classification policies
    4.  
      Associating classification policies with retention plans
    5.  
      About the PowerShell cmdlets for working with retention plans
    6.  
      Applying retention plans to your Enterprise Vault archives
  5. Running classification in test mode
    1.  
      About classification test mode
    2.  
      Implementing classification test mode
    3.  
      About the PowerShell cmdlets for running classification in test mode
    4.  
      Understanding the classification test mode reports
  6. Using classification with smart partitions
    1.  
      About smart partitions
    2.  
      How Enterprise Vault determines whether to archive an item to a smart partition
    3.  
      Setting up smart partitions
    4.  
      Verifying that Enterprise Vault has archived items to smart partitions
  7. Appendix A. Enterprise Vault properties for use in custom field searches
    1.  
      About the Enterprise Vault properties
    2.  
      System properties
    3.  
      Attachment properties
    4.  
      Custom Enterprise Vault properties
    5.  
      Custom Enterprise Vault properties for File System Archiving items
    6.  
      Custom Enterprise Vault properties for SharePoint items
    7.  
      Custom Enterprise Vault properties for Compliance Accelerator-processed items
    8.  
      Custom properties for use by policy management software
    9.  
      Custom properties for Enterprise Vault SMTP Archiving
  8. Appendix B. PowerShell cmdlets for use with classification
    1.  
      About the classification cmdlets
    2.  
      Disable-EVClassification
    3.  
      Get-EVClassificationPolicy
    4.  
      Get-EVClassificationStatus
    5.  
      Get-EVClassificationTestMode
    6.  
      Get-EVClassificationVICTags
    7.  
      Initialize-EVClassificationVIC
    8.  
      Set-EVClassificationVICFIPSMode
    9.  
      New-EVClassificationPolicy
    10.  
      Remove-EVClassificationPolicy
    11.  
      Set-EVClassificationPolicy
    12.  
      Set-EVClassificationTestMode
  9. Appendix C. Classification cache folder
    1.  
      How Enterprise Vault caches the items that it submits for classification
    2.  
      Limits on the size of classification files
    3.  
      Configuring Enterprise Vault to keep the classification files in the cache folder
  10. Appendix D. Migrating from FCI classification to the Veritas Information Classifier
    1.  
      Converting FCI classification rules for use with the Veritas Information Classifier
  11. Appendix E. Monitoring and troubleshooting
    1.  
      Auditing
    2.  
      Checking the classification performance counters
    3.  
      Troubleshooting classification
    4.  
      Searching archives for items that the Veritas Information Classifier has classified
    5.  
      Troubleshooting language detection

Creating or editing patterns

You cannot edit the built-in patterns, but you can edit any custom patterns that you have created.

To create or edit a pattern

  1. At the left of the Veritas , click Patterns.
  2. Do one of the following:

    • To create a pattern, click New.

    • To edit an existing pattern, select it and then click Edit.

    The following diagram shows the New Pattern dialog with the pattern type as Regular expression.
  3. Set the fields as follows:

    Name

    Specifies the pattern name. The name must be unique, and it can contain up to 100 alphanumeric, space, and special characters.

    Description

    (Optional.) Provides a short description of the pattern for display in the Veritas .

    Type

    Specifies the pattern type.

    For a Text or Regular expression pattern, you must specify the value for which to look. The same guidelines that you must observe when you enter these values in a policy condition apply when you enter them as a pattern value.

    See About policy conditions.

    Choose Similar document to find items that resemble a supplied template. For example, you can find completed forms by submitting the blank form as a template. Unlike Text and Regular expression patterns, you can set the required confidence levels for Similar document patterns when you incorporate them in a policy condition.

    The document similarity feature can find instances where users have created variants of the template document by adding, removing, or reordering paragraphs, sentences, or words. It can also find instances where users have changed individual words. However, the more extensive these word changes are, the less likely the Veritas is to find a match.

    You must choose the required similarity mode: Full or Section. In Full mode, the Veritas compares the template document in its entirety with other documents in their entirety. This mode is useful when looking for instances where users have altered the template document in places without greatly affecting its overall size. In Section mode, the Veritas looks for instances where the content of the template document appears as one section within a larger document.

    To submit the template document, click Browse and then select the required document.

    Choose Exact Data Match to find match of one or more specific values in an item. Exact Data Match (EDM) gives precise control over the data classification process by setting more granular level data match conditions and provides less false positives.

    With EDM you can create patterns using database records.

    See “To create an Exact Data Match based pattern”.

  4. Test the pattern by clicking Browse and then choosing a document that ought to match it.

    Select the Include text in images checkbox for extracting information from images and performing classification using Optical character recognition (OCR).

    Note:

    The Include text in images checkbox is displayed only when the Tesseract software is installed on the system where Veritas is running.

    After a few moments, the Veritas indicates whether it has found a match. When this is the case, you can click Show details to see the matching text and confidence levels.

  5. Click Save.

To create an Exact Data Match based pattern

  1. Follow the initial steps for creating a pattern as described earlier.
  2. In the Type box, click to select Exact Data Match.
  3. Specify the following configuration options:

    First row contains column headers

    Select Yes if the first row in the source document contains the names of each field. If selected, content of first row from the source document will not be considered for rule generation.

    Select No if the first row in the source document do not contain the names of each field.

    Column delimiter

    This is an optional field. It specifies the delimiter character that separates each column/field in the data file.

    Note:

    • Delimiter can be any single special character. For example, a comma(,), pipe (|), a space, and so on.

    • If the source document contains only a single column/field, you can set any delimiter character that is not present in file.

    • Delimiter must be a single character value.

    Perform hashing to secure data fields

    Select Yes if the generated rule used for creating EDM pattern need to be hashed to protect the data. The data fields are hashed using hashing algorithm SHA256 when storing them in the generated classification rule.

    Note:

    Classification performance drops if hashing is used while creating Exact Data Match pattern.

    Use case-sensitive matching

    Select Yes if the match needs to be case sensitive.

    Proximity for matches

    Specifies the distance between two columns or fields in number of characters for a match to be considered valid. Valid values are greater than 0.

    Note:

    • If source document contains only a single column/field, proximity value should be set to 1.

    • The generateRulePack API that generates classification rule uses "From the first condition option" proximity option. "Sliding Window" proximity option is not supported for Exact Data Match.

    Example:

    With proximity = 20, if the CSV source document content is as follows,

    Goodbye, Hello

    and test document content is,

    … You say Goodbye and I say Hello …

    Here, between the two words "Goodbye" and "Hello" the proximity is 19 characters. The matched words are within the set range of proximity value, that is 20 characters. Therefore, Veritas will show a match.

    Minimum columns to match

    Specifies the minimum number of columns that should match to trigger a result. Note that matching of the first column is compulsory regardless of the value specified in Minimum columns while creating EDM pattern.

    Note:

    Minimum columns field will be ignored if All columns checkbox is selected.

    All columns

    Select this checkbox if all columns/fields in source document need to match to trigger a result.

  4. Under the Source Document section, browse to select the EDM source file based on which you want to create the classification rule.

    Note:

    • EDM source document must be of type CSV or TXT (plain text only)

    • Maximum document size is configurable. Recommended size is 5 MB.

    • CSV document with fields quoted is not supported

  5. Click Save.

    The created EDM pattern shows the user configured exact data matching options. The source document name is retained for pattern, but its location or direct link is not provided. See the following image.

    You can use the EDM pattern created to:

    • Enhance an existing policy

    • Create a new policy

For more information, See About policy conditions.

Known issue while editing EDM patterns

While editing EDM patterns, updating the pattern name or description may fail due to an internal system error. If you experience this issue, contact your system administrator or Veritas support.