Enterprise Vault™ Classification using the Veritas Information Classifier
- About this guide
- Preparing Enterprise Vault for classification
- Setting up Veritas Information Classifier policies
- Defining and applying Enterprise Vault classification policies
- Defining classification policies
- Running classification in test mode
- Using classification with smart partitions
- Appendix A. Enterprise Vault properties for use in custom field searches
- Appendix B. PowerShell cmdlets for use with classification
- Appendix C. Classification cache folder
- Appendix D. Migrating from FCI classification to the Veritas Information Classifier
- Appendix E. Monitoring and troubleshooting
Troubleshooting language detection
By default, Veritas Information Classifier determines the language in a message if there are at least 80 characters. Using Veritas Information Classifier 2.4.0, an administrator can configure the minimum number of characters and a higher or lower confidence level for language detection. When multiple languages are present in small files, the administrator can specify a smaller size of each chunk that language detection is performed on.
Perform the following steps:
- Navigate to the
C:\Program Files (x86)\Enterprise Vault\Services\vic\Engine
directory and open the.vic-overrides-config.yml
file with a text editor.This file is used to override the configuration settings Veritas Information Classifier for customization.
- Ensure that the property languageDetectionEnabled under the classifier section is set to true.
- To override any values for language detection, set the values for the following properties under the classifier section.
Property
Description
minimumTextRequiredForLanguageDetection
Specify the minimum length of text for language detection.
Any text smaller than the set value is designated as language "unknown". The default value is 80 Unicode characters.
chunkSizeForLanguageDetection
Specify the size of each chunk that language detection is performed on. The default value is 300.
For example, if a document is of length 500 Unicode characters, then Veritas Information Classifier detects language on the first 300 characters and then on the last 200 characters, the language which has the most occurrences is designated as primary.
When the document has less than 300 Unicode characters where multiple languages are present, use this property to reduce the chunk size for language detection.
minimumConfidenceForLanguageDetection
Specify the confidence level to detect language. Higher confidence level gives greater accuracy but with a greater likelihood of language being determined as "unknown".
The value should be between 1 and 100. The default value is 90.
An example of the override entries:
classifier: minimumTextRequiredForLanguageDetection: 200 chunkSizeForLanguageDetection: 400 minimumConfidenceForLanguageDetection: 90
- Save the
.vic-overrides-config.yml
file. - Recycle the EnterpriseVaultVIC application pool.
The changes get reflected in the
.vic-merged-config.yml
file under theC:\Program Files (x86)\Enterprise Vault\Services\vic\Engine
directory.