Enterprise Vault™ Classification using the Microsoft File Classification Infrastructure
- About this guide
- Getting started
- Setting up the classification properties
- Configuring your classification rules
- Defining and applying classification policies
- Running classification in test mode
- Publishing classification properties and rules across your site
- Using classification with smart partitions
- Appendix A. Enterprise Vault properties for use in classification rules
- Appendix B. PowerShell cmdlets for use with classification
- Appendix C. Monitoring and troubleshooting
Limits on the size of classification files
By default, the File Classification Infrastructure can classify files that are up to 25 MB in size. When a text file exceeds this limit, Enterprise Vault automatically splits it into files that are approximately 25 MB in size, and classification then proceeds across the set of files. To determine where to split the files, Enterprise Vault operates as follows:
If any single line in a text file causes the file to exceed the limit, Enterprise Vault places the line in a new text file. For example, the cont property line holds the content of an item and is usually the lengthiest line in the text file. In cases where this line and its predecessors exceed the limit, Enterprise Vault splits the file immediately before the line and creates a new file for the cont property.
If the contents of a single line still exceed the limit, Enterprise Vault searches back from the limit until it finds a space character, and then splits the contents there. If Enterprise Vault cannot find a space character within 300 characters, it splits the file precisely at the limit.
You can change the 25-MB limit by setting a registry entry, MaxTextFilterBytes. The following article on the Microsoft website describes this registry entry:
https://msdn.microsoft.com/library/ms692103.aspx
You may want to increase the limit if you have a complex rule that fails to match items because different parts of it match different files in the set. For example, this issue can arise if you have a rule that searches for both of the words fraud and corruption, when the first word is in one text file and the second word is in another.