Veritas Data Insight Classification Guide
- About this guide
- Getting Started
- Managing content classification from Data Insight
- Configuring classification
- Initiating classification
- Appendix A. Classification jobs
- Appendix B. Troubleshooting classification
About Smart Classification
Smart Classification is enabled by default when classification is enabled.
Data Insight prioritizes on-demand classification requests ahead of Smart Classification through machine learning.
See Initiating classification.
Note:
Indexer requirements are different if you want to enable Smart Classification.
For Smart Classification, Data Insight automatically starts sending files with high-risk score for content scanning. Once it has enough files with positive classification outcomes from few of these scans, it uses the predictive analysis to generate a list of such files. It then sends the files with the positive predictive outcomes for content scanning with high priority.
Data Insight uses predictive analysis derived from file attributes with a high information risk score or from file attributes already content scanned by Data Insight.
Data Insight uses machine learning algorithms to create a model that captures patterns (if they exist) in the distribution of sensitive files. In order to help create a model for the machine learning algorithm, Data Insight picks files from shares with high information risk score and sends them for content scanning. It prioritizes shares with a high information risk score because if there are any classified files on such shares which are open and being accessed by a large number of active users, then such files pose a bigger threat to the organization.
Once the model is trained, it starts making predictions on files that have not been scanned by Veritas Information Classifier. It then automatically assigns a higher priority to those files from shares with high information risk score that are predicted to be sensitive and sends them for content scanning.
The following factors affects the accuracy of the prediction:
The primary analytics attribute that is configured. The more distinctive the attribute values are, the better is the prediction accuracy. For example, the primary analytic attribute, like Department or the designation may yield more accuracy rather than Email
If the content sources are not configured to receive audit events, the owner of the files may not reflect the true owner, which reduces the accuracy. For example, if most of the file have as have owners as Administrators group, then the predictive algorithm will not have enough data to analyze.
Note:
Classification requests submitted using Smart Classifier method cannot be viewed or canceled.
More Information