Veritas Data Insight Administrator's Guide

Last Published:
Product(s): Data Insight (7.0)
Platform: Windows
  1. Section I. Getting started
    1. Introduction to Veritas Data Insight administration
      1. About Veritas Data Insight administration
        1.  
          Operation icons on the Management Console
        2.  
          Data Insight administration tasks
    2. Configuring Data Insight global settings
      1.  
        About Data Insight licensing
      2.  
        SQLite WAL mode
      3.  
        Configuring SMTP server settings
      4. About scanning and event monitoring
        1. Configuring scanning and event monitoring
          1.  
            Considerations for running a parallel scan
      5.  
        Monitoring Indexer Node Storage Utilization
      6. About filtering certain accounts, IP addresses, and paths
        1.  
          About exclude rules for access events
        2.  
          About exclude rules for Scanner
        3. Adding exclude rules to Data Insight
          1.  
            Add/Edit Exclude rule for access events options
          2.  
            Add/Edit Exclude rule for Scanner options
      7. About archiving data
        1.  
          About purging data
        2.  
          Configuring data retention settings
        3.  
          Parameterized Purging of access data using Data Retention
        4.  
          Purging indexes by date/whitespace for deleted files
      8. About Data Insight integration with Symantec Data Loss Prevention (DLP)
        1.  
          About configuring Data Insight to integrate with Data Loss Prevention (DLP)
        2.  
          Configuring Symantec Data Loss Prevention settings
        3.  
          Importing SSL certificate from the DLP Enforce Server to Data Insight Management Server
        4.  
          About Symantec Data Loss Prevention (DLP) integration with Data Insight
      9.  
        Importing sensitive files information through CSV
      10. Configuring advanced analytics
        1.  
          Choosing custom attributes for advanced analytics
      11. About open shares
        1.  
          Configuring an open share policy
      12. About user risk score
        1.  
          User risk weight configuration
      13.  
        Configuring file groups
      14.  
        Configuring Workspace data owner policy
      15.  
        Configuring Management Console settings
      16. About bulk assignment of custodians
        1.  
          Assigning custodians in bulk using a CSV file
        2.  
          Assigning custodians based on data ownership
      17.  
        Configuring Watchlist settings
      18. Configuring Metadata Framework
        1.  
          Using the metadata framework for classification and remediation
      19.  
        Proof of concept
  2. Section II. Configuring Data Insight
    1. Configuring Data Insight product users
      1.  
        About Data Insight users and roles
      2.  
        Reviewing current users and privileges
      3. Adding a user
        1.  
          Add or edit Data Insight user options
      4.  
        Editing users
      5.  
        Deleting users
      6.  
        Configuring authorization for Symantec Data Loss Prevention users
      7.  
        Configuring single sign-on (SSO) using security assertion markup language (SAML)
      8.  
        Disabling single sign-on (SSO)
    2. Configuring Data Insight product servers
      1.  
        About Data Insight product servers
      2.  
        Adding a new Data Insight server
      3.  
        Managing Data Insight product servers
      4.  
        Viewing Data Insight server details
      5. About node templates
        1.  
          Managing node templates
        2.  
          Adding or editing node templates
      6.  
        Adding Portal role to a Data Insight server
      7.  
        Adding Classification Server role to a Data Insight server
      8.  
        Assigning Classification Server to a Collector
      9.  
        Associating a Classification Server pool to a Collector
      10.  
        Viewing in-progress scans
      11.  
        Configuring Data Insight services
      12.  
        Configuring advanced settings
      13.  
        Monitoring Data Insight jobs
      14.  
        Rotating the encryption keys
      15.  
        Viewing Data Insight server statistics
      16. About automated alerts for patches and upgrades
        1.  
          Viewing and installing recommended upgrades and patches
      17.  
        Deploying upgrades and patches remotely
      18.  
        Using the Upload Manager utility
      19.  
        About migrating storage devices across Indexers
      20.  
        Viewing the status of a remote installation
    3. Configuring saved credentials
      1. About saved credentials
        1.  
          Managing saved credentials
      2.  
        Handling changes in account password
      3.  
        Data Insight Hash Utility
    4. Configuring directory service domains
      1.  
        About directory domain scans
      2. Adding a directory service domain to Data Insight
        1.  
          Add/Edit Active Directory options
        2.  
          Add/Edit LDAP domain options
        3.  
          Add/Edit NIS domain options
        4.  
          Add/Edit NIS+ domain options
        5. Add/Edit Azure active directory service
          1.  
            Prerequisites for configuring Azure AD
          2.  
            Registering Data Insight with Microsoft to scan Azure AD
          3.  
            Configuring application without user impersonation for Microsoft 365
          4.  
            Creating an application in the Microsoft Azure portal
      3.  
        Managing directory service domains
      4.  
        Fetching users and groups data from NIS+ scanner
      5.  
        Configuring attributes for advanced analytics
      6.  
        Deleting directory service domains
      7.  
        Scheduling scans
      8.  
        Configuring business unit mappings
      9.  
        Importing additional attributes for users and user groups
    5. Configuring containers
      1.  
        About containers
      2. Adding containers
        1.  
          Add new container/Edit container options
      3.  
        Managing containers
  3. Section III. Configuring native file systems in Data Insight
    1. Configuring NetApp 7-mode file server monitoring
      1.  
        About configuring NetApp file server monitoring
      2.  
        Prerequisites for configuring NetApp file servers
      3.  
        Credentials required for configuring NetApp filers
      4.  
        Credentials required for configuring NetApp NFS filers
      5.  
        Configuring SMB signing
      6.  
        About FPolicy
      7.  
        Preparing Data Insight for FPolicy
      8.  
        Preparing the NetApp filer for Fpolicy
      9.  
        Preparing the NetApp vfiler for Fpolicy
      10.  
        Configuring NetApp audit settings for performance improvement
      11.  
        Preparing a non-administrator domain user on the NetApp filer for Data Insight
      12.  
        Enabling export of NFS shares on a NetApp file server
      13.  
        Excluding volumes on a NetApp file server
      14.  
        Handling NetApp home directories in Data Insight
    2. Configuring clustered NetApp file server monitoring
      1.  
        About configuring a clustered NetApp file server
      2.  
        About configuring FPolicy in Cluster-Mode
      3.  
        Pre-requisites for configuring clustered NetApp file servers
      4.  
        Credentials required for configuring a clustered NetApp file server
      5.  
        Preparing a non-administrator local user on the clustered NetApp filer
      6.  
        Preparing a non-administrator domain user on a NetApp cluster for Data Insight
      7.  
        Preparing Data Insight for FPolicy in NetApp Cluster-Mode
      8.  
        Preparing the ONTAP cluster for FPolicy
      9. About configuring secure communication between Data Insight and cluster-mode NetApp devices
        1.  
          Generating SSL certificates for NetApp cluster-mode authentication
        2.  
          Preparing the NetApp cluster for SSL authentication
      10.  
        Enabling export of NFS shares on a NetApp Cluster-Mode file server
      11.  
        Enabling SSL support for Cluster Mode NetApp auditing
    3. Configuring EMC Celerra or VNX monitoring
      1. About configuring EMC Celerra or VNX filers
        1.  
          About EMC Common Event Enabler (CEE)
        2.  
          Preparing the EMC filer for CEPA
        3.  
          Preparing Data Insight to receive event notification
      2.  
        Credentials required for configuring EMC Celerra filers
    4. Configuring EMC Isilon monitoring
      1.  
        About configuring EMC Isilon filers
      2.  
        Prerequisites for configuration of Isilon or Unity VSA file server monitoring
      3.  
        Credentials required for configuring an EMC Isilon cluster
      4.  
        Configuring audit settings on EMC Isilon cluster using OneFS GUI console
      5.  
        Configuring audit settings on EMC Isilon cluster using the OneFS CLI
      6.  
        Configuring Isilon audit settings for performance improvement
      7.  
        Preparing Veritas Data Insight to receive event notifications from an EMC Isilon or Unity VSA cluster
      8.  
        Creating a non-administrator user for an EMC Isilon cluster
      9.  
        Utilizing access zone's SmartConnect Zone/Alias mappings
      10.  
        Purging the audit logs in an Isilon filer
    5. Configuring EMC Unity VSA file servers
      1.  
        About configuring Dell EMC Unity storage platform
      2.  
        Credentials required for configuring an EMC Unity VSA file server
      3.  
        Configuring audit settings on EMC Unity cluster using Unisphere VSA Unity console
    6. Configuring Hitachi NAS file server monitoring
      1.  
        About configuring Hitachi NAS
      2.  
        Credentials required for configuring a Hitachi NAS EVS
      3.  
        Creating a domain user on a Hitachi NAS file server for Data Insight
      4.  
        Preparing a Hitachi NAS file server for file system auditing
      5.  
        Advanced configuration parameters for Hitachi NAS
    7. Configuring Windows File Server monitoring
      1.  
        About configuring Windows file server monitoring
      2.  
        Credentials required for configuring Windows File Servers
      3.  
        Using the installcli.exe utility to configure multiple Windows file servers
      4.  
        Upgrading the Windows File Server agent
    8. Configuring Veritas File System (VxFS) file server monitoring
      1.  
        About configuring Veritas File System (VxFS) file servers
      2.  
        Credentials required for configuring Veritas File System (VxFS) servers
      3.  
        Enabling export of UNIX/Linux NFS shares on VxFS filers
    9. Configuring monitoring of a generic device
      1.  
        About configuring a generic device
      2.  
        Credentials required for scanning a generic device
    10. Managing file servers
      1.  
        About configuring filers
      2.  
        Viewing configured filers
      3. Adding filers
        1.  
          Add/Edit NetApp filer options
        2.  
          Add/Edit NetApp cluster file server options
        3.  
          Add/Edit EMC Celerra filer options
        4.  
          Add/Edit EMC Isilon file server options
        5.  
          Add/Edit EMC Unity VSA file server options
        6.  
          Add/Edit Windows File Server options
        7.  
          Add/Edit Veritas File System server options
        8.  
          Add/Edit a generic storage device options
        9.  
          Add/Edit Hitachi NAS file server options
      4.  
        Custom schedule options
      5.  
        Editing filer configuration
      6.  
        Deleting filers
      7.  
        Viewing performance statistics for file servers
      8.  
        About disabled shares
      9. Adding shares
        1.  
          Add New Share/Edit Share options
      10.  
        Managing shares
      11.  
        Editing share configuration
      12.  
        Deleting shares
      13.  
        About configuring a DFS target
      14.  
        Adding a configuration attribute for devices
      15.  
        Configuring a DFS target
      16.  
        About the DFS utility
      17.  
        Running the DFS utility
      18.  
        Importing DFS mapping
    11. Renaming storage devices
      1.  
        About renaming a storage device
      2.  
        Viewing the device rename status
      3.  
        Considerations for renaming a storage device
  4. Section IV. Configuring SharePoint data sources
    1. Configuring monitoring of SharePoint web applications
      1.  
        About SharePoint server monitoring
      2.  
        Credentials required for configuring SharePoint servers
      3.  
        Configuring a web application policy
      4. About the Data Insight web service for SharePoint
        1.  
          Installing the Data Insight web service for SharePoint
      5.  
        Viewing configured SharePoint data sources
      6. Adding web applications
        1.  
          Add/Edit web application options
      7.  
        Editing web applications
      8.  
        Deleting web applications
      9. Adding site collections
        1.  
          Add/Edit site collection options
      10.  
        Managing site collections
      11.  
        Removing a configured web application
    2. Configuring monitoring of SharePoint Online accounts
      1. About SharePoint Online account monitoring
        1.  
          Prerequisites for configuring SharePoint Online account
      2.  
        Configuring user with minimum privileges in Microsoft 365
      3.  
        Creating an application in the Microsoft Azure portal
      4.  
        Configuring application without user impersonation for Microsoft 365
      5.  
        Adding SharePoint Online accounts
      6.  
        Managing a SharePoint Online account
      7. Adding site collections to SharePoint Online accounts
        1.  
          Add/Edit site collection options
      8.  
        Managing site collections
  5. Section V. Configuring cloud data sources
    1. Configuring monitoring of Box accounts
      1.  
        About configuring Box monitoring
      2.  
        Using a co-admin account to monitor Box resources
      3. Configuring monitoring of cloud sources in Data Insight
        1.  
          Add/Edit Box account
      4.  
        Configuring Box cloud resources through proxy server
      5.  
        Data Insight limitations for Box permissions
    2. Configuring OneDrive account monitoring
      1.  
        About configuring OneDrive monitoring
      2.  
        Configuring user with minimum privileges in Microsoft 365
      3.  
        Creating an application in the Microsoft Azure portal
      4.  
        Configuring application without user impersonation for Microsoft 365
      5.  
        Add/Edit OneDrive account
      6. Adding OneDrive cloud accounts
        1.  
          Add/edit OneDrive user accounts
    3. Managing cloud sources
      1.  
        Viewing configured cloud sources
      2.  
        Managing cloud sources
  6. Section VI. Configuring Object Storage Sources
    1. Amazon S3
      1.  
        About Amazon Simple Storage Service (Amazon S3)
      2.  
        Configuring Amazon S3 account monitoring
      3.  
        Configuring Audit Events in AWS
      4.  
        Creating an Athena table
      5.  
        Adding Amazon S3 account
      6.  
        Limitations for Amazon S3 in Data Insight
      7. Managing Amazon S3 data source
        1.  
          Monitored Buckets
        2.  
          Classification
  7. Section VII. Health and monitoring
    1. Using Veritas Data Insight dashboards
      1.  
        Viewing the system health overview
      2.  
        Viewing the scanning overview
      3.  
        Viewing the scan status of storage devices
      4.  
        Viewing the scan history of storage devices
    2. Monitoring Data Insight
      1.  
        Viewing events
      2.  
        About high availability notifications
      3.  
        Monitoring the performance of Data Insight servers
      4.  
        Configuring email notifications
      5.  
        Enabling Windows event logging
      6.  
        Viewing scan errors
  8. Section VIII. Alerts and policies
    1. Configuring policies
      1.  
        About Data Insight policies
      2. Managing policies
        1.  
          Create Data Activity Trigger policy options
        2.  
          Create User Activity Deviation policy options
        3.  
          Create Real-time Permitted User-based Activity Policy options
        4.  
          Create Real-time Restricted User-based Activity Policy options
        5.  
          Create Real-time Sensitive Data Activity policy options
      3.  
        Managing alerts
  9. Section IX. Remediation
    1. Configuring remediation settings
      1. About configuring permission remediation
        1.  
          Managing and configuring permission remediation
        2.  
          Configuring exclusions for permission recommendation
      2.  
        About managing data
      3.  
        About deleting files
      4. About configuring archive options for Enterprise Vault
        1.  
          Adding new Enterprise Vault servers
        2.  
          Managing Enterprise Vault servers
        3.  
          Mapping file server host names
      5. Setting Microsoft Purview Information Protection (MIP) Label
        1.  
          Creating Minimum Privilege Account Role in Compliance Center
      6.  
        Using custom scripts to manage data
      7.  
        Viewing and managing the status of an operation
  10. Section X. Reference
    1. Appendix A.  Data Insight best practices
      1.  
        Understanding Data Insight best practices
    2. Appendix B.  Migrating Data Insight components
      1.  
        Migrating Data Insight components
    3. Appendix C. Backing up and restoring data
      1.  
        Selecting the backup and restore order
      2.  
        Backing up and restoring the Data Insight Management Server
      3.  
        Backing up and restoring the Indexer node
      4.  
        Understanding Data Insight best practices
    4. Appendix D. Data Insight health checks
      1. About Data Insight health checks
        1.  
          Services checks
        2.  
          Deployment details checks
        3.  
          Generic checks
        4.  
          Data Insight Management Server checks
        5.  
          Data Insight Indexer checks
        6.  
          Data Insight Collector checks
        7.  
          Data Insight Windows File Server checks
        8.  
          Data Insight SharePoint checks
        9.  
          Classification server health checks
        10.  
          Data Insight self service portal server health checks
    5. Appendix E. Command File Reference
      1.  
        fg.exe
      2.  
        indexcli.exe
      3.  
        reportcli.exe
      4.  
        scancli.exe
      5.  
        installcli.exe
    6. Appendix F. Data Insight jobs
      1.  
        Scheduled Data Insight jobs
    7. Appendix G. Troubleshooting
      1.  
        About general troubleshooting procedures
      2.  
        About the Health Audit report
      3.  
        Location of Data Insight logs
      4.  
        Downloading Data Insight logs
      5.  
        Migrating the data directory to a new location
      6. Troubleshooting FPolicy issues on NetApp devices
        1.  
          Viewing FPolicy-related errors and warnings
        2.  
          Resolving FPolicy connection issues
      7.  
        Troubleshooting EMC Celera or VNX configuration issues
      8.  
        Troubleshooting EMC Isilon configuration issues
      9.  
        Troubleshooting SharePoint configuration issues
      10.  
        Troubleshooting Hitachi NAS configuration issues
      11.  
        Troubleshooting installation of Tesseract software

Understanding Data Insight best practices

To optimize the productivity and efficiency of Data Insight, you are advised to follow the guidelines given below:

Sizing & Deployment Best Practices

Do not use System disk for Data directory. Use a separate disk instead.

Product Configuration Best Practices

Set up event notifications to ensure that errors and warnings are reported. Create a separate email distribution list including storage administrators, product administrators and other stakeholders.

Collector Best Practices (Audit and Scans)

Do not schedule scans at peak hours. That might impact user experience. It is advisable to schedule scans at off peak hours which will minimize user impact.

Exclusions

Audit Exclusions - Service Account exclusion

  • Exclude service accounts, application accounts from auditing.

  • If there is a third-party application that generates a lot of events residing on a volume, exclude that volume from auditing.

Scan Exclusions

  • Exclude scanning of specified folders or files like snapshot~ or any other temp files that will help in consuming less data and eventually improving overall performance.

Indexer
  • Use high performance disks like SSD for indexers.

  • General guidelines around calculating index memory:

Dashboard and Reports
  • Computation speed for all the reports, including Dashboard computation can be enhanced by increasing the number of threads. You can decide to increase number of threads based on available resources like CPU usage on indexer and Management Server.

  • To see CPU usage and overall performance of Data Insight servers, navigate to

    • Settings >> Health and Monitoring >> Performance

    • Settings >> Inventory >> Data Insight Servers >> Select Node >> Statistics >> Performance

Maintenance

Retention Policy

You can define retention policies to ensure the database and log files are maintained over time. The retention policy affects sizing guidelines and disk space requirements. This is more important for retention of product logs for future troubleshooting and will have implication around disk space requirements

Upgrade

For any windows or third-party upgrade

  • Before upgrade, ensure that

    • all the Data Insight Services are gracefully stopped.

    • any classification request is not running.

    • any report or Index-Writer Job is not running.

  • After upgrade, ensure that

    • nothing is broken in event logs.

    • all the services are up and running.

If possible, perform the activity during maintenance window when users are comparatively less active and check Events log to see if anything is broken.

Anti-virus Exclusion

If you are using anti-virus, ensure that the AV scanner has exclusions for the Data Insight install folder, the Data folder and the OS Temp folder on the Management Server and Indexers.

Security

  • Create containers to logically group related objects together for administration and reporting purposes.

  • Use latest available version of Data Insight to ensure that the most recent security and defect fixes are applied.

Classification

General

  • Use recommended system configurations for better throughput.

  • Use a classification server pool of multiple nodes to achieve higher throughput for large classification tasks.

  • Disable smart classification if not required. In Data Insight 6.3, the option will be disabled by defauly.

    • Smart classification requires significant resources on Indexer and Management Server nodes to automatically generate the list of files to classify.

  • Update default disk safeguard thresholds to higher values especially in case of PDF Files where uncompressed files can consume up to 40GB disk space (considering 16 threads and file sizes around 2.5 GB) hence the values given below will safeguard against disk usage reaching maximum limit.

    • Reset at 50 GB (or higher)

    • Stop at 45 GB (or higher)

  • As a part of classification, Data Insight does text extraction and uses the data directory for storing temporary files.

Maximum file size supported

  • Data Insight has a default maximum file size of 50MB. This limit can be changed in the Classification Configuration settings page.

  • Text extraction during classification is bounded by the uncompressed size of a file and this uncompressed size dictates whether files can be successfully classified. All Microsoft Office documents since Office 2007 use Office Open XML format (.docx, .pptx etc) which introduced compression.

    • Most Office docs therefore have a degree of compression ranging from 20%-70% depending on the mix of text and images, with pure text compressing to around 80%.

    • Files with a lot of images will compress less as images such as JPEG and PNG are already compressed.

    • PDFs are not compressed by default unless the 'Optimize PDF' option in Adobe Acrobat or similar PDF authoring applications has been used.

  • It has been observed that 16 concurrent files of 400MB uncompressed docx files can be classified without any memory exhaustion.

    • This means that 16 concurrent requests of docx files in a range of 100MB-250MB logical sized would probably work fine given the average compression ratio.

    • Note that the compression ratio is impossible to predict unless you analyse each file or have some indication of the type of content within the corpus.

    • These figures do not relate to volume/disk level compression, but the compression that Microsoft Office applies to the content. A .docx file is simply a ZIP container that can be opened in a tool such as 7-Zip to assess the uncompressed size.

The table below shows the file types and sizes tested with the recommended Classification Server specification:

  • Recommended maximum file sizes for classification without OCR enabled

    Document type

    Extensions

    Maximum compressed file size tested

    Maximum uncompressed file size tested

    Microsoft Word

    doc, docx, docm, dotm, dotx

    200 MB

    450 MB

    Microsoft PowerPoint

    ppt, pptx, pps, potm, potx, ppsm, ppsx

    200 MB

    450 MB

    Office Tabular

    xls, xlsx, xlt, xltx, xlsb, xlam

    50 MB

    100 MB

    Adobe PDF

    pdf

    1 MB

    Compressed PDFs are not yet tested. However, the maximum uncompressed size would mirror the compressed size of 1 GB.

  • Server specification used (the recommended Data Insight Classification Server specification)

    • 16 Cores, 32GB RAM

    • 16 classification threads running in parallel

  • Using Optical Character Recognition (OCR)

    • OCR usually results in higher memory consumption which eventually affects the classification performance.

Larger File support

  • It is possible that larger files than tested could be successfully classified, but it depends on the size of other files being classified at the same time. For example, if a 300MB DOCX is 1GB uncompressed, it could still be classified successfully if all other 15 files running in parallel are relatively small since the total memory used by the classification process would be within limits.

  • As there is no way to ensure that a mix of small and large files are classified at the same time, recommend that any DQL reports that are used to select files to classify are not ordered or segregated by file size. This ensures that the files submitted to VIC are done so as 'randomly' as possible.

    • For example, do not classify all 'small DOCX' files first and leave the largest ones until later. Classifying the very largest files together in one classification Job increases the risk that the total uncompressed size of 16 large files would lead to VIC memory exhaustion. Submitting a mix of file sizes together provides the best chance of large and large uncompressed files being successfully classified.

    • If using DQL to generate a report of files to classify, do not order the output of the report by size as that would lead to VIC processing the largest files together, whether they are sorted to appear at the start or end of the report.

Recommendations for creating classification Jobs

  • Use DQL reports which will filter out the files based on the above recommendations and then trigger classification requests accordingly.

  • Enable only required policies in VIC configuration.

    • As the number of enabled policies and policy complexity increases (such as using complex regular expressions or hundreds of keywords), the throughput tends to decrease.

  • Disable OCR if not required.

  • Configure the content fetch pause window to reduce the potential impact on the source devices.

    • The content fetch job copies files from the source devices to classify them.

    • By default, the job is paused from 7am to 7pm which matches normal working hours.

    • Recommend assessing the load on the devices during the content fetch as many customers have discovered the load does not disrupt any normal activities. If it can run 24-hours a day, that will help ensure that the classification process has a constant feed of files to classify and hence throughput can be increased.