Please enter search query.
Search <product_name> all support & community content...
Article: 100017458
Last Published: 2020-06-04
Ratings: 1 0
Product(s): Enterprise Vault
Problem
When using Enterprise Vault (EV), it is not possible to search for some messages or files.
Solution
If the same files are viewed through Archive Explorer or Search Applications, then the HTML presentation of the data appears as unrecognizable text. Restoring the item and viewing it in its original format works correctly.
In some instances, it is possible that Exchange messages or documents do not have a character set codepage set that defines the data contained in the messages or documents. This results in the data falling back to the system codepage for any interpretation or presentation of that data. On English systems, the standard default codepage is 1252. When this issue occurs, EV uses a standard Microsoft Internet Explorer method to determine which codepage to use to interpret the data for indexing operations and viewing of items in HTML format. This method is not always completely accurate and can result in an inaccurate codepage assessment. The assessment is based on Language 'Confidence Level' and 'Percent of document in that codepage'.
In some instances, it is possible that Exchange messages or documents do not have a character set codepage set that defines the data contained in the messages or documents. This results in the data falling back to the system codepage for any interpretation or presentation of that data. On English systems, the standard default codepage is 1252. When this issue occurs, EV uses a standard Microsoft Internet Explorer method to determine which codepage to use to interpret the data for indexing operations and viewing of items in HTML format. This method is not always completely accurate and can result in an inaccurate codepage assessment. The assessment is based on Language 'Confidence Level' and 'Percent of document in that codepage'.
For example, a Hebrew message could be detected and represented as follows:
Language | Percent of document in that codepage | Confidence Level |
---|---|---|
Hebrew | 62 | 87 |
Vietnamese | 37 | 107 |
Turkish | 37 | 102 |
By default, the highest confidence level is used and thus Vietnamese codepage is used to try and interpret a Hebrew character set.
To refine this determination and avoid these issues, there are now several registry values under a new registry key that can be used to adjust the method used in deciding with which language to index items and for HTML presentation. These are outlined below:
The new key required to be created under location specified below, with the following registry values:
The new key required to be created under location specified below, with the following registry values:
BaseLocation:
HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\KVS\Enterprise Vault\Storage\CodepageDetection
DWORD - HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\KVS\Enterprise Vault\Storage\CodepageDetection\FallbackCodepage
This is the codepage that should be used if the method is unable to make a good assessment of which codepage to use. If this regkey is not set, it defaults to 1252 - Western European, or the system default.
This is the codepage that should be used if the method is unable to make a good assessment of which codepage to use. If this regkey is not set, it defaults to 1252 - Western European, or the system default.
Note: If a valid codepage is not detected in the content provided for conversion, the default codepage will be utilize (1252).
This default may be customized with the FallbackCodepage registry value (IE. 65001 = UTF-8).
This default may be customized with the FallbackCodepage registry value (IE. 65001 = UTF-8).
See the following for a reference of code page identifiers : Code Page Identifiers
DWORD - HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432\KVS\Enterprise Vault\Storage\CodepageDetection\DecisionType
There are three ways to order the closest codepage to the text. Currently, the codepage to which the Detection Method gives the highest confidence value is chosen. Setting this value to 1 makes EV use the codepage in which the highest percent of the document is written. Setting it to 2 makes EV use both these values and choose the codepage with the highest calculation of (document percent) x (confidence) value. If this value is not set, it defaults to 0, i.e. to choose only the codepage with the highest confidence value.
REG_SZ - HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432\KVS\Enterprise Vault\Storage\CodepageDetection\PriorityCodepages
A comma separated list of up to 20 codepages which should be given priority. The earlier the codepage is specified in the list, the higher the priority. If one of the possible codepages detected is in this list, that codepage is used regardless of confidence level or document percent. If this value is not set, it defaults to having no priority codepages
DWORD HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432\KVS\Enterprise Vault\Storage\CodepageDetection\MinimumDocumentPercent
A codepage will not be used if it is below the document percentage. If this value is not set, it defaults to 10 percent.
DWORD HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432\KVS\Enterprise Vault\Storage\CodepageDetection\MinimumConfidenceLevel
A codepage will not be used if it is below this confidence level. If this value is not set, it defaults to 30.
DWORD HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432\KVS\Enterprise Vault\Storage\CodepageDetection\LogConversions
A record of each message where a decision has to be made will be put in the event log. Set this to 1 to log messages in the 'Enterprise Vault Converters' Event log.
After setting or modifying any of these registry values, it is necessary to restart the EV Storage Service.
With the 'LogConversions' value set, every document retrieved from a Saveset in EV will be logged. It is not recommended to have this setting enabled for long periods and should only be used when adjusting the above registry values.
Example of the event that would be returned with the '
DecisionType = 1' registry value:
Event Type: Information
Event Source: Enterprise Vault Converters
Event Category: Storage Online
Event ID: 7301
User: N/A
Computer: EV
Description:
Convert to unicode information.
Decision Type: percent
Min Confidence Level: 30
Min Document Percent: 10
Fallback Codepage: 1252
Codepage Detect 0: Codepage-1255, conf-87, percent-62
Codepage Detect 1: Codepage-1258, conf-107, percent-37
Codepage Detect 2: Codepage-1254, conf-102, percent-37
Using Codepage With Highest Percent: 1255
Document Data: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7226.0">
<TITLE>My Message</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/rtf format -->
<P DIR=RTL ALIGN=CENTER><SPAN LANG="en-us"></SPAN><SPAN LANG
V-437-7301
Warning:
Incorrect use of the Windows registry editor may prevent the operating system from functioning properly. Great care should be taken when making changes to a Windows registry. Registry modifications should only be carried-out by persons experienced in the use of the registry editor application. It is also recommended that a complete backup of the registry and workstation / server be made prior to making any registry changes.
Note: This functionality was introduced in EV 6.0 SP2 and are applicable with versions EV 6.0 SP2 and greater. Configuring these registry values will ensure that newly archived items that conform to these scenarios will be correctly viewed. For historically archived items, it is necessary to re-index the archives to which those items belong before they can be searched appropriately.