When performing a simple search for items/ browsing catalog in Web Channel, some wildcard characters, special characters are not recognized for returning the search results.
E.g. Strings such as 7" plate
On searching for 7" plate, "Fatal Error" message is displayed.
The Out of the Box Analyzer used for Catalog Search is the Lucene 'StandardAnalyzer'
The Standard analyzer removes stopwords and indexes words, numbers, and some special characters.
The Standard analyzer processes text characters in the following ways:
- Stopwords are not indexed.
- Converts alphabetical characters to lower case.
- Ignores colons, #, %, $, parentheses, and slashes.
- Indexes underscores, hyphens, @, and & symbols when they are part of words or numbers.
- Separately indexes number and words if numbers appear at the beginning of a word.
- Indexes numbers as part of the word if they are within or at the end of the word.
- Indexes apostrophes if they are in the middle of a word, but removes them if they are at the beginning or end of a word.
- Ignores an apostrophe followed by the letter s at the end of a word.
Resolving the problem
To escape special characters (xml special characters and lucene special characters) one would need to use WhiteSpaceAnalyzer instead of StandardAnalyzer for lucene because StandardAnalyzer is not capable of searching for special characters.
The changes to be made are as listed below: (Reference: Extending_the_Database.pdf, Section 3.2 Extending a Catalog Search)
1. Copy the <INSTALL_DIR>/repository/xapi/template/merged/resource/extn/ExtnCatalogSearchConfigProperties.xml.sample file as
2. Make the following entry in ExtnCatalogSearchConfigProperties.xml under the Locales section:
<IndexSets><IndexSetName="CatalogIndex"><Locales><Locale LocaleCode="en_US" AnalyzerClass=" org.apache.lucene.analysis.WhitespaceAnalyzer"/></Locales></IndexSet></IndexSets></SearchConfigurations>
3. Build the SearchIndex.
4. Check if the Search Functionality is working as desired for different Search Queries like 7, 7", any other search query etc.
Other Analyzers that could be considered are SimpleAnalyzer, SnowballAnalyzer, StandardAnalyzer, StopAnalyzer, WhitespaceAnalyzer etc.
[Ref: http://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/api/all/org/apache/lucene/analysis/Analyzer.html ]
Choosing the right analyzer is a crucial development decision with Lucene, and one single analyzer may not fit in for all solutions. Language is one factor, because each has its own unique features. Another factor to consider is the domain of the text being analyzed; different industries have different terminology, acronyms, and abbreviations that may deserve attention. When none of the built-in analysis options are adequate for all needs, businesses have to consider a custom analyzer solution.
Also, many industries consider an Custom Analyzer Chain that displays the behavior of two or more Analyzers. Lucene provides developers with the building blocks to achieve this.