Understanding FastPhrase and soLazyPhrase
Rubicon 2.04 and 2.05 introduced the FastPhrase properties and the
soLazyPhrase SearchOption which, when enabled, can significantly improve the performance
of phrase searches. Since these improvements come at the cost of a much larger
Words table, it is important to understand when to use these new properties and what the
tradeoffs are. How Rubicon Performs a Phrase Search In order to understand how FastPhrase and soLazyPhrase works, one must first understand how Rubicon conducts a "normal" phrase search. Rubicon indexes the words that appear in the text (typically a record in a table), but does not index the position of the word in the text (thereby significantly reducing the size of the indexes). When searching for the phrase "full text search", Rubicon first identifies which records contain the words "full", "text", and "search", then it reads the records and determines whether the three words appear sequentially. This approach works very well as long as the words in the phrase are fairly unique, if phrase searches are relatively infrequent, or if the number of records being indexed is not huge. If the phrase contains words that do not meet the MinWordLen criteria, are in the OmitList, or are rejected by the OnAcceptWord event, these ignored words are still respected in the phrase search. FastPhrase When the Rubicon indexes are built with FastPhrase enabled, Rubicon creates "new" words by indexing adjacent words. So when indexing the text "Rubicon full text search", the individual words are indexed as well as the words "Rubiconfull", "fulltext", and "textsearch". When the phrase search "full text search" is performed, Rubicon looks for the individual words as described in the previous section, and it searches for the words "fulltext" and "textsearch". Rubicon then checks the matching records to make sure the phrase is present. When combining adjacent words, Rubicon combines all adjacent words even if the word does not meet the MinWordLen criteria, is in the OmitList, or is rejected by the OnAcceptWord event. FastPhrase significantly improves phrase search performance, but it also results in a much larger Words table because many more "words" are indexed. When indexed with FastPhrase enabled, the Rubicon index on the sample help.db database is five times larger. Therefore, with FastPhrase enabled, indexing should be performed only on a high end system with ample hard disk space and memory. When FastPhrase is enabled, searches containing leading wildcards are not supported unless ReverseField is enabled. soLazyPhrase When searches are performed with SearchOption soLazyPhrase enabled, Rubicon does not check the matching records to see whether the phrase is present in the records, and therefore it is possible for Rubicon to return false matches. The benefit is that phrase searches are performed just as fast as and and or searches. For example, a soLazyPhrase search for "full text search" would return a false match on a record containing "perform a text search on the full text". Note that even if soLazyPhrase is disabled, soLazyPhrase evaluation of is automatically performed when the phrase consists of two words, the two words do not contain the AnyChar wildcard character, and the length of the two words combined is less than the WordFieldSize. Under these conditions, there cannot be a false match. If most phrase searches meet this criteria, there is no need to enable soLazyPhrase. The soLazyPhrase SearchOption property is ignored if FastPhrase has not been enabled. Summary Use FastPhrase when
Do not use FastPhrase when
Enable soLazyPhrase when
It is highly recommended that you test with and without FastPhrase enabled and evaluate the tradeoffs described above. See Also Addendum to the Rubicon 2 Manual
|
Copyright 2003 © Tamarack Associates |
||
www.TamarackA.com | Last updated 03/28/00 | www.FullTextSearch.com |