Logo Tamarack Associates
Products Download Support Order Contact

Understanding FastPhrase and soLazyPhrase

Rubicon 2.04 and 2.05 introduced the FastPhrase properties and the soLazyPhrase SearchOption which, when enabled, can significantly improve the performance of phrase searches.   Since these improvements come at the cost of a much larger Words table, it is important to understand when to use these new properties and what the tradeoffs are.

How Rubicon Performs a Phrase Search

In order to understand how FastPhrase and soLazyPhrase works, one must first understand how Rubicon conducts a "normal" phrase search.  Rubicon indexes the words that appear in the text (typically a record in a table), but does not index the position of the word in the text (thereby significantly reducing the size of the indexes).  When searching for the phrase "full text search", Rubicon first identifies which records contain the words "full", "text", and "search", then it reads the records and determines whether the three words appear sequentially.  This approach works very well as long as the words in the phrase are fairly unique, if phrase searches are relatively infrequent, or if the number of records being indexed is not huge.

If the phrase contains words that do not meet the MinWordLen criteria, are in the OmitList, or are rejected by the OnAcceptWord event, these ignored words are still respected in the phrase search.

FastPhrase

When the Rubicon indexes are built with FastPhrase enabled, Rubicon creates "new" words by indexing adjacent words.  So when indexing the text "Rubicon full text search", the individual words are indexed as well as the words "Rubiconfull", "fulltext", and "textsearch".   When the phrase search "full text search" is performed, Rubicon looks for the individual words as described in the previous section, and it searches for the words "fulltext" and "textsearch".  Rubicon then checks the matching records to make sure the phrase is present.

When combining adjacent words, Rubicon combines all adjacent words even if the word does not meet the MinWordLen criteria, is in the OmitList, or is rejected by the OnAcceptWord event.

FastPhrase significantly improves phrase search performance, but it also results in a much larger Words table because many more "words" are indexed.   When indexed with FastPhrase enabled, the Rubicon index on the sample help.db database is five times larger.  Therefore, with FastPhrase enabled, indexing should be performed only on a high end system with ample hard disk space and memory.

When FastPhrase is enabled, searches containing leading wildcards are not supported unless ReverseField is enabled.

soLazyPhrase

When searches are performed with SearchOption soLazyPhrase enabled, Rubicon does not check the matching records to see whether the phrase is present in the records, and therefore it is possible for Rubicon to return false matches.   The benefit is that phrase searches are performed just as fast as and and or searches.

For example, a soLazyPhrase search for "full text search" would return a false match on a record containing "perform a text search on the full text".

Note that even if soLazyPhrase is disabled, soLazyPhrase evaluation of is automatically performed when the phrase consists of two words, the two words do not contain the AnyChar wildcard character, and the length of the two words combined is less than the WordFieldSize.  Under these conditions, there cannot be a false match.   If most phrase searches meet this criteria, there is no need to enable soLazyPhrase.

The soLazyPhrase SearchOption property is ignored if FastPhrase has not been enabled.

Summary

Use FastPhrase when

  • Frequently performing phrase searches that contain vague words
  • Fast phrase search performance is required
  • Indexing is performed on a high end system

Do not use FastPhrase when

  • Phrase searches are infrequent
  • Indexing performance is important
  • The database is not huge

Enable soLazyPhrase when

  • FastPhrase has been enabled
  • The fastest phrase performance is required
  • It is acceptable that some false matches are returned

It is highly recommended that you test with and without FastPhrase enabled and evaluate the tradeoffs described above.

See Also

Addendum to the Rubicon 2 Manual

 

Copyright 2003 © Tamarack Associates 

www.TamarackA.com Last updated 03/28/00 www.FullTextSearch.com