Boosting Search

Adjusting the query to alter search results...

Introduction

A client recently was moving off of Google Search Appliance (GSA) on to Liferay and Elasticsearch. One key aspect of GSA that they relied on though, was KeyMatch.

What is KeyMatch? Well, in GSA an administrator can define a list of specific keywords and assign content to them. When a user performs a search that includes one of the specific keywords, the associated content is boosted to the top of the search results.

This way an admin can ensure that a specific piece of content can be promoted as a top result.

For example, you run a bakery. During holidays, you have specially decorated cakes and cupcakes. You might do a KeyMatch search for "cupcake" to your specialty cupcakes so when a user searches, they get the specialty cakes over your normal cupcakes.

Elasticsearch Tuning

So Elasticsearch, the heart of the Liferay search facilities, does not have KeyMatch support. In fact, often it may seem that there is little search result tuning capabilities at all. In fact, this is not the case.

There are tuning opportunities for Elasticsearch, but it does take some effort to get the outcomes you're hoping for.

Tag Boosting

So one way to get a result similar to KeyMatch would be to boost the match for tags.

In our bakery example above, all of our contents related to cupcakes will, of course, appear as search results for "cupcake" if only because the keyword is part of our content. Tagging content with "cupcake" would also get it to come up as a search result, but may not make it score high enough to make them stand out as results.

We could, however, use tag boosting so that a keyword match on a tag would push a match to the top of the search results.

So how do you implement a tag boost? Through a custom IndexPostProcessor implementation.

Here's one that I whipped up that will boost tag matches by 100.0:

@Component(
  immediate = true,
  property = {
    "indexer.class.name=com.liferay.journal.model.JournalArticle",
    "indexer.class.name=com.liferay.document.library.kernel.model.DLFileEntry"
  },
  service = IndexerPostProcessor.class
)
public class TagBoostIndexerPostProcessor extends BaseIndexerPostProcessor implements 
    IndexerPostProcessor {

  @Override
  public void postProcessFullQuery(BooleanQuery fullQuery, SearchContext searchContext) 
      throws Exception {
    List<BooleanClause<Query>> clauses = fullQuery.clauses();

    if ((clauses == null) || (clauses.isEmpty())) {
      return;
    }

    Query query;
    BooleanQueryImpl queryImpl;

    for (BooleanClause<Query> clause : clauses) {
      query = clause.getClause();

      updateBoost(query);
    }
  }

  protected void updateBoost(final Query query) {

    if (query instanceof BooleanClauseImpl) {
      BooleanClauseImpl<Query> booleanClause = (BooleanClauseImpl<Query>) query;

      updateBoost(booleanClause.getClause());
    } else if (query instanceof BooleanQueryImpl) {
      BooleanQueryImpl booleanQuery = (BooleanQueryImpl) query;

      for (BooleanClause<Query> clause : booleanQuery.clauses()) {
        updateBoost(clause.getClause());
      }
    } else if (query instanceof WildcardQueryImpl) {
      WildcardQueryImpl wildcardQuery = (WildcardQueryImpl) query;

      if (wildcardQuery.getQueryTerm().getField().startsWith(Field.ASSET_TAG_NAMES)) {
        query.setBoost(100.0f);
      }
    } else if (query instanceof MatchQuery) {
      MatchQuery matchQuery = (MatchQuery) query;

      if (matchQuery.getField().startsWith(Field.ASSET_TAG_NAMES)) {
        query.setBoost(100.0f);
      }
    }
  }
}

So this is an IndexPostProcessor implementation that is bound to all JournalArticles and DLFileEntries. When a search is performed, the postProcessFullQuery() method will be invoked with the full query to be processed and the search context. The above code will be used to identify all tag matches and will increase the boost for them.

This implementation uses recursion because the passed in query is actually a tree; processing via recursion is an easy way to visit each node in the tree looking for matches on tag names.

When a match is found, the boost on the query is set to 100.0.

Using this implementation, if a single article is tagged with "cupcake", a search for "cupcake" will cause those articles with the tag to jump to the top of the search results.

Other Modification Ideas

This is an example of how you can modify the search before it is handed off to Elasticsearch for processing.

It can be used to remove query items, change query items, add query items, etc.

It can also be used to adjust the query filters to exclude items from search results.

Conclusion

So the internals of the postProcessFullQuery() method and arguments are not really documented, at least not anywhere in detail that I could find for adjusting the query results.

Rather than reading through the code for how the query is built, when I was creating this override, I actually used a debugger to check the nodes of the tree to determine types, fields, etc.

I hope this will give you some ideas about how you too might adjust your search queries in ways to manipulate search results to get the ordering you're looking for.

Blogs

i have created two articles, and to one of the article have added the tag as "cupcake" and i expect this article to be at the top of the search result. But it does not happen. Also when i tried searching with a keyword which is returning zero results, there also the debugger hits the query.setBoost(100.0f); of the matchQuery. Is there anyway to boost ,if the value of  the Field.ASSET_TAG_NAMES's value will match the searched keyword.