Enhancing Lucene Search

October 23, 2012

So I'm attending the "Enhancing search for Lucene" because I've been using Lucene lightly for years but haven't had the time to fully vet it for more mainstream use. I love it for its speed and flexibility but I have lingering questions about the memory/cpu costs while rebuilding large indexes on production servers. Overall I just want to get a feel about how other people are using it and any pitfalls they've run into.    

The talk starts with a use case which as a developer always makes sense because there's a certain level of nuance that is hard to replicate with hypotheticals. The challenges they focused on were indexing new information quickly and being able to filters useful information from large document counts with little or no meta data.    

To help users filter their large pool of documents, they're using additive filters in the right column on information like date ranges and authors which can be used in combination to further refine the result list. They're also using lazy loading of more detailed result information to decrease the html footprint and inherently the page load time as well as caching those individual requests to improve repeated use.    

When the presenters dove into their breakdown they revealed that they were using their own custom search indexer which is a great example of the flexibility of Lucene. They also did a great job of breaking up their filter code to be more modular by creating an interface that all filters should implement. The sublayout data source is how they're able to differentiate pages that will be reusing the search functionality with different result sets. One of the most interesting pieces is the user interface within Sitecore that actually allows an editor to configure a new search page. They get a lot of points for accommodating that level of user control.    

Overall it was a nice solution. They broke down their solution in a very MVC way. I also did get my question answered: they never needed to run a full index rebuild and only relied on the incremental updates from the publishing events which for me was all I needed to hear. I have a plan for further integration with Lucene in a number of ways and this was able to put my assuage my fears about the continual maintenance in a products environment.