Linq to Sitecore

April 05, 2013

I wanted to write up an article that elaborates on a presentation I gave on some but not all of the new features that will be upcoming in Sitecore 7. Mostly I'm going to focus on the updates around Lucene search but I'll also touch on some of the other features as well. This will be a longer article so be warned.

I'd also like to preface this article by saying that one of the driving forces behind much of the change in Sitecore 7 is an increasing need to support large data sets. And don't dismiss this release thinking that there isn't a lot of value if you're system isn't that large. You don't need to be very big to benefit from a faster search utility or the new features it includes since it doesn't really take much more than a few thousand items to really find Lucene beneficial.

What's Changing

The new system is highly geared toward improved search and as a shining example of that is how the Item Buckets are now baked into the content editor. Buckets are a great way to manage large sets of content considering the suggested limit of around 100 child items per item in Sitecore. But let's also consider that aside from massive quantities of data, not all content needs to be stored structurally in a tree. There's plenty of use cases where you just need to store information and will deal with selecting and displaying it based on the value of the fields and not on how they're stored. to assist with this, you'll see that there are new editor tabs attached to what seems like all items in the tree, that allow you to search for content in the context of the item selected. They've also included new field types that allow users to search for items which is great because item lists can grow long fast inside those tiny [treelist, droplink etc.] windows.  

Hot Swappable Indexes

One of the major changes to Lucene is that you can now configure it to build the index in a separate folder so that it doesn't interfere with the existing index. If you've ever found yourself with no results or a filelock exception due to Lucene deleting the index before rebuilding, you'll be interested in this. It's not setup by default but you can configure it by changing the configuration setting for the Lucene Provider. Here's how it will look:

<indexes hint="list:AddIndex"> 
	<index id="content_index" type="Sitecore.ContentSearch.LuceneProvider.SwitchOnRebuildLuceneIndex, Sitecore.ContentSearch.LuceneProvider"> 

The way this works is that the two folders for the index will be named like so: index_folder, index_folder_sec. The "_sec" stands for secondary. The provider will use the most up to date folder and will rebuild into the other. It should be noted that after you make the change to use the SwitchOnRebuildLuceneIndex, you will need to perform a rebuild twice on the index as per the provided documentation.

Interchangeable Indexes

The next set of updates are regarding the indexing source itself. Lucene can be now replaced with an implementation of SOLR that can be setup and configured remotely and queried in it's vastness very quickly. I still have questions about if the Lucene indexer can be configured to run on a separate machine in a similar vein but I don't currently know. This of course will require some training time to understand how to set that up but if you're building something that needs that kind of scale I'd guess you'll just be glad it's possible.

The new API

There's also a new search API "Sitecore.ContentSearch" that's unofficially being called "Linq to Sitecore" and includes an "ORM" feature that allows you to cast your search results to your existing class-model structures (assuming you have one) such as Custom Item Generator or Glass Mapper. Actually the casting is similar to that of the Glass Mapper in how you decorate fields to identify which fields should be popuplated from corresponding fields in the search result. You may now have questions, as I do, about where Alex Shyba's amazing AdvancedDatabaseSearch/Sitecore.SearchContrib stands. I think this may end up replacing it but again I don't know. It seems like this new search has in many ways learned from what he did and taken it a bit further with it's syntax. I did find this bit in the documentation explaining that Sitecore does want to begin phasing out the older search methods:

"The Sitecore.Data.Indexing API was deprecated in Sitecore CMS 6.5 and in 7.0 it has been completely removed. The Sitecore.Search API works with Sitecore CMS 6 and 7, but is not recommended for new development in version 7.0. Developers should use the Sitecore.ContentSearch API when using Sitecore Search or Lucene search indexes in Sitecore 7.0."

Configuration and Code Samples

Okay so now that I've covered a bit about what's happening I'll get the "how" side of things. I wanted to see how much effort was involved in using the new search features to see if I would be completely relying on it for all my data querying needs or just supplementing the existing methods. I started by setting up a local instance and creating some templates, content items and a sublayout. The next piece is to change the search configuration file located at /App_Config/Includes/Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration to tell it to store my field values in the index by changing storageType="NO" to storageType="YES":

<fieldType fieldTypeName="single-line text" storageType="YES" indexType="TOKENIZED" vectorType="NO" boost="1f" type="System.String"   settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider" />

With that set you'll then need to run a rebuild of the index which is now available in many ways but I used the one available through the Control Panel in the Content Editor. Then I found an example of how to query from the "Developer's Guide To Item Buckets and Search" document provided to me by Sitecore. I've altered it a bit to show how to make the query more generic for use in returning results of any given type and to also see what it's limits were:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
using Sitecore.ContentSearch;
using Sitecore.ContentSearch.Security;
using Sitecore.Data;
using Sitecore.Data.Items;

namespace TestsiteSC7.Web.layouts {
	public partial class PeoplePage : System.Web.UI.Page {

		#region Utility Methods

		protected StringBuilder log = new StringBuilder();
		protected void Log(string title, string message) {
			log.AppendFormat("{0}:{1}<br/>", title, message);
		}

		#endregion Utility Methods

		#region Page Events

		protected void Page_Load(object sender, EventArgs e) {

			//show indexes
			//ShowIndexes();

			//demo hydration
			//GetPeople();

			//show all fields
			GetResult();
			
			//write log
			ltlOut.Text = log.ToString();
		}

		#endregion Page Events

		#region Search Methods

		protected void ShowIndexes() {
			Dictionary<string, ISearchIndex> indexes = ContentSearchManager.SearchConfiguration.Indexes;
			Log("Number of Indexes", indexes.Count.ToString());
			foreach (KeyValuePair<string, ISearchIndex> p in indexes)
				Log("Index", p.Key);
		}

		protected void GetPeople() {
			List<Person> people = PerformSearch<Person>();
			Log("Number of People", people.Count().ToString());
			foreach (Person i in people) {
				Log("name", i.Name);
				Log("address", i.Address);
				Log("occupation", i.Job);
				Log("_fullpath", i.Path);
				Log("parsedlanguage", i.ParsedLanguage);
				Log("_template", i.template);
				Log("_name", i.OtherName);
				Log("", "");
			}
		}

		protected void GetResult() {
			//filters down to a single result
			IEnumerable<MySearchResultItem> results = PerformSearch<MySearchResultItem>()
				.Where(a => a.Name.Equals("Jim"));
			Log("Number of results", results.Count().ToString());
			foreach (MySearchResultItem m in results) {
				Log(m.Name, string.Empty);
				foreach (KeyValuePair<string, string> f in m.fields)
					Log(f.Key, f.Value);
			}
		}

		private List<T> PerformSearch<T>() where T : AbstractResult, new() {
			//the fields are managed in:
			//			/App_Config/includes/Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration
			//this index is managed in:
			//			/App_Config/includes/Sitecore.ContentSearch.Lucene.Index.Master.config
			//this settings for indexing in:
			//			/App_Config/includes/Sitecore.ContentSearch.config
			var index = ContentSearchManager.GetIndex("sitecore_master_index");
			using (var context = index.CreateSearchContext(SearchSecurityOptions.EnableSecurityCheck)) {
				var queryable = context.GetQueryable<T>()
					.Where(a => a.template.Equals(AbstractResult.TemplateID));
				
				//work with items inside the context
				T instance = new T();
				//display count
				instance.HandleResults(queryable);

				//need to send to list before the using outside the context
				return queryable.ToList();
			}
			//exception thrown when working with IQueryable object outside of context
			//Accessing IndexSearchContext after Commit() called
		}

		#endregion Search Methods
	}

	public interface IResult {
		string template { get; set; }
		string Name { get; set; }
	}

	public abstract class AbstractResult : IResult {
		
		public string Name { get; set; }
		[IndexField("_template")]
		public string template { get; set; }

		public static readonly string TemplateID = "fc110b3df82c4b0eabc580a4f185ea1d";

		public abstract void HandleResults(IQueryable<AbstractResult> results); 
	}

	public class Person : AbstractResult {

		#region properties
		
		[IndexField("_name")]
		public string OtherName { get; set; }

		public string ParsedLanguage { get; set; }
		
		[IndexField("_fullpath")]
		public string Path { get; set; }
		
		public string Address { get; set; }
		
		[IndexField("occupation")]
		public string Job { get; set; }

		#endregion properties

		public override void HandleResults(IQueryable<AbstractResult> results) {
			IEnumerable<Person> people = (IEnumerable<Person>)results;
			HttpContext.Current.Response.Write("People count:" + people.Count().ToString());
		}
	}

	public class MySearchResultItem : AbstractResult {

		#region properties

		// Fields
		public readonly Dictionary<string, string> fields = new Dictionary<string, string>();
		// Will match the myid field in the index
		public Guid MyId { get; set; }
		public int MyNumber { get; set; }
		public float MyFloatingPointNumber { get; set; }
		public double MyOtherFloatingPointNumber { get; set; }
		public DateTime MyDate { get; set; }
		public ShortID MyShortID { get; set; }
		public ID SitecoreID { get; set; }
		
		// Will be set with key and value for each field in the index document
		public string this[string key] {
			get {
				return this.fields[key.ToLowerInvariant()];
			}
			set {
				this.fields[key.ToLowerInvariant()] = value;
			}
		}

		#endregion properties

		public override void HandleResults(IQueryable<AbstractResult> results) {
			IEnumerable<MySearchResultItem> people = (IEnumerable<MySearchResultItem>)results;
			HttpContext.Current.Response.Write("Results count:" + people.Count().ToString());
		}		
	}
}

What to Expect

I will state for the record that it doesn't take much effort to get results. In fact you may get more than you bargained for. By default you'll get all results in the index, which include everything and the kitchen sink. I ended up with standard values items as results even after filtering by template id. The key thing to note here is that Lucene is returning results and Sitecore is then using the field values to fill properties on the class you provide. It will fill your object with data whether or not it makes sense to. I guess this is no different than other ORMs but keep in mind that you will be best served by filtering out the results thoughtfully. What I will recommend is that you create several small indexes that are focused just on the template/content types you're trying to retrieve otherwise you may have other data comingling into your results. Making smaller indexes will also keep the amount of memory and CPU usage to a minimum when rebuilding indexes and will limit the scope of what is affected.

The Code Walkthrough

Now I'll delve a bit into the code itself. I've created an interface IResult, an abstract class AbstractResult and two classes: Person and MySearchResultItem (MySearchResultItem is actually pulled from the documentation). The use of the interface was to require that all results have certain fields. The abstract class was to give enough substance to a generic result class to allow the search method to call a method with the result set.

Sitecore is going to use the search result values to populate your classes after a search is run and you'll notice that on the Person class there are fields decorated to identify which values should be put into what property. You also won't have to add the decoration if the class property matches the name of a field in a result. I haven't yet tested how field names with spaces will be handled but I imagine that the spaces will just be removed or replaced.  Here's some different examples:

//setting Sitecore provided fields
[IndexField("_fullpath")]
public string Path { get; set; }

//fills itself by matching names
public string Address { get; set; }

//populating a template field into an unrelated property name
[IndexField("occupation")]
public string Job { get; set; }

It will help to become acquainted with the fields you have available to you. The MySearchResultItem has an example of property called fields that will get populated by Sitecore with all the field names/values. You can use this as a guide to see what's available to you. By default, it contains some Sitecore information like template id and item path among others:

public readonly Dictionary<string, string> fields = new Dictionary<string, string>();
// Will be set with key and value for each field in the index document
public string this[string key] {
	get {
		return this.fields[key.ToLowerInvariant()];
	}
	set {
		this.fields[key.ToLowerInvariant()] = value;
	}
}

When I got to actually operating the search, my goal was to push as much generic search functionality into a single method and allow the calls for each type to be extremely simple. Here's how the search is initiated in my examples.

List<Person> people = PerformSearch<Person>();
IEnumerable<MySearchResultItem> results = PerformSearch<MySearchResultItem>();

The PerformSearch is simple and reusable. It also takes a Generic type that is of type AbstractResult and has an empty constructor. Here's the signature for that method:

private List<T> PerformSearch<T>() where T : AbstractResult, new() {}

It starts by getting the index and setting up a using context. In this case the security is enabled so Sitecore will filter results with permissions in mind:

var index = SearchManager.GetIndex("sitecore_master_index");
using (var context = index.CreateSearchContext(SearchSecurityOptions.EnableSecurityCheck)) {

I'm using the built in master database index and I've filtered by the template id provided by the abstract class. Which does work well. The search and filter largely occur on a single line, which is also really quite impressive.

var queryable = context.GetQueryable<T>().Where(a => a.template.Equals(AbstractResult.TemplateID));

The GetQueryable<T> is the part where Sitecore will fill your class type with the result values and the .Where() is the Linq to Sitecore which does all the work in the Lucene/SOLR index. You will be able to use the alternate syntax as well:

var queryable = from T a in context.GetQueryable<T>
		where a.template.Equals(AbstractResult.TemplateID)
		select a;

The beauty here is that you're offloading a lot of the filtering directly to Lucene instead of having to sort/filter/trim a humungous list of items in the .NET then cache it. You should know that not all queryable methods are implemented but what is available certainly isn't a short list. The supported methods are: Where, Select, Take, Skip, Page, OrderBy, OrderByDescending, ThenBy, ThenByDescending, First, FirstOrDefault, ElementAt, Last, LastOrDefault, Single, SingleOrDefault, Any and Count. It's definitely got some teeth.

I also wanted to point out that you're really only able to manipulate the search result in it's current form while you're inside the using context. If you try to return an IEnumerable result and then try to loop through the results you'll get an exception like:

Accessing IndexSearchContext after Commit() called

This is likely because of how IEnumerable behaves in that the items are only queried when the GetEnumerator is called and in this case the objects that are used to crunch the data have been disposed of through the IDisposable using context. You do have two options to deal with this. You'll either want to pass in a handler method or have one available on your Type(T) class, as I did or you can convert the result set to a List<T>. This will force the data to be crunched while the context is still available. Here's the example of the handler being called on the AbstractResult class which is implemented separately in the subclasses.

T instance = new T();
instance.HandleResults(queryable);

I also return the result set as a List and do work on the results in the calling method.

return queryable.ToList();

My results are then iterated through and have some of the field data printed to a log for display:

Log("Number of People", people.Count().ToString());
foreach (Person i in people) {
	Log("name", i.Name);
	Log("address", i.Address);
	Log("occupation", i.Job);
	Log("_fullpath", i.Path);
	Log("parsedlanguage", i.ParsedLanguage);
	Log("_template", i.template);
	Log("_name", i.OtherName);
	Log("", "");
}

Hopefully this new search, with it's simplified API and multi-foldered index will be big enough improvements to gain the trust of the development community and get rapid adoption. I think you'll be able to rely on it whether you've got a small or large system, as it's quite fast and powerfull.

One More Thing

There is also a nice update to Sitecore regarding the caching properties on sublayouts. You will be able to configure the cache to clear when the index is updated. This may need some customization for multi-site instances since you (and I) may not want all html cache removed when you publish a single item.

Synopsis

Sitecore has now brought Lucene out of the shadows and integrated it into the system as a much more visible and reliable utility to build your solutions on. The new API is definitely a breath of fresh air and will hopefully, lower the barrier for entry for developers to configure and use it and allow us to build much more efficient applications.