Thursday, May 23, 2013

Setting up Solr with Sitecore 7

This is a brief walk through for people who have heard the buzz (can you say "scales to billions of documents"?) about Solr and Sitecore 7 and want to get it working on their desktop.  This is based on the instructions in Sitecore's Search Scaling Guide, with a few details about the initial Solr install filled in.
Steps:
  1. Install Sitecore 7 initial release.
  2. Download the Solr support package from SDN.and the Solr Support Package from the SDN Sitecore 7 download page
  3. Download SOLR 4.x from http://lucene.apache.org/solr/  I did this walkthrough with SOLR 4.2.0, but at the time of this writing the current version is 4.3.0.  The download link will take to a list of mirror sites, where you will be given the option of downloading SOLR in ZIP format. Extract to a location of your choice (I used "Program Files (x86)").


    UPDATE (June 11, 2014): This walk through does not work with SOLR 4.8.*+.  Sitecore's Schema generator, used in step 8 below, has assumptions about the structure of the Solr Schema file which are no longer true after Solr 4.8.*.  This issue was found by Sen Gupta and has been reported to Sitecore.  I recommend using Solr 4.7 until this issue is corrected. Alternatively, you can update the schema.xml file as described in step 8 below.

  4. In the SOLR-4.3.0 directory, find the directory  /example/solr/collection1 and rename it to "itembuckets".  (Note: The scaling guide gives instructions for building a Solr collection from scratch, but I was not able to get this to load. Since the pre-installed "collection1" worked, I decided to go with that.)
  5. Update for Solr 4.4.x and later:
    Rename collection1 by going to /example/solr/itembuckets/core.properties, and changing the contents to:
    name=itembuckets
    No change is required in the solr.xml file for recent versions of Solr. I discuss this change in the post: Solr Core Discovery
    For Solr 4.3.x and earlier:
    In the same example/solr directory, open "solr.xml" and replace "collection1" with "itembuckts".  You will make three changes, to end up with this:
    <cores adminPath="/admin/cores" defaultCoreName="itembuckets" host="${host:}" hostPort="${jetty.port:8983}" hostContext="${hostContext:solr}" zkClientTimeout="${zkClientTimeout:15000}">
        <core name="itembuckets" instanceDir="itembuckets" />
    </cores>
    
  6. Now let's see if we can fire this up!  Open a command prompt, go to the SOLR-4.3.0\example directory, and type "java -jar start.jar".   A bunch of text should fly by, ending with: "Started SocketConnector@0.0.0.0:8983.
  7. Open a browser, and go to localhost:8983/solr.  If all has gone well, you should see this, with "itembuckets" available under the "Core Selector" dropdown:
    SOLR Console
  8. Now that we have Solr running, let's wire it up to Sitecore.

    Update for Solr 4.8 and later: You will need to update the schema.xml file manually as described here before using the "Generate the Solr Schema.xml" option.  This is a temporary workaround, as the Sitecore control panel tool does not take into account schema changes that occurred with Solr 4.8.  Thanks to Stephen Pope for providing this information.

    First, we need to define the fields to index.  This is defined in the "schema.xml" file at Solr-4.3.0\example\solr\itembuckts\conf.  Lets rename this file to "schema_orig.xml".  We'll use Sitecore to create a new version:
    • Go to the Sitecore desktop control panel, select "Indexing", then "Generate the Solr Schema.xml file"
    • Under source, put the path the full path to "schema_orig.xml".  
    • Under destination, put the full path to the new "schema.xml".   It should look like this:
    • After running the tool, verify that schema.xml contains Sitecore fields like "_id" and "_datasource":
      <fields>
          <field name="_id" type="string" indexed="true" stored="true" required="true" />
          <field name="_content" type="text_general" indexed="true" stored="true" />
          <field name="_database" type="string" indexed="true" stored="true" />
          <field name="_path" type="string" indexed="true" stored="false" multiValued="true" />
          <field name="_uniqueid" type="string" indexed="true" stored="true" required="true" />
          <field name="_datasource" type="string" indexed="true" stored="true" required="true" />
      
      
    • Now go to the SOLR console, select "Core Admin" on the left, then "Reload" on the top, to load the new schema.
  9. Now it's time to change Sitecore's configuration to use the new index.  First, let's add the new Sitecore.ContentSearch.Solr.Indexes.config file from the Solr Support Package to the App_Config/Include directory.  Rename the extension of all seven files with "Lucene" in the name (e.g. to .example), since we don't want Sitecore using these.
  10. Now it's time to move over Sitecore's Solr DLLS.  This is trickier than it sounds, since Sitecore 7 uses Inversion of Control to wire this into the application, and the administrator is allowed to choose which IoC container to use.  This walkthrough uses Castle Windsor, but AutoFac, Ninject, StructureMap and Unity are supported, each with their own DLLS).  For Castle Windsor, copy the following files over to the project bin directory:
    • Castle.Facilities.SolrNetIntegration.dll 
      Microsoft.Practices.ServiceLocation.dll 
      Sitecore.ContentSearch.Linq.Solr.dll 
      Sitecore.ContentSearch.SolrProvider.CastleWindsorIntegration.dll
      Sitecore.ContentSearch.SolrProvider.dll
      SolrNet.dll
  11. It is also necessary to add Castle.Core and Castle.Windsor.  I used version 3.1.0 for each.  Getting these is tricky.  You can create a solution and use NuGet, or you can pull them directly from the Nuget site, using https://www.nuget.org/api/v2/package/castle.windsor/3.1.0 and https://www.nuget.org/api/v2/package/castle.core/3.1.0  Hitting these URLs on Chrome automatically downloads a .nupkg object, which you can rename to a zip archive.  Both archives contain a "lib\net40-client" path.  Copy Castle.Windsor.dll and Castle.Core.dll from lib\net40-client of each package to the website bin directory.
  12. Finally, wire in the Inversion of Control logic by editing the Global.asax "Application" directive to read:
    <%@Application Language='C#' Inherits="Sitecore.ContentSearch.SolrProvider.CastleWindsorIntegration.WindsorApplication" %>
  13. Go to the Sitecore desktop.  Try a bucket-style search (using the magnifying glass icon) on the Sitecore root node. It should return no results.
  14. Go to Control Panel\Indexing\Indexing Manager, and rebuild the Core, Master, and Web databases.
  15. Retry the search. It should work now!
Please let me know on Twitter (@DanSolovay) or the comments if this walkthrough works for you, or if you run into any issues.

A few things to note:
  • Because Sitecore talks to Solr via a URL, SOLR can be moved to a separate server, or the cloud, with only a one-line configuration change in Sitecore.
  • SOLR provides some impressive scalability features (distribution over multiple servers and sharding), which are discussed here: http://lucene.apache.org/solr/
  • The SOLR console provides a lot of the functionality that would require Luke with Lucene.

9 comments:

  1. These directions worked great for me.

    You can use binding redirects if you want to use Castle 3.2 (seems to work ok)

    ReplyDelete
  2. This is a fantastic guide to integrating Solr with Sitecore 7. Worked great for me and now I'm writing up a quick guide for my coworkers, highlighting the issues and speedbumps I encountered along the way. Thanks!

    ReplyDelete
  3. Hi
    I have opened "solr.xml" but xml in that file is different from the xml you have posted in this example. I have following xml




    ${host:}
    ${jetty.port:8983}
    ${hostContext:solr}
    ${zkClientTimeout:30000}
    ${genericCoreNodeNames:true}



    ${socketTimeout:0}
    ${connTimeout:0}


    ReplyDelete
    Replies
    1. I've updated the post. Thanks for raising this.

      Delete
  4. after running the tool I can't see Sitecore fields like "_id" and "_datasource":

    ReplyDelete
  5. My schema too looks different and then sitecore 7.2 is no giving me a strange .net error

    ReplyDelete
  6. Schema Looks Different with Sitecore 4.8?

    So here is what is causing issue!

    SOLR schema has changed in the versions higher than 4.6.1.

    I tried to look into the fields configurations into the SOLR schema generated using Sitecore 7.2 SOLR schema Wizard and found out that it is not generating the schema as expected.

    The reason it does not generate the correct schema is this change that is listed in the change log of SOLR 4.8+

    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    and tags have been deprecated from schema.xml. There is no longer any reason to keep them in the schema file,
    they may be safely removed. This allows intermixing of , and definitions if desired. Currently,
    these tags are supported so either style may be implemented. They may be deprecated formally in 5.0. See SOLR-5228 for more details

    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


    Apparently, Sitecore used the tag to inject Sitecore fields into the newly generated schema and fails here!

    Solutions: Stick to SOLR 4.6.1 as this version has the schema Sitecore Schema Builder Wizard expects or hand copy the fields in SOLR schema files if you need a feature of SOLR 4.8?

    ReplyDelete
  7. Followed this, and it works. Thanks :-)

    ReplyDelete