Thursday, May 23, 2013

Setting up Solr with Sitecore 7

This is a brief walk through for people who have heard the buzz (can you say "scales to billions of documents"?) about Solr and Sitecore 7 and want to get it working on their desktop.  This is based on the instructions in Sitecore's Search Scaling Guide, with a few details about the initial Solr install filled in.
Steps:
  1. Install Sitecore 7 initial release.
  2. Download the Solr support package from SDN.and the Solr Support Package from the SDN Sitecore 7 download page
  3. Download SOLR 4.x from http://lucene.apache.org/solr/  I did this walkthrough with SOLR 4.2.0, but at the time of this writing the current version is 4.3.0.  The download link will take to a list of mirror sites, where you will be given the option of downloading SOLR in ZIP format. Extract to a location of your choice (I used "Program Files (x86)").


    UPDATE (June 11, 2014): This walk through does not work with SOLR 4.8.*+.  Sitecore's Schema generator, used in step 8 below, has assumptions about the structure of the Solr Schema file which are no longer true after Solr 4.8.*.  This issue was found by Sen Gupta and has been reported to Sitecore.  I recommend using Solr 4.7 until this issue is corrected. Alternatively, you can update the schema.xml file as described in step 8 below.

  4. In the SOLR-4.3.0 directory, find the directory  /example/solr/collection1 and rename it to "itembuckets".  (Note: The scaling guide gives instructions for building a Solr collection from scratch, but I was not able to get this to load. Since the pre-installed "collection1" worked, I decided to go with that.)
  5. Update for Solr 4.4.x and later:
    Rename collection1 by going to /example/solr/itembuckets/core.properties, and changing the contents to:
    name=itembuckets
    No change is required in the solr.xml file for recent versions of Solr. I discuss this change in the post: Solr Core Discovery
    For Solr 4.3.x and earlier:
    In the same example/solr directory, open "solr.xml" and replace "collection1" with "itembuckts".  You will make three changes, to end up with this:
    <cores adminPath="/admin/cores" defaultCoreName="itembuckets" host="${host:}" hostPort="${jetty.port:8983}" hostContext="${hostContext:solr}" zkClientTimeout="${zkClientTimeout:15000}">
        <core name="itembuckets" instanceDir="itembuckets" />
    </cores>
    
  6. Now let's see if we can fire this up!  Open a command prompt, go to the SOLR-4.3.0\example directory, and type "java -jar start.jar".   A bunch of text should fly by, ending with: "Started SocketConnector@0.0.0.0:8983.
  7. Open a browser, and go to localhost:8983/solr.  If all has gone well, you should see this, with "itembuckets" available under the "Core Selector" dropdown:
    SOLR Console
  8. Now that we have Solr running, let's wire it up to Sitecore.

    Update for Solr 4.8 and later: You will need to update the schema.xml file manually as described here before using the "Generate the Solr Schema.xml" option.  This is a temporary workaround, as the Sitecore control panel tool does not take into account schema changes that occurred with Solr 4.8.  Thanks to Stephen Pope for providing this information.

    First, we need to define the fields to index.  This is defined in the "schema.xml" file at Solr-4.3.0\example\solr\itembuckts\conf.  Lets rename this file to "schema_orig.xml".  We'll use Sitecore to create a new version:
    • Go to the Sitecore desktop control panel, select "Indexing", then "Generate the Solr Schema.xml file"
    • Under source, put the path the full path to "schema_orig.xml".  
    • Under destination, put the full path to the new "schema.xml".   It should look like this:
    • After running the tool, verify that schema.xml contains Sitecore fields like "_id" and "_datasource":
      <fields>
          <field name="_id" type="string" indexed="true" stored="true" required="true" />
          <field name="_content" type="text_general" indexed="true" stored="true" />
          <field name="_database" type="string" indexed="true" stored="true" />
          <field name="_path" type="string" indexed="true" stored="false" multiValued="true" />
          <field name="_uniqueid" type="string" indexed="true" stored="true" required="true" />
          <field name="_datasource" type="string" indexed="true" stored="true" required="true" />
      
      
    • Now go to the SOLR console, select "Core Admin" on the left, then "Reload" on the top, to load the new schema.
  9. Now it's time to change Sitecore's configuration to use the new index.  First, let's add the new Sitecore.ContentSearch.Solr.Indexes.config file from the Solr Support Package to the App_Config/Include directory.  Rename the extension of all seven files with "Lucene" in the name (e.g. to .example), since we don't want Sitecore using these.
  10. Now it's time to move over Sitecore's Solr DLLS.  This is trickier than it sounds, since Sitecore 7 uses Inversion of Control to wire this into the application, and the administrator is allowed to choose which IoC container to use.  This walkthrough uses Castle Windsor, but AutoFac, Ninject, StructureMap and Unity are supported, each with their own DLLS).  For Castle Windsor, copy the following files over to the project bin directory:
    • Castle.Facilities.SolrNetIntegration.dll 
      Microsoft.Practices.ServiceLocation.dll 
      Sitecore.ContentSearch.Linq.Solr.dll 
      Sitecore.ContentSearch.SolrProvider.CastleWindsorIntegration.dll
      Sitecore.ContentSearch.SolrProvider.dll
      SolrNet.dll
  11. It is also necessary to add Castle.Core and Castle.Windsor.  I used version 3.1.0 for each.  Getting these is tricky.  You can create a solution and use NuGet, or you can pull them directly from the Nuget site, using https://www.nuget.org/api/v2/package/castle.windsor/3.1.0 and https://www.nuget.org/api/v2/package/castle.core/3.1.0  Hitting these URLs on Chrome automatically downloads a .nupkg object, which you can rename to a zip archive.  Both archives contain a "lib\net40-client" path.  Copy Castle.Windsor.dll and Castle.Core.dll from lib\net40-client of each package to the website bin directory.
  12. Finally, wire in the Inversion of Control logic by editing the Global.asax "Application" directive to read:
    <%@Application Language='C#' Inherits="Sitecore.ContentSearch.SolrProvider.CastleWindsorIntegration.WindsorApplication" %>
  13. Go to the Sitecore desktop.  Try a bucket-style search (using the magnifying glass icon) on the Sitecore root node. It should return no results.
  14. Go to Control Panel\Indexing\Indexing Manager, and rebuild the Core, Master, and Web databases.
  15. Retry the search. It should work now!
Please let me know on Twitter (@DanSolovay) or the comments if this walkthrough works for you, or if you run into any issues.

A few things to note:
  • Because Sitecore talks to Solr via a URL, SOLR can be moved to a separate server, or the cloud, with only a one-line configuration change in Sitecore.
  • SOLR provides some impressive scalability features (distribution over multiple servers and sharding), which are discussed here: http://lucene.apache.org/solr/
  • The SOLR console provides a lot of the functionality that would require Luke with Lucene.

28 comments:

  1. These directions worked great for me.

    You can use binding redirects if you want to use Castle 3.2 (seems to work ok)

    ReplyDelete
  2. This is a fantastic guide to integrating Solr with Sitecore 7. Worked great for me and now I'm writing up a quick guide for my coworkers, highlighting the issues and speedbumps I encountered along the way. Thanks!

    ReplyDelete
  3. Hi
    I have opened "solr.xml" but xml in that file is different from the xml you have posted in this example. I have following xml




    ${host:}
    ${jetty.port:8983}
    ${hostContext:solr}
    ${zkClientTimeout:30000}
    ${genericCoreNodeNames:true}



    ${socketTimeout:0}
    ${connTimeout:0}


    ReplyDelete
    Replies
    1. I've updated the post. Thanks for raising this.

      Delete
  4. after running the tool I can't see Sitecore fields like "_id" and "_datasource":

    ReplyDelete
  5. My schema too looks different and then sitecore 7.2 is no giving me a strange .net error

    ReplyDelete
  6. Schema Looks Different with Sitecore 4.8?

    So here is what is causing issue!

    SOLR schema has changed in the versions higher than 4.6.1.

    I tried to look into the fields configurations into the SOLR schema generated using Sitecore 7.2 SOLR schema Wizard and found out that it is not generating the schema as expected.

    The reason it does not generate the correct schema is this change that is listed in the change log of SOLR 4.8+

    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    and tags have been deprecated from schema.xml. There is no longer any reason to keep them in the schema file,
    they may be safely removed. This allows intermixing of , and definitions if desired. Currently,
    these tags are supported so either style may be implemented. They may be deprecated formally in 5.0. See SOLR-5228 for more details

    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


    Apparently, Sitecore used the tag to inject Sitecore fields into the newly generated schema and fails here!

    Solutions: Stick to SOLR 4.6.1 as this version has the schema Sitecore Schema Builder Wizard expects or hand copy the fields in SOLR schema files if you need a feature of SOLR 4.8?

    ReplyDelete
  7. Followed this, and it works. Thanks :-)

    ReplyDelete
  8. Dan, thank you so much. This helps a lot. I recently installed Solr 4.9 and I had to uncomment the following from schema.xml. Apparently these types are going to be deprecated as of Solr 5.0






    ReplyDelete
  9. I'm working with Sitecore 7.2 and using Solr 4.7.2, I've followed all the steps but I've got this error:
    Connection error to search provider [Solr] : Unable to connect to [http://localhost:8983/solr]
    While Solr is up and running and I can navigate to http://localhost:8983/solr and use Solr UI !
    FYI: I use Castle.windsor 3.3 and I've made the change to my global.asax file:
    Inherits="Sitecore.ContentSearch.SolrProvider.CastleWindsorIntegration.WindsorApplication"
    Any advice?

    ReplyDelete
    Replies
    1. I am trying to setup Solr 4.10.1 with Sitecore 7.2 but I am getting the same error (Connection error to search provider [Solr] : Unable to connect to [http://localhost:8983/solr/]). I am using Unity for my IoC. Were you able to figure out the issue on your end? I would appreciate any help. Thank you

      Delete
  10. missing a slash? http://localhost:8983/solr/

    ReplyDelete
  11. @Morteza, File permissions issue?

    ReplyDelete
  12. I am trying to setup Solr 4.10.1 with Sitecore 7.2 but I am getting the following server error (Connection error to search provider [Solr] : Unable to connect to [http://localhost:8983/solr/]). I am using Unity for my IoC. I would appreciate any help. Thank you

    ReplyDelete
    Replies
    1. Please check wheather you are initilizing unity in global.asax?

      Regards,
      Pavan Toshniwal.

      Delete
    2. This comment has been removed by the author.

      Delete
  13. Hi Dan,

    I'm doing a Solr setup with Sitecore 7.2 and Solr 4.9.1. There's another change needed in the schema.xml. The "pint" field type needs to be enabled, it is commented out in the example schema.

    <fieldType name="pint" class="solr.IntField"/>

    This field type will be deprecated in Solr 5, but it is still available in 4.9+.

    ReplyDelete
  14. Exception: SolrNet.Exceptions.SolrConnectionException
    Message:

    40031ERROR: [doc=sitecore://master/{174eb539-3a1f-4098-a50d-573663198d2e}?lang=en&ver=1] Error adding field 'version_im'='<link text="2014.1" linktype="media" target="_blank" id="{CE5808DF-AFFE-45EC-AA24-20D6795A42FB}" />' msg=For input string: "<link text="2014.1" linktype="media" target="_blank" id="{CE5808DF-AFFE-45EC-AA24-20D6795A42FB}" />"400



    please help me to resolve this issue

    ReplyDelete
    Replies
    1. this is related to this issue https://github.com/SitecorePowerShell/Console/issues/370
      one template has field named 'version' which is expected to be integer

      Delete
  15. So I got this all working fine and I have Solr working with Sitecore 7. Now (for other reasons in my app) I need to be able to add custom code for the Application_BeginRequest event in my Global.ASAX. However my Global.ASAX now inherits from Sitecore.ContentSearch.SolrProvider.CastleWindsorIntegration.WindsorApplication. So how can I add custom code to my Application_BeginRequest event?

    ReplyDelete
  16. Hi All,

    Do you know if this configuration works with sitecore 8 and Solr 4.2.0?

    ReplyDelete
  17. I am configuring Solr 5.0 search implementation in sitecore8 update3 version instance based on steps in these links http://sitecore-community.github.io/docs/search/solr/installing-solr-using-the-bitnami-apache-solr-stack/
    http://sitecore-community.github.io/docs/search/solr/Configuring-Solr-for-use-with-Sitecore-8/. While rebuilding sitecore_master_index from control panel's index manager, I am getting system.reflection.targetinvocation exception. Please find attachment of error screen shot for error details. Could any one please suggest the resolution for this error.

    ReplyDelete
  18. While rebuilding master (or) web (or) core index, I am getting below exception. Below error is for core index. Could any one help please.

    Job started: Index_Update_IndexName=sitecore_core_index|#Exception: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> SolrNet.Exceptions.SolrConnectionException:

    4001ERROR: [doc=sitecore://core/{d5024336-5016-4192-b6ae-6905b427b14a}?lang=da&ver=1&ndx=sitecore_core_index] unknown field '__display_name_t_da'400

    ---> System.Net.WebException: The remote server returned an error: (400) Bad Request.
    at System.Net.HttpWebRequest.GetResponse()
    at HttpWebAdapters.Adapters.HttpWebRequestAdapter.GetResponse()
    at SolrNet.Impl.SolrConnection.GetResponse(IHttpWebRequest request)
    at SolrNet.Impl.SolrConnection.PostStream(String relativeUrl, String contentType, Stream content, IEnumerable`1 parameters)
    --- End of inner exception stack trace ---
    at SolrNet.Impl.SolrConnection.PostStream(String relativeUrl, String contentType, Stream content, IEnumerable`1 parameters)
    at SolrNet.Impl.SolrConnection.Post(String relativeUrl, String s)
    at SolrNet.Impl.SolrBasicServer`1.SendAndParseHeader(ISolrCommand cmd)
    at Sitecore.ContentSearch.SolrProvider.SolrBatchUpdateContext.AddRange(IEnumerable`1 group, Int32 groupSize)
    at Sitecore.ContentSearch.SolrProvider.SolrBatchUpdateContext.Commit()
    at Sitecore.ContentSearch.SolrProvider.SolrSearchIndex.PerformRebuild(Boolean resetIndex, Boolean optimizeOnComplete, IndexingOptions indexingOptions, CancellationToken cancellationToken)
    at Sitecore.ContentSearch.SolrProvider.SolrSearchIndex.Rebuild(Boolean resetIndex, Boolean optimizeOnComplete)
    --- End of inner exception stack trace ---
    at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor)
    at System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(Object obj, Object[] parameters, Object[] arguments)
    at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
    at System.Reflection.MethodBase.Invoke(Object obj, Object[] parameters)
    at (Object , Object[] )
    at Sitecore.Pipelines.CorePipeline.Run(PipelineArgs args)
    at Sitecore.Jobs.Job.ThreadEntry(Object state)

    ReplyDelete
    Replies
    1. Hi Gurunadh,

      Have you managed to solve above issue? If not why don't you try to change to this in solr schema.xml

      Delete
    2. field name="_displayname" type="text_general" indexed="true" stored="true"

      Delete
  19. and change
    dynamicField name="*_t_da" type="text_da" indexed="true" stored="true"

    to

    dynamicField name="*_t_da" type="text_general" indexed="true" stored="true"

    ReplyDelete