Sunday, February 9, 2014

Solr Core Discovery

I did a walk-through post awhile back on setting up Solr as a Sitecore 7 index provider.  One of the steps involved defining a "core" (also refered to as a "collection") to hold the indexes.  The way I described is no longer required, as of Solr 4.4, and will no longer work with Solr 5.  (Solr is currently on release 4.6.1).

First, a little background.  Unlike Lucene, which is housed in a filesystem and accessed through API calls, Solr communication is managed through HTTP requests. The Solr runtime is cabable of managing many cores (also called "collections"), each of which has its own configuration, and each of which is powered by its own Lucene index.  The core name is present in the URL when making web requrests: http://localhost:8983/solr/<corename>

Prior to Solr 4.4, it was necesary to define each Core in the main solr.xml configuration. Solr 4.4 introduced the concept of auto-discovery, where any folder that contains a collection.properties file that is a child or descendant of the Solr home directory is automatically identified and loaded as a core (see "Solr Cores and solr.xml," in the Solr 4.6 Reference Guide).  What if you want to locate your cores elsewhere, such as in the Data folder of your Sitecore installation? You can do this too, either from the Solr dashboard under Core Admin:


Or by making a direct API call through HTTP:

In the above HTTP call, the "instanceDir" parameter is not needed if the core is a direct child of the Solr home directory (the directory that contains the "solr.xml" file).

It is also worth noting that the command will fail if the directory has a file named "collection.properties" with a "Core already exists" error.  If you want to build a script to load cores automatically, you will need to both delete or rename the "collection.properties" file and issue the CREATE command.  This is one good reason to locate your cores in the Solr directory: when Solr is started, these will load automatically provided they have a colllection.properties file, whereas externally located cores needed to be manually loaded.

So what needs to be in the core directory?

  • A "conf" directory.  This can be copied from the "collection1" included on a fresh install, as long as the schema.xml document has been modified by the "Generate Solr Schema.xml file" command on the Sitecore desktop Indexing control panel application (see step #8 of my earlier post.) 
  • A file named collection.properties.  This can be empty, in which case the core will be given the name of it's directory. 
  • It is not necessary to create a "data" folder to store the Lucene index; Solr will do this itself.
To my mind, a logical Solr set up for a developer workstation would have Solr running on the standard 8983 port, running with Jetty from a start up script or a windows service (I have had good experience with this one), with separate collections for each Sitecore environment as children of the Solr home directory, or optionally stored in a "Sitecore Cores" directory.  Since Solr will recusively search for folders with "collection.properties" at startup, these will be automatically loaded.

2 comments:

  1. again another beautifuul post by you on Solr!! thanks a lot

    ReplyDelete
  2. Very well written and a takeaway for me

    ReplyDelete