Setup a Solr schema.xml for AEM
Reading time3 Minutes
Now that we have successfully convinced AEM to use Solr as Indexer, the next step is to create a Schema which is used by Solr for Index/Query Processing.
Why do we need a schema?
Solr does not know anything about your data structure but you want it to perform complex operation like fulltext searches, faceting etc. To allow Solr to create a fast index, you need to define which fields you want to index and which operations should be performed upon index or query1.
There is an excellent book by Trey Grainger2 and Timothy Potter which gives a good view on the capabilities of Solr3. Although it is written for Solr 5 most of the concepts are the same for Solr 6 and just need minimal adjustments.
By default Solr 6 uses a
managed-schema.xml4 which allows you to use the Schema API5 to modify the schema. You can change this behavior in
solrconfix.xml per core and enable the classic
schema.xml which we'll use in this example.
The Jackrabbit project provides a basic configuration for a core you can use with Solr 4.x6 and as base for a custom configuration. I recommend that you have a look at the
schema.xml which is the base for the following definitions.
Schema.xml for AEM
uniqueKey field is the identity of an indexed document. If a new document with an already existing uniqueKey is indexed it replaces the existing entry. For structured content like a JCR content the path is a great identifier and therefor used.
Since you most likely not only want to query the complete index but restrict your queries to certain paths, some adjustments are required here. The Jackrabbit Oak Solr indexer supports multiple fields out of the box that should be added to your schema9. The documentation also provides some examples, where those fields are used.
Note: Only the field
path_exact is stored in our index and is therefor retrievable. All other fields are only used for indexing.
JCR/Sling and DAM attributes
schema.xml contains some interesting JCR attributes like
jcr_lastModified that can be queried as string or date (e.g before xyz). To allow queries of DAM assets, you can also see the mimetype attributes of DAM.
For this example I'll use three different JCR properties that should be index:
|headline||Simple String, no fulltext search|
|title||Simple String, no fulltext search|
|text||English text, indexed for fulltext search, suggestions etc|
All fieldtypes you can find in the
schema.xml are quite simple and by the book. There are primitive fieldtypes like
string but also types that support fulltext searches like
For the two
*_path fieldtypes some rules that replace or group the result by slashes are defined.
For a simple AEM application where you want to perform fulltext searches on predefined fields (like
text) the provided schema is a good starting point. You can extend it by adding additional fields or using the copyField10 mechanism to index more fields into the already defined ones.
If your application uses a property named
richText which you want to index, the following definition would copy it into the
text field and merge the results:
<copyField source="richText" dest="text">
The next post will deal with a sample application you can setup to get a better insight of the already achieved steps.