Index AEM Content with Elasticsearch
Reading time3 Minutes
With the build-in Lucene Index1 in AEM, most of the requirements for a flexible and fast search can be implemented with minimal effort and no need for additional dependencies.
The functionality for queries is wrapped in the QueryBuilder API2 and the index configuration3 done in CRX. A query can search in multiple fields for values or full-text and provide scoring, excerpts etc. Depending on the supported languages, you can also configure Analyzers4 for Stemming, Stop Words etc.
In one of our projects the customer came up with the requirement to add external content into the search results (documents from a DMS). This is when I started to implement this Elasticsearch5 Integration for AEM.
This blog post shows how to setup an integration of Elasticsearch into AEM and index Pages and DAM Assets upon replication. Please refer to this post for an introduction of the Search functionality.
There is currently no release of the integration available, therefor you should checkout the github repository6 and build the project as described in the README7.
As soon as you have the bundle running in your AEM instance, you can start to configure the integration.
Configure Elasticsearch Host
Most of the configuration is done in System Configuration. The first step is to setup a
Elasticsearch Search Provider.
Create Replication Agent
The next step is to setup a Replication Agent in /etc/replication/agents.author.
To enable the Agent, open the Edit mode and check the Enabled box. You can also configure the desired log-level.
Now you are ready and can test the Connection. If everything works as expected, you should now see the default response from your Elasticsearch installation.
Add default mappings to Elasticsearch
Elasticsearch uses mappings8 to define how a document is stored and processed. You can find a mapping file that is based on the default configuration in /misc/template_aem.json.
You should install this mapping file using either Kibana or curl:
curl -X POST -d @misc/template_aem.json http://localhost:9200/_template/aem
Index custom fields (optional)
By default only basic fields are indexed (see Default Fields). You can add additional fields using the
ElasticSearch Index Configuration in System Configuration. Each entry contains a primary type (e.g. cq:Page) and multiple index rules.
Please note that you should add all fields to the mapping if they required special treatment.
As soon as a Page or DAM-Asset is activated, you will see an entry in the
ElasticSearch Replication Agent queue and shortly after an index document in your Elasticsearch.
In addition to the
path, the following fields are indexed by default and are included in the default mapping. There is no option to disable the indexing of those fields.
|cq:lastModified||jcr:lastModified||Last Modified Date|