Index AEM Content with Elasticsearch

With the build-in Lucene Index1 in AEM, most of the requirements for a flexible and fast search can be implemented with minimal effort and no need for additional dependencies.

The functionality for queries is wrapped in the QueryBuilder API2 and the index configuration3 done in CRX. A query can search in multiple fields for values or full-text and provide scoring, excerpts etc. Depending on the supported languages, you can also configure Analyzers4 for Stemming, Stop Words etc.

In one of our projects the customer came up with the requirement to add external content into the search results (documents from a DMS). This is when I started to implement this Elasticsearch5 Integration for AEM.

This blog post shows how to setup an integration of Elasticsearch into AEM and index Pages and DAM Assets upon replication. Please refer to this post for an introduction of the Search functionality.

Setup

There is currently no release of the integration available, therefor you should checkout the github repository6 and build the project as described in the README7.

As soon as you have the bundle running in your AEM instance, you can start to configure the integration.

Configure Elasticsearch Host

Most of the configuration is done in System Configuration. The first step is to setup a Elasticsearch Search Provider.

Elasticsearch Search Provider

Create Replication Agent

The next step is to setup a Replication Agent in /etc/replication/agents.author.

Add Replication Agent

To enable the Agent, open the Edit mode and check the Enabled box. You can also configure the desired log-level.

Configure Replication Agent

Now you are ready and can test the Connection. If everything works as expected, you should now see the default response from your Elasticsearch installation.

ElasticSearch Search Provider

Add default mappings to Elasticsearch

Elasticsearch uses mappings8 to define how a document is stored and processed. You can find a mapping file that is based on the default configuration in /misc/template_aem.json.

You should install this mapping file using either Kibana or curl:

curl -X POST -d @misc/template_aem.json http://localhost:9200/_template/aem

Index custom fields (optional)

By default only basic fields are indexed (see Default Fields). You can add additional fields using the ElasticSearch Index Configuration in System Configuration. Each entry contains a primary type (e.g. cq:Page) and multiple index rules.

Please note that you should add all fields to the mapping if they required special treatment.

ElasticSearch Index Configuration

Usage

As soon as a Page or DAM-Asset is activated, you will see an entry in the ElasticSearch Replication Agent queue and shortly after an index document in your Elasticsearch.

Default Fields

In addition to the path, the following fields are indexed by default and are included in the default mapping. There is no option to disable the indexing of those fields.

cq:Pagedam:AssetUsage
jcr:titledc:titlePage/Asset Title
jcr:descriptiondc:descriptionOptional Description
cq:lastModifiedjcr:lastModifiedLast Modified Date
cq:template Template

Footnotes

Related Posts