Index AEM Content with Elasticsearch

Contents

With the build-in Lucene Index2 in AEM, most of the requirements for a flexible and fast search can be implemented with minimal effort and no need for additional dependencies.

The functionality for queries is wrapped in the QueryBuilder API3 and the index configuration4 done in CRX. A query can search in multiple fields for values or full-text and provide scoring, excerpts etc. Depending on the supported languages, you can also configure Analyzers5 for Stemming, Stop Words etc.

In one of our projects the customer came up with the requirement to add external content into the search results (documents from a DMS). This is when I started to implement this Elasticsearch1 Integration for AEM.

This blog post shows how to setup an integration of Elasticsearch into AEM and index Pages and DAM Assets upon replication. Please refer to this post for an introduction of the Search functionality.

Setup

There is currently no release of the integration available, therefor you should checkout the github repository6 and build the project as described in the README7.

As soon as you have the bundle running in your AEM instance, you can start to configure the integration.

Configure Elasticsearch Host

Most of the configuration is done in System Configuration. The first step is to setup a Elasticsearch Search Provider.

Create Replication Agent

The next step is to setup a Replication Agent in /etc/replication/agents.author.

To enable the Agent, open the Edit mode and check the Enabled box. You can also configure the desired log-level.

Now you are ready and can test the Connection. If everything works as expected, you should now see the default response from your Elasticsearch installation.

Add default mappings to Elasticsearch

Elasticsearch uses mappings8 to define how a document is stored and processed. You can find a mapping file that is based on the default configuration in /misc/template_aem.json.

You should install this mapping file using either Kibana or curl:

curl -X POST -d @misc/template_aem.json http://localhost:9200/_template/aem

Index custom fields (optional)

By default only basic fields are indexed (see Default Fields). You can add additional fields using the ElasticSearch Index Configuration in System Configuration. Each entry contains a primary type (e.g. cq:Page) and multiple index rules.

Please note that you should add all fields to the mapping if they required special treatment.

Usage

As soon as a Page or DAM-Asset is activated, you will see an entry in the ElasticSearch Replication Agent queue and shortly after an index document in your Elasticsearch.

Default Fields

In addition to the path, the following fields are indexed by default and are included in the default mapping. There is no option to disable the indexing of those fields.

cq:Pagedam:AssetUsage
jcr:titledc:titlePage/Asset Title
jcr:descriptiondc:descriptionOptional Description
cq:lastModifiedjcr:lastModifiedLast Modified Date
cq:templateTemplate

Footnotes


  1. Elasticsearch

  2. Lucene

  3. AEM QueryBuilder API

  4. Lucene Index Configuration

  5. Lucene Analyzers

  6. Elasticsearch-AEM Github

  7. Elasticsearch-AEM README

  8. Elastisearch Mappings

Tags

Comments

Related