During my main job I work with multiple clients with different IT departments and strategies when it comes to the deployment of a new Application Release to their AEM instances.
Usually such a deployment involves the participation of the client, their IT department (or hosting partner) and the developers of the application. All parties have to work together if you want to make sure, that the deployments work flawless and with only minor downtimes (if any).
The following blog post will describe a near best practice for deployments from my experience. It might work for you, it might not but I think it provides a good baseline for most scenarios. Please feel free to contact me if you have any further suggestions or experiences!
You can apply this to your staging environment but also to live but I strongly suggest to use the same process through all stages.
I assume that you are running a setup similar to the one below.
We made it our habit to create checklists for every regular task. During the task we check each single topic and make sure, that we don’t miss anything. This might be annoying but especially if somebody else has to perform the task, he’ll thank you.
So I’d suggest to create a confluence page or something similar where you list all steps required with as much detail as possible (or links to further documentation).
I don’t think there are really any AEM instances running on bare metal left out there but everything is virtualized. Therefor it should be a no-brainer to perform a snapshot backup shortly before the deployment to give you the chance to rollback if anything goes wrong.
Another important thing is the communication between the teams. Usually you are not in the same office as the client or hosting partner so make sure, everybody is available on a predefined channel. We favor realtime communication (like Slack) over email as it’s faster and cleared when multiple discussions happen in parallel.
Make sure, that the editors don’t modify any content during the deployment or even better after the last backup. Otherwise you might have people complaining about timeouts, lost content or similar when you don’t want to deal with such problems but concentrate on your main task.
During the deployment you don’t want to search for URLs or changes that need to be tested. Prepare a list with URLs and functionality in advance and add it to the release notes or somewhere available for everybody involved.
You should also always have a look into the logfiles of the systems to see any backend or nonvisual errors.
The deployment process should utilize the advantages of your system architecture. Therefor we’ll first deploy to the author and perform the smoketest there. If you are confident that the system is running as expected, one of the publish systems is removed from loadbalancer and upgraded afterwards. You can then redo the smoketests here and then remove the next publish system. As soon as the majority of your publish systems is removed from loadbalancer and upgraded, you should swap the systems and get the finished systems into loadbalancer while leaving the old ones out.
During the whole process the website should not be affected as it is still served from CDN and the loadbalancer which always contains at least one tested system.
If you find any issue during the deployment, you can always roll back to the state before.
After each deployed system you should delete the dispatcher cache. Otherwise you might get a mixup of old clientlibs/assets and new markup or something similar. It’s always very annoying to find out, that the whole trouble was caused by a missing purge.
If you are using a CDN with caching, you also need to make sure, that you bypass the cache during your smoketests and perform them only on the targeted system. This might be done by using direct access through the system’s IP or special configurations in the CDN.
You should not purge the CDN cache unless you have the new software version on the systems in loadbalancer.
First of all you should perform the deployment to the author instance and run the smoketests there. This is usually the easiest step as it does only involve the dispatcher cache and no CDN cache.
Now remove the first publish instance from your loadbalancer and install the update there. Now perform the smoketests and verify that all features work as expected. As soon as you (and the client) are happy, repeat those steps for all other publish systems.
After the deployment to all instances you should document if there were any issues and how you resolved them. Also don’t forget to notify everybody involved that the deployment has been finished, contentfreeze is lifted and that you are thankful to have such a great team ;)