Convert Jekyll to Hugo Permalinks

tl;dr: While Jekyll is a great static site generator, I had a many bad moments with Ruby and now migrate to Hugo. Only minor changes in the posts were required but it took some time to rewrite all old incoming links to the new pages.

Contents

My legacy blog was created using Jekyll1 and will be migrated to a brand new Hugo2 blog now. I don’t want to loose any content and therefor need to convert my post to a newer format (luckily both site generators use Markdown as base).

Jekyll extracts the Date from a filename and uses the remaining part as Permalink. Although this is configurable3, by default 2019-03-11-foo-bar.md will eventually become /2019/03/11/foo-bar.html.

Hugo does not use any file-extensions in the urls but uses folders with index.html files. Permalinks are also configurable4 in config.yaml (or toml) and the closest match is /:year/:month/:day/:slug/.

permalinks:
  posts: /:year/:month/:day/:slug/

With these facts in mind I will need the following matching (/blog/ is my default jekyll category which will be removed):

https://www.dev-eth0.de/blog/2019/03/11/docker-anonymize-logs.html
=>
https://www.dev-eth0.de/2019/03/11/docker-anonymize-logs/

Configure redirects in Webserver

All incoming requests are processed by a Nginx Webserver which can do the rewrite. I just need to add the following rule to the site configuration:

location ~* /blog/(.*)\.html$ {
  return 301 /$1/;
}

Add slug to frontmatter

If now slug is configured in a post’s frontmatter, the title is used as default. Therefore none of the posts will have the correct permalink until we have defined the filename as slug.

You can go through all your posts manually or use the following bash script to add the filename (without date) as slug: to the frontmatter:

for f in *.md;
do
  base=`basename "$f" '.md' | cut -f 4- -d '-'`
  sed -i "s/title:/slug: $base\ntitle:/" "$f"
done

(!) Important: if you are using a Mac/BSD, sed will not support newlines (\n). You need to install it (brew install gnu-sed) and use gsed instead of sed). If you are unsure, just run this:

echo "1234" | sed 's/3/\nnewline/g'

Verify result

Adam Jarret documented a simple way to crawl a sitemap.xml with curl5. I made some minor changes to his command which allow me, to verify, that all urls of my current blog (www.dev-eth0.de) are also valid on my test-system (dev.dev-eth0.de).

The command first loads the sitemap, then extracts all loc elements for posts in /blog. For each of the results the domain is changed and then queried. Only links which do not return a 200 are printed out (you should get an empty list).

curl https://www.dev-eth0.de/sitemap.xml | grep -e loc | grep blog | sed 's|<loc>\(.*\)<\/loc>$|\1|g' | sed 's/www/dev/g' | xargs -I {} curl -L -s -o /dev/null -w "%{http_code} %{url_effective}\n" {} | grep -v 200

Footnotes

Tags

Comments

Related