Overview

Changelog: Improving Discovery and Readability

Posted on Thu 19 March 2015 in Dispatches • 4 min read

Iteration is the thing. Or at least that’s what I tell myself. After all, tinkering is easy to do, especially when you are working with a platform as flexible as Jekyll. So as you would expect, I’ve made several changes to the site since the initial relaunch.

Ditching LSI

Jekyll has built-in support for latent semantic indexing, which should result in an accurate list of related posts for every entry on the site. The results are based upon indexing the language of each post and comparing it to the the others. It’s an appealing feature, and should provide similar results as my Xapian/Haystack powered reading suggestions that were part of my old Django-based version of this site. The performance gains for LSI were actually one of the reasons I initially put in the effort to upgrade to the latest Jekyll version from my aging Octopress installation.

Here’s the thing though: LSI sucks.

It’s true that LSI no longer takes hours to run, but it still adds up to anywhere between 11-15 minutes of additional time in site generation. That’s acceptable to me in order to ensure accurate results that just work, so I happily began deploying with it in place. What I found was that performance was not the only issue with LSI; the results were bad too.

After running a site build with LSI enabled, I found that every post was still linked to virtually the same articles, which also happened to be some of the most recent. This meant that it didn’t matter where in the site you landed, it would recommend the same five or six articles over and over again. The point of this feature is to enable reader disovery of other content, and this was a ridiculously poor result considering the performance tax for using it.

The solution to this problem was to cut out LSI altogether in favor of Lawrence Woodman’s related posts plugin, which uses tag indexing to relate posts together by topic.1 Of course, this means you need to ensure you are assigning tags for each post, which is a best practice anyway. I’m pleased to note that not only are the results more consistently accurate, this method is radically faster. My entire site now generates in less than 27 seconds.

I recently read Matt Gemmell’s article on permalinks and found his argument compelling. Essentially, his point is that date-based permalinks are a vestigial feature of early journal-style blogging and add little value for the reader. He also asserts, perhaps a little bit presumptively, that these URL designs may in fact detract from the authority of the piece itself.2

I encourage you to read his piece on the matter to see if you agree. I must admit that I quickly found myself nodding at his reasoning.

There is some concern of duplicate titles causing issues in this scenario, but searching through the 1000+ posts on this site, I found only two duplicates titles for posts, and those two were link-blog entries, which were easy to correct. After adding a simple rewrite rule to my nginx config to redirect legacy inbound links, I deployed a new version of the site using the new permalink model.

I’m happy with the change and agree with Gemmell that the URLs look cleaner without all the date-based cruft. I think they are friendlier to a reader as well.

The new site theme includes support for featured images in posts, which I quite enjoy. However, those images were not being displayed in the Atom feed, which meant that those reading in a feed reader would never see them. This was a relatively easy thing to add, and now the featured image, when present, is automatically supplied using the media:thumbnail element.

I’ve tested the results with several popular feed readers and they look good.

There Is Always More

Of course, the project is never really done. I’m sure I’ll continue to make changes to the site in the coming months and years, but that’s half the joy. As with any creative outlet, the day we stop playing is the day we start dying. With that in mind, I look forward to many more years of tinkering here.


  1. Please note that if you want to use with Jekyll 2.1+, you will need to use this fork of the library until it’s pull request is merged. 

  2. This is clearly an area of importance to Gemmell, as demonstrated in his recent post suggesting that the terminology inherited from blogging does a disservice to online writers.