Better scheduled posts for Pelican

Posted on Fri 12 January 2018 in Dispatches • 4 min read

Pop quiz: you’ve set up a static web site like all the cool kids these days, but you still want to be able to do scheduled posts. There’s a lot of good reasons why you might. After all, if you go on a writing binge you don’t want your RSS feed to become a firehose, and if you go away on vacation you don’t want your site to grow moss. In both cases, it helps to be able to write ahead and have the posts appear according to a schedule.

I’ve briefly written before about how I’ve accomplished this with both Jekyll and Pelican in the past. Typically, I would set the dates for the posts in the future, enable the future dates filter in either system, and then run a cronjob on the server to pull updates from the git repo and deploy every thirty minutes or so. That works fine when you’re running it on a VPS, but what about when you’re hosting the solution on something like S3, where your PUT requests affect your billing? If you’re using CloudFront as well, then you need to deal with invalidating the cache. It doesn’t make sense to publish twice an hour if you don’t have any changes to deploy— and your wallet won’t like it either —so you need to be a bit more selective.

Having recently moved this site to S3, I found myself pondering this very dilemma. Ultimately, I only want there to be a deployment if there is new content, or a scheduled post has come due. There are quite a few recommended solutions for this around the web, but I’m not particularly fond of them. I’ve seen quite a few proposing use of the at command to schedule deployments, or even suggestions of holding the content somewhere else on the machine and have a cronjob move it into your repo at the appointed time. 🤮

My requirements for an acceptable solution are as follows:

  • Draft management needs to stay in the Pelican repo. I want to use Pelican’s existing ability to filter out future dates rather than moving content out of the repo, and somehow mangling it back into place.
  • It needs to be portable.
  • It needs to function well as a cronjob, and provide plenty of troubleshooting data to the log.

With all those requirements in mind, I came up with this script. It checks the repository state of both your theme repo and your content repo1 prior to any changes. Then it executes a git pull on each and compares the new state to the old state in order to see if you’ve made any commits since the last time it ran. If you have, it plans a deployment.

Of course, that only helps with spotting new commits; what about the scheduled posts?

Each time the script runs it will run a check for any Markdown2 files— e.g. ending in “*.md” —in your content folder. It then grabs all the dates from the YAML metadata at the top of your posts, sorts them, and finds any posts scheduled for the future. It identifies the one that’s coming up next in your queue and stores that date to your repo inside a file named .lastdateeval.3 On subsequent runs, it checks to see if that date has since come due, and if it has, plans a deployment. It then updates the .lastdateeval file with the date of the next future post in the queue if any more exist.

If the deployment is needed, it will execute your configured make target for your Pelican repo in order to publish your content. If you are using S3/CloudFront, the script can also optionally issue an invalidation request to AWS in order to clear the cache on the edge servers. This ensures that your site updates appear immediately with minimal stale cache issues. Since the script will only execute a deployment or invalidation request in response to new content or design changes, you can run this script as often as you like via a cronjob, and it shouldn’t negatively impact your AWS billing.

Ok, you know what it does. Now for the important bit. Here is the script:

This seems to be a common dilemma for users of static site generators, so I hope this tool is helpful. It most likely can also be adapted to work with other generators such as Jekyll, but I haven’t attempted that yet. And as always with these things, the script can almost certainly be refactored and improved. Suggestions and feedback are welcome.

  1. You are keeping those separate, right? I hope? 

  2. You will need to modify the script if you prefer to roll with another syntax such as RST

  3. I recommend adding that filename to your .gitignore