Skip to main content
  1. Dispatches/

Bleach: HTML Sanitization

·132 words·1 min
Articles Development Python Html Security
Daniel Andrlik
Author
Daniel Andrlik lives in the suburbs of Philadelphia. By day he manages product teams. The rest of the time he is a podcast host and producer, writer of speculative fiction, a rabid reader, and a programmer.

Bleach is a rather clever Python module for sanitizing HTML input and auto-linking URLs. It uses a whitelist for the allowed elements and attributes (thank God), and will avoid trying to “linkify” URLs that are already within an anchor element. The way it pulls this off is to build an HTML5 document from the input and then walks through the DOM as opposed to using pure regular expressions. In theory, that means it should just do the right thing.

I usually achieve this result by requiring Markdown input in safe mode, but this is certainly a handy library to have in your toolbox, and it doesn’t require your users having to learn a new syntax.

It was released by Mozilla for use on their production sites, so it should be stable enough.

Related

Migrating a Django application from Heroku to AWS Elastic Beanstalk
·3467 words·17 mins
Articles Development Heroku Aws Elastic Beanstalk Django Python
Here’s the scenario you find yourself in: you’ve built and deployed a cool little Django app, and successfully deployed it on Heroku. You’ve followed all the great advice out there and are using django-environ to help manage all the various configuration variables you have to retrieve from your Heroku dynos.
Soaring with Pelican
·737 words·4 mins
Articles Personal Development Assorted Geekery Meta Python Pelican
Get Pelican: it’s good! There comes a time in every young man’s life when he begins to neglect his digital lawn, and the weeds grow so thick you wouldn’t think there was any home there at all.
Switching To Octopress
·1512 words·8 mins
Articles Assorted Geekery Meta Ruby Octopress Django Development Python Jekyll Hyde
This site is now powered by Octopress. The tentacles compell you! It’s tentacly delicious ! The Search> The Search # As I mentioned in my previous post, I’ve been looking to try out a new CMS for this site.