Bleach: HTML Sanitization

Posted on Mon 03 January 2011 in Dispatches • 1 min read

Bleach is a rather clever Python module for sanitizing HTML input and auto-linking URLs. It uses a whitelist for the allowed elements and attributes (thank God), and will avoid trying to “linkify” URLs that are already within an anchor element. The way it pulls this off is to build an HTML5 document from the input and then walks through the DOM as opposed to using pure regular expressions. In theory, that means it should just do the right thing.

I usually achieve this result by requiring Markdown input in safe mode, but this is certainly a handy library to have in your toolbox, and it doesn’t require your users having to learn a new syntax.

It was released by Mozilla for use on their production sites, so it should be stable enough.