SSL, Apache, VirtualHosts, Django, and SuspiciousOperation’s

I recently upgraded to Django 1.4.5, which fixes security issues relating to malicious HTTP “Host” headers. Since my Django site does use the host header occasionally, I took the recommended step of adding an ALLOWED_HOSTS setting which whitelists the hosts that are allowed to access the site.

My server logs then started filling up with SuspiciousOperations, being triggered by none other than GoogleBot — hundreds of hits a day. I checked with Google’s webmaster tools, but there were no listed crawl errors for my domain, despite my hundreds of 500’s in the logs.

The first breakthrough came when I added an additional field to the Apache combo log directive, to see the host header:

Original:

LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined

Add host header:

LogFormat "%h %l %u %t %{Host}i \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined

Once I did this, I started seeing that the host names for the SuspiciousOperation’s were all valid names for other virtual hosts on this server.

When using VirtualHost’s, apache searches for a virtualhost with a ServerName that matches the request’s Host header. If it doesn’t find such a VirtualHost, it just uses the first defined VirtualHost. This can lead to weird results if your first-listed VirtualHost is a production site that usually goes by a different name. It seems the “best practice” is to define a catch-all 404 virtual host directive before all the others, so that it collects the bad host names. For example, this one returns a 404 for any URL:

    # /etc/apache2/sites-enabled/000-default
    <VirtualHost *:80>
        ServerName bogus # it doesn't matter what this is
        RewriteEngine on
        RewriteRule ^/.*$ [R=404]
    </VirtualHost>

However, this didn’t fix my problem. I manually checked all of these virtual hosts, and they all were resolving correctly and as expected — yet the SuspiciousOperations kept pouring in, with Host headers matching various of my virtualhost’s. Somehow Django was receiving requests from other virtualhosts.

Finally, I found the answer: SSL. The Django site I have is configured to use both HTTP and HTTPS, and these are separate worlds for apache. My catch-all virtualhost, and the definitions for all the other virtualhosts, only matched on port 80; but SSL requests come in at port 443. So GoogleBot was requesting SSL variants of the virtualhost names, and Apache was shunting these to the Django app, which was the only virtualhost configured for port 443. It’s interesting, but perhaps unsurprising, that GoogleBot ignores the certificate errors it would have gotten if it tried to validate the cert for those host names.

The trouble with VirtualHosts and apache and SSL, is that you can’t define two different VirtualHost sections for a single port, in the way that you can with non-SSL, due to how Apache parses SSL requests. So you can’t just define a default 404 VirtualHost like you could on port 80. We need some way to limit which hosts are sent to Django to the ones that it should actually get, within a single VirtualHost.

The solution is to use mod_rewrite to check the HTTP_HOST header, and to explicitly send 404’s, redirects, or whatever else if an unconfigured virtualhost is used. Here’s what I ended up doing:

    # /etc/apache2/sites-enabled/my-ssl-django-site
    <VirtualHost *:443>
        RewriteEngine On

        # Send a 404 for anything that isn't www.example.com or example.com.
        RewriteCond %{HTTP_HOST} !^(www\.)?example\.com$
        RewriteRule /.* [R=404]

        # Optional -- canonicalize the URL by redirecting "www" variants to
        # non-www. You could also do the reverse. If you skip the 404 RewriteRule
        # above, this would also redirect other virtualhost's to your functioning
        # SSL virtualhost if you wanted.
        RewriteCond %{HTTP_HOST} !^example\.com$
        RewriteRule /.* https://example.com/ [R]

        SSLEngine on
        ...
    </VirtualHost>

This worked great. The SuspiciousOperation’s went away, and if I boldly visit the non-SSL virtualhost’s with https, ignoring the certificate errors, I get a reasonable error response.

Advertisements
This entry was posted in apache, django. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s