Tackling Apache zombies

Myths and legends laid to rest

Choosing a cloud hosting partner with confidence

The net is full of Apache knowledge. Tips and tricks, discussion fora, experts of all kinds, and innumerable "how-tos" and tutorials on a range of subjects. Some of these are worth reading; others may be otherwise.

Among all this wisdom are some myths and half-truths that are so well known as to be "common knowledge". They regularly need to be "un-explained" in the support fora. They typically have origins in the very early days of Apache and its predecessor the NCSA server, and were either true in the distant past, or appeared in tutorial examples which gathered momentum to become, in cargo cult terms, the One True Way. Now, zombie-like, they refuse to die.

So, let's tackle some of them here. I've selected four that can lead you into trouble if misunderstood. But before launching into them, here's a public service announcement, particularly for those who missed ApacheCon in Dublin for reasons of geography. Two more ApacheCon conferences are coming up shortly:

  1. Asia: Colombo, Sri Lanka, 14-17 August
  2. USA: Dallas, Texas, 9-13 October

Password Protection and .htaccess

One of the most pervasive myths around Apache is that password protection means .htaccess, and vice versa. While not entirely false, this belief is misleading, and often leads to configurations that are unnecessarily complex and slow. Apache itself may be guilty of confusing users by virtue of the naming scheme: the similarly-named htpasswd and htdigest programs are indeed concerned with password protection.

To understand the role of .htaccess, we need to take a look at Apache configuration. The majority of Apache's configuration can be set per-directory, so that http://www.example.com/example/ may behave entirely differently to http://www.example.com/, etc. This is based on the <Directory> container in httpd.conf, and its cousins <Location>, <Files>, etc. Password protection is just one of many aspects of configuration that can be determined per-directory.

Now, the true role of .htaccess files is to enable some aspects of Apache configuration to be determined outside httpd.conf, so it can be delegated to users who have no privilege to mess with the core of the system. The AllowOverride directive determines what aspects of configuration can be set in .htaccess files, and AllowOverride is itself something that can be set per-directory.

Now, you may sometimes encounter instructions to create a .htaccess file, containing something like:

  AuthType Basic   AuthName "My Application"   AuthUserFile /path/to/my/userfile   Require valid-user

together with setting

  AllowOverride AuthConfig

in httpd.conf to enable it.

This is harmful for two reasons. Firstly, it's complex: it introduces new and unnecessary issues concerning management of, permissions on, and (not least) security of, the .htaccess files themselves. Secondly, it's slow: as soon as you set AllowOverride to anything other than its default setting of None, Apache has to look for and read the .htaccess files (possibly more than one, where subdirectories are involved) for every hit!

Just say no to .htaccess! Whatever goes in a .htaccess file can always go directly into a corresponding <Directory> stanza. The only people to use .htaccess should be end-users who want control without having to bug the server admin.

AddType Abuse

The AddType directive serves to associate a filename "extension" with a MIME type, so the server sets the correct Content-Type header for the Client to handle the contents.

In the NCSA server and in Apache 1.0, this was extended in a rather dirty hack to create "magic" types that would invoke special processing on the server. Common instances were CGI scripts and server-side includes, and later PHP.

This hack became redundant with the introduction of the AddHandler directive in Apache 1.1 (1996). Now we have a clean distinction between handlers (serverside processing) and MIME Types (which describe contents so that browsers and other Clients can handle them correctly). But ten years on, the abuse of AddType to determine serverside processing continues to muddy the waters. This, perhaps more than any other Apache zombie, has advocates who really should know better.

Don't limit the limits!

Another password-protection myth that is mercifully near-dead concerns the use of <Limit ...> sections. These appeared in some early example documentation, from where they were widely copied out of context and grew into a myth.

The <Limit> container serves to say "this limit applies only to the HTTP methods listed". So you're granting unlimited access using other methods. This is rarely what you want, although it does have valid uses. The most common valid case is where you use Limit alongside a matching LimitExcept to determine different access policies for different operations. WebDAV and subversion commonly use this form, and provide good examples. For other password protection policies, you should normally just ignore it.

The case-insensitive URL

The case-insensitivity myth goes something like "Apache on Windows is case-insensitive; on *X it's case sensitive". And so it appears, though Apache's mod_speling(sic) - or an AliasMatch or RewriteRule - can change this behaviour.

Underlying this is a half-truth: that URLs correspond to the filesystem. They may indeed do that, but this is no more than a convenient convention. URLs are in fact defined unambiguously by RFC2396 to be case-sensitive (in the "file" part). A server that treats them otherwise is in fact serving the same contents at multiple different URLs.

Servers that give the illusion of being case-insensitive are actively harmful to the 'net infrastructure. That's because agents such as caches and search engines MUST treat each URL as different. So creating multiple versions is in effect spamming them (and may get a site penalised in search engines that watch out for spamming).

To see the potential extent of this spam, note that each "case-insensitive" letter doubles the number of spammed URLs. So for example, an eight letter "case-insensitive" URL is in fact 256 URLs, and at 20 letters, it rises to over a million!

And now the light is failing and I'm pulling the covers over my head – before the zombies get me... ®

Secure remote control for conventional and virtual desktops

More from The Register

next story
ONE MILLION people already running Windows 10
A third of them are doing it in VMs, but early feedback focuses on frippery
Netscape Navigator - the browser that started it all - turns 20
It was 20 years ago today, Marc Andreeesen taught the band to play
Sign off my IT project or I’ll PHONE your MUM
Honestly, it’s a piece of piss
Sway: Microsoft's new Office app doesn't have an Undo function
Content aggregation, meet the workplace ... oh
Do Moan! MONSTER 6-day EMAIL OUTAGE hits Domain Monster
Customers freaked out by frightful service
Ploppr: The #VultureTRENDING App of the Now
This organic crowd sourced viro- social fertiliser just got REAL
Return of the Jedi – Apache reclaims web server crown
.london, .hamburg and .公司 - that's .com in Chinese - storm the web server charts
NetWare sales revive in China thanks to that man Snowden
If it ain't Microsoft, it's in fashion behind the Great Firewall
prev story


Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Win a year’s supply of chocolate
There is no techie angle to this competition so we're not going to pretend there is, but everyone loves chocolate so who cares.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.