Tackling Apache zombies

Myths and legends laid to rest

Providing a secure and efficient Helpdesk

The net is full of Apache knowledge. Tips and tricks, discussion fora, experts of all kinds, and innumerable "how-tos" and tutorials on a range of subjects. Some of these are worth reading; others may be otherwise.

Among all this wisdom are some myths and half-truths that are so well known as to be "common knowledge". They regularly need to be "un-explained" in the support fora. They typically have origins in the very early days of Apache and its predecessor the NCSA server, and were either true in the distant past, or appeared in tutorial examples which gathered momentum to become, in cargo cult terms, the One True Way. Now, zombie-like, they refuse to die.

So, let's tackle some of them here. I've selected four that can lead you into trouble if misunderstood. But before launching into them, here's a public service announcement, particularly for those who missed ApacheCon in Dublin for reasons of geography. Two more ApacheCon conferences are coming up shortly:

  1. Asia: Colombo, Sri Lanka, 14-17 August
  2. USA: Dallas, Texas, 9-13 October

Password Protection and .htaccess

One of the most pervasive myths around Apache is that password protection means .htaccess, and vice versa. While not entirely false, this belief is misleading, and often leads to configurations that are unnecessarily complex and slow. Apache itself may be guilty of confusing users by virtue of the naming scheme: the similarly-named htpasswd and htdigest programs are indeed concerned with password protection.

To understand the role of .htaccess, we need to take a look at Apache configuration. The majority of Apache's configuration can be set per-directory, so that http://www.example.com/example/ may behave entirely differently to http://www.example.com/, etc. This is based on the <Directory> container in httpd.conf, and its cousins <Location>, <Files>, etc. Password protection is just one of many aspects of configuration that can be determined per-directory.

Now, the true role of .htaccess files is to enable some aspects of Apache configuration to be determined outside httpd.conf, so it can be delegated to users who have no privilege to mess with the core of the system. The AllowOverride directive determines what aspects of configuration can be set in .htaccess files, and AllowOverride is itself something that can be set per-directory.

Now, you may sometimes encounter instructions to create a .htaccess file, containing something like:

  AuthType Basic   AuthName "My Application"   AuthUserFile /path/to/my/userfile   Require valid-user

together with setting

  AllowOverride AuthConfig

in httpd.conf to enable it.

This is harmful for two reasons. Firstly, it's complex: it introduces new and unnecessary issues concerning management of, permissions on, and (not least) security of, the .htaccess files themselves. Secondly, it's slow: as soon as you set AllowOverride to anything other than its default setting of None, Apache has to look for and read the .htaccess files (possibly more than one, where subdirectories are involved) for every hit!

Just say no to .htaccess! Whatever goes in a .htaccess file can always go directly into a corresponding <Directory> stanza. The only people to use .htaccess should be end-users who want control without having to bug the server admin.

AddType Abuse

The AddType directive serves to associate a filename "extension" with a MIME type, so the server sets the correct Content-Type header for the Client to handle the contents.

In the NCSA server and in Apache 1.0, this was extended in a rather dirty hack to create "magic" types that would invoke special processing on the server. Common instances were CGI scripts and server-side includes, and later PHP.

This hack became redundant with the introduction of the AddHandler directive in Apache 1.1 (1996). Now we have a clean distinction between handlers (serverside processing) and MIME Types (which describe contents so that browsers and other Clients can handle them correctly). But ten years on, the abuse of AddType to determine serverside processing continues to muddy the waters. This, perhaps more than any other Apache zombie, has advocates who really should know better.

Don't limit the limits!

Another password-protection myth that is mercifully near-dead concerns the use of <Limit ...> sections. These appeared in some early example documentation, from where they were widely copied out of context and grew into a myth.

The <Limit> container serves to say "this limit applies only to the HTTP methods listed". So you're granting unlimited access using other methods. This is rarely what you want, although it does have valid uses. The most common valid case is where you use Limit alongside a matching LimitExcept to determine different access policies for different operations. WebDAV and subversion commonly use this form, and provide good examples. For other password protection policies, you should normally just ignore it.

The case-insensitive URL

The case-insensitivity myth goes something like "Apache on Windows is case-insensitive; on *X it's case sensitive". And so it appears, though Apache's mod_speling(sic) - or an AliasMatch or RewriteRule - can change this behaviour.

Underlying this is a half-truth: that URLs correspond to the filesystem. They may indeed do that, but this is no more than a convenient convention. URLs are in fact defined unambiguously by RFC2396 to be case-sensitive (in the "file" part). A server that treats them otherwise is in fact serving the same contents at multiple different URLs.

Servers that give the illusion of being case-insensitive are actively harmful to the 'net infrastructure. That's because agents such as caches and search engines MUST treat each URL as different. So creating multiple versions is in effect spamming them (and may get a site penalised in search engines that watch out for spamming).

To see the potential extent of this spam, note that each "case-insensitive" letter doubles the number of spammed URLs. So for example, an eight letter "case-insensitive" URL is in fact 256 URLs, and at 20 letters, it rises to over a million!

And now the light is failing and I'm pulling the covers over my head – before the zombies get me... ®

Secure remote control for conventional and virtual desktops

More from The Register

next story
'Windows 9' LEAK: Microsoft's playing catchup with Linux
Multiple desktops and live tiles in restored Start button star in new vids
Not appy with your Chromebook? Well now it can run Android apps
Google offers beta of tricky OS-inside-OS tech
New 'Cosmos' browser surfs the net by TXT alone
No data plan? No WiFi? No worries ... except sluggish download speed
Greater dev access to iOS 8 will put us AT RISK from HACKERS
Knocking holes in Apple's walled garden could backfire, says securo-chap
NHS grows a NoSQL backbone and rips out its Oracle Spine
Open source? In the government? Ha ha! What, wait ...?
Google extends app refund window to two hours
You now have 120 minutes to finish that game instead of 15
Intel: Hey, enterprises, drop everything and DO HADOOP
Big Data analytics projected to run on more servers than any other app
prev story


Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Providing a secure and efficient Helpdesk
A single remote control platform for user support is be key to providing an efficient helpdesk. Retain full control over the way in which screen and keystroke data is transmitted.