Feeds

Taking wing with Apache 2

A new and flexible framework

Beginner's guide to SSL certificates

In October, we developed a simple HelloWorld module. Last week, my book finally appeared in print. To celebrate the happy event, let's take a look at the more advanced topic of how Apache 1's fixed request processing morphed into a new and more flexible framework in Apache 2.

Any competent CGI or PHP scripter knows the basics of writing a web application. Your program accepts inputs and generates outputs in a prescribed format. The server is responsible for translating between that format and HTTP, the protocol of the web. The application is responsible for doing whatever needs to be done beyond that. Since the release of Apache 2.0 in 2002, there's been quite a lot of application support built into Apache itself, but the basic principles are the same whether your application uses modules, scripts, separate application servers, or some mixture, and whether you use Apache or some other server.

A slightly more advanced developer knows Apache's request processing cycle, which divides the job of handling HTTP requests into a number of phases. The phases defined in the module structure in Apache 1 are reasonably well-understood. Apache 2 defines a similar request processing sequence, but the underlying mechanism is very different, and altogether more powerful. But before looking at that, let's take a brief look at what happens when Apache receives an HTTP request:

At the heart of the request is the content generator. That's where your basic CGI and PHP scripts, as well as HelloWorld, live. Before that are a number of phases, which can broadly be characterised as:

  • First, Apache has to analyse the URL path to determine what configuration applies. That means matching the URL path first to <Location> sections in the configuration, then to the filesystem, whereupon <Directory>, <Files> and .htaccess configuration can be applied. A module may wish to get in early: for example, mod_alias or mod_rewrite may change how the request is mapped to the filesystem, or divert it completely.
  • Once all the configuration is known, Apache determines whether the client is authorised for the attempted operation. It can apply a range of different criteria - for example, host-based restrictions or password protection.
  • Once we have determined that the user is authorised, we apply any remaining configuration and determine exactly how the request will be processed. This commonly involves checking filename "extensions" to determine the handler and the response headers, and includes more advanced processing such as mod_negotiation selecting the variant of a document that best matches a particular client's request.
  • Now we run the core application and generate the response.
  • Finally, we log the request.

Of course there's more to it: Apache 1.3 defined nine phases of request processing, and 2.x defines 10 (including request creation), but that's the overview. Chapter 6 of my book gives details of the request processing cycle.

Now, whereas Apache 1 had a fixed sequence of request processing steps hardwired into its module structure, Apache 2 instead uses hooks. A hook offers an opportunity for modules to insert their own functions at server start-up, as we saw with the HelloWorld example:


static void helloworld_hooks(apr_pool_t *pool) {
  /* hook helloworld_handler into Apache */
  ap_hook_handler(helloworld_handler, NULL, NULL, APR_HOOK_MIDDLE);
}

The handler hook takes the place of the handler phase in Apache 1, and runs our function at that point in processing a request.

There are several advantages to this approach. In terms of the request processing sequence, it enables modules to define their order of processing very precisely, rather than depending on the order the modules are loaded. But, more interestingly, it generalises. Any module can define a new hook, thereby enabling other modules to insert a function into its own processing.

A new hook is unlikely to provide yet another phase in request processing: that's the business of the core. Rather, it will support hooking into something altogether different. For example:

  • mod_proxy provides hooks into the proxied request.
  • mod_authz_dbd (SQL based user authorization) supports "login"/"logout" operations, in which the internal state of the database is changed - typically by setting a "logged in" flag. It exports a hook so that another module managing client state - e.g. by setting a cookie - can tie its operations to serverside events.</li

In keeping with Apache's modular design, we prefer to avoid creating complex dependency chains where they can be avoided. Our HelloWorld module used the function ap_hook_handler to register a handler with the core. But what about calling ap_hook_foo to register a handler with some third-party mod_foo that exports its own hook? That creates a dependency on mod_foo. Sometimes we prefer to avoid that:

  • mod_include (server side includes) supports an <!--#exec cgi...--> directive that requires a CGI module to process. But most SSI directives work fine without CGI, and so should work even if CGI is disabled altogether on the server.
  • A cookie session management module may benefit from tying itself to serverside login/logout, but should work even when that is not available.

To support this kind of modular flexibility, Apache (actually APR) supports Optional Functions and Optional Hooks. Put succinctly, this gives us "Module X works whether or not Module Y is available, but has additional capabilities when it can call on optional parts of Module Y". Our login/logout example uses an optional hook, whereas our SSI example relies on optional functions. The latter adds an additional layer of flexibility: another module can define its own SSI events of the form <!--#myaction this="that" foo="bar"--> by registering its own functions with mod_include. A similar example is mod_rewrite's RewriteMap feature.

New and optional functions and hooks are one of three important mechanisms for modular extensibility introduced in Chapter 10 of my book. Others are:

  • The Provider mechanism, in which a module exports an interface somewhat like a C++ virtual base class or Java Interface for other modules to implement. For example, the XML Namespace API.
  • Exporting a service to other modules: for example, mod_dbd managing a database connection pool for other modules.

The extensible API isn't (for most of us) the most compelling reason to use Apache 2 - the filtering framework wins hands down there. But when you're building complex applications, it can make all the difference between a clean, well-engineered architecture and an ugly, unmaintainable hack.

Nick Kew is the author of The Apache Modules Book, available for discounted UK order (it was published on 2 February in the USA and will arrive in UK about a week later) here. ®

Beginner's guide to SSL certificates

More from The Register

next story
That dreaded syncing feeling: Will Microsoft EVER fix OneDrive?
Microsoft's long history of broken Windows sync
Mozilla, EFF, Cisco back free-as-in-FREE-BEER SSL cert authority
Let’s Encrypt to give HTTPS-everywhere a boost in 2015
SLURP! Flick your TONGUE around our LOLLIPOP – Google
Android 5 is coming – IF you're lucky enough to have the right gadget
Nokia's N1 fondleslab's HIDDEN BRILLIANCE: The 'Z Launcher'
Sugarcoating Android's Lollipop makes tab easier to swallow
Bug fixes! Get your APPLE BUG FIXES! iOS and OS X updates right here!
Yosemite fixes Wi-Fi hiccup, older iOS devices get performance boost
Facebook, working on Facebook at Work, works on Facebook. At Work
You don't want your cat or drunk pics at the office
Soz, web devs: Google snatches its Wallet off the table
Killing off web service in 3 months... but app-happy bonkers are fine
Meet Windows 10's new UI for OneDrive – also known as File Explorer
New preview build continues Redmond's retreat to the desktop
Microsoft: Your Linux Docker containers are now OURS to command
New tool lets admins wrangle Linux apps from Windows
prev story

Whitepapers

Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Simplify SSL certificate management across the enterprise
Simple steps to take control of SSL across the enterprise, and recommendations for a management platform for full visibility and single-point of control for these Certificates.
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.