A look at Apache modules

Now for some real Apache code

Column As I write this, I'm on the Eurostar train, just returning from O'Reilly's OSCon (Open Source Conference) in Brussels. Some fascinating insights there; and even my own talk generated some interesting discussion. Some of the delegates, including O'Reilly himself, are promoting opensource ideas going beyond software and into society more generally. I've touched on that in this very column before now, but a related argument that's new to me is that the open/closed debate in software could become largely sidelined, as the industry focuses on software as a service (such as Google's offerings) more than as a product.

Anyway, I've been meaning for some time to give you an article on writing Apache modules. But first things first. If you're going to write modules, you'll need a proper Apache installation. A safe option is to download the latest release version (currently 2.2.3) from httpd.apache.org, and install it from that. For the adventurous, you could install the current development version from svn.apache.org. If you use a different package such as an rpm or deb, you'll probably need an "apache-dev" package as well as Apache itself.

What is a module?

As I'm sure I've said, Apache has a diverse developer and user profile, and we all have very different uses for it. This approach to serving a wide range of needs is based on a small core, together with a large number of modules. Most of what you get when you install Apache as a package, from apache.org or elsewhere, comprises the modules that perform Apache's standard functions. Even such a simple task as serving an "index.html" file involves various modules: for example, mod_dir to resolve the URL http://www.example.com/ to the file index.html, and mod_mime to determine its MIME type as "text/html" so the browser knows how to render it.

Just as Apache's standard functions are driven by modules, so we can write new modules to change its behaviour, or introduce entirely new capabilities. Some examples of the kind of things we can do with modules include:

  • A content generator module takes an HTTP request and generates a response, in the manner of, for example, a CGI or PHP script. The default handler simply serves up a static file from local disc, while others may implement a service such as XML-RPC, do custom processing, or (like mod_cgi) delegate the work to an external script.
  • A mapper module runs before content generation, and determines how a request will be processed. For example, mod_negotiation selects amongst different versions of a document (e.g. different languages) according to browser preferences, while mod_alias and mod_rewrite perform rule-based URL manipulation.
  • An authentication module ascertains the identity of a user. When used, it is usually accompanied by an authorization module, which determines whether the user is permitted the attempted operation.
  • A filter module transforms incoming and/or outgoing data. Filters may be chained arbitrarily, and are the building blocks for sophisticated processing and aggregation applications. They range from simple content manipulation such as server side includes, through compression, to SSL encryption, and include many of the most exciting third-party applications.
  • A service module may export an entirely new API and/or service for other modules. For example, mod_dbd manages SQL database connections, and mod_xmlns exports an API for namespace-based processing of XML.

A HelloWorld Module

Conceptually, the simplest type of module is the content generator or handler, whose role in Apache is directly equivalent to a CGI or PHP script. That is to say, it processes a request in whatever manner is required, and generates a response to return to the Client. It is not required to deal with the details of the HTTP protocol, though (as with a script) it may do that, or any number of other things. Usually it's good to keep the content generator simple, and use other types of module for different tasks.

So in the spirit of simplicity, let's take a look at a minimal HelloWorld module. Note that, unlike a script, this doesn't live amongst our web documents, so we can't run it straight from the filesystem. We'll need to configure it using a directive such as SetHandler instead:

LoadModule helloworld_module modules/mod_helloworld.so
<Location /helloworld>
        SetHandler helloworld
</Location>

Here's a function to return a HelloWorld page to the client. The prototype is typical: it takes the request_rec (HTTP Request) object as a single argument, and returns an integer status code. The request_rec provides access to everything a handler might need (such as the variables available to a script) and also serves as an I/O descriptor, among other things:

static int helloworld_handler(request_rec *r) {
  /* First, some housekeeping. */
  if (!r->handler || strcasecmp(r->handler, "helloworld") != 0) {
    /* r->handler wasn't "helloworld", so it's none of our business */
    return DECLINED;
  }

  if (r->method_number != M_GET) {
    /* We only accept GET and HEAD requests.
     * They are identical for the purposes of a content generator
     * Returning an HTTP error code causes Apache to return an
     * error page (ErrorDocument) to the client.
     */
    return HTTP_METHOD_NOT_ALLOWED;
  }

  /* OK, we're happy with this request, so we'll return the response. */

  ap_set_content_type(r, "text/html");
  ap_rputs("<title>Hello World!</title> .... etc", r);

  /* we return OK to indicate that we have successfully processed
   * the request.  No further processing is required.
   */
  return OK;
}

So, that's our handler function. Now we need to hook it in to Apache's processing, so it will be run when we get a request for /helloworld. We use a special function that runs at server startup to register our handler with Apache:

static void helloworld_hooks(apr_pool_t *pool) {
  /* hook helloworld_handler in to Apache */
  ap_hook_handler(helloworld_handler, NULL, NULL, APR_HOOK_MIDDLE);
}

This hooks function is itself part of the module object. For most modules, this is the only symbol exported and visible to other modules or the core:

module AP_MODULE_DECLARE_DATA helloworld_module = {
  STANDARD20_MODULE_STUFF,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  helloworld_hooks
};

And that's all! We have a complete HelloWorld module. We can use apxs, a compiler-wrapper that is part of the Apache installation, to compile and (as root) install it:

$ apxs -c mod_helloworld.c
# apxs -ie mod_helloworld.la

Well, as usual I'm over the 1000 words, so I'll bring this first look at modules to a close. For further information, stay tuned. If you're seriously interested, my book is now in production with the publisher, so for the first time you have a more than just the source code and a handful of ad-hoc materials to help upgrade your LAMP and application server skills!

Sponsored: Driving business with continuous operational intelligence