God's gift to C
Crossing platforms with Apache Portable Runtime
OK, here it is: a proper developer article. After all, this is Reg Developer, so it's about time. Today's subject is an Apache spinoff project: the APR lies at the heart of the webserver, but is also a standalone library, and is extensively used in separate projects: most famously Subversion.
But before diving in, here are a couple of things for your diary:
- ApacheCon, the main Apache conference, will take place in Dublin, 26-30 June (two days tutorials + three days conference). Earlybird registration is available until 29 May. So, start working on the boss!
- If you are a student, Google's Summer of Code may offer an opportunity to undertake a really interesting open source project, and get paid for it. Apache is one of many open source organisations participating.
The need for APR
Despite its age and patchy heritage, C has a stronger claim than any other programming language to be the industry standard. Yet, it has many shortcomings, and typically requires quite a lot more programming effort to work with than higher-level and more modern languages. For example, C lacks dynamically resizing strings and arrays, and data types such as hash tables that are standard in scripting languages. More fundamentally, C leaves all resource management in the hands of the programmer, and avoiding memory (or other resource) leaks can be a major chore. Finally, it is harder to write portable, cross-platform code in C than in scripting languages.
APR serves to deal with these issues, bridging the gap between C and a scripting language in terms of programmer productivity. The basic areas dealt with by APR include resource management, cross-platform programming, and a range of utility classes.
Perhaps the most fundamental barrier to productivity in C is the problem of resource management. C programmers are responsible for all resource allocation, and (generally much harder) ensuring that resources are always cleaned up after use, but never used after cleanup! Dealing with resource management can consume an utterly disproportionate amount of programmer effort, and generate very difficult bugs.
APR's solution to this is "pools", which lie at the heart of APR and Apache. Pools serve to allocate memory (faster than malloc on most platforms), and to ensure it is cleaned up at the appropriate time. They can also register cleanups for other resources, for example, to close a filehandle or socket, or release a lock. Typical usage of a pool is to tie it to an object with a well-defined lifetime (such as, in the webserver, a TCP connection or HTTP request), so objects can be allocated with that same lifetime and then just left.
A secondary shortcoming in C is the lack of built-in support for dynamically-resizing resources such as strings and arrays. APR provides this too, built on top of dynamic memory allocation with pools. For example, whereas in C's stdio, we have:
sprintf(buf, fmt, varargs);
APR's strings module gives us instead:
buf = apr_psprintf(pool, fmt, varargs);
freeing the programmer from the need to compute the size of buf and allocate in advance, or make a guess and leave the code vulnerable to buffer overflows.
Basic Class Library
Higher-level and scripting languages have at least dynamic arrays and hashes as native datatypes. APR provides these for C programmers:
- Array is implemented as a stack, and can be used as an automatically-resizing array or queue.
- Hash is a hash table, in which keys and entries are (pointers to) arbitrary data types.
- Table is a table indexed by character strings. As such, it is less general than the hash, but it supports a number of additional operations, such as merging multiple values for a key into a comma-separated list. It is used extensively in Apache to represent tables such as the HTTP headers in a request/response, where these operations are required.
- Ring is a doubly-linked list. It is implemented in macros, and resembles a C++ Template.
- Queue is a thread-safe FIFO queue.
- Bucket is an arbitrary data container or source, that lies at the heart of Apache's I/O. Buckets are contained in bucket brigades, which are an instantiation of the ring.
Dynamic Resource Pools
Another level of resource management is the apr_reslist, which manages a dynamically-resizable pool of typically-complex resources. An example is Apache's DBD architecture, that uses a reslist to implement a pool of connections to a backend SQL database.
Aside from resource management, the other fundamental purpose of the APR is to provide a common cross-platform API for operations falling outside standard C and involving a platform-specific library. This aims to encompass the platform-dependent operations likely to be used by Apache and its modules, such as:
- I/O, including network and filesystem operations.
- Process and thread management, conditions, mutexs.
- Dynamic loading of code.
- Identity management and system security.
- Shared memory and memory mapping.
- Signals, Events, Environment.
In addition to specific modules, there are general portability aids, such as pre-processor macros that expand to platform-specific declarations where necessary (Windows' dllimport/dllexport being prime examples where vendor-defined information that should belong to the build flags has to go in the source code).
Your humble scribe has worked mostly in environments where portability is not an issue. Before getting seriously involved with Apache, I'd ported Perl and Java reasonably painlessly, but found C and C++ more trouble than they were worth for nontrivial jobs. Writing Apache modules I've found I can develop on Linux or BSD and later compile on Solaris, MacOSX, or even Windows with no extra effort, or at worst half an hour’s worth of simple hacks. Of course, the APR portability layer is the key to this. As proof of the pudding, Site Valet (for which I am responsible) is moving from C++ STL-based classes to APR-based classes to make it maintainable across platforms.
In addition to these core functions, APR provides a range of utilities. On the one hand, basic but important things such as time/date and cryptographic APIs. On the other hand, high-level abstractions such as apr_dbm (DBM databases), apr_dbd (SQL databases), apr_ldap, and apr_memcache, in the tradition of scripting language abstractions such as Perl's Tie/AnyDBM and DBI/DBD. Last but not least, APR makes use of the pre-processor to implement the infrastructure for hooks that form the basis for the Apache module API and related constructs.
As usual, I've passed the thousand words while still having a lot to say. So rather than show usage examples and the shape of an APR-based program here, I'll give you some URLs for further reading. If this article has aroused your interest in using the APR, INOUE Seiichiro's tutorial is a great introduction. Tutorials on selected APR topics in the context of Apache exist at Apache Tutor. My forthcoming book Applications Development with Apache devotes a complete chapter to a more extensive introduction to the APR. Finally, of course, the APR project site includes general information, downloads and API documentation. ®
Sponsored: Hyper-scale data management