God's gift to C

Crossing platforms with Apache Portable Runtime

Top 5 reasons to deploy VMware with Tegile

OK, here it is: a proper developer article. After all, this is Reg Developer, so it's about time. Today's subject is an Apache spinoff project: the APR lies at the heart of the webserver, but is also a standalone library, and is extensively used in separate projects: most famously Subversion.

But before diving in, here are a couple of things for your diary:

  • ApacheCon, the main Apache conference, will take place in Dublin, 26-30 June (two days tutorials + three days conference). Earlybird registration is available until 29 May. So, start working on the boss!
  • If you are a student, Google's Summer of Code may offer an opportunity to undertake a really interesting open source project, and get paid for it. Apache is one of many open source organisations participating.

The need for APR

Despite its age and patchy heritage, C has a stronger claim than any other programming language to be the industry standard. Yet, it has many shortcomings, and typically requires quite a lot more programming effort to work with than higher-level and more modern languages. For example, C lacks dynamically resizing strings and arrays, and data types such as hash tables that are standard in scripting languages. More fundamentally, C leaves all resource management in the hands of the programmer, and avoiding memory (or other resource) leaks can be a major chore. Finally, it is harder to write portable, cross-platform code in C than in scripting languages.

APR serves to deal with these issues, bridging the gap between C and a scripting language in terms of programmer productivity. The basic areas dealt with by APR include resource management, cross-platform programming, and a range of utility classes.

Resource Management

Perhaps the most fundamental barrier to productivity in C is the problem of resource management. C programmers are responsible for all resource allocation, and (generally much harder) ensuring that resources are always cleaned up after use, but never used after cleanup! Dealing with resource management can consume an utterly disproportionate amount of programmer effort, and generate very difficult bugs.

APR's solution to this is "pools", which lie at the heart of APR and Apache. Pools serve to allocate memory (faster than malloc on most platforms), and to ensure it is cleaned up at the appropriate time. They can also register cleanups for other resources, for example, to close a filehandle or socket, or release a lock. Typical usage of a pool is to tie it to an object with a well-defined lifetime (such as, in the webserver, a TCP connection or HTTP request), so objects can be allocated with that same lifetime and then just left.

Dynamic Resources

A secondary shortcoming in C is the lack of built-in support for dynamically-resizing resources such as strings and arrays. APR provides this too, built on top of dynamic memory allocation with pools. For example, whereas in C's stdio, we have:

sprintf(buf, fmt, varargs);

APR's strings module gives us instead:

buf = apr_psprintf(pool, fmt, varargs);

freeing the programmer from the need to compute the size of buf and allocate in advance, or make a guess and leave the code vulnerable to buffer overflows.

Basic Class Library

Higher-level and scripting languages have at least dynamic arrays and hashes as native datatypes. APR provides these for C programmers:

  • Array is implemented as a stack, and can be used as an automatically-resizing array or queue.
  • Hash is a hash table, in which keys and entries are (pointers to) arbitrary data types.
  • Table is a table indexed by character strings. As such, it is less general than the hash, but it supports a number of additional operations, such as merging multiple values for a key into a comma-separated list. It is used extensively in Apache to represent tables such as the HTTP headers in a request/response, where these operations are required.
  • Ring is a doubly-linked list. It is implemented in macros, and resembles a C++ Template.
  • Queue is a thread-safe FIFO queue.
  • Bucket is an arbitrary data container or source, that lies at the heart of Apache's I/O. Buckets are contained in bucket brigades, which are an instantiation of the ring.

Dynamic Resource Pools

Another level of resource management is the apr_reslist, which manages a dynamically-resizable pool of typically-complex resources. An example is Apache's DBD architecture, that uses a reslist to implement a pool of connections to a backend SQL database.

Portability Layer

Aside from resource management, the other fundamental purpose of the APR is to provide a common cross-platform API for operations falling outside standard C and involving a platform-specific library. This aims to encompass the platform-dependent operations likely to be used by Apache and its modules, such as:

  • I/O, including network and filesystem operations.
  • Process and thread management, conditions, mutexs.
  • Dynamic loading of code.
  • Identity management and system security.
  • Shared memory and memory mapping.
  • Signals, Events, Environment.

In addition to specific modules, there are general portability aids, such as pre-processor macros that expand to platform-specific declarations where necessary (Windows' dllimport/dllexport being prime examples where vendor-defined information that should belong to the build flags has to go in the source code).

Your humble scribe has worked mostly in environments where portability is not an issue. Before getting seriously involved with Apache, I'd ported Perl and Java reasonably painlessly, but found C and C++ more trouble than they were worth for nontrivial jobs. Writing Apache modules I've found I can develop on Linux or BSD and later compile on Solaris, MacOSX, or even Windows with no extra effort, or at worst half an hour’s worth of simple hacks. Of course, the APR portability layer is the key to this. As proof of the pudding, Site Valet (for which I am responsible) is moving from C++ STL-based classes to APR-based classes to make it maintainable across platforms.

More ..

In addition to these core functions, APR provides a range of utilities. On the one hand, basic but important things such as time/date and cryptographic APIs. On the other hand, high-level abstractions such as apr_dbm (DBM databases), apr_dbd (SQL databases), apr_ldap, and apr_memcache, in the tradition of scripting language abstractions such as Perl's Tie/AnyDBM and DBI/DBD. Last but not least, APR makes use of the pre-processor to implement the infrastructure for hooks that form the basis for the Apache module API and related constructs.

Using APR

As usual, I've passed the thousand words while still having a lot to say. So rather than show usage examples and the shape of an APR-based program here, I'll give you some URLs for further reading. If this article has aroused your interest in using the APR, INOUE Seiichiro's tutorial is a great introduction. Tutorials on selected APR topics in the context of Apache exist at Apache Tutor. My forthcoming book Applications Development with Apache devotes a complete chapter to a more extensive introduction to the APR. Finally, of course, the APR project site includes general information, downloads and API documentation. ®

Providing a secure and efficient Helpdesk

More from The Register

next story
Google+ goes TITSUP. But WHO knew? How long? Anyone ... Hello ...
Wobbly Gmail, Contacts, Calendar on the other hand ...
Preview redux: Microsoft ships new Windows 10 build with 7,000 changes
Latest bleeding-edge bits borrow Action Center from Windows Phone
Microsoft promises Windows 10 will mean two-factor auth for all
Sneak peek at security features Redmond's baking into new OS
UNIX greybeards threaten Debian fork over systemd plan
'Veteran Unix Admins' fear desktop emphasis is betraying open source
Google opens Inbox – email for people too stupid to use email
Print this article out and give it to someone techy if you get stuck
DEATH by PowerPoint: Microsoft warns of 0-day attack hidden in slides
Might put out patch in update, might chuck it out sooner
Redmond top man Satya Nadella: 'Microsoft LOVES Linux'
Open-source 'love' fairly runneth over at cloud event
prev story


Cloud and hybrid-cloud data protection for VMware
Learn how quick and easy it is to configure backups and perform restores for VMware environments.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.