Original URL: http://www.theregister.co.uk/2006/09/16/human_iterate/

To iterate is human

Why it’s three, not four – or five

By Kevlin Henney

Posted in Developer, 16th September 2006 10:02 GMT

Column In the previous article, I made the following observation:

A collection that holds its elements but doesn't allow you to traverse them is unlikely to prove popular. There are many ways to offer traversal, but if the caller needs to be able to know the position of elements in some way there are essentially only three general designs that keep the collection's internal representation hidden from the caller.

And a reader made the following comment : A fourth option, which is in the general case superior to all three of your options, is the internal iterator. It delegates management of the iteration to the collection, rather than adding repetitive boilerplate to your functional code.

In this particular case, the design pattern mentioned is also known as the Enumeration Method pattern. The use of this pattern was mentioned in the article, along with a link to a corresponding article, although it appears that it was overlooked. However, the posted comment touches on a whole topic area that deserves more attention than there was either space or topic focus for in the previous article. In fact, it deserves at least a whole article!

In essence, where Iterator is normally characterised by the introduction of an additional kind of object that performs iteration over some kind of collection, Enumeration Method introduces a method on the collection that calls back on a piece of supplied code for each element in the collection. The key benefit here is that you are only dealing with a single unit of design responsibility. All the mechanics of iteration are contained within the collection. Very compact, very cohesive. Nice — just what we want.

However, there are some other considerations at play: some are related to matters of perception; others are related to questions of practice; the rest are related to requirements and design goals; all are related to understanding the context that frame, and the forces that drive, a particular pattern's application.

The Gang of Four tried to shoehorn this additional iteration pattern, in the guise of Internal Iterator, into their write up of Iterator, but it must be said with limited success. Its presence is fairly low key, with most of the detail tucked away towards the end of the pattern write up, the last item in the Sample Code section. The unification of these two different design approaches was something of a compromise to try to accommodate a single iteration pattern for all OO-related languages. Specifically, an attempt to square a C++ view of the world with a Smalltalk view of the world... which is always a challenge.

However, the fundamentally different design structures, philosophies and trade-offs of the two approaches undermines any claim that they can be considered the same pattern. A pattern represents a recurring design solution, with an associated set of consequences, to a recurring problem whose forces are understood and arise within a specific context. It turns out that although both can be characterised in the general sense as iteration patterns the similarity ends there: the two approaches have almost nothing in common; the consequences of applying each one have almost nothing in common; indeed, even the problem forces that they resolve differ in the detail.

As a distinct pattern, Enumeration Method was first properly documented by Kent Beck in Smalltalk Best Practice Patterns. However, it is not a pattern that is restricted to Smalltalk: it can be applied in C, using function pointers, such as EnumChildWindows in the Windows API; it can be used in Java, based on the common Command pattern and the specific use of inner classes to achieve a sense of closure; it is the common form of iteration in Ruby, which supports blocks as objects directly, and where these are commonly (but confusingly, for our purposes) also known as iterators; for the functional programmers amongst you, it is in essence the map function from an object-centred perspective.

Of course, the ease with which Enumeration Method can be implemented and used, and therefore its applicability, is an important consideration. It would be too simple to claim that Iterators are necessarily more verbose than Enumeration Methods and that Enumeration Methods are generally superior, for the simple reason that such a claim needs to be made and measured against a specific context. Understanding the role context plays in design is perhaps one of the most important, but most overlooked, aspects of successful pattern application.

In a language where blocks are supported natively as objects, such as Smalltalk and Ruby, implementing Iterators without appropriate cause might be considered quite curious and more than a little gratuitous. However, although it is said that "what's good for the goose is good for the gander", it doesn't follow that it's always such good source for the mongoose.

In a language that doesn't support closures, it turns out that even though implementing the Enumeration Method itself is normally easy, using it can be something of a pain, shifting the complexity from the collection writer to the collection user. This applies to a greater or lesser degree depending on what other features a language supports and what its native library style is. For example, assuming that closures are adopted in Java 6, Enumeration Method will become easier to implement and use in Java. For the moment, however, although anonymous inner classes make a block-like approach possible, the resulting syntactic overhead is somewhat cumbersome if you aren't getting any obvious additional benefit. So unless there is a specific reason to do otherwise, such as recursive traversal or synchronized traversal, it is far wiser to favour Iterator as the default approach in Java: both the language and the library are geared up to support it, and writing an Iterator correctly is not a significant challenge.

We could go on to talk about Python's approach to iteration, or the diversity of styles that can be conveniently supported in C# 2.0, or the style of iteration used in C++ that supports the concept of generic programming, or understand how simple and effective map is in Scheme, or the relationship between iteration in Ruby and CLU, and so on. But, for the sake of brevity, I'll stop the language listing there. Hopefully you get the general idea: there is no single option that is best across all languages.

So, whatever its merits elsewhere, if a particular pattern cuts across the grain of a language's features and its received idioms, it is normally easier to go with the flow than against it. You don't want to be writing code that is unnecessarily complex by virtue of an unquestioned idiom import from elsewhere; an idiom that fails to add any noticeable advantage over the more native idiom for a typical case of application.

But let's be clear: that is "normally easier", not "always easier". It pays to be a polyglot: you want your design gestalt to contain more than just a nice set of unquestioned defaults informed only by a single language. To be able to select the mot juste, whatever the situation, you want your design vocabulary to be able to draw on multiple sources. You need more than one idea.

I've already mentioned that for recursive data structures, Enumeration Method is much easier to implement than Iterator. Likewise, where you have a collection shared between threads and you want to support uninterrupted traversals, it is much easier to use. These situations sound quite specific, but having explained why Enumeration Method is a less appropriate approach than Iterator for most uses of iteration in certain of languages, I would like to ensure that not too much design territory is ceded.

Enumeration Method is a general design pattern, not just a language-specific idiom. In the right situation, it offers a number of significant benefits. For example, if you have a relatively stable set of actions you want to perform during an iteration, the fact that a language does not support closures becomes less of an issue. Passing blocks as objects is particularly effective when iteration is ad hoc but, when the kinds of loop action you have are well characterised and bounded, common actions can be wrapped up and provided as predefined Command classes.

Alternatively, when traversing aggregate data structures holding different kinds of elements, such as objects that represent the syntax structure of source code or a document, Enumeration Method with a Visitor interface offers a far simpler programming model than working multiple nested loops, multiple Iterator types and explicit runtime type checking. It is also much easier to write unit tests against such structures by passing Mock Objects to the Enumeration Method.

But what of the original observation that kicked off this article? If you recall:

There are many ways to offer traversal, but if the caller needs to be able to know the position of elements in some way there are essentially only three general designs that keep the collection's internal representation hidden from the caller.

Whenever you mention magic numbers — three in this case — you implicitly invite others to check your working (in a discipline where we don't do nearly as much reviewing and checking as we should, this is no bad thing). The claim is that I was off by one and that, including Enumeration Method (or Internal Iterator), there are four. Given that I seem to be quite a fan of Enumeration Method, have written about it going back many years, and even mention it in the very same article, is this correction correct? Not quite, and it is worth understanding why.

Had the problem to be solved just been how to provide some mechanism for traversal that kept the collection's internal representation hidden from the caller, three would have been right out. In fact, so would four. Here's a summary of the three approaches from the previous article:

First, it is possible to use an index into the collection. [...] The second approach is to internalise a cursor within the collection. [...] And the third option is to introduce an Iterator object.

And then add Enumeration Method. And then add Batch Method. And then you can choose whether or not to count separately variations of these or combinations that draw on more than one approach, such as Batch Iterator (also known as Chunky Iterator).

However, the phrasing of the design issue was more constrained:

... if the caller needs to be able to know the position of elements in some way...

This constraint is important because not only does it subset the available solutions, it also highlights a significant distinction between Iterator and Enumeration Method. While many choices are driven by idiom and context, keep in mind that the detail of the problem being solved casts an important vote (or veto) in choosing an appropriate solution. In this particular case, one of Enumeration Method's strengths — that of fully encapsulating any concept of position and iteration mechanism from the caller — is a mismatch for the problem we want to solve. What are needed are solutions that have the property of both traversal and persistent position, hence the inclusion of Iterator and the exclusion of Enumeration Method.

So, to summarise, Iterator supports traversal by encapsulating position and offering traversal control whereas Enumeration Method supports traversal by encapsulating the whole loop and all of its associated mechanisms. Specific requirements and general context help to determine which is the more appropriate solution in a given situation. ®