Don't Forget The 'C' in Objective-C
Part 2: Runtime efficiency issues in Mac Cocoa programming
Last time round, we looked at the way an unnamed developer had used Cocoa routines to chop up a simple C-string in order to determine whether or not it contained a particular, named OpenGL extension name.
A few folks pointed out that if, hypothetically speaking, some new extension name gets devised that happens to be a superstring of the extension name we're after, my simple
strstr solution would break. Absolutely right, but this rather missed the point of the discussion which was a focus on writing efficient code in Cocoa and using the right tools for the job.
This time, I want to dig a little deeper by looking at the code generated by gcc in response to your carefully crafted Objective-C code. I'm hoping to demonstrate that you will often get more than you bargained for.
To begin with, consider a simple method invocation such as:
NSCursor * cursor = [NSCursor crosshairCursor];
This returns an instance of the cross-hair cursor. On an Intel machine, this will generate the code shown below.
Listing One 00001e60 movl 0x0000501c,%eax crosshairCursor 00001e65 movl %eax,0x04(%esp,1) 00001e69 movl 0x0000502c,%eax NSCursor 00001e6e movl %eax,(%esp,1) 00001e71 calll _objc_msgSend
objc_msgSend routine lies at the heart of Objective-C message dispatching. The first parameter it takes is effectively a pointer to the
NSCursor class since we're calling a class method rather than instance method. The second parameter is the selector for the
crosshairCursor method. Contrary to what some folks think, there is nothing remotely magical about selectors. A selector is simply a pointer to a good old-fashioned C string containing the selector name. After the call to
objc_msgSend, the EAX register will contain the wanted
The Objective-C message dispatcher is reasonably efficient, especially second time around. This is because once it's seen a particular message selector for a particular class; it caches the address of the corresponding method making subsequent lookups more efficient.
Note: If you want to get a better idea of
objc_msgSend's workings, and don't mind wading knee-deep through PPC code, see here where ways of speeding up Objective-C dispatching are discussed. The "real" source code is also available as part of OpenDarwin.
Rather than focusing on execution speed, I want to look here at the size of generated code. In Listing One, we've eaten up 22 bytes in making that method call. Not a big deal you might say, and you'd be right. But it amazes me how often I see code that looks like this:
Str1 = [[NSUserDefaults standardUserDefaults] stringForKey: @"myStr"]; Int1 = [[NSUserDefaults standardUserDefaults] integerForKey: @"myInt"]; Float1 = [[NSUserDefaults standardUserDefaults] floatForKey: @"myFloat"]; ..etc..
This sort of code pattern is so ubiquitous in Cocoa programming that I expect to be universally criticised for criticising it. But I ask you: what's the point in spending another 22 bytes of code to retrieve the same
NSUserDefaults object that was being used on the previous line? Do Cocoa developers believe that there's something inherently volatile about these shared globals? After all, beneath the covers,
[NSUserDefaults standardUserDefaults] is doing nothing more than providing thread-safe access within the AppKit library.
If you've got a complex application that needs to initialise a whole slew of user preferences, you can easily chew up several hundred bytes with repeated calls to retrieve the shared defaults object. Personally, I'd make the call once, store the
NSUserDefaults object in a local variable, and then use that inside my initialisation routine.
Of course, the nay-sayers will complain, "But we've always done it that way", "I've got a quad-core development machine", "Who cares about a few hundred extra bytes?", yada, yada, yada. That's fine. Presumably, those same folks sprinkle their code liberally with gratuitous calls to the
sleep library routine just to make sure things don't execute too quickly, and lots of extra
malloc calls just to use up that surplus memory, right?
Moving swiftly on, take a look at the seemingly innocent line of code below:
NSRect insetRect = NSInsetRect ([self frame], 2.0, 2.0);
The gcc compiler absolutely loves rectangles, or any data structure for that matter. Once you give it an
NSRect to play with, it just doesn't want to let go. Listing Two shows how that one-liner balloons out into 147 bytes of code.
Listing Two 00002902 movl 0x08(%ebp),%ecx 00002905 leal 0x00002754(%ebx),%eax frame 0000290b movl (%eax),%eax 0000290d leal 0xb8(%ebp),%edx 00002910 movl %eax,0x08(%esp,1) 00002914 movl %ecx,0x04(%esp,1) 00002918 movl %edx,(%esp,1) 0000291b calll _objc_msgSend_stret 00002920 subl $0x04,%esp 00002923 movl 0xb8(%ebp),%eax 00002926 movl %eax,0xe8(%ebp) 00002929 movl 0xbc(%ebp),%eax 0000292c movl %eax,0xec(%ebp) 0000292f movl 0xc0(%ebp),%eax 00002932 movl %eax,0xf0(%ebp) 00002935 movl 0xc4(%ebp),%eax 00002938 movl %eax,0xf4(%ebp) 0000293b leal 0xa8(%ebp),%edx 0000293e leal 0x00000728(%ebx),%eax 00002944 movl (%eax),%eax 00002946 movl %eax,0x18(%esp,1) 0000294a leal 0x00000728(%ebx),%eax 00002950 movl (%eax),%eax 00002952 movl %eax,0x14(%esp,1) 00002956 movl 0xe8(%ebp),%eax 00002959 movl %eax,0x04(%esp,1) 0000295d movl 0xec(%ebp),%eax 00002960 movl %eax,0x08(%esp,1) 00002964 movl 0xf0(%ebp),%eax 00002967 movl %eax,0x0c(%esp,1) 0000296b movl 0xf4(%ebp),%eax 0000296e movl %eax,0x10(%esp,1) 00002972 movl %edx,(%esp,1) 00002975 calll _NSInsetRect 0000297a subl $0x04,%esp 0000297d movl 0xa8(%ebp),%eax 00002980 movl %eax,0xd0(%ebp) 00002983 movl 0xac(%ebp),%eax 00002986 movl %eax,0xd4(%ebp) 00002989 movl 0xb0(%ebp),%eax 0000298c movl %eax,0xd8(%ebp) 0000298f movl 0xb4(%ebp),%eax 00002992 movl %eax,0xdc(%ebp)
There are a few things to explain in the code listing above, which I'll highlight by referring to a hex address. To begin with, Cocoa methods such as
[NSView frame] and the non-OOP
NSInsetRect routine appear to return a structure as the function result. In fact, they don't. Instead, under the hood, they pass a hidden pointer argument ($2918 and $2972) which is used by the called routine to store the returned structure, a
NSRect in this case.
This would be ok if gcc behaved itself, but – like I said – it loves fooling around with data structures. When the call to
[self frame] is made, you may notice ($290D) that the target routine is instructed to place the resulting
NSRect into the "quartet" of implicit local variables at offsets $B8/$BC/$C0/$C4. However, once the call is made, the rectangle is immediately copied out of these four memory locations ($2923) into another quartet at $E8/EC/F0/F4. Why? Why not just pass the offset of the second quartet to
[self frame] in the first place?
The rectangle then gets copied again (!) when it's moved onto the stack prior to the call to
NSInsetRect ($2956). Finally, after the call to
NSInsetRect, the resulting inset rectangle is moved from the local variables at $A8/$AC/$B0/$B4 to $D0/$D4/D8/$DC.
You're probably thinking that I'm cheating here; surely gcc doesn't really generate that much code, does it? Ok, I am cheating – a little. Listing Two was created with compiler optimisation turned off – it's the setting you'd get when using the Debug configuration under XCode. If you switch to the Release configuration and rebuild, the above code snippet will shrink to 118 bytes, which is better, but definitely not awesome. The compiler still insists on having NSRect-returning routines place the returned function result in one place, and then pointlessly copy it some place else. But at least with optimisation turned on, the compiler is smart enough to use the esi, ebx, ecx and edx registers as an intermediate store for the four rectangle fields, thus saving on some memory reads.
Note: Incidentally, you'd be surprised just how many commercial applications are released using the Debug configuration. The majority of apps that I've peeked inside show the same sloppy code generation as is typified in Listing Two.
What's the bottom line here? The biggest obstacle to runtime efficiency is that we're passing around the rectangle by value rather than by reference. As I've mentioned, the routines that return an
NSRect do actually use a hidden pointer implementation, but when passing an
NSRect to a routine, all 16 bytes are passed on the stack. There are obviously sound reasons for doing things this way, the principal one being that if (for example) you call
[NSScreen frame] to retrieve the dimensions of a screen, you really don't want the application getting access to the real data; you must always give a copy of your data to the calling code.
Finally, here's a little teaser for you experienced Cocoa-heads. Going back to Listing One, you may notice that the "self" pointer (either a pointer to an object instance, or to a class if we're dealing with a class method such as
[NSCursor crosshairCursor]) is always passed last in the EAX register before the call. Can you figure out why? ®