Original URL: https://www.theregister.com/2007/05/25/c_objective-c_part2/

Don't Forget The 'C' in Objective-C

Part 2: Runtime efficiency issues in Mac Cocoa programming

By Dave Jewell

Posted in Software, 25th May 2007 16:07 GMT

Last time round, we looked at the way an unnamed developer had used Cocoa routines to chop up a simple C-string in order to determine whether or not it contained a particular, named OpenGL extension name.

A few folks pointed out that if, hypothetically speaking, some new extension name gets devised that happens to be a superstring of the extension name we're after, my simple strstr solution would break. Absolutely right, but this rather missed the point of the discussion which was a focus on writing efficient code in Cocoa and using the right tools for the job.

This time, I want to dig a little deeper by looking at the code generated by gcc in response to your carefully crafted Objective-C code. I'm hoping to demonstrate that you will often get more than you bargained for.

To begin with, consider a simple method invocation such as:

NSCursor * cursor = [NSCursor crosshairCursor];

This returns an instance of the cross-hair cursor. On an Intel machine, this will generate the code shown below.

Listing One

00001e60 movl        0x0000501c,%eax              crosshairCursor
00001e65 movl        %eax,0x04(%esp,1)
00001e69 movl        0x0000502c,%eax              NSCursor
00001e6e movl        %eax,(%esp,1)
00001e71 calll       _objc_msgSend

The objc_msgSend routine lies at the heart of Objective-C message dispatching. The first parameter it takes is effectively a pointer to the NSCursor class since we're calling a class method rather than instance method. The second parameter is the selector for the crosshairCursor method. Contrary to what some folks think, there is nothing remotely magical about selectors. A selector is simply a pointer to a good old-fashioned C string containing the selector name. After the call to objc_msgSend, the EAX register will contain the wanted NSCursor instance.

The Objective-C message dispatcher is reasonably efficient, especially second time around. This is because once it's seen a particular message selector for a particular class; it caches the address of the corresponding method making subsequent lookups more efficient.

Note: If you want to get a better idea of objc_msgSend's workings, and don't mind wading knee-deep through PPC code, see here where ways of speeding up Objective-C dispatching are discussed. The "real" source code is also available as part of OpenDarwin.

Rather than focusing on execution speed, I want to look here at the size of generated code. In Listing One, we've eaten up 22 bytes in making that method call. Not a big deal you might say, and you'd be right. But it amazes me how often I see code that looks like this:

Str1 = [[NSUserDefaults standardUserDefaults] stringForKey: @"myStr"];
Int1 = [[NSUserDefaults standardUserDefaults] integerForKey: @"myInt"];
Float1 = [[NSUserDefaults standardUserDefaults] floatForKey: @"myFloat"];
..etc..

This sort of code pattern is so ubiquitous in Cocoa programming that I expect to be universally criticised for criticising it. But I ask you: what's the point in spending another 22 bytes of code to retrieve the same NSUserDefaults object that was being used on the previous line? Do Cocoa developers believe that there's something inherently volatile about these shared globals? After all, beneath the covers, [NSUserDefaults standardUserDefaults] is doing nothing more than providing thread-safe access within the AppKit library.

If you've got a complex application that needs to initialise a whole slew of user preferences, you can easily chew up several hundred bytes with repeated calls to retrieve the shared defaults object. Personally, I'd make the call once, store the NSUserDefaults object in a local variable, and then use that inside my initialisation routine.

Of course, the nay-sayers will complain, "But we've always done it that way", "I've got a quad-core development machine", "Who cares about a few hundred extra bytes?", yada, yada, yada. That's fine. Presumably, those same folks sprinkle their code liberally with gratuitous calls to the sleep library routine just to make sure things don't execute too quickly, and lots of extra malloc calls just to use up that surplus memory, right?

Moving swiftly on, take a look at the seemingly innocent line of code below:

NSRect insetRect = NSInsetRect ([self frame], 2.0, 2.0);

The gcc compiler absolutely loves rectangles, or any data structure for that matter. Once you give it an NSRect to play with, it just doesn't want to let go. Listing Two shows how that one-liner balloons out into 147 bytes of code.

Listing Two

00002902  movl        0x08(%ebp),%ecx
00002905  leal        0x00002754(%ebx),%eax        frame
0000290b  movl        (%eax),%eax
0000290d  leal        0xb8(%ebp),%edx
00002910  movl        %eax,0x08(%esp,1)
00002914  movl        %ecx,0x04(%esp,1)
00002918  movl        %edx,(%esp,1)              
0000291b  calll       _objc_msgSend_stret
00002920  subl        $0x04,%esp
00002923  movl        0xb8(%ebp),%eax
00002926  movl        %eax,0xe8(%ebp)
00002929  movl        0xbc(%ebp),%eax
0000292c  movl        %eax,0xec(%ebp)
0000292f  movl        0xc0(%ebp),%eax
00002932  movl        %eax,0xf0(%ebp)
00002935  movl        0xc4(%ebp),%eax
00002938  movl        %eax,0xf4(%ebp)
0000293b  leal        0xa8(%ebp),%edx
0000293e  leal        0x00000728(%ebx),%eax
00002944  movl        (%eax),%eax
00002946  movl        %eax,0x18(%esp,1)
0000294a  leal        0x00000728(%ebx),%eax
00002950  movl        (%eax),%eax
00002952  movl        %eax,0x14(%esp,1)
00002956  movl        0xe8(%ebp),%eax
00002959  movl        %eax,0x04(%esp,1)
0000295d  movl        0xec(%ebp),%eax
00002960  movl        %eax,0x08(%esp,1)
00002964  movl        0xf0(%ebp),%eax
00002967  movl        %eax,0x0c(%esp,1)
0000296b  movl        0xf4(%ebp),%eax
0000296e  movl        %eax,0x10(%esp,1)
00002972  movl        %edx,(%esp,1)
00002975  calll       _NSInsetRect
0000297a  subl        $0x04,%esp
0000297d  movl        0xa8(%ebp),%eax
00002980  movl        %eax,0xd0(%ebp)
00002983  movl        0xac(%ebp),%eax
00002986  movl        %eax,0xd4(%ebp)
00002989  movl        0xb0(%ebp),%eax
0000298c  movl        %eax,0xd8(%ebp)
0000298f  movl        0xb4(%ebp),%eax
00002992  movl        %eax,0xdc(%ebp)

There are a few things to explain in the code listing above, which I'll highlight by referring to a hex address. To begin with, Cocoa methods such as [NSView frame] and the non-OOP NSInsetRect routine appear to return a structure as the function result. In fact, they don't. Instead, under the hood, they pass a hidden pointer argument ($2918 and $2972) which is used by the called routine to store the returned structure, a NSRect in this case.

This would be ok if gcc behaved itself, but – like I said – it loves fooling around with data structures. When the call to [self frame] is made, you may notice ($290D) that the target routine is instructed to place the resulting NSRect into the "quartet" of implicit local variables at offsets $B8/$BC/$C0/$C4. However, once the call is made, the rectangle is immediately copied out of these four memory locations ($2923) into another quartet at $E8/EC/F0/F4. Why? Why not just pass the offset of the second quartet to [self frame] in the first place?

The rectangle then gets copied again (!) when it's moved onto the stack prior to the call to NSInsetRect ($2956). Finally, after the call to NSInsetRect, the resulting inset rectangle is moved from the local variables at $A8/$AC/$B0/$B4 to $D0/$D4/D8/$DC.

You're probably thinking that I'm cheating here; surely gcc doesn't really generate that much code, does it? Ok, I am cheating – a little. Listing Two was created with compiler optimisation turned off – it's the setting you'd get when using the Debug configuration under XCode. If you switch to the Release configuration and rebuild, the above code snippet will shrink to 118 bytes, which is better, but definitely not awesome. The compiler still insists on having NSRect-returning routines place the returned function result in one place, and then pointlessly copy it some place else. But at least with optimisation turned on, the compiler is smart enough to use the esi, ebx, ecx and edx registers as an intermediate store for the four rectangle fields, thus saving on some memory reads.

Note: Incidentally, you'd be surprised just how many commercial applications are released using the Debug configuration. The majority of apps that I've peeked inside show the same sloppy code generation as is typified in Listing Two.

What's the bottom line here? The biggest obstacle to runtime efficiency is that we're passing around the rectangle by value rather than by reference. As I've mentioned, the routines that return an NSRect do actually use a hidden pointer implementation, but when passing an NSRect to a routine, all 16 bytes are passed on the stack. There are obviously sound reasons for doing things this way, the principal one being that if (for example) you call [NSScreen frame] to retrieve the dimensions of a screen, you really don't want the application getting access to the real data; you must always give a copy of your data to the calling code.

Finally, here's a little teaser for you experienced Cocoa-heads. Going back to Listing One, you may notice that the "self" pointer (either a pointer to an object instance, or to a class if we're dealing with a class method such as [NSCursor crosshairCursor]) is always passed last in the EAX register before the call. Can you figure out why? ®