The Register® — Biting the hand that feeds IT

Feeds

An introduction to static code analysis

What, why and how

Agentless Backup is Not a Myth

So, by some misfortune, usually instigated by "management" or by "tradition", you are stuck with a C/C++ program to maintain. Not only do you get the positives of speed, you also get the negatives of the lack of memory management.

How do you know what is lurking underneath the templates and calls to new and malloc? How do you know if that program is leaking memory or doing other dangerous things? In the past, you didn't. You could only scan the lines looking for errors or use tools like clint or valgrind.

Now, however, there are software tools that can help you to catch these kinds of errors before running the program. These tools are based on a technology called static code analysis and in this series, we will explore what this technology is, what tools are available, and what are its latest and future trends.

To illustrate the remarkable power of static code analysis we will use examples involving memory management and the use of pointer constructs in C/C++ since these are arguably one of the most error prone and hard to debug features in wide-spread programming languages. At this point, some might suggest that "no one writes in C/C++ anymore".

While garbage collected languages have become a large part of the programming marketplace, interestingly, C/C++ is still largely used in many critical domains: banking, embedded (automobiles), avionics, networking, operating systems, etc. Even when using a garbage collected language, they are often based on a Virtual Machine written in C/C++ which may itself have memory leaks or other errors.

Thanks to static analysis you can now automatically find those pesky mistakes (you know that your code is always perfect, right?). How does it work? First, a static analysis tool is a program which parses then analyses you source code. This means that in your toolchain the static analyser is an additional compiler. It does not produce binaries as such but produces an intermediate representation which is more suited than the source code to be analysed.

There are a variety of methods for analysis of your program: denotational semantics, axiomatic semantics, operational semantics, abstract interpretation, and separation logic. These rely on Formal Methods or complex mathematical/logical proofs against your code. We will cover these different methods in the next instalment. Suffice it to say now, one or more of these methods could be used to interrogate your code for different kinds of errors.

Depending on the methods used, as noted above, static analysis can take anywhere from a few minutes to much longer. Thus you must balance the time that analysis can take and the amount of time that you have available in your project. However, it is always a good idea to have one last run before sending code into production, no matter how long that might take.

An example will illustrate this process. Suppose you have a program like so:


#include <stdlib.h>

int *example1();
void alloc0(int **, int);

int main(int argc, char **argv) {
  example1();

  return EXIT_SUCCESS;
}

int *example1() {
  int *i;

  alloc0(&i, 0); // ok
  alloc0(&i, 1); // ok, malloc 
  alloc0(&i, 1); // leak: i overwritten
  return i;
}

void alloc0(int **i, int b) {
  if (b) *i = malloc(sizeof (int));
  else *i = 0;
}

As you can see in the foregoing contrived example, there is an error in the example1() function which has to do with pointer manipulation. Essentially, i is overwritten without first freeing the memory allocated in the second call to alloc0().

This kind of bug is easy to miss and can cause massive problems in long running programs or in programs which are critical to your infrastructure. What a static analyser will do with your code is attempt to find some of the errors like the one above before you run the code (bug catching), therefore avoiding the bug when your client is using your software.

However, this is not the end of the story. Catching some bugs is typical of static analysers based on static code analysis 1.0. Researchers in the field have recently introduced a new capability which they believe will revolutionise the static code analysis arena: the ability to prove the absence of bugs.

This represents a dramatic leap forward in the technology, and some experts in the field have already named this as static analysis 2.0 (using the metaphor of Web 2.0 vs Web 1.0). To appreciate the difference, an analyser from the static analysis 2.0 category is effectively a "bug eradicator" able to eliminate all the bugs and showing an evidence that the code has been depurated. On the contrary, tools in the static analysis 1.0 category would be only able to find some existing bugs.

As you can see, knowing where subtle bugs like memory leaks and other pointer errors are is extremely valuable. Static analysis of your C/C++ code helps removing some of the most troublesome errors imaginable. It also takes away from the tedium of having to look at sometimes quite subtle errors such as memory allocation errors. In the next part of our series, we will show some of the maths behind the scenes.

Copyright © 2011 Christopher Yocum / monoidics

Chris is a Static Code Wrangler at monoidics.com

Regcast training : Hyper-V 3.0, VM high availability and disaster recovery

Anonymous Coward

Amazing piece!

Especially since the programming language referenced doesn't actually exist.

Yes, pedantic. But honestly, even if this is lazy shorthand for "C and C++", the code is essentially C and completely ignores C++, as does the rest of the text apart from being lazy. C++ brings powerful new tools but with them come equally dangerous snags and gotchas, that in turn can be mitigated by careful use of those powerful tools. But not a peep from the author. And, of course, it pretends no other tools than static code analysis existed before now, which isn't true.

Even a decent libc will provide decent reporting and debugging tools for the memory management this author claims doesn't exist in this fictional language. Well, maybe it doesn't. But I write code in both C and C++, and yes I do know the difference, thank you, and I do have plenty of tools and tricks that help a lot here.

This is not to say that the buzzword-to-be touted here isn't useful. It merely is not, as in cannot possibly be, nearly as dramatic as portrayed, and the portrayal smacks of betrayal of the tools that came before. That means I now feel, having read the article, I wasted my time. Which is a pity, for a codesmith too needs to keep his toolchest well-filled.

7
0

Advertisement, and multi-threading

Perhaps you should include a disclaimer that this is a thinly-disguised advertisement for the author's products? (The small print at the bottom, "Chris is a Static Code Wrangler at monoidics.com", is not clear and is very hidden).

The Monoidics tool doesn't work on multithreaded apps, according to their website. Unfortunately, this is a really common limitation in static analysis tools. This is especially annoying as the rest of the world is adding threads to take advantage of multiple cores.

Can anyone recommend a good static analysis tool for a multithreaded application? I'm mainly interested in finding the deadlocks and race conditions. So the Monoidics tool is useless. I've seen the excellent work Coverity has done on the Linux kernel, but when they demo'd it on our code they didn't have any of the multithreading checks. Any suggestions?

7
0

C++ memory management

As a fellow commentard duly noted, there is no such language as C/C++. But this is only part of the point.

I just can't bring myself to believe that the author of this blatant infomercial is that ignorant about the C++ memory management techniques. Granted, he has to sell his product, but deliberately implying that C++ memory management has to be done the old C way won't buy any customers. Any half decent C++ programmer knows about how RAII can (and must) be used for precise garbage collection, and how it fits in with exception safety (which is the basics of C++ programming).

FFS, how is it possible that in 2011 some people still confuse C++ and "C-with-classes"? Shame on you, author.

4
0

More from The Register

Bjarne Again: Hallelujah for C++
Plus: Now officially OK to admit you never used STL algorithms
Interwebs taunt Sir Jony over Apple eye candy makeover
Hey Ive, Ive... add more unicorns, willya?
SCO vs. IBM battle resumes over ownership of Unix
Zombie lawsuit back and wants to suck the brains out of Linux
Apple: iOS7 dayglo Barbie makeover is UNFINISHED - report
Plus: You don't like the icons? Blame marketing
Red Hat to ditch MySQL for MariaDB in RHEL 7
So long, Oracle! Don't let the door hit you on the way out
Shy? Socially inadequate? Fiddling with your phone could help
App 'tells the brutal truth' about social inadequates' chatup lines
Java EE 7 melds HTML5 with enterprise apps
New release arrives with GlassFish, NetBeans support
 breaking news
'Office Facebook' firm Tibbr wants you to PAY for mobe-meetings app
Great idea. Punters won't cough for it though
 breaking news
The only Waze is Google: Ad giant tipped to gobble map app 'for $1.3bn'
Pac-Man-satnav-ish upstart in bidding war with Apple, Facebook
 breaking news
PM Cameron calls for modern, programmable computers! (We think)
IT education musings to G8 chiefs to mystify IT industry