Build and manage large-scale C++ on Windows

Original URL: https://www.theregister.com/2008/03/31/large_scale_c_plus_plus_windows/

DLLs versus shared objects

Posted in OSes, 31st March 2008 06:02 GMT

John Lakos wrote the book on Large-Scale C++ Software Design more than 10 years ago, but it remains a must read for any serious C++ developer today.

It doesn't go much into the language. For instance there isn't anything inside regarding dynamic casts and virtual inheritance. Neither will it tell you how to calculate the factorial at compile time using compile time recursive templates.

What it does, is talk about the very real issues faced when a project gets big and complicated, a realty in today's world of cloud computing, online services, distributed applications and data centers. These issues lie between the abstraction presented by language and the underlying hardware as complexities of scale and lead us to start subdividing the software into static libraries and shared objects.

This book's standing in the industry shows that the issues are taken seriously, are largely platform independent and should be understood. But what about the Windows-specific issues - DLLs versus shared objects?

DLLs on Windows are not 100 per cent analogous to shared objects in the Unix world. In a shared object all symbols are exported unless steps are taken to prevent this and, while a shared object knows its dependences in terms of external symbols and libraries, the linker doesn't give an error if a needed externally defined symbol is not present at link time.

Dynamic loaders and real code

If the same symbol occurs in different libraries, as can happen with templates, then warnings result at link time and the duplicate symbols are merged by the dynamic loader. Data is also managed differently. Symbols naming static data are exported like any other and multiple definitions are merged to unambiguously name a single piece of memory at runtime.

DLLs on the other hand have a different model. Symbols that are intended for external use must be marked so that the linker knows to put them in the export table. This is what the declaration specifier __declspec(dllexport) is for and there are some divergences from Unix shared-object behavior as a result of this difference.

External code finds the symbol via a corresponding import declaration and linking with the import library of the DLL. The import library is linked into a client-portable executable as any other static library, but when called its code will invoke the dynamic loader and call the real code via pointers to functions obtained from GetProcAddress. We'll see why this is important in a moment, for now just remember that code inside the DLL is not passing by this import library.

Internal code accesses its siblings, both exported and otherwise, by offset addresses relative to the DLL base address. This implies the first important difference, linking a DLL on Windows is a full blown link operation, all symbols are resolved and replaced with offset addresses and errors are given if something is not found. In fact when linking a DLL you are creating a real image in portable executable format and this takes considerably more time than linking a static library which is essentially just an archive.

This becomes a real disadvantage when there are many DLLs in a large project in a configuration used for ongoing and active development. A non-optimized build system may initiate a large-scale relink of dependencies and transitive dependencies at the update of a single import library. Time spent compiling and linking is time lost cumulatively across all developers in code, build, test cycles and an improvement in link time is multiplied by the number of developers and by the number of compile and links performed. It's possible that a static library build configuration could result in significant development time savings in development.

DLLs and duplicate symbols

Another big difference is how a DLL is loaded into the address space versus a shared object. As we said before a shared object contains all of its symbols and all of the code associated with these symbols. When the dynamic loader kicks in, it loads all needed libraries into memory and if there are several containing the same symbol then all are merged into a single symbol and all references are updated to refer to merged address in memory.

This means that even if a shared object contains a reference to a symbol defined internally and this symbol is later merged out then everything will still be OK because the dynamic loader has enough information to resolve everything. In the Windows model, when several DLLs contain the same symbol then they are never resolved. The DLL called depends very much on the location of the code doing the calling.

Of course this isn't a problem if the symbol represents a re-entrant function, but what if the symbol represents a function containing static data? This static data could have several values depending on which instance is called, and any assumptions regarding value transitions between function calls are broken. Even worse, when the imported symbol is itself data then all bets are really off.

There isn't an easy way to work around this because in Windows code inside the DLL will always access internal symbols with base address offset addressing as opposed to relocations. So if we need to use libraries containing static data, which are eventually linked into more than one DLL, then the best approach may be to move all such data out of static libraries and create a DLL containing functions that are engineered specifically to have a single definition in a single DLL just to manage such data.

So, next time think carefully before switching on the /FORCE option in that linker as it could be opening a can of worms.

DLLs and static libraries

What about mixing DLLs and static libraries? In a big project there will be code that makes sense to be packaged in a static library and other code that makes more sense as a DLL.

The choice could be driven by build times or facility to test - as long as data of static duration is both minimized and carefully managed there isn't really any reason to say a project should be all static library only or all DLL. An alternative is to have a levelized system as espoused by our friend and hero Lakos. In levelized systems, dependencies are more predictable because of the hierarchical organization.

As a result, once we identify the libraries we want to be DLLs and static we can identify the point in the dependency graph when the static library is first used by a DLL. We can add the "exports" of the static library to the DLL's exports list by using the /EXPORT option when linking the DLL. Any other users of that static library can now use it via this DLL to avoid duplicate definitions in the configuration.

What to do when it goes wrong

In a large project there will be of course many areas of responsibility producing many different deliverables. Often, such deliverables are provided in both static and dynamic configurations so those that want to link statically can and those who want the dynamic version are equally facilitated. Of course there's always the day when you need to link with one essential component, which is only available as a DLL and worse, it uses some other component in mode dynamic. Of course, this other component is the very same other component that everything else in your build uses in mode static and it's not necessarily easy to change that.

A consequence of the name decoration caused by exporting a symbol in Windows is that the binary name of a symbol coming from a DLL will be different that the name of the same symbol in a static library. OK, no big deal, one component uses the DLL version and the rest of the build configuration uses the static version. This should work, provided there is no data of static duration in the picture for the reasons we've already seen.

So what if there is some static data, what can be done to initialize it? Well if the data is initialized by a C function this isn't that hard. We can add code to our application to call LoadLibrary with the name of the DLL containing the static data. As the DLL is already loaded into the process we'll get a handle to this already loaded DLL. We then call GetProcAddress using this handle and the name of our function, casting the resulting function pointer appropriately and all is good again.

It's not always so easy though, as sometimes people will use constructors to initialize static data. It is possible to call a constructor in a DLL transitive dependency. It involves allocating memory with malloc, writing assembler to prepare the stack frame manually and calling the constructor as if it was a C function. This is unlikely to win awards for portable code though.

Navigate the differences

There are complexities specific to Windows DLLs especially when used with static libraries that contain data. As a consequence managing a large-scale project on Windows brings some additional considerations which should be thought about if supporting Windows in addition to Unix platforms is an issue. Even if the eventual product isn't cross platform, this issues can cause headaches if development and testing is supported on Windows, because some developers prefer tools on that platform.

In general, though, even with the additional complexity there's always a way to navigate the differences in the dynamic-linking model and I've yet to see a case of "we cant get that to run on Windows" that wasn't solvable.®