Central Iowa Railroad Herald

CIRR.COM

Compile time

Writing Portable Code


Everyone professes to write portable code, but few actually manage to do it. In most cases, so-called portable code comes out littered with #ifs or #ifdefs (or worse, nested #ifs and #ifdefs), rendering the code illegible and obfuscated. Sadly, the lion's share of many porting efforts is spent trying to figure out which lines of code are actually being compiled and executed. This wastes time and energy, and can be downright frustrating.

To avoid porting hassles, many developers have switched to scripting languages like Perl or Python that tend to hide system specifics very well. However, if you're writing the next Perl or Python interpreter, chances are you're using a language like C or C++ and don't have the luxury of scripting's layers of abstraction. To be sure, writing portable code is hard. Even if you're just targeting Unix variants, there are a multitude of differences -- in hardware and operating system interfaces and features -- that you have to contend with.

Over the next few months, we're going to delve into techniques and tools that can help you write and distribute extremely portable code. We'll look at portable build tools, automatically generated test suites, cross-platform programming, and packaging systems like RPM, used to distribute source and binaries in a uniform way. Ultimately, we'll see how to create a code release worthy of being called "Open Source".

This month, we start the series with coding.

What's in an OS?

If you've been developing Unix code for even a little while, you've probably seen code something like this:

#if defined(__SOLARIS__) || (defined(__LINUX__) && __LINUX__ > 0204013)) 
    	[...]
#elif defined(__NetBSD__) || defined(__FreeBSD)
    	[...]
#else
	[...]
#endif

Code snippets like this are compile-time multiplexers: they choose the appropriate flavor of application programming interface (API) based on the type and version of the local operating system. This (somewhat contrived) snippet reflects that Solaris and a specific version of Linux have one kind of API, while FreeBSD and NetBSD have another.

While it seems extremely appealing and natural to test OS type to determine if a specific API is available, that technique is probably the least portable and the most confusing to read. In particular, testing OS type tells you little about the purpose of the code or its system dependencies. For example, if you're trying to port the code snippet above to a new platform, can you tell which of the three code fragments, if any, is the most appropriate for your machine? Probably not.

A better technique is to test for the API itself. For example:

#if defined(HAVE_GETHOSTNAME_R)
    [...]
#elif defined(HAVE_GETHOSTNAME)
    [...]
#else
#  error "__FILE__ needs either gethostname_r or gethostname"
#endif

Here, you can easily tell that the code depends on gethostname(). Moreover, the code prefers the re-entrant call gethostname_r() if it's available. If gethostname() is not supported at all, the compiler emits an error. (System-specific #defines like #define HAVE_GETHOSTNAME 1 can usually be found in a .h file generated by something like MetaConfig or autoconfig. In an upcoming column, we'll see how to use these utilities in your projects.) Of course, if you're industrious, you can also replace the #error with an implementation of your own (see the sidebar "Fake It To Make It" on page XX).

This technique is superior because it specifically calls out the system dependency, rather than having the reader infer it. At a glance you can see the feature required and the intent of the code. To ease porting, you can even generate a list of dependencies with a simple command such as:

% grep '^#if[[:space:]]defined(.*)$' *.[ch] | \
	sed -e 's/^\(.*\):.*defined(\(.*\))$/\1: \2/' | sort | uniq

This command creates a list of #defines used in #if preprocessor statements, along with source file names (note that this grep expression does not process complex #if conditions such as C<#if defined(HAVE_TERM_H) && defined(HAVE_CURSES_H)>, but you can easily extend it). Running the command on some source code from MySQL generates:

log_event.h: MYSQL_CLIENT
mysql.cc: HAVE_TERM_H
mysql.cc: HAVE_TERMIOS_H
mysql.cc: HAVE_TERMCAP_H
mysql.cc: OS2

Given a list like this, writing a porting guide should be easy or obvious: just document each dependency and what it does.

Hoist the Standard

Speaking of documentation, you'd think that the man pages installed on your local development system would be your best source of information. That's not necessarily the case. Indeed, if you're trying to write portable software, using your local man pages is an exceptionally bad choice. The man pages might describe your system's APIs, but the goal is to write code that works on all APIs equally well.

A much better source of information on widely available interfaces is the standards documents, such as POSIX.1 (http://www.pasc.org). A standards sub-body of the IEEE, POSIX's role is to standardize and document the API available on Unix(-like) systems. Admittedly, POSIX provides a restricted set of functions for use, but those functions are guaranteed to be implemented on any system claiming POSIX conformance.

Following the POSIX standards are the ISO standards (which are based upon the POSIX standards), the System V interface definition (SVID, http://www.caldera.com/developers/devspecs), or its successor, the Single UNIX Specification (SUS, http://www.unix-systems.org/version3/online.html), published by the Open Group, successor to the Open Software Foundation.

Another source for API specifications are the various web sites that provide online editions of system manual pages. The FreeBSD (http://www.freebsd.org) and NetBSD (http://www.netbsd.org) projects provide access to a number of different operating system's manual pages.

Bits, Nibbles, and Bytes? Oh my!

In this day and age, very few applications should care about natural bit and byte orders of a machine. Technology like XML provides a platform-independent storage medium, and you can generally use the features of a higher-level language to avoid minutiae such as the number of bits in an int or a long.

However, it's important to remember that the world is not just Intel x86: off_t is not necessarily a C long, and C long is not always 32 bits. (For example, the Alpha architecture uses 64 bit longs and pointers. At one time, there was a Data General architecture that had 32 bit words, but 48 bit pointers. Even x86 processors have been capable of addressing 36 bits of address for a couple of generations now.) When using APIs, use the typedefs provided for you by the operating system. If you must have a data type of a known bit size, use typedef, and use the typedef consistently throughout your code (using the C99 typedef naming convention might even ease porting in the future.)

Here are some other guidelines to keep in mind:

Finally, one not so obvious practice for writing portable, easily maintained code is to limit each source directory to one executable. It makes the build task much easier, and segments the sources into coherent chunks. Sources common to multiple executables should be placed in a separate subdirectory, built into a library, and linked to all executables.

Have Code, Will Travel

Using the techniques guidelines described above, your code will be legible, well-organized, and portable.

To make your code even more readable, make sure to document your coding style, and include the description in your source tree (at the top level.) Be clear, concise, and show examples. In fact, in the spirit of portability, consider using a standard style -- doc in the Linux kernel source tree seems to quite popular. What's good for Linux (and Linus) is probably good for you.


Fake It To Make It

If you're developing some code and find a feature of your local OS that you've just got to have, but discover it isn't specified in the standards documents (or is unlikely to exist on other platforms), include your own implementation.

An excellent example of this is in C News, (written over 10 years ago) by Henry Spencer and Geoff Collyer. symlink() is a very useful on systems running USENET News software. However, in 1989, only BSD-derived systems regularly implemented it. Strict System V (release 2 and 3) systems didn't implement symlink() at all.

Spencer and Collyer really wanted symlink(), so they implemented the symlink() calling interface around a routine that would copy the file if it couldn't make a hard link (using link()). A rather elegant solution: modular, encapsulated, and simple. And rather than reinvent the wheel entirely, the two programmers leveraged existing system calls on operating systems whenever possible.

If there are standards-based interfaces that provide partial or complete implementations of a feature, use them (with appropriate #if feature tests). If you write your own implementation of a feature, comply with the standard interface. And, if you cannot comply or if no standard exists, write a new interface that makes coding simpler and hides the underlying implementation.

If you'd like to see an example of many of the techniques described here, download a copy of C News from ftp://ftp.cs.toronto.edu/pub/c-news/c-news.tar.Z.


RESOURCES

Henry Spencer's "#ifdef considered harmful" http://www.literateprogramming.com/ifdefs.pdf

Single UNIX Specification http://www.unix-systems.org/version3/online.html


If you have any questions about our site, please send us mail.
Copyright 2000,2001 Central Iowa (Model) Railroad Contact Us Referral
Program
Support
$Id: 2002-Sept.html,v 1.2 2002/11/26 22:47:10 cirr Exp $ Terms of Service Privacy Information