A C Style Sheet

Martin Minow
Digital Equipment Corp.
146 Main St. MLO 3-3/U8
Maynard MA 01754


June 15, 1983


Introduction

This document presents a common set of coding standards, as well as a series of hints to aid in producing maintainable and transportable C software. It is abstracted from a number of sources: It is also based heavily on my own experience in developing a number of large, transportable, applications in C, using Decus C, Vax-11 C, and several varieties of Unix C. (Unix is a trademark of Bell Laboratories).

As could be expected, the suggested style does not totally agree with any of the referenced documents.

Motivation

The reason a you should maintain a consistant coding style is that good programs will evolve. When writing a new program, you will often take routines and data structures from old -- working -- software. This is much easier to do if the old software is understandable. Unreadable software is unusable, no matter how well it works.

The single most important thing about a typographical style is sticking to it consistently. There are many good styles, but the differences among them are totally drowned out by the difficulty of reading a program with pieces in different styles. If you modify a program, stick to its original style. If you must change things -- you really cannot live with the old style -- change the whole program, or at least all the parts logically related to what you are changing.

In recommending against automatic beautifiers (prettyprinters), the Indian Hill standards committee noted:
These comments are relevant to any rigid application of a programming or typographical style. There will always be cases where the automatic rule is unsatisfactory and you, as the person responsible, must be able to understand that your primary goal is to achieve clarity and understandability.

File Organization

A file consists of several sections separated by blank lines or a form-feed (<FF>). If you use a form-feed, it should be the only character on the line.

In general, source files should not be much longer than 1000 lines. Larger files are often difficult to edit and -- if too large -- cannot be proccessed by the diff (differences) program. 1000 lines translates to about 12-15 pages of text. Source lines should not be longer than 78 characters long.

A source file should be organized as follows:
  1. A prologue comment gives the file name and a few sentences telling what is in the file. This is followed, if necessary, by copyright and license "boilerplate." The prologue tells the reader the purpose of the text of the file, whether it contains functions, data definitions, tables, or support code. It should not generally be a list of function names.

    In some programs, C source files may be created by program generators. For example, a dictionary may be compiled into a keyword vector (one file) and a definition vector (one file). In this case, the program generating the files should write the date of generation (as a comment) into the C file source. If the generated program file may require editing, consider including the source of the generated information, either as comments or as text bracketed by #ifdefs, as an aid to the debugger.
  2. Usage and operating instructions follow next. Decus C programs should use the format accepted by the "getrno" utility program. This allows the program source file to contain the source of its documentation and lessens the burden of keeping the documentation in synchronization with the program itself.

    If a program is composed from separate modules, one of the modules (generally the one with the main() function) should contain instructions on how to build the program.

    Decus C programs should use the build utility to maintain compilation instructions:
        /*)BUILD
            $(PROGRAM) = program
            $(INCLUDE) = header.h
            $(FILES)   = { file1 file2 }
        */
    
    Unix programs should include the program's makefile within a comment with some defined format, allowing extraction by a simple program or shell script:
        /*)MAKEFILE
            program: file1.o file2.o
                cc file1.o file2.o -o program
            file1.o file2.o: header.h
        */
    
    This centralizes everything relevant to maintainence of a program in one place.
  3. Header files are specified using the #include preprocessor directive. The suggested header file order is
        #include <stdio.h>
        #include <other system headers>
        #include "user header files"
    
    Note that header files should be given the ".h" filetype, while all C source files should be given the ".c" filetype.

    In allocating header files for large packages of programs, you should avoid absolute pathnames for header files. Use the <name> construction for system files, relative directories for Unix and VMS systems, and externally defined logical devices for RSTS/E and RT11. If the sub-system is reasonably small, put all source files in one directory.
  4. Typedefs, #defines and structure definitions that apply to the file as a whole are next. Structures should be defined as
        typedef struct NAME {
            ...
        } NAME;
    
    Definitions should be grouped functionally.

    It is recommended that the struct and typedef name be the same to simplify forward references (in linked lists):
        typedef struct LISP_LIST {
            struct LISP_LIST *car;
            struct LISP_LIST *cdr;
        } LISP_LIST;
    
    Other references to this structure may use the typedef name:
        register LISP_LIST *head;
    
  5. Global data definitions are next. The suggested order is
    1. Global variables defined in this file.
    2. Static (file global) variables.
    3. External variables and functions that are used throughout the file.
    If a program is large enough to require multiple source files, all global data should be defined by a "data.c" file which only contains data definitions, while all other source files contain extern references. Alternatively, the file containing the main program, documentation, and build instructions should contain data allocation.

    A very large program would contain global references in an "extern.h" #include file in addition to a "data.c" definition file.
  6. Functions come last. If the file is a main program, the main() function is first.

Header Files

Header files are included in other files during compilation. Some, such as "stdio.h" are defined system-wide, and must be included by any C programs that use the standard I/O library. Others are used within a single program or group of programs.

Header files should be functionally organized. Declarations for separate sub-systems should be in separate header files.

Header files should not be nested. This is not permitted by Decus C and some data objects, such as typedefs and initialized data definitions, must not be seen twice by the compiler in one compilation.

Header files should contain all #define, typedef, and extern declarations necessary for a given program and shared among two or more of its files.

Header files should not declare (i.e., allocate) variables or contain code. This is frequently a symptom of poor partitioning of code between files. One header organization that has worked well for several medium-sized projects is:
  1. Definitions, including common data structures, are placed in one header file. Things defined in this file are common to the entire package.
  2. All external (global) data is defined in a second header file.
  3. A separate data.c file contains all global data allocations.
  4. Definitions required by bounded sub-systems are in separate header and data allocation files.

Declarations and Definitions

The use of the #define preprocessor command is especially recommended. In general, numerical constants and array boundaries should never be coded directly. They should be assigned a meaningful name and assigned their permanent value by the #define. This will make it much easier to administer large and evolving programs as the constant value can be changed uniformly by changing the #define and recompiling.

The enumeration data type (not in Decus C) offers an improved manner of managing constant definitions as additional type checking is then available.

In general, all constant values which are not strictly numeric should be specified by #defines. Exceptions to this rule are the values 0 and 1 when used as the lower boundary of an array; relative indices (if p is a pointer to an array element, p[1] is the next element, while p[-1] is the previous element); and strictly numeric quantities. #defines may even be useful in the latter situation as well:
    #define SPEED_LIMIT	55
Note that defined quantities should generally be in upper-case.

Directly-coded numerical constants must have a comment explaining the derivation of the value.

It is generally poor practice to use #defines to modify C syntax. for example, the following definitions are not recommended:
    #define reg		register
    #define begin	{
    #define end		}
In certain circumstances, however, this may be necessary for proper compilation or fastest possible execution:
    #define DIV_2	>> 1
    #define DIV_4	>> 2
Replacing divides by right-shifts cannot be done by the compiler as it would yield incorrect results if the divisor were negative. If the programmer knows that the divisor must be positive (which fact being duly documented), this optimization becomes possible.

Also, the programmer may need to conceal non-portable quirks by means of centrally-placed definitions:
    #ifdef decus
    #define UNSIGNED_LONG	long
    #else
    #define UNSIGNED_LONG	unsigned long
    #endif
As will be noted under portability, most C compilers predefine a small number of variables that may be used to conditionally compile machine or operating-system specific code. This allows one program run on multiple systems without hand-editing.

It is highly recommended that you use the following definitions freely and consistantly:
    #define	EOS	'\0'
    #define	FALSE	0
    #define	TRUE	1
NULL is defined by <stdio.h> and should not be explicitly specified by your program as some compilers require NULL to be type-cast.

EOS marks the end of a C string, while FALSE and TRUE are used for Boolean testing. You will probably get in the habit of only referring to FALSE in your if statements:
    if (test != FALSE) {
This generates the best possible code. TRUE is usually used to return a "success" value from a function. Don't use both TRUE and YES in the same program to mean the same thing.

If a structure contains a data element that can take on one of several values, it may be useful to put the #define's for that element within the structure definition. For example, here is a fragment (slightly reorganized to fit on the documentation page) from a Vax-11 C header file that defines a VMS system structure:
    /*
     * XABSUM -- Summary Extended Attribute Block
     */

struct XABSUM { char xab$b_cod; #define XAB$C_SUM 22 /* type code */ char xab$b_bln; #define XAB$C_SUMLEN 0x0C /* block length */ #define XAB$K_SUMLEN 0x0C ....
Note that the information that would be placed in each field is #define'd following that field. The definitions and structure fields follow standard VMS syntax conventions.

The empty initializer "{}" should never be used. Initialized structures should be fully delimited with braces. Constants used to initialize longs should be explicitly long.

In any file which is part of a larger context, all local information should be identified by use of the static keyword. Variables, in particular, should not be accessable outside the file unless there is an overriding need for global access. If these variables are shared by only one or two other files, you should name these files in a comment.

The readonly specifier, available in Vax-11 C, should be used to signal data that does not change during execution. On other compilers, it may be "hidden" by
    #define readonly

Comments

The importance of comments cannot be overemphasized. In any professional environment, many people will have to read your code, trying to understand what you have done. Sometimes, they wish to modify it to do other things; sometimes they need to modify it to do what you originally intended to do. Consider the Golden Rule: "if you make life easy for others, maybe someone will be nice to you someday."

The purpose of a comment is to describe your intention. If properly written, the code itself will adequately tell what you actually did. There are two general types of comments:

Block comments are narratives describing the purpose of a portion of the program text. They are written in the following format:
    /*
     * The comment text is written
     * here in complete sentences.
     */
The comment text should be at the same level of indentation as the source code it discusses. You should never write a comment that could be interpreted as a C statement (unless the comment is blocking out temporary debugging code). A block comment should always be included at the beginning of a major segment of the program.

Very short comments may appear on the same line as the code they describe. They should be tabbed over far enough to separate them from the statements. if more than one short comment appears in a block of code, they should all start at the same tab position:
    while (!finish()) {		/* Main sequence:	*/
        inquire();		/* Get user request	*/
        process();		/* And carry it out	*/
    }				/* As long as possible.	*/
Note that all single-line comments start at some specific column and end with the closing "*/" tabbed to column 72 on the line. Closing the comment at the right-hand margin makes it more readable than if the "*/" were next to the comment text itself:
    while (!finish()) {		/* Main sequence: */
        inquire();		/* Get user request */
        process();		/* And carry it out */
    }				/* As long as possible. */
In general, you should use one-line comments to document variable definitions and block comments to describe the computation processes. The above comments should actually have been written as a block comment:
    /*
     * Main sequence: get and process
     * all user requests.
     */
    while (!finish()) {
        inquire();
        process();
    }

Function Declarations

Each function should be defined beginning in column 1 (to simplify searches for the function's definition). If the function returns a value or is static, that should be alone on the preceeding line.

Each formal parameter should be declared, with a comment, on a separate line. If the function uses any external variables or functions (that do not return integers), these should be declared with other local variables. This is particularly beneficial to someone reading code written by another.

The format for the function declaration may be illustrated as follows:
    char *
    savest(string)
    char	*string;	/* String to save	*/
    /*
     * Savest saves its argument string in free storage,
     * returning a pointer to the allocated datum.
     * It returns NULL if the allocation fails.
     */
    {
	register char	*ptr;
	extern char	*malloc();
	extern char	*strcpy();

ptr = malloc(strlen(string) + sizeof (char)); if (ptr != NULL) strcpy(ptr, string); return (ptr); }
Note that, in the example above, the function description followed the formal definition itself. Another acceptable style preceeds the function by a block comment.
    /*
     * match(string, pattern)
     *
     * If the pattern is an initial substring of string,
     * return a pointer to the first character of the
     * string beyond those matching the pattern,
     * Otherwise, return NULL.  Thus:
     *     match("abcde", "abc")
     * returns a pointer to the 'd' in the first string;
     *     match("abcde", "bc")
     * returns NULL.
     */

char * match(string, pattern) register char *string; /* Source */ register char *pattern; /* for match */ {

while (*string == *pattern && *string != EOS) { pattern++; string++; } return ((*pattern == EOS) ? string : NULL); }
In this format, the block comment is separated from the function definition by a blank line.

The above program fragments illustrate several transportability and maintainability issues:
  1. Although the value returned by strcpy() is not used by the savest() function, it is declared so the compiler knows how to allocate and deallocate space for the value it does return. This is important for compilers running on stack machines with 16-bit integers and 32-bit character pointers.
  2. On some machines, sizeof (char) is NOT 1. If you want to allocate space to hold the EOS at the end of a string, you should use the transportable format, not the absolute value.
  3. In the match() function, note that the end of string test is written explicitly. You should not assume that strings are terminated by a zero-valued byte.

Structure and Variable Declarations

Structures are one of the most important features of C. They enhance the logical organization of your code, offer consistant addressing, and will generally increase the efficiency and performance of your programs by a significant amount.

In general, if there are two or more "things" in your program that are addressed by the same index, they should be defined by a common structure. This gives you great freedom to allow the program to evolve (by adding another "thing" to the structure, for example), or to modify storage allocation (from pre-compiled to dynamic allocation).

For example, if your program processes symbols -- where each symbol has a name, type, flags, and an associated value, you shouldn't define separate vectors:
    char *name[NSYMB];
    int  type[NSYMB];
    int  flags[NSYMB];
    int  value[NSYMB];
but, rather,
    typedef struct SYMBOL {
        char *sy_name;
        int  sy_type;
        int  sy_flags;
        int  sy_value;
    } SYMBOL;

SYMBOL symboltable[NSYMB];
All structures should be defined by typedefs. Note, also, the use of a header ("sy_") to identify members of the SYMBOL structure.

There is one important exception to the rule that conforming data areas are declared by a single data structure: the case where some data is read-only and some read-write. In this case, you may wish to allocate separate areas to permit use of the readonly specifier or to allocate read-only data in an overlay segment.

The local variables used by a function should have names that do not duplicate global names.

Compound Statements

Compound statements carry out the calculations required by the C program. They are lists of statements enclosed in braces. They should be tabbed over one more than the tab position of the compound statement introducer itself. (Four space indentation is recommended, although it is certainly more convenient to use the hardware-provided eight position tab stops. If you change your mind in the middle of a program, you should have the courtesy to re-edit the rest of the file so it is consistant.)

The opening left brace should be at the end of the line beginning the compound statement and the closing right brace should be alone on a line, tabbed under the beginning of the compound statement. Note that the left brace beginning a function body is the only occurrance of a left brace which is alone on a line. This is the "Indian Hill" style, also used in Kernighan and Ritchie's book. (Other style sheets recommend placing the opening left brace alone on the line following the statement opener. Choose one style; be consistant. This subject will be discussed further in a subsequent section.)

The right brace before the while of a do-while statement is the only place where a closing right brace is not alone on a line:
    do {
	stuff();
    } while (cond != FALSE);
It is good practice always to provide braces, even when they are are not required by the language:
    if (abc < def) {
	lesser();
    }
    else if (abc == def) {
	equal();
    }
    else {
	greater();
    }
This prevents suprises when you add debugging statements.

Never, never, write nested conditionals or loops without braces:
    for (dp = &values[0]; dp < top_value; dp++)
	if (dp->d_value == arg_value
	 && (dp->d_flag & arg_flag) != 0)
	    return (dp);
    return (NULL);
While the above is correct C, it is unmaintainable. It should always be written as
    for (dp = &values[0]; dp < top_value; dp++) {
	if (dp->d_value == arg_value
	 && (dp->d_flag & arg_flag) != 0) {
	    return (dp);
	}
    }
    return (NULL);
If the span of a block is large (more than about 40 lines) or there are several nested blocks, closing braces should be commented to indicate what part of the process they delimit:
    for (sy = sytable; sy != NULL; sy = sy->sy_link) {
        if (sy->sy_flag == DEFINED) {
            ...
        }		/* if defined			*/
        else {
	    ...
	}		/* if undefined			*/
    }			/* for all symbols		*/
Each line should contain one and only one statement. The only exception to this is the else if construction as shown above. In a sequence of "if ... else ..." statements, there should always be a terminating else even if it is merely a dummy statement. Note especially that an if statement and its associated conditionally executed statement appear on separate lines.

If a for, if, or while statement has a dummy body, the ';' must go on the next line:
    /*
     * Locate end of string
     */
    for (charp = string; *charp != EOS; charp++)
	;
There are few more insidious bugs than an extra ';' tacked on the end of a for or if statement. Everything will compile normally and the code might even work for some cases, but -- because of the invisibility of the ';' -- the bug will be very difficult to track down.

There should always be a blank between reserved words and their opening parentheses, e.g., "if (condition)" rather than "if(condition)". There should also be parentheses around the objects of sizeof and return.

If the conditional test in an if statement is so complex that it requires more than one line, break it at an && or ||, and line up the expressions so the tests line up as well:
    if (a == b
     && b == c) {
	printf("a == c");
    }
If the conditional test extends over one line, always enclose the conditionally-executed statement in braces.

The above is a special case of a more general recommendation that you break statements across lines at meaningful boundaries, and attempt to align the components to make the meaning clear. For example, the following sequence computes the length of an RMS logical record.
    r->lrecl = r->rab.rab$w_rsz	/* Record size from RAB	*/
     + ((hbyte != EOS) ? 1 : 0)	/* If header byte	*/
     + ((tbyte != EOS) ? 1 : 0)	/* If trailer byte	*/
     - offset			/* For Fortran hacking	*/
     + hnewline			/* For VFC hacking	*/
     + tnewline;		/* For VFC hacking	*/
Switch statements offer a good alternative to multiple if...else sequences. Each case appears by itself on a line, tabbed under the switch itself. The break that terminates a case should be followed by a blank line. The "fall through" feature of C's switch statement should rarely, if ever, be used. If it is needed, it must be commented for further reference:
    eow = 0;
    while ((c = getchar()) != EOF) {
	switch (c) {
	case '\n':	/* Newline,			*/
	    lines++;	/* count lines			*/
	    /*
	     * Fall through to "end of word" case
	     */
	case '\t':	/* Tabs, newlines, and blanks	*/
	case ' ':	/* Form words.			*/
	    words += eow;
	    eow = 0;	/* Don't count multiple runs	*/
	    letters++;	/* But count all "whitespace"	*/
	    break;

default: /* All the rest form a word */ letters++; eow = 1; break; } } words += eow; /* Fix count of last word */
The above implements the central algorithm of a "word count" routine where a newline, blank, or tab terminates a word, but multiple blanks do not increase the number of words.

Note that the break following the last case is redundant, but should be provided to make the programmer's intent clear. In general, the default case should be last.

All switch statements should have a default case, which may merely be a "fatal error" exit.

Expressions and Operators

C is an expression language. This means -- in essence -- that the assignment statement "a = b" itself has a value which can be embedded in a larger context. This should be used very sparingly. For example,
    while ((value = *pointer++) != 0) {
	process(value);	
    }
shows a standard C idiom which all programmers should recognize. It is essential, however, that you do not carry this to extremes by embedding mutiple assignments (or other side-effects) in a statement.

Blanks should surround all binary operators except those which compose primaries, (".", "->"). No blanks should separate a unary operator (such as '-', '&', '[]', '!') from its operand. Sizeof and return are exceptions to this rule.

Some judgement is called for here as there are a few situations when complex expressions become clearer when inner constructions don't have spaces. For example,
    x = (a*b) + (c*d);
Blanks should appear after commas in argument lists to help separate the arguments visually. On the other hand, macros with arguments and function calls should not have a blank between the name and the left parenthesis.

Side effects within expressions should be used sparingly. No more than one operator with a side-effect ("=", "op=", "++", "--") should appear within an expression. It is very easy to misunderstand the rules for C compilation and get side-effects compiled in the wrong order. For example,
    func(*ptr++, *ptr++);
    *ptr = *ptr++;
    *ptr++ = *ptr;
Are not necessarily going to do what you expect; and are going to do different things on different implementations of C.

The old versions of the assigned operators ("=+", etc.) must not be used. Always surround assigned operators by spaces. "x=*foo" is interpreted as "x = x * foo" (even if foo is a pointer) by some compilers.

The comma operator should be used exceedingly sparingly. One of the few appropriate places is in a for statement:
    for (sum = 0, ptr = &array[0]; ptr < &array[A_MAX];) {
	sum += *ptr++;
    }
Since C has some unexpected operator precedence rules, all expressions involving mixed operators should be fully parenthesized. This is especially true when comparison or mask operators (&, |, and ^) are combined with shifts. Always write
    if (value > (1 << 12))
        ...
with parenteses around the shift operation.

Naming Things

When a program must be used as part of a larger context, whether it be a subroutine library, or an independent program within an application package, the programmer's creativity in defining mnemonic names must be subservient to the needs of the group as a whole.

Naming Rules

  1. Application program names should follow a standard format, such as:
        The 1st 2 or 3 characters = sub-system code
        The rest = unique meaningful identifier
    
  2. Names (variables, structs, unions, and procedures) are lower-case, unique in the first eight characters. (Some C systems require names to be unique in the first six characters.)

    External names must be unique in the first six characters.

    If the first letter in an external name is an underscore '_' it indicates a Unix system-internal name, (such as a routine within a file-management I/O system). Application programs should not use this as it implies system-level programming. Trailing underscores should also be avoided.

    (Note that any use of underscore may conflict with variables defined by your operating systems. For example, on RSX-11M, the operating system file management routines use -- in effect -- a leading underscore.

    Longer names and underscore should be used freely to improve readability and understandability.

    Upper-case and lower-case should not be mixed in a name.

    Names more than four characters long should differ by at least two characters:
        int systst, sysstst;	/* are easily confused	*/
    
    Constants (things named by a #define) should be in all upper-case.

    All names must be unique, ignoring case. In other words, even though C knows that "this" is different from "THIS", do not do it.

    Although this guide recommends keeping variables in lowercase only, and #define'd constants in uppercase only, there are a few times when breaking this rule results in greater clarity. It is better to be compatible with an externally-defined standard, even if it is in mixed case. For example, if the hardware documentation for a chip refers to "TxRdy", your device driver should use the same format to refer to this entity.

Choosing names

Names should be meaningful. Abbreviations should also be meaningful, and should be chosen by some uniform, rational scheme.
  1. Each variable and name must have an invarient usage and meaning throughout the program.
  2. Names should not be re-defined in inner blocks. Nor should global names be redeclared within a function.
  3. Standard meaningful names for local (temporary) variables include:
        i, j, k     indexes
    

    c, ch character

    n, m counters

    p, q, a, b pointers

    s strings
    Never use the letter 'l' as a variable or in any context where it could be confused with the digit '1'.

Names for structs, unions, and defines

Consider using typedef for structs and unions. This helps both reader of the code and type checking programs such as LINT.

In a large system, global names should be composed of two parts, a one or two letter prefix, relating to the sub-system, and a longer name defining the item itself.

If a several symbols are needed to refer to an entity, they should have some consistent relation. For example:
    #define MAX_ITEM  123

typedef struct ITEM { struct ITEM *it_next; int it_value; } ITEM;

ITEM item_store[MAX_ITEM]; ITEM *itfirst = &item_store[0]; #define ITEM_LAST (&item_store[MAX_ITEM])
Note that For a member of a struct, the prefix should be related to the body of the struct name.

If structures are declared in #include files and there is a risk that the file might be included twice, you should block multiple compilation to prevent compiler error messages. For example, if the ITEM structure declaration were stored in a header file, item.h, you should write it as follows:
    #ifndef item_h
    typedef struct ITEM {
        ...
    } ITEM;
    #define item_h 1	/* Structure has been declared	*/

Pointers

Pointers should be declared and used as "pointer to a thing of type X". Do not, for example, use a variable which is declared as "pointer to int" to point to a char, even though the particular compiler and/or machine will let you do it. On most compilers, unions may be used to allow pointers to different objects. At other times, explicit type casts are the simplest solution.

For example, a print formatter may need both types of pointer:
    #define     INT  0		/* Storage classes for	*/
    #define    LONG  1		/* Formatter		*/
    #define     NEG  2		/* Negative flag	*/

typedef struct FORMAT { char f_type; /* Format type */ int f_width; /* Item storage width */ int f_radix; /* Conversion radix */ } FORMAT;

static FORMAT formatinfo[] { { 'd', INT, 10 }, { 'o', INT, 8 }, { 'u', INT, 10 }, { 'D', LONG, 10 }, { EOS, 0, 0 }, };

c__doprnt(format, argp, fildes) char *format; int *argp; FILE *fildes; { register union { FORMAT *fmt; /* -> format codes */ char *out; /* -> result */ } p; /* General pointer */ int radix; /* Conversion radix */ int temp; /* General temp value */ char c; /* Current format char */ long value; /* Value to convert */ char work[WORKSIZE]; /* Number buffer */

... /* * Search for a matching format. */ p.fmt = formatinfo; while (p.fmt->f_type != c && p.fmt->f_type != EOS)) p.fmt++; if (p.fmt->f_type != EOS) { /* * A numeric conversion was found. Get the * value and expand it into the work area. */ radix = p.fmt->f_radix; temp = p.fmt->f_width; p.out = &work[WORKSIZE - 1]; *p.out = EOS; /* Terminate result */ if (temp == INT) { if (c == 'd' && *argp < 0) { value = -(*argp++); temp = NEG; /* Remember signal */ } else /* 'u', 'x', or 'o' */ value = (unsigned) *argp++; } else { /* Get long from caller */ value = *((long *) argp)++; } if (value == 0) *--p.out = '0'; else { do { /* Convert unsigned number != 0 */ *--p.out = "0123456789abdef"[value % radix]; } while ((value /= radix) != 0); if (temp == NEG) *--p.out = '-'; }
The program will then output the EOS-terminated string starting at p.out.

Note that a union was used when the same (register) variable was used to point to two separate objects (at different points of the program), while casts were used when a pointer refers to different objects, depending on the particular data being processed.

Standard Defined-names

There are a number of #define'd names whose meaning is standardized by C programs:

TRUE - Boolean true
FALSE - Boolean false

NULL - For comparison or assignment of pointers
EOS - The end of string marker
EOF - End-of-file

In writing a large program, the following standards proved to be useful:

DEBUG - Switch for compiling debugging code.
DEBUG_X - Debug sub-part X only
TESTING - Compile a built-in test program. See below.

INT_16 - A storage integer that must hold 16 bits.
INT_32 - A storage integer that must hold 32 bits.
INT - Whatever is fastest for this compiler.
FLAG - A TRUE/FALSE (or small range of values) flag.

In developing a large program, many subroutines included a small main program for testing. This program was conditionally compiled by #define'ing the TESTING compile-time variable. When the module has been debugged, TESTING is undefined and the module integrated with the rest of the package.

INT_16, INT_32, and INT were used in the same large program to eliminate dependency on certain compiler/machine dependencies. For example, on the Motorola 68000, 16-bit integers are computationally faster than the default (32-bit) int, whereas on the Vax-11, 32-bit integers are more efficient. The program's header file thus contained:
    #ifdef vax
    #define INT_16  short
    #define INT_32  int
    #define INT     int
    #endif
    #ifdef M68000
    #define INT_16  short
    #define INT_32  long
    #define INT     short
    #endif

Portability

Portability means that a source file can be compiled and executed on different machines, operating systems, and/or compilers with either no source file changes or, at most, changes to system-specific header files. In writing portable software, the following should be understood:
  1. Most C compilers predefine symbols that may be used to isolate machine-dependent code. The following list may be helpful:
    1. Decus C defines "pdp11", "decus", "rsx" (or "rt11").
    2. Vax-11 C defines "vax", "vms", and "vax11c"
    3. Venix defines "pdp11", and "unix"
    4. A compiler for the Dec-20 defines "TOPS20" and "PDP10"
    When running on Unix, the compiler option -Dxxx may be used to pre-define a symbol without modifying the source code.
  2. Some things are inherently non-portable. For example, a hardware device handler can, in general, not be transported between operating systems.
  3. Different machines have different word sizes. While the language standard guarantees that "long int" is at least as long as "int" and "short int" are never longer than "int", it does not guarantee any specific word length. Note also that pointers and integers are not necessarily the same size; nor are all pointers the same size.
  4. Word size and constants can interact in unpleasant ways. For example,
        int x;
        x &= 0177770;
    
    Clears the low-order 3 bits of an integer on a PDP-11. However, on a Vax, it will also clear the upper half-word. Instead, you should use:
        x &= ~07;
    
    Which is portable.
  5. Beware of code that takes advantage of two's complement arithmetic. In particular, optimizations that replace division or multiplication with shifts should be avoided.
  6. Watch out for the PDP-11 signed character, which becomes unsigned on other machines.
  7. Do not presuppose any specific byte ordering within words.
  8. Do not default Boolean tests. Use
        if (func() != FALSE) {
    
    Instead of
        if (func()) {
    
    A particularily insidious example of incorrect code is:
        if (strcmp(s1, s2)) {
            /* different */
        }
    
    Always write
        if ((strcmp(s1, s2) != 0) {
            /* different */
        }
    
    Decus C provides streq() for this purposes. On other systems, you can easily write a macro:
        #define STREQ(a, b) (strcmp((a), (b)) == 0)
    
    One counter example to this is generally made for predicates: functions which have no other purpose than to return TRUE or FALSE, and which are named so that the meaning of a TRUE return is absolutely obvious. For example, a routine should be named "isvalid()", not "checkvalid()".
  9. Be very suspicious of numeric values appearing in the code. Almost all constants would be better expressed as #defined quantities.
  10. Any unsigned type other than unsigned int should be identified by a typedef, as these are highly compiler dependent. As noted above, large programs should have a central header file which encapsulates machine-dependent information.
  11. Become familiar with the standard library and use it for string and character manipulation. Do not reimplement standard routines as the person reading your code must then figure out whether you're doing something special in the reimplemented stuff. Home-brew "standard" routines are a fruitful source of bugs as your routines might be called by other parts of the library. Also, the standard library hides non-portable details that you might not (and generally should not) be aware of.

Miscellaneous

This section contains a fairly disorganized list of hints, some of which appear in other sections of this style sheet. They are not in any specific order.
  1. Don't change syntax via macro substitution. It makes the program unintelligible to all who come after.
  2. There is a time and place for embedded assignment statements. In some cases, this is the best way to specify the algorithm. However, it is not your responsibility to second-guess the compiler by packing code as tightly as possible. For example:
        a = b + c;
        d = a + r;
    
    should not be rewritten as:
        d = (a = b + c) + r;
    
    Even though the latter may save one instruction.

    Note also that a C compiler may freely modify the order of execution of an expression. Thus,
        a = (b + c) + d;
    
    Will not necessarily add b to c, then add the result to d. If the order of evaluation is important (for accuracy or overflow prevention), you must write separate statements with temporary variables:
        temp = b + c;
        a = temp + d;
    
  3. Don't overuse the ternary "(cond) ? a : b" operator The condition should always be enclosed in parentheses. Nested ternary operators should be avoided if possible.

    The ternary operator does not guarantee order of execution. The following is therefore unsafe:
        a = (b == 0) ? 0 : d / b;
    
  4. Goto statements should be used sparingly. The main place where they are useful is in breaking out of several levels of switch/for/while nesting. If a goto is needed, the accompaning label should be at the left margin with a comment explaining who jumps here. The continue statement is also a source of bugs.

    But, don't be afraid that evil spirits will haunt you if you write the dreaded goto. It is often much clearer to use gotos to escape from an inner loop than by using seemingly random combinations of break, continue, return and default exits from switch statements. To some extent, the lack of a rich set of exit operations is a failure of C, requiring disipline and a commitment to clarity on the part of the programmer.

    Often, the need for gotos and complicated exit conditions is an indication that the inner constructions ought to be redone as a separate function with a success/failure return code.

    Never goto into an else clause or into the body of a for or while loop.
  5. In declarations (#defines, structure definitions, or variable defininitions), various components should line up. Thus:
        #define TESTING    1
        #define PRODUCTION 2
    
  6. When the storage structure or type of a variable is important, always state it explicitly. In particular, use auto if you are going to use the address of a local variable using '&'. Declare integer parameters as int, rather than letting them default.
  7. Sometimes it is impossible to avoid doing something tricky. (And sometimes you just can't resist the temptation.) At the very least, put enough documentation in the code to warn the poor soul who comes after you.
  8. Try to write code that is clear and safe, rather than something that "seems" easier to compile. Make sure local variables are local (or static) so things won't blow up when you compile with other modules.
  9. Try to keep the flow of control through your program apparent. Where this is governed by separately-compiled tables (such as a finite-state parser), embed comments in the parser table to aid the maintainer.
  10. Use register variables wherever possible. They are especially efficient when used as structure or array pointers. Since offsets within a structure are known at compile time, the compiler can generate extremely efficient code.

    For example, suppose a program is processing a collection of elements which have a value and a set of flag bits. The "simple" solution would be:
        int    value[MAX];
        long   flags[MAX];
        int    array_max;
    

    int lookfor(arg_val, arg_flag) int arg_val; long arg_flag; /* * Return index to the element with the same * value and at least one matching flag bit. * Return -1 on failure. */ { int i;

    for (i = 0; i < array_max; i++) { if (value[i] == arg_val && (flag[i] & arg_flag) != 0) { return (i); } } return (-1); }
    The inner loop of the above requires turning the index "i" into a pointer twice. The above should generally be rewritten as:
        typedef struct data {
            int     d_value;
            long    d_flag;
        } DATA;
        DATA        values[MAX];
        DATA        *top_value;
    

    DATA * lookfor(arg_value, arg_flag) int arg_value; long arg_flag; /* * Return a pointer to the element with the same * value and at least one matching flag bit. * Return NULL on failure. */ { register DATA *dp;

    for (dp = &values[0]; dp < top_value; dp++) { if (dp->d_value == arg_value && (dp->d_flag & arg_flag) != 0) { return (dp); } } return (NULL); }
    Note the use of redundant braces in the above programs.
  11. If a function manipulates a database stored in a separate file, the routines that manipulate (generate and access) this database should be isolated from other routines. The internal structure of the data base should also be defined. If the database format is likely to change, a release date or version should be buried in the database and precompiled into the manager software. The program should check the validity of the release date when the package opens the database.
  12. If a file contains the main routine of a program, that should be the first function in the file. On Unix and VMS, where programs may be called as sub-processes, it is important that all programs exit by calling exit(). On Unix, use "exit(0)" for success and exit(1) for failure. The following construction may be useful:
        #ifdef     vms
        #include   <ssdef.h>
        #endif
        ...
        #ifdef	vms
            exit(SS$_NORMAL);
        #else
            exit(0);
        #endif
    
  13. In the condition portion of an if, for, while, etc., side effects whose effect extends beyond the extent of the guarded statement block should be avoided. For example, consider:
        if ((c = getchar()) != EOF) {
            guarded-statements
        }
        other-statements
    
    It is natural to think of variable "c" being "bound" to a value only within "guarded-statements." Its value should not be presumed upon entrance to "other-statements." Using a variable set or modified inside a condition outside the range of statements guarded by the condition is in general quite distracting.
  14. You should not use || and && with right-hand operands having side-effects. For example,
        if ((fildes = fopen("file.nam", "r")) == NULL
         || readin(fildes) != SUCCESS) {
            bug("something's wrong somewhere.);
        }
    
    A better approach would be
        if ((fildes = fopen("file.nam", "r")) == NULL) {
            perror("file.nam");
    	bug("Can't open input file");
        }
        else if (readin(fildes) != SUCCESS)
            bug("couldn't read file");
    
    Whenever conditional sequences contain both || and &&, parentheses should be used for clarity.
  15. Routines should be kept reasonably short. It is important for the maintainer to be able to read and comprehend all of the routine at one glance. In general, a routine processes one or more inputs and generates one or more outputs, where each of the inputs and outputs can be consisely described.

    Signs that a routine is too long, and ought to be split up, are: length greater than 100 lines (two pages), heavy use of localized variables (whose active scope is less than the entire routine), or conditional or loop statements nested more than four levels.

    Even when processing is linear (do first part, do second part, etc.), it is often helpful to the maintainer to break the routine into separate pieces:
        main(argc, argv)
        int         argc;
        char        *argv[];
        {
                setup(argc, argv);
                process();
                finish();
        }
    
    On many operating systems, the setup() and finish() modules can be compiled into overlay structures, leaving more room for in-memory data.
  16. Use of globals should be minimized by judicious use of parameters.
  17. In general, a routine should be designed with a "natural", easily-remembered calling sequence. Routines with more than five arguments are not recommended. Routines with "op-code" arguments, where one argument determines the interpretation, type, and functions of the others, are also not recommended (though they often prove useful as internal routines to a package, they should not be part of a package's documented interface.)
  18. Datatype compatibility should be practiced where possible. This can be facilitated by use of C's typedef facility, by explicit type casting, or by the use of the union datatype.

    A package which returns a pointer to a structure whose format need not be known outside of that package may return a "generic pointer" (char *). The C language specifically guarantees that any pointer may be converted to a char * and back again without harm.
  19. Use #defines to eliminate magic numbers. Use compile-time computation to combine magic numbers into others:
        #define ARRAY_A_SIZE 123
        #define ARRAY_B_SIZE 456
        #define BOTH_SIZE (ARRAY_A_SIZE + ARRAY_B_SIZE)
    
    If you change ARRAY_A_SIZE, the compiler with change BOTH_SIZE without your further intervention.
  20. Some experience is needed to decide what to put in a for statement and what to put in the loop body. In general, you should put what is needed to control the loop in the for, and the process itself in the body. Also, you should be disciplined about using break, continue, and goto to control "unusual" break-out cases. For example, the following code searches a symbol table for an unused element:
        for (sp = &sym[0]; sp < &sym[MAXSYM]; sp++) {
            if ((sp->sy_flag & UNUSED) != 0)
                goto found;
        }
        error_message("No room in symbol table");
        return (FALSE);
    

    found: /* ... here to process symbol */ return (TRUE);
    In this case, the most natural way to write the code is to use a goto for the "normal" case. While the above could be handled by a flag (or auxiliary test), the solution seems less intuitive:
        for (sp = &sym[0]; sp < &sym[MAXSYM]; sp++) {
            if ((sp->sy_flag & UNUSED) != 0)
                break;
        }
        if (sp >= &sym[MAXSYM]) {
            error_message("No room in symbol table");
            return (FALSE);
        }
        else {
            /* ... here to process symbol		*/
            return (TRUE);
        }
    
  21. The first three register variables, in lexicographic order, should be ones for which the most gain can be gotten.
  22. While C distinguishes between upper- and lower-case in variables and keywords, the programmer should maintain disipline. Global symbols should never require case distinction as they will not work properly on many operating systems. You should also avoid using the same name for different quantities.

    Never require the reader to see differences between "1" (digit), "l" (letter), and "I"; or between "O", "Q", and "0". The C language "long constant" identifier ("1l" is a long integer if the second character is the letter 'L') offers a good example of a practice to avoid (use "1L" instead).

Re-examining Braces and Indentation

Several other style sheets recommend the brace syntax:
    if (cond)
    {
        statements;
    }
Another recommendation is similar to the above except that the braces are aligned with the conditionally-executed statements:
    if (cond)
        {
        statements;
        }
This follows the structured programming methodology that "begin" and "end" are at the same indentation level.

The syntax recommended in this manual (with the left brace on the same line as the conditional) seems, in the author's eyes, to bind the left brace closer to the conditional than does the "left brace on a new line" format. Also, Left braces don't appear in the same column as right braces and are, hence, more visually distinctive. Finally, the right brace is aligned vertically with the clause introducer (if/while/etc.) with no intruding text. This seems to make things more visible.

When an early draft of this style sheet was reviewed, a collegue, Jeff Lomicka, took exception to the recommendations for indentation.

Here is an alternative indentation style presented with its own rationale. You may choose your style accordingly, but be prepared to understand and defend your choice.

A program is a sequential execution of simpler functions, each of which is broken up into more primitive functions until the primitives become directly executable. A compound statement is the same kind of entity as is a single statement or a function call, and should therefore be treated equally.

The goal of proper indentation is to separate visually the level of detail at which the program is viewed, and to permit the reader easily to associate related elements of the program with each other. For example, we need to associate an "if" with its "else", and to be able to determine what are the contents of the if-clause and else-clause.

The general formatting rules are:
  1. Statements executed sequentially are all at the same indentation level.
  2. If a statement includes other statements, such the "while" loop body or the "then" and "else" clauses of a conditional, these statements are indented to the next block level.
  3. Braces are part of the statement, and are always displayed at the same indentation level as the code they contain.
  4. This improves the readability of the program, since each compound statement easily identified as a primitive function, separate from the control structure that controls its execution. In traditional top-down fashion
        if (conditional)
            statement;
        else
            statement;
    
    is seen when reading a passage of code at one level of detail, and a close look can reveal the details of the statements:
        if (conditional)
            { /* when executed and what is done here	*/
            statements;
            }
        else
            { /* when executed and what is done here	*/
            statements;
            }
    
    A reader is therefore not forced to see the inner block details when trying to understand only the outer block. Note that when reading the code at the outer block's level of detail, only the introducing comment needs to be read to discern the purpose of a compound statement.
These rules are modified according to the same considerations as listed earlier, as seen in the else-if. For example:
    while (conditional)
        { /* when executed and what is done here	*/
        statements;
        }

for (s1; s2; s3) { /* when executed and what is done here */ statements; }

if (conditional) { /* when executed and what is done here */ statements; } else if (conditional) { /* when executed and what is done here */ statements; } else { /* when executed and what is done here */ statements; }
Note how these rules effect switch statements:
    switch (c)
        {
    case 1: /* when executed and what is done here	*/
        statements;
        break;
    case 2: /* when executed and what is done here	*/
        statements;
        break;
    }
The purpose of a typographical style is to present the semantic elements of your program in a way that is understandable by your readers.

The C programming language can be very deceptive. Although it has every characteristic of other block structured languages, because of the way it "looks", it must be treated differently. Many programmers started their careers using Algol derivatives, such as Simula: languages with BEGINs and ENDs. In such languages, BEGIN and END must be prominant as any declaration -- even a function -- could follow any BEGIN. (Later versions of C, though not Decus C, permit variables to be declared following a '{'.) There was thus little difference between single statements and whole programs. In these languages, keywords were always in upper case, library routines would have their first letter capitalized, and user defined variables and functions were in lower case. Everybody did things that way.

While, superficially, C doesn't look very different, it is so in some deeper sense. Those curly braces look like they want to disappear. The blocking appears to want to be done with indentation alone. Since you can't see the braces anyway, it probably doesn't make that much difference where they are, so long as the contents of the blocks are properly indented. There doesn't seem to be any real difference in readability.

Note also that C has at least four separate "flavors" of braces: structure definition delimiters, function delimiters, if/for/while/do delimiters, and switch block delimiters. Since there is only one construct terminator, '}', it becomes more important for the reader to be able to scan up and immediately locate the construct initiator. (In some other languages, such as Bliss, each construct, such as IF, has an unique terminator, such as FI. While this helps prevent runaway syntax errors, it also requires the programmer to remember more information.)

Responding to the difference in language syntax, programmers develop different programming habits. For example, an Algol programmer might think of an IF statement, in general, as:
    IF condition THEN statement ELSE statement;
(with one statement in each clause), while a C programmer might think of an IF statement as:
    IF condition THEN statements ELSE statements END-IF;
Where in C, the THEN is implied by the end of the condition, the braces around the THEN clause are a syntatic nusiance, and the ENDIF is represented by the closing brace on the ELSE clause.

We can do the same with loops.
    Algol:  WHILE condition DO statement;
    C:      WHILE condition DO statements END-WHILE;
Here too, the patterns we look for when reading the code are different. The END-IF and END-WHILE are represented, in C, by '}' which requires typographical prominance and must be kept visually distinct from the visually similar '{'. The C style:
    if (condition) {
        statements;
    }
is thus more understandable.

But, of course, programmers are different in their needs, backgrounds, and motivations. Essential, however, is the need to define a style, understand it, use it, and know when to violate it to attain the overriding goal of clarity and communication.

Summary

The following extended -- and artificial -- example shows most of the recommended decisions.
/*
 * A C Style Summary Sheet		Block comment
 * abstracted from one			describes a file
 * by Henry Spencer,			or section of
 * University of Toronto,		code.
 * Department of Zoology
 */

#include <stdio.h> Header files #include "local.h" don't nest

typedef int SYTYPE; Global definitions typedef struct symtab { structs use typedefs struct symtab *s_next; /* Link entries */ char *s_name; /* Symbol name */ SYTYPE s_type; /* Symbol type */ #define TY_UNK 0 /* unknown */ #define TY_INT 1 /* integer */ #define TY_STR 2 /* char * */ union { int i; /* Integer */ char *s; /* String */ } svalue; } SYMBOL; Typef's capitalized SYMBOL *sy_head = NULL; Explicit initialization

/* * sylookup(text) * * Look for a word in the symbol table, * return a pointer to the symbol if found. * return NULL if not found. */

static SYMBOL * What is returned sylookup(text) Name at first column char *text; /* Symbol name */ { register SYMBOL *syp;

for (syp = sy_head; syp != NULL; syp = syp->s_next) { if (strcmp(text, syp->s_name) == 0) return (syp); } return (NULL); }

/* * syprint(text) * * If the argument is in the symbol table, print * the associated value, else print "not found". */

syprint(text) Doesn't return a value char *text; /* Symbol name */ { register SYMBOL *syp;

printf("%s: ", text); The following shows an acceptable embedded assignment, but don't default the NULL test. Use braces even for a single statement. if ((syp = sylookup(text)) == NULL) { printf("%s: not found\n", text); } else { Braces here, too. switch (syp->s_type) { case TY_UNK: printf("unknown"); break; Blank line after break case TY_INT: printf("%d", syp->s_value.i); break;

case TY_STR: printf("%s", syp->s_value.s); break;

default: Always have a default Message before abort printf("? unexpected type %d\n", syp->s_type); abort(); } } printf("\n"); }