A C Style Sheet
Martin Minow
Digital Equipment Corp.
146 Main St. MLO 3-3/U8
Maynard MA 01754
June 15, 1983
Introduction
This document presents a common set of coding standards, as well as a
series of hints to aid in producing maintainable and transportable
C software. It is abstracted from a number of sources:
- Brian Kernighan and Dennis Ritchie.
The C Programming Language.
- Indian Hill C Style and Coding Standards (Bell Telephone Labs
unpublished Technical Memorandum 78-5221, March 29, 1978)
with annotations by Henry Spencer (utzoo!henry), University of Toronto.
- Joe Kalish. Ingres Coding Conventions for C/Unix Programming
(INGVAX.kalish @ Berkeley)
- Dan Franklin. BBN Programming Standards for C. (Dan @ BBN-UNIX)
- Andrew Shore, et al. Network Graphics C Style Sheet.
Stanford University, (from CSL.LANTZ @ SU-SCORE).
- Ray Van Tassle. C Language Programming Standards for Motorola
DCS. Motorola, Inc. 1301 E. Algonquin Road, Room 4135, Schaumburg, IL
60196.
It is also based heavily on my own experience in developing a number
of large, transportable, applications in C, using Decus C, Vax-11 C,
and several varieties of Unix C. (Unix is a trademark of Bell
Laboratories).
As could be expected, the suggested style does not totally
agree with any of the referenced documents.
Motivation
The reason a you should maintain a consistant coding style is
that good programs will evolve. When writing a new program, you will often
take routines and data structures from old -- working -- software.
This is much easier to do if the old software is understandable.
Unreadable software is unusable, no matter how
well it works.
The single most important thing about a typographical style is
sticking to it consistently. There are many good styles, but the differences
among them are totally drowned out by the difficulty of reading a program
with pieces in different styles. If you modify a program, stick to its
original style. If you must change things -- you really cannot live
with the old style --
change the whole program, or at least all the parts logically related to what
you are changing.
In recommending against automatic beautifiers (prettyprinters),
the Indian Hill standards committee noted:
"First, the main person who benefits from good program style is the programmer
himself. This is especially true in the early design of handwritten
algorithms or pseudo-code. Automatic beautifiers ... are not available
when the need for attention to white space and indentation is greatest.
It is also felt that programmers can do a better job of making clear the
complete visual layout of a function or a file with the normal attention
to detail of a careful programmer."
These comments are relevant to any rigid application of a programming
or typographical style. There will always be cases where the
automatic rule is unsatisfactory and you, as the person responsible,
must be able to understand that your primary goal is to achieve
clarity and understandability.
File Organization
A file consists of several sections separated by blank lines or
a form-feed (<FF>). If you use a form-feed, it should be the only
character on the line.
In general, source files should not be much longer than 1000 lines.
Larger files are often difficult to edit and -- if too large --
cannot be proccessed by the diff (differences) program. 1000 lines
translates to about 12-15 pages of text. Source lines should not
be longer than 78 characters long.
A source file should be organized as follows:
-
A prologue comment gives the
file name and a few sentences telling what is in the file.
This is followed, if necessary, by copyright and license "boilerplate."
The prologue tells the reader the purpose of the text of the file, whether
it contains
functions, data definitions, tables, or support code. It should not generally
be a list of function names.
In some programs, C source files may be created by program generators.
For example, a dictionary may be compiled into a keyword vector (one file)
and a definition vector (one file). In this case, the program generating
the files should write the date of generation (as a comment) into the
C file source. If the generated program file may require editing, consider
including the source of the generated information, either as comments or
as text bracketed by #ifdefs, as an aid to the debugger.
-
Usage and operating instructions
follow next. Decus C programs
should use the format accepted by the "getrno" utility program. This allows
the program source file to contain the source of its documentation
and lessens the burden of keeping the documentation in synchronization
with the program itself.
If a program is composed from separate modules, one of the modules
(generally the one with the main() function) should contain
instructions on how to build the program.
Decus C programs should
use the build utility to maintain compilation instructions:
/*)BUILD
$(PROGRAM) = program
$(INCLUDE) = header.h
$(FILES) = { file1 file2 }
*/
Unix programs should include the program's makefile within a
comment with some defined format, allowing extraction by a
simple program or shell script:
/*)MAKEFILE
program: file1.o file2.o
cc file1.o file2.o -o program
file1.o file2.o: header.h
*/
This centralizes everything
relevant to maintainence of a program in one place.
-
Header files
are specified using the #include preprocessor directive.
The suggested header file order is
#include <stdio.h>
#include <other system headers>
#include "user header files"
Note that header files should be given the ".h" filetype, while
all C source files should be given the ".c" filetype.
In allocating header files for large packages of programs,
you should avoid absolute pathnames
for header files. Use the <name> construction for system files,
relative directories for Unix and VMS systems, and externally defined
logical devices for RSTS/E and RT11. If the sub-system is reasonably
small, put all source files in one directory.
-
Typedefs, #defines and structure definitions
that apply to the file as a whole are next. Structures should
be defined as
typedef struct NAME {
...
} NAME;
Definitions should
be grouped functionally.
It is recommended that the struct and typedef name be the
same to simplify forward references (in linked lists):
typedef struct LISP_LIST {
struct LISP_LIST *car;
struct LISP_LIST *cdr;
} LISP_LIST;
Other references to this structure may use the typedef name:
register LISP_LIST *head;
-
Global data definitions are next. The suggested order is
-
Global variables defined in this file.
-
Static (file global) variables.
-
External variables and functions that are used throughout the file.
If a program is large enough to require multiple source files,
all global data should be defined by a "data.c" file which only
contains data definitions, while all other source files contain
extern references. Alternatively, the file containing the main
program, documentation, and build instructions should contain
data allocation.
A very large program would contain global references in an "extern.h"
#include file in addition to a "data.c" definition file.
-
Functions come last. If the file is a main program, the main()
function is first.
Header Files
Header files are included in other files during compilation. Some,
such as "stdio.h" are defined system-wide, and must be included by
any C programs that use the standard I/O library. Others are used
within a single program or group of programs.
Header files should be functionally organized. Declarations for
separate sub-systems should be in separate header files.
Header files should not be nested. This is not permitted by Decus
C and some data objects, such as typedefs and initialized data definitions,
must not be seen twice by the compiler in one compilation.
Header files should contain all #define, typedef, and extern
declarations necessary for a given program and shared among two or more
of its files.
Header files should not declare (i.e., allocate) variables or contain
code. This is frequently a symptom
of poor partitioning of code between files. One header organization
that has worked well for several medium-sized projects is:
-
Definitions, including common data structures, are
placed in one header file.
Things defined in this file are common to the entire package.
-
All external (global) data is defined in a second header file.
-
A separate data.c file contains all global data
allocations.
-
Definitions required by bounded sub-systems are in separate
header and data allocation files.
Declarations and Definitions
The use of the #define preprocessor command is especially recommended.
In general, numerical constants and array boundaries should never be
coded directly. They should be assigned a meaningful name and assigned
their permanent value by the #define. This will make it much easier
to administer large and evolving programs as the constant value can
be changed uniformly by changing the #define and recompiling.
The enumeration data type (not in Decus C) offers an improved manner of
managing constant definitions as additional type checking is then available.
In general, all constant values which are not strictly numeric should
be specified by #defines. Exceptions to this rule are the values 0 and 1
when used as the lower boundary of an array; relative indices (if p is
a pointer to an array element, p[1] is the next element, while p[-1] is
the previous element); and strictly numeric quantities. #defines may
even be useful in the latter situation as well:
#define SPEED_LIMIT 55
Note that defined quantities should generally be in upper-case.
Directly-coded numerical constants must have a comment explaining the
derivation of the value.
It is generally poor practice to use #defines to modify C syntax.
for example, the following definitions are not recommended:
#define reg register
#define begin {
#define end }
In certain circumstances, however, this may be necessary for proper compilation
or fastest possible execution:
#define DIV_2 >> 1
#define DIV_4 >> 2
Replacing divides by right-shifts cannot be done by the compiler as it
would yield incorrect results if the divisor were negative. If the
programmer knows that the divisor must be positive (which fact being duly
documented), this optimization becomes possible.
Also, the programmer may need to conceal non-portable quirks by means
of centrally-placed definitions:
#ifdef decus
#define UNSIGNED_LONG long
#else
#define UNSIGNED_LONG unsigned long
#endif
As will be noted under portability, most C compilers predefine a small
number of variables that may be used to conditionally compile machine
or operating-system specific code. This allows one program run on
multiple systems without hand-editing.
It is highly recommended that you use the following definitions freely
and consistantly:
#define EOS '\0'
#define FALSE 0
#define TRUE 1
NULL is defined by <stdio.h> and should not be explicitly
specified by your program as some compilers require NULL to
be type-cast.
EOS marks the end of a C string, while
FALSE and TRUE are used for Boolean testing. You will probably get
in the habit of only referring to FALSE in your if statements:
if (test != FALSE) {
This generates the best possible code. TRUE is usually used to return
a "success" value from a function. Don't use both TRUE and YES in
the same program to mean the same thing.
If a structure contains a data element that can take on one of several
values, it may be useful
to put the #define's for that element
within the structure definition. For example, here is a fragment
(slightly reorganized to fit on the documentation page)
from a Vax-11 C header file that defines a VMS system structure:
/*
* XABSUM -- Summary Extended Attribute Block
*/
struct XABSUM {
char xab$b_cod;
#define XAB$C_SUM 22 /* type code */
char xab$b_bln;
#define XAB$C_SUMLEN 0x0C /* block length */
#define XAB$K_SUMLEN 0x0C
....
Note that the information that would be placed in each field
is #define'd following that field. The definitions and structure
fields follow standard VMS syntax conventions.
The empty initializer "{}" should never be used. Initialized structures
should be fully delimited with braces. Constants used to initialize
longs should be explicitly long.
In any file which is part of a larger context, all local information
should be identified by use of the static keyword. Variables,
in particular, should not be accessable outside the file unless there
is an overriding need for global access. If these variables are
shared by only one or two other files, you should name these files
in a comment.
The readonly specifier, available in Vax-11 C, should be used
to signal data that does not change during execution. On other
compilers, it may be "hidden" by
#define readonly
Comments
The importance of comments cannot be overemphasized. In any professional
environment, many people will have to read your code, trying to understand
what you have done. Sometimes, they wish to modify it to do other
things; sometimes they need to modify it to do what you originally
intended to do. Consider the Golden Rule:
"if you make life easy for others, maybe someone
will be nice to you someday."
The purpose of a comment is to describe your intention. If
properly written, the code itself will adequately tell what you actually did.
There are two general types of comments:
Block comments are narratives describing the purpose of a portion of
the program text. They are written in the following format:
/*
* The comment text is written
* here in complete sentences.
*/
The comment text should be at the same level of indentation as the
source code it discusses. You should never write a comment that
could be interpreted as a C statement (unless the comment is blocking out
temporary debugging code). A block comment should always be included
at the beginning of a major segment of the program.
Very short comments may appear on the same line as the code they
describe. They should be tabbed over far enough to separate them from
the statements. if more than one short comment appears in a block of
code, they should all start at the same tab position:
while (!finish()) { /* Main sequence: */
inquire(); /* Get user request */
process(); /* And carry it out */
} /* As long as possible. */
Note that all single-line comments start at some specific column and
end with the closing "*/" tabbed
to column 72 on the line. Closing the comment at the right-hand
margin makes it more readable than if the "*/" were next to the
comment text itself:
while (!finish()) { /* Main sequence: */
inquire(); /* Get user request */
process(); /* And carry it out */
} /* As long as possible. */
In general, you should use one-line comments to document variable definitions
and block comments to describe the computation processes.
The above comments should actually have been written as a block comment:
/*
* Main sequence: get and process
* all user requests.
*/
while (!finish()) {
inquire();
process();
}
Function Declarations
Each function should be defined beginning in column 1 (to simplify
searches for the function's definition). If the function returns a
value or is static, that should be alone on the preceeding line.
Each formal parameter should be declared, with a comment, on a separate
line. If the function uses any external variables or functions (that
do not return integers), these should be declared with other local
variables. This is particularly beneficial to someone reading code
written by another.
The format for the function declaration may be illustrated as follows:
char *
savest(string)
char *string; /* String to save */
/*
* Savest saves its argument string in free storage,
* returning a pointer to the allocated datum.
* It returns NULL if the allocation fails.
*/
{
register char *ptr;
extern char *malloc();
extern char *strcpy();
ptr = malloc(strlen(string) + sizeof (char));
if (ptr != NULL)
strcpy(ptr, string);
return (ptr);
}
Note that, in the example above, the function description followed
the formal definition itself. Another acceptable style preceeds
the function by a block comment.
/*
* match(string, pattern)
*
* If the pattern is an initial substring of string,
* return a pointer to the first character of the
* string beyond those matching the pattern,
* Otherwise, return NULL. Thus:
* match("abcde", "abc")
* returns a pointer to the 'd' in the first string;
* match("abcde", "bc")
* returns NULL.
*/
char *
match(string, pattern)
register char *string; /* Source */
register char *pattern; /* for match */
{
while (*string == *pattern && *string != EOS) {
pattern++;
string++;
}
return ((*pattern == EOS) ? string : NULL);
}
In this format, the block comment is separated from the
function definition by a blank line.
The above program fragments illustrate several transportability
and maintainability issues:
-
Although the value returned by
strcpy() is not used by the savest() function,
it is declared so the compiler knows how to allocate and deallocate
space for the value it does return. This is important for compilers
running on stack machines with 16-bit integers and 32-bit character
pointers.
-
On some machines, sizeof (char) is NOT 1. If you want to allocate
space to hold the EOS at the end of a string, you should use the
transportable format, not the absolute value.
-
In the match() function, note that the end of string test is
written explicitly. You should not assume that strings are
terminated by a zero-valued byte.
Structure and Variable Declarations
Structures are one of the most important features of C. They enhance
the logical organization of your code, offer consistant addressing, and
will generally increase the efficiency and performance of your programs
by a significant amount.
In general, if there are two or more "things" in your program that are
addressed by the same index, they should be defined by a common
structure. This gives you great freedom to allow the program to
evolve (by adding another "thing" to the structure, for example), or to
modify storage allocation (from pre-compiled to dynamic allocation).
For example, if your program processes symbols -- where each symbol
has a name, type, flags, and an associated value, you shouldn't
define separate vectors:
char *name[NSYMB];
int type[NSYMB];
int flags[NSYMB];
int value[NSYMB];
but, rather,
typedef struct SYMBOL {
char *sy_name;
int sy_type;
int sy_flags;
int sy_value;
} SYMBOL;
SYMBOL symboltable[NSYMB];
All structures should be defined by typedefs. Note, also, the
use of a header ("sy_") to identify members of the SYMBOL structure.
There is one important exception to the rule that conforming data
areas are declared by a single data structure: the case where
some data is read-only and some read-write. In this case, you
may wish to allocate separate areas to permit use of the readonly
specifier or to allocate read-only data in an overlay segment.
The local variables used by a function should have names that do not
duplicate global names.
Compound Statements
Compound statements carry out the calculations required by the C program.
They are lists of statements enclosed in braces. They should be tabbed
over one more than the tab position of the compound statement
introducer itself. (Four space indentation is recommended, although
it is certainly more convenient to use the hardware-provided eight
position tab stops. If you change your mind in the middle of a program,
you should have the courtesy to re-edit the rest of the file so it is
consistant.)
The opening left brace should be at the end of the line beginning the compound
statement and the closing right brace should be alone on a line, tabbed under
the beginning of the compound statement. Note that the left brace
beginning a function body is the only occurrance of a left brace which
is alone on a line. This is the "Indian Hill" style, also used in
Kernighan and Ritchie's book. (Other style sheets recommend
placing the opening left brace alone on the line following the statement
opener. Choose one style; be consistant. This subject will be discussed
further in a subsequent section.)
The right brace before the while of a do-while statement is
the only place where a closing right brace is not alone on a line:
do {
stuff();
} while (cond != FALSE);
It is good practice always to provide braces, even when they are
are not required by the language:
if (abc < def) {
lesser();
}
else if (abc == def) {
equal();
}
else {
greater();
}
This prevents suprises when you add debugging statements.
Never, never, write nested conditionals or loops without braces:
for (dp = &values[0]; dp < top_value; dp++)
if (dp->d_value == arg_value
&& (dp->d_flag & arg_flag) != 0)
return (dp);
return (NULL);
While the above is correct C, it is
unmaintainable. It should always be written as
for (dp = &values[0]; dp < top_value; dp++) {
if (dp->d_value == arg_value
&& (dp->d_flag & arg_flag) != 0) {
return (dp);
}
}
return (NULL);
If the span of a block is large (more than about 40 lines)
or there are several nested blocks,
closing braces should be commented to indicate what part of
the process they delimit:
for (sy = sytable; sy != NULL; sy = sy->sy_link) {
if (sy->sy_flag == DEFINED) {
...
} /* if defined */
else {
...
} /* if undefined */
} /* for all symbols */
Each line should contain one and only one statement. The only
exception to this is the else if construction as shown above.
In a sequence of "if ... else ..." statements, there should
always be a terminating else even if it is merely a dummy statement.
Note especially that an if statement and its associated conditionally
executed statement appear on separate lines.
If a for, if, or while statement has a dummy body,
the ';' must go on the next line:
/*
* Locate end of string
*/
for (charp = string; *charp != EOS; charp++)
;
There are few more insidious bugs than an extra ';' tacked on the end of
a for or if statement. Everything will compile normally and the code
might even work for some cases, but -- because of the invisibility
of the ';' -- the bug will be very difficult to track down.
There should always be a blank between reserved words and their opening
parentheses, e.g., "if (condition)" rather than "if(condition)".
There should also be parentheses around the objects of sizeof and
return.
If the conditional test in an if statement is so complex that it requires
more than one line, break it at an && or ||, and line up the
expressions so the tests line up as well:
if (a == b
&& b == c) {
printf("a == c");
}
If the conditional test extends over one line, always enclose the
conditionally-executed statement in braces.
The above is a special case of a more general recommendation that
you break statements across lines at meaningful boundaries, and
attempt to align the components to make the meaning clear. For
example, the following sequence computes the length of an RMS
logical record.
r->lrecl = r->rab.rab$w_rsz /* Record size from RAB */
+ ((hbyte != EOS) ? 1 : 0) /* If header byte */
+ ((tbyte != EOS) ? 1 : 0) /* If trailer byte */
- offset /* For Fortran hacking */
+ hnewline /* For VFC hacking */
+ tnewline; /* For VFC hacking */
Switch statements offer a good alternative to multiple if...else
sequences. Each case appears by itself on a line, tabbed under
the switch itself. The break that terminates a case
should be followed by a blank line.
The "fall through" feature of C's switch statement should rarely,
if ever, be used. If it is needed, it must be commented for further
reference:
eow = 0;
while ((c = getchar()) != EOF) {
switch (c) {
case '\n': /* Newline, */
lines++; /* count lines */
/*
* Fall through to "end of word" case
*/
case '\t': /* Tabs, newlines, and blanks */
case ' ': /* Form words. */
words += eow;
eow = 0; /* Don't count multiple runs */
letters++; /* But count all "whitespace" */
break;
default: /* All the rest form a word */
letters++;
eow = 1;
break;
}
}
words += eow; /* Fix count of last word */
The above implements the central algorithm of a "word count" routine
where a newline, blank, or tab terminates a word, but
multiple blanks do not increase the number of words.
Note that the break following the last case is redundant, but should
be provided to make the programmer's intent clear. In general, the
default case should be last.
All switch statements should have a default case, which may
merely be a "fatal error" exit.
Expressions and Operators
C is an expression language. This means -- in essence -- that the
assignment statement "a = b" itself has a value which can be embedded
in a larger context. This should be used very sparingly. For
example,
while ((value = *pointer++) != 0) {
process(value);
}
shows a standard C idiom which all programmers should recognize. It
is essential, however, that you do not carry this to extremes by embedding
mutiple assignments (or other side-effects) in a statement.
Blanks should surround all binary operators except those which compose
primaries, (".", "->"). No blanks should separate a unary operator
(such as '-', '&', '[]', '!')
from its operand. Sizeof and return are exceptions to this rule.
Some judgement is called for here as
there are a few situations when complex expressions become clearer
when inner constructions don't have spaces. For example,
x = (a*b) + (c*d);
Blanks should appear after commas in argument
lists to help separate the arguments visually. On the other hand,
macros with arguments and function calls should not have a blank
between the name and the left parenthesis.
Side effects within expressions should be used sparingly. No more
than one operator with a side-effect ("=", "op=", "++", "--") should
appear within an expression. It is very easy to misunderstand the
rules for C compilation and get side-effects compiled in the wrong
order. For example,
func(*ptr++, *ptr++);
*ptr = *ptr++;
*ptr++ = *ptr;
Are not necessarily going to do what you expect; and are going
to do different things on different implementations of C.
The old versions of the assigned operators ("=+", etc.) must not be
used. Always surround assigned operators by spaces. "x=*foo" is interpreted
as "x = x * foo" (even if foo is a pointer) by some compilers.
The comma operator should be used exceedingly sparingly. One of the
few appropriate places is in a for statement:
for (sum = 0, ptr = &array[0]; ptr < &array[A_MAX];) {
sum += *ptr++;
}
Since C has some unexpected operator precedence rules, all expressions
involving mixed operators should be fully parenthesized. This is
especially true when comparison or
mask operators (&, |, and ^) are combined
with shifts. Always write
if (value > (1 << 12))
...
with parenteses around the shift operation.
Naming Things
When a program must be used as part of a larger context, whether it
be a subroutine library, or an independent program within an
application package, the programmer's creativity in defining
mnemonic names must be subservient to the needs of the group as a whole.
Naming Rules
-
Application program names should follow a standard format,
such as:
The 1st 2 or 3 characters = sub-system code
The rest = unique meaningful identifier
-
Names (variables, structs, unions,
and procedures) are lower-case,
unique in the first eight characters. (Some C systems require names to
be unique in the first six characters.)
External names must be unique in the first six characters.
If the first letter in an external name is an underscore '_'
it indicates a Unix system-internal name,
(such as a routine within a file-management I/O system).
Application programs should not use this
as it implies system-level programming. Trailing underscores
should also be avoided.
(Note that any use of underscore
may conflict with variables defined by your operating
systems. For example, on RSX-11M, the operating system file management
routines use -- in effect -- a leading underscore.
Longer names and underscore should be used freely to improve readability
and understandability.
Upper-case and lower-case should not be mixed in a name.
Names more than four characters long should differ by at least two characters:
int systst, sysstst; /* are easily confused */
Constants (things named by a #define) should be in all upper-case.
All names must be unique, ignoring case. In other words, even though C
knows that "this" is different from "THIS", do not do it.
Although this guide recommends keeping variables in lowercase only,
and #define'd constants in uppercase only, there are a few times
when breaking this rule results in greater clarity.
It is better to be compatible with an externally-defined standard,
even if it is in mixed case. For example, if the hardware documentation
for a chip refers to "TxRdy", your device driver should use the same
format to refer to this entity.
Choosing names
Names should be meaningful. Abbreviations should also be meaningful, and
should be chosen by some uniform, rational scheme.
-
Each variable and name must have an invarient usage and
meaning throughout the program.
-
Names should not be re-defined in inner blocks. Nor should global
names be redeclared within a function.
-
Standard meaningful names for local (temporary) variables include:
i, j, k indexes
c, ch character
n, m counters
p, q, a, b pointers
s strings
Never use the letter 'l' as a variable or in any context where it
could be confused with the digit '1'.
Names for structs, unions, and defines
Consider using typedef for structs and unions. This helps both
reader of the code and type checking programs such as LINT.
In a large system, global names should be composed of two parts,
a one or two letter prefix, relating to the sub-system, and
a longer name defining the item itself.
If a several symbols are needed to refer to an entity, they should
have some consistent relation. For example:
#define MAX_ITEM 123
typedef struct ITEM {
struct ITEM *it_next;
int it_value;
} ITEM;
ITEM item_store[MAX_ITEM];
ITEM *itfirst = &item_store[0];
#define ITEM_LAST (&item_store[MAX_ITEM])
Note that
For a member of a struct, the prefix should be related to the
body of the struct name.
If structures are declared in #include files and there is a risk
that the file might be included twice, you should block multiple
compilation to prevent compiler error messages. For example, if
the ITEM structure declaration were stored in a header file, item.h, you
should write it as follows:
#ifndef item_h
typedef struct ITEM {
...
} ITEM;
#define item_h 1 /* Structure has been declared */
Pointers
Pointers should be declared and used as "pointer to a thing of type X".
Do not, for example, use a variable which is declared as "pointer to int"
to point to a char, even though the particular compiler and/or
machine will let you do it. On most compilers, unions may be
used to allow pointers to different objects. At other times,
explicit type casts are the simplest solution.
For example, a print formatter may need both types of pointer:
#define INT 0 /* Storage classes for */
#define LONG 1 /* Formatter */
#define NEG 2 /* Negative flag */
typedef struct FORMAT {
char f_type; /* Format type */
int f_width; /* Item storage width */
int f_radix; /* Conversion radix */
} FORMAT;
static FORMAT formatinfo[] {
{ 'd', INT, 10 },
{ 'o', INT, 8 },
{ 'u', INT, 10 },
{ 'D', LONG, 10 },
{ EOS, 0, 0 },
};
c__doprnt(format, argp, fildes)
char *format;
int *argp;
FILE *fildes;
{
register union {
FORMAT *fmt; /* -> format codes */
char *out; /* -> result */
} p; /* General pointer */
int radix; /* Conversion radix */
int temp; /* General temp value */
char c; /* Current format char */
long value; /* Value to convert */
char work[WORKSIZE]; /* Number buffer */
...
/*
* Search for a matching format.
*/
p.fmt = formatinfo;
while (p.fmt->f_type != c && p.fmt->f_type != EOS))
p.fmt++;
if (p.fmt->f_type != EOS) {
/*
* A numeric conversion was found. Get the
* value and expand it into the work area.
*/
radix = p.fmt->f_radix;
temp = p.fmt->f_width;
p.out = &work[WORKSIZE - 1];
*p.out = EOS; /* Terminate result */
if (temp == INT) {
if (c == 'd' && *argp < 0) {
value = -(*argp++);
temp = NEG; /* Remember signal */
}
else /* 'u', 'x', or 'o' */
value = (unsigned) *argp++;
}
else { /* Get long from caller */
value = *((long *) argp)++;
}
if (value == 0)
*--p.out = '0';
else {
do { /* Convert unsigned number != 0 */
*--p.out = "0123456789abdef"[value % radix];
} while ((value /= radix) != 0);
if (temp == NEG)
*--p.out = '-';
}
The program will then output the EOS-terminated string starting at
p.out.
Note that a union was used when the same (register) variable was
used to point to two separate objects (at different points of
the program), while casts were used when a pointer refers to
different objects, depending on the particular data being processed.
Standard Defined-names
There are a number of #define'd names whose meaning is standardized
by C programs:
TRUE - Boolean true
FALSE - Boolean false
NULL - For comparison or assignment of pointers
EOS - The end of string marker
EOF - End-of-file
In writing a large program, the following standards proved to be useful:
DEBUG - Switch for compiling debugging code.
DEBUG_X - Debug sub-part X only
TESTING - Compile a built-in test program. See below.
INT_16 - A storage integer that must hold 16 bits.
INT_32 - A storage integer that must hold 32 bits.
INT - Whatever is fastest for this compiler.
FLAG - A TRUE/FALSE (or small range of values) flag.
In developing a large program, many subroutines included a small
main program for testing. This program was conditionally compiled
by #define'ing the TESTING compile-time variable. When the
module has been debugged, TESTING is undefined and the module integrated
with the rest of the package.
INT_16, INT_32, and INT were used in the same large program to eliminate
dependency on certain compiler/machine dependencies. For example, on
the Motorola 68000, 16-bit integers are computationally faster than
the default (32-bit) int, whereas on the Vax-11, 32-bit integers
are more efficient. The program's header file thus contained:
#ifdef vax
#define INT_16 short
#define INT_32 int
#define INT int
#endif
#ifdef M68000
#define INT_16 short
#define INT_32 long
#define INT short
#endif
Portability
Portability means that a source file can be compiled and executed
on different machines, operating systems, and/or compilers with
either no source file changes or, at most, changes to system-specific
header files. In writing portable software, the following should be
understood:
-
Most C compilers predefine symbols that may be used to isolate
machine-dependent code. The following list may be helpful:
-
Decus C defines "pdp11", "decus", "rsx" (or "rt11").
-
Vax-11 C defines "vax", "vms", and "vax11c"
-
Venix defines "pdp11", and "unix"
-
A compiler for the Dec-20 defines "TOPS20" and "PDP10"
When running on Unix, the compiler option -Dxxx may be used
to pre-define a symbol without modifying the source code.
-
Some things are inherently non-portable. For example, a hardware
device handler can, in general, not be transported between operating
systems.
-
Different machines have different word sizes. While the language standard
guarantees that "long int" is at least as long as "int"
and "short int" are never longer than "int", it does not guarantee
any specific word length. Note also that pointers and integers are
not necessarily the same size; nor are all pointers the same size.
-
Word size and constants can interact in unpleasant ways. For example,
int x;
x &= 0177770;
Clears the low-order 3 bits of an integer on a PDP-11. However, on
a Vax, it will also clear the upper half-word. Instead, you should
use:
x &= ~07;
Which is portable.
-
Beware of code that takes advantage of two's complement arithmetic.
In particular, optimizations that replace division or multiplication with
shifts should be avoided.
-
Watch out for the PDP-11 signed character, which becomes unsigned on
other machines.
-
Do not presuppose any specific byte ordering
within words.
-
Do not default Boolean tests. Use
if (func() != FALSE) {
Instead of
if (func()) {
A particularily insidious example of incorrect code is:
if (strcmp(s1, s2)) {
/* different */
}
Always write
if ((strcmp(s1, s2) != 0) {
/* different */
}
Decus C provides streq() for this purposes. On other systems, you
can easily write a macro:
#define STREQ(a, b) (strcmp((a), (b)) == 0)
One counter example to this is generally made for predicates: functions
which have no other purpose than to return TRUE or FALSE, and which
are named so that the meaning of a TRUE return is absolutely obvious.
For example, a routine should be named "isvalid()", not "checkvalid()".
-
Be very suspicious of numeric values appearing in the code. Almost all
constants would be better expressed as #defined quantities.
-
Any unsigned type other than unsigned int should be identified by a
typedef, as these are highly compiler dependent. As noted above,
large programs should have a central header file which encapsulates
machine-dependent information.
-
Become familiar with the standard library and use it for string
and character manipulation. Do not reimplement standard routines
as the person reading your code must then figure out whether you're
doing something special in the reimplemented stuff. Home-brew
"standard" routines are a fruitful source of bugs as your routines might be
called by other parts of the library. Also, the standard library hides
non-portable details that you might not (and generally should not) be
aware of.
Miscellaneous
This section contains a fairly disorganized list of hints, some of which
appear in other sections of this style sheet. They are not in any
specific order.
-
Don't change syntax via macro substitution. It makes the program
unintelligible to all who come after.
-
There is a time and place for embedded assignment statements. In some
cases, this is the best way to specify the algorithm. However, it
is not your responsibility to second-guess the compiler by packing
code as tightly as possible. For example:
a = b + c;
d = a + r;
should not be rewritten as:
d = (a = b + c) + r;
Even though the latter may save one instruction.
Note also that a C compiler may freely modify the order of execution
of an expression. Thus,
a = (b + c) + d;
Will not necessarily add b to c, then add the result to d.
If the order of evaluation is important (for accuracy or overflow
prevention), you must write separate statements with temporary
variables:
temp = b + c;
a = temp + d;
-
Don't overuse the ternary "(cond) ? a : b" operator
The condition should always be enclosed in parentheses.
Nested ternary operators should be avoided if possible.
The ternary operator does not guarantee order of execution.
The following is therefore unsafe:
a = (b == 0) ? 0 : d / b;
-
Goto statements should be used sparingly. The main place where they
are useful is in breaking out of several levels of
switch/for/while
nesting. If a goto is needed, the accompaning label should be
at the left margin with a comment explaining who jumps here.
The continue statement is also a source of bugs.
But, don't be
afraid that evil spirits will haunt you if you write the dreaded
goto. It is often much clearer to use gotos to escape
from an inner loop
than by using seemingly random combinations of break,
continue, return
and default exits from switch statements. To some extent, the lack of
a rich set of exit operations is a failure of C, requiring disipline
and a commitment to clarity on the part of the programmer.
Often, the need for gotos and complicated exit conditions is an indication
that the inner constructions ought to be redone as a separate function with
a success/failure return code.
Never goto into an else clause or into the body of a for
or while loop.
-
In declarations (#defines, structure definitions, or variable defininitions),
various components should line up. Thus:
#define TESTING 1
#define PRODUCTION 2
-
When the storage structure or type of a variable is important, always
state it explicitly. In particular, use auto if you are going to
use the address of a local variable using '&'. Declare integer parameters
as int, rather than letting them default.
-
Sometimes it is impossible to avoid doing something tricky. (And sometimes
you just can't resist the temptation.)
At the very least, put enough documentation
in the code to warn the poor soul who comes after you.
-
Try to write code that is clear and safe, rather than something that
"seems" easier to compile. Make sure local variables are local
(or static) so things won't blow up when you compile with other modules.
-
Try to keep the flow of control through your program apparent. Where this
is governed by separately-compiled tables (such as a finite-state parser),
embed comments in the parser table to aid the maintainer.
-
Use register variables wherever possible. They are especially efficient
when used as structure or array pointers. Since offsets within a
structure are known at compile time, the compiler can generate extremely
efficient code.
For example, suppose a program is processing a collection of elements
which have a value and a set of flag bits. The "simple" solution
would be:
int value[MAX];
long flags[MAX];
int array_max;
int
lookfor(arg_val, arg_flag)
int arg_val;
long arg_flag;
/*
* Return index to the element with the same
* value and at least one matching flag bit.
* Return -1 on failure.
*/
{
int i;
for (i = 0; i < array_max; i++) {
if (value[i] == arg_val
&& (flag[i] & arg_flag) != 0) {
return (i);
}
}
return (-1);
}
The inner loop of the above requires turning the index "i" into a
pointer twice. The above should generally be rewritten as:
typedef struct data {
int d_value;
long d_flag;
} DATA;
DATA values[MAX];
DATA *top_value;
DATA *
lookfor(arg_value, arg_flag)
int arg_value;
long arg_flag;
/*
* Return a pointer to the element with the same
* value and at least one matching flag bit.
* Return NULL on failure.
*/
{
register DATA *dp;
for (dp = &values[0]; dp < top_value; dp++) {
if (dp->d_value == arg_value
&& (dp->d_flag & arg_flag) != 0) {
return (dp);
}
}
return (NULL);
}
Note the use of redundant braces in the above programs.
-
If a function manipulates a database stored in a separate file,
the routines that manipulate (generate and access) this database
should be isolated from other routines. The internal structure
of the data base should also be defined. If the database format is likely
to change, a release date or version should be buried in the database
and precompiled into the manager software. The program should check
the validity of the release date when the package opens the database.
-
If a file contains the main routine of a program, that should be the
first function in the file. On Unix and VMS, where programs may be
called as sub-processes, it is important that all programs exit by
calling exit(). On Unix, use "exit(0)" for success and exit(1)
for failure. The following construction may be useful:
#ifdef vms
#include <ssdef.h>
#endif
...
#ifdef vms
exit(SS$_NORMAL);
#else
exit(0);
#endif
-
In the condition portion of an if, for, while, etc., side effects
whose effect extends beyond the extent of the guarded statement
block should be avoided. For example, consider:
if ((c = getchar()) != EOF) {
guarded-statements
}
other-statements
It is natural to think of variable "c" being "bound" to a value only
within "guarded-statements." Its value should not be presumed upon
entrance to "other-statements." Using a variable set or modified inside
a condition outside the range of statements guarded by the condition is
in general quite distracting.
-
You should not use || and && with right-hand operands having
side-effects. For example,
if ((fildes = fopen("file.nam", "r")) == NULL
|| readin(fildes) != SUCCESS) {
bug("something's wrong somewhere.);
}
A better approach would be
if ((fildes = fopen("file.nam", "r")) == NULL) {
perror("file.nam");
bug("Can't open input file");
}
else if (readin(fildes) != SUCCESS)
bug("couldn't read file");
Whenever conditional sequences contain both || and &&, parentheses
should be used for clarity.
-
Routines should be kept reasonably short. It is important for the
maintainer to be able to read and comprehend all of the routine at
one glance. In general, a routine processes one or more inputs
and generates one or more outputs, where each of the inputs and outputs
can be consisely described.
Signs that a routine is too long, and ought to be split up, are:
length greater than 100 lines (two pages), heavy use of localized
variables (whose active scope is less than the entire routine), or
conditional or loop statements nested more than four levels.
Even when processing is linear (do first part, do second part, etc.),
it is often helpful to the maintainer to break the routine into separate
pieces:
main(argc, argv)
int argc;
char *argv[];
{
setup(argc, argv);
process();
finish();
}
On many operating systems, the setup() and finish() modules can
be compiled into overlay structures, leaving more room for in-memory
data.
-
Use of globals should be minimized by judicious use of parameters.
-
In general, a routine should be designed with a "natural", easily-remembered
calling sequence. Routines with more than five arguments are not recommended.
Routines with "op-code" arguments, where one argument determines the
interpretation, type, and functions of the others, are also not recommended
(though they often prove useful as internal routines to a package, they
should not be part of a package's documented interface.)
-
Datatype compatibility should be practiced where possible. This can
be facilitated by use of C's typedef facility, by explicit
type casting,
or by the use of the union datatype.
A package which returns a pointer to a structure whose format need
not be known outside of that package may return a "generic pointer"
(char *). The C language specifically guarantees that any pointer
may be converted to a char * and back again without harm.
-
Use #defines to eliminate magic numbers. Use compile-time computation
to combine magic numbers into others:
#define ARRAY_A_SIZE 123
#define ARRAY_B_SIZE 456
#define BOTH_SIZE (ARRAY_A_SIZE + ARRAY_B_SIZE)
If you change ARRAY_A_SIZE, the compiler with change BOTH_SIZE
without your further intervention.
-
Some experience is needed to decide what to put in a for statement
and what to put in the loop body. In general, you should put
what is needed to control the loop in the for, and the process itself
in the body. Also, you should be disciplined about using break,
continue, and goto to control "unusual" break-out cases. For
example, the following code searches a symbol table for an
unused element:
for (sp = &sym[0]; sp < &sym[MAXSYM]; sp++) {
if ((sp->sy_flag & UNUSED) != 0)
goto found;
}
error_message("No room in symbol table");
return (FALSE);
found:
/* ... here to process symbol */
return (TRUE);
In this case, the most natural way to write the code is to
use a goto for the "normal" case. While the above could be
handled by a flag (or auxiliary test), the solution seems
less intuitive:
for (sp = &sym[0]; sp < &sym[MAXSYM]; sp++) {
if ((sp->sy_flag & UNUSED) != 0)
break;
}
if (sp >= &sym[MAXSYM]) {
error_message("No room in symbol table");
return (FALSE);
}
else {
/* ... here to process symbol */
return (TRUE);
}
-
The first three register variables, in lexicographic order, should
be ones for which the most gain can be gotten.
-
While C distinguishes between upper- and lower-case in variables
and keywords, the programmer should maintain disipline. Global
symbols should never require case distinction as they will not
work properly on many operating systems. You should also avoid
using the same name for different quantities.
Never require the reader to see differences
between "1" (digit), "l" (letter), and "I"; or between "O", "Q", and "0".
The C language
"long constant" identifier ("1l" is a long integer if the second character
is the letter 'L') offers a
good example of a practice to avoid (use "1L" instead).
Re-examining Braces and Indentation
Several other style sheets recommend the brace syntax:
if (cond)
{
statements;
}
Another recommendation is similar to the above except that the
braces are aligned with the conditionally-executed statements:
if (cond)
{
statements;
}
This follows the structured programming methodology that "begin" and
"end" are at the same indentation level.
The syntax recommended in this manual (with the left brace on the same line
as the conditional)
seems, in the author's eyes, to bind the
left brace closer to the conditional than does the "left brace on
a new line" format. Also, Left braces don't
appear in the same column as right braces and are, hence, more
visually distinctive. Finally, the right brace is aligned vertically with
the clause introducer (if/while/etc.) with no intruding text. This
seems to make things more visible.
When an early draft of this style sheet was reviewed, a collegue,
Jeff Lomicka, took exception to the recommendations for indentation.
Here is an alternative indentation style
presented with its own rationale. You
may choose your style accordingly, but be prepared to understand
and defend your choice.
A program is a sequential execution of simpler functions, each of
which is broken up into more primitive functions until the
primitives become directly executable. A compound statement is
the same kind of entity as is a single statement or a function
call, and should therefore be treated equally.
The goal of proper indentation is to separate visually the level
of detail at which the program is viewed, and to permit the reader
easily to associate
related elements of the program with each other. For example, we
need to associate an "if" with its "else", and to be able to
determine what are the contents of the if-clause and else-clause.
The general formatting rules are:
-
Statements executed sequentially are all at the same indentation
level.
-
If a statement includes other statements, such the "while" loop
body or the "then" and "else" clauses of a conditional, these
statements are indented to the next block level.
-
Braces are part of the statement, and are always displayed
at the same indentation level as the code they contain.
-
This improves the readability of the program, since each compound
statement easily identified as a primitive function, separate from
the control structure that controls its execution. In
traditional top-down fashion
if (conditional)
statement;
else
statement;
is seen when reading a passage of code at one level of detail, and
a close look can reveal the details of the statements:
if (conditional)
{ /* when executed and what is done here */
statements;
}
else
{ /* when executed and what is done here */
statements;
}
A reader is therefore not forced to see the inner block details
when trying to understand only the outer block. Note that when
reading the code at the outer block's level of detail, only the
introducing comment needs to be read to discern the purpose of a
compound statement.
These rules are modified according to the same considerations as
listed earlier, as seen in the else-if. For example:
while (conditional)
{ /* when executed and what is done here */
statements;
}
for (s1; s2; s3)
{ /* when executed and what is done here */
statements;
}
if (conditional)
{ /* when executed and what is done here */
statements;
}
else if (conditional)
{ /* when executed and what is done here */
statements;
}
else
{ /* when executed and what is done here */
statements;
}
Note how these rules effect switch statements:
switch (c)
{
case 1: /* when executed and what is done here */
statements;
break;
case 2: /* when executed and what is done here */
statements;
break;
}
The purpose of a typographical style is to present the semantic
elements of your program in a way that is understandable by your
readers.
The C programming language
can be very deceptive. Although it has every characteristic of other
block structured languages, because of the way it "looks", it must be
treated differently. Many programmers started their careers using
Algol derivatives, such as Simula:
languages with BEGINs and ENDs. In such languages, BEGIN and END
must be prominant as any declaration -- even a function --
could follow any
BEGIN. (Later versions of C, though not Decus C, permit variables
to be declared following a '{'.)
There was thus little difference between single statements and whole
programs. In these languages,
keywords were always in upper case, library
routines would have their
first letter capitalized, and user defined variables and functions were in
lower case. Everybody did things that way.
While, superficially, C doesn't look very different, it is so in some
deeper sense.
Those curly braces look like they want to disappear. The
blocking appears to want to be done with indentation alone. Since you
can't see the braces anyway, it probably doesn't make that much
difference where they are, so long as the contents of the blocks are
properly indented. There doesn't seem to be any real difference in
readability.
Note also that C has at least four separate "flavors" of braces:
structure definition delimiters, function delimiters, if/for/while/do
delimiters, and switch block delimiters. Since there is only one
construct terminator, '}', it becomes more important for the reader
to be able to scan up and immediately locate the construct initiator.
(In some other languages, such as Bliss, each construct, such as IF,
has an unique terminator, such as FI. While this helps prevent runaway
syntax errors, it also requires the programmer to remember more information.)
Responding to the difference in language syntax, programmers develop
different programming habits. For example, an Algol programmer might
think of an IF statement, in general, as:
IF condition THEN statement ELSE statement;
(with one statement in each clause), while a C programmer might
think of an IF statement as:
IF condition THEN statements ELSE statements END-IF;
Where in C, the THEN is implied by the end of the condition, the
braces around the THEN clause are a syntatic nusiance, and the ENDIF is
represented by the closing brace on the ELSE clause.
We can do the same with loops.
Algol: WHILE condition DO statement;
C: WHILE condition DO statements END-WHILE;
Here too, the patterns we look for when reading the code are different.
The END-IF and END-WHILE are represented, in C, by '}' which requires
typographical prominance and must be kept visually distinct from
the visually similar '{'. The C style:
if (condition) {
statements;
}
is thus more understandable.
But, of course, programmers are different in their needs, backgrounds,
and motivations. Essential, however, is the need to define a style,
understand it, use it, and know when to violate it to attain the
overriding goal of clarity and communication.
Summary
The following extended -- and artificial -- example shows most of
the recommended decisions.
/*
* A C Style Summary Sheet Block comment
* abstracted from one describes a file
* by Henry Spencer, or section of
* University of Toronto, code.
* Department of Zoology
*/
#include <stdio.h> Header files
#include "local.h" don't nest
typedef int SYTYPE; Global definitions
typedef struct symtab { structs use typedefs
struct symtab *s_next; /* Link entries */
char *s_name; /* Symbol name */
SYTYPE s_type; /* Symbol type */
#define TY_UNK 0 /* unknown */
#define TY_INT 1 /* integer */
#define TY_STR 2 /* char * */
union {
int i; /* Integer */
char *s; /* String */
} svalue;
} SYMBOL; Typef's capitalized
SYMBOL *sy_head = NULL; Explicit initialization
/*
* sylookup(text)
*
* Look for a word in the symbol table,
* return a pointer to the symbol if found.
* return NULL if not found.
*/
static SYMBOL * What is returned
sylookup(text) Name at first column
char *text; /* Symbol name */
{
register SYMBOL *syp;
for (syp = sy_head; syp != NULL; syp = syp->s_next) {
if (strcmp(text, syp->s_name) == 0)
return (syp);
}
return (NULL);
}
/*
* syprint(text)
*
* If the argument is in the symbol table, print
* the associated value, else print "not found".
*/
syprint(text) Doesn't return a value
char *text; /* Symbol name */
{
register SYMBOL *syp;
printf("%s: ", text);
The following shows
an acceptable embedded
assignment, but don't
default the NULL test.
Use braces even for a
single statement.
if ((syp = sylookup(text)) == NULL) {
printf("%s: not found\n", text);
}
else { Braces here, too.
switch (syp->s_type) {
case TY_UNK:
printf("unknown");
break;
Blank line after break
case TY_INT:
printf("%d", syp->s_value.i);
break;
case TY_STR:
printf("%s", syp->s_value.s);
break;
default: Always have a default
Message before abort
printf("? unexpected type %d\n", syp->s_type);
abort();
}
}
printf("\n");
}