Guidelines for Writing Empire Code compiled and edited by Markus Armbruster Preface ------- Empire is an old program, and the age shows in places. It took a lot of people a lot of effort to write it. Chances are that some of them were smarter than you and me. Treat the code with the respect it deserves. These guidelines are just that: guidelines. Not law. There's no Empire Coding Police. People hacked on Empire code long before I knew it existed, and I hope they still will when I fade into Empire history. But there's substantial experience behind these guidelines, so don't dismiss them cavalierly. Here's our goal: algorithmic clarity. All the rules and guidelines are subservient to it. If you'd have to obfuscate your code to satisfy the letter of some guideline, don't! Mind, I said `clarity', not `self-realization' or `expression of your individuality'. Dave Pare wrote: Be invisible. When you make a change, think not about marking your place in history, or about showing off how much nicer your two-space tabs are than those old, icky eight-space tabs that the stupid empire hackers of old used, but think instead about the aesthetics of the whole. The resulting lines of code should flow smoothly from the beginning of the procedure to the end. Empire is 60,000 lines of code. If you're the general case, you're only changing fifty lines, so be aware of that. Some guidelines are more serious than others. When I use words like `is', `shall' or `must', I mean it. When I write `should', it's a recommendation. Many guidelines concern style, i.e. they involve matters of taste. That's how it has to be. Uniformity of style is important for clarity. These guidelines don't attempt to be exhaustive. More complete guidelines that are mostly compatible with Empire can be found at http://www.jetcafe.org/~jim/c-style.html Source tree organization ------------------------ FIXME Code formatting --------------- In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt." -- Blair P. Houghton Over the years, enough Empire coders lacked the good taste to preserve the style of the original Empire code in their changes, and thus turned the source code into an unreadable mess. In 2003, we fed it to GNU indent. We tried to restore things to the original style, mostly. There is one noteable change: basic indentation is now four spaces. Restoring the original eight spaces would have resulted in many more long lines, which would have to be broken by indent. Since indent isn't good at breaking lines tastefully, we reluctantly chose four instead. FIXME mention src/scripts/intend-emp, even better: convert it into an indent profile and mention that. If you use Emacs, `stroustrup' indentation style produces satisfactory results. The command `c-set-style', normally bound to `C-c .', lets you set the style for the current buffer. Set variable `c-default-style' to "stroustrup" to switch to this style for all buffers. Tab character use Whether you use tab characters or not doesn't really matter that much, but tab stops are every eight characters, period. Indentation, placement of braces, function name Basic indentation is four spaces. The opening brace goes on the same line as `if', `struct', etc. Put `else' and do loops' `while' one the same line as the closing brace. You are encouraged to leave out syntactically optional braces. Don't indent case labels. Example: if (is_fubar()) despair(); else { do { char ch = getchar(); switch (ch) { case EOF: return; case '\n': break; default: grind_along(ch); } } while (happy()); } In a function definition, the return type, function name and the braces must all start on a new line, unindented, like this: type name(arguments) { body } This does not apply to function declarations. Breaking long lines Line length should not exceed 75 characters. Break such lines at a logical place, preferably before an operator with low precedence. Line up the continuation line with the innermost unclosed parenthesis, if there is one, else indent it four spaces. Example: FIXME Bad example: FIXME Blank lines Use blank lines to separate different things. Functions must be separated by a blank line. You are encouraged to insert a blank line between a block's declarations and statements. Spaces There is a space after `for', `if', and `while'. If `for' and `while' are followed by `;' (empty loop body) on the same line, there is a space before the `;'. There is no space between the function name and the `(', but there is a space after each `,'. There should be no space between type cast operator and operand. There is no extra space after '(' and before ')'. Example: for (p = s; *p; ++p) ; printf("%ld\n", (long)(p-s)); Comments The function comment describing what a function does goes directly above the definition. Comments to the right of code should start in column 32 if possible (counting from zero). Comment lines should be indented exactly like the code the belong to. You are encouraged to format multi-line comments like this: /* * Please use complete sentences, with proper grammer, * capitalization and punctuation. Use two spaces between * sentences. */ But please avoid elaborate boxes like this: /*********************************************************** * Such decorations are a waste of time, hard to edit, and * * ugly, too. * ***********************************************************/ Do not use // C++/C99 comments because they are not portable C89. Conditional compilation Unless the conditional code is very short, please comment it like this: #ifdef FOO ... #endif /* FOO */ #ifdef BAR ... #else /* !BAR */ ... #endif /* !BAR */ Style ----- Names DoNotUseStudlyCaps! Stick to lower_case and use underscores. Upper case is for macros and enumeration constants. File names should not differ in case only, since not all file systems distinguish case. Best stick to lower case. Avoid funny characters in file names. This includes space. Preprocessor Like many good things, the preprocessor is best used sparingly. Especially conditional compilation. Do not use the preprocessor to `improve' the syntax of C! Macro names should be ALL_CAPS to make them stand out. Otherwise, they can be mistaken for objects or functions. This is bad, because `interesting' behavior can hide inside macros, like not evaluating arguments, or changing them. Exception: if a function-like macro behaves exactly like a function, then you may use a lower case name. Parameters that can take expression arguments must be parenthesized in the expansion. If the expansion is an expression, it should be parenthesized as well. You are encouraged to use enumeration constants instead of macros when possible. The existing code doesn't, but it makes sense all the same. Comments Every file should have a file comment FIXME Every function should have a function comment that describes what it does. FIXME elaborate. Writing such comments helps both future maintainers and yourself: if it's too hard to write a concise function comment, then your function is likely too complicated and could use a redesign. The existing code has very little useful comments, and it'll likely take us years to fix it. Please don't make it harder than it already is. Declarations Do not declare system functions yourself; include the appropriate system header. Use prototypes, not old-style function declarations. To get the best use of C's type checking, each function or variable with external linkage should have exactly one declaration, in some header file, and that declaration must be in scope at the definition. No other declarations should exist. In particular, please include the appropriate header instead of just declaring stuff again. The code used to be full of this kind of junk, and it was quite some work to clean it up. Forward declarations of static functions should all go in one place near the beginning of the file. If you want a typedef name in addition to a structure or union tag, give it the same name, like this: typedef struct foo foo; Yes, this is incompatble with C++. Reducing the number of names for the same thing is more useful than compatibility to a programming language we don't use. Please don't hide away pointers with typedefs, like this: typedef struct foo *foop; When I see `foo *', I *know* what it is. When I see `foop', I have to look it up. Booleans Do not use increment operators to set a variable to logical true! It obfuscates the purpose, and narrow variables could even overflow. Just assign 1. A lot of cleanup remains to be done there. Type casts Casting pointers to and from `void *' clutters the code and serves no purpose (repeat after me: C is not C++). It is also unsafe, because the cast suppresses type warnings. Casting function values to void clutters the code and serves little purpose (repeat after me: C is not Pascal). Casting an argument to the type of the parameter clutters the code and serves no purpose (repeat after me: ISO C is not K&R C). Factor out common code Do not gratuitiously duplicate code! Ken Stevens said it well, and it's as relevant as ever: Cut-and-Paste coding is by far the biggest problem that the current Empire suffers from. This is how cut-and-paste coding happens. Joe Shmuck decides that he wants to add a new function to the server. So he goes hunting through the server to find some already existing code which does something similar to what he wants to do. This is good. You should always write new code to imitate old code. What is bad is when Joe Shmuck decides to simply "copy" 200 lines of code from the old function into his new function, and then just change a couple of little things here and there to suit his needs. This method, known as Cut-and-Paste coding is the fastest and easiest way to code. However, it results in a server that is impossible to maintain. What Joe _should_ have done, is "move" the 200 lines of code into a new _third_ function which the first two both call. This is called writing a "general solution" to handle both cases. Most of my work in the Empire2 project consisted in cleaning up after a bunch of Joe Shmucks. Portability ----------- FIXME C89, POSIX FIXME sizes, printf formats FIXME reserved names FIXME conditional compilation is a last resort Robustness ---------- Error checking Check error conditions meticulously. The existing code is bad enough at this, please don't make it worse. FIXME what to do on error Sanity checking In many places, the code checks for conditions that should not happen, and then tries to recover. This is sound defensive programming. Please use CANT_HAPPEN() and CANT_REACH() for this purpose, because they log the error condition, and optionally abort the program, or write out a core dump. This is called an "oops". Buffer overruns FIXME Thread safety Thread stacks cannot grow. If you use too much automatic storage, you can overrun the stack with disastrous consequences. FIXME elaborate Empire uses non-preemptive threads: a thread runs until it yields the processor. This simplifies programming, but there are still traps to avoid. Yielding the processor is an explicit thread operation, and whether a thread operation yields is documented in empthread.h. However, the operation may be buried in library code. In particular, player input may yield. Player output may yield only if it goes to the current player, and his last command doesn't have the C_MOD flag. You can use this to simplify command code: set C_MOD if you don't want to worry about yield on output. This is commonly done for commands that modify game state. Be careful with static storage. Read-only data is fine. More than one thread writing static data is problematic, but there are safe uses. For instance, a static scratch buffer that is never used across `yields' is safe. Yielding the processor invalidates *all* the game state you copied into variables. You have to reread and possibly recheck. See below. Accessing game state Game state consists of several sets of objects, e.g. sectors, ships, bmaps, ... Each set is fully held in memory and backed by a file in the data directory. There are several ways to access an object in set FOO: * You can get a pointer to the object (not a copy!) with getFOOp(). This is inherently dangerous, because it allows you to update the object in memory without updating the disk file. It's used mostly in update code, where it pays major performance dividends, but there are uses in other places. Whether they are wise is debatable. Obviously, the object can change when you let another thread run. This is not an issue in update code, because it never yields. * You can get a copy with getFOO() and write it back with putFOO(). Putting updates both the object in memory and the disk file. Any change to the object invalidates the copy. Putting such an invalid copy will clobber the change(s) that invalidated it, possibly resulting in corrupted game state. The code oopses on such puts, but it can't repair the damage. There are two common ways to invalidate a copy: calling a function that updates the object you copied (unless it does that through your copy), and yielding the processor, which lets other threads update the object you copied. Therefore, you have to reget after a possible invalidation, and deal with changes. In particular, if you checked whether the object is suitable for a task, you need to check again after regetting it. If you can afford to bail out when something changed, use check_FOO_ok(). Function comments should state what objects the function can update. Unfortunately, they generally don't. It's best to keep puts close to gets, both in runtime and in the source code. * Bmaps have special access functions. Each country has two bmaps: the working bmap and the true bmap. Unfortunately, the bmap code calls the former `bmap' and the latter `map'. You update bmaps with map_set(). This doesn't write through to the file; instead, it returns non-zero when the update changed anything. In that case, you must write bmaps to disk with writemap() before you yield. If you only updated the working bmap, then you can call writebmap() instead. Reading player input Reading player input can fail, and you must check for that. Neglecting it can break the interrupt feature for players (normally Ctrl-C), and produce extra prompts that could confuse clients. Even worse is neglecting it in a loop that terminates only when input is read successfully. When reading input fails, you should normally abort the command with status RET_SYN. There are exceptions, e.g. when aborting a pinpoint bombing run over the target. Some functions to read player input return a special error value you can check, e.g. recvclient(), prmptrd() and uprmptrd() return a negative value, getstring() and getstarg() return NULL. Others return the same error value for failed read and for successful read of input that is invalid. Then you need to check player->aborted; if it is set after a read, the read failed. git --- Commit related changes together, unrelated changes separately. Write meaningfull commit messages. Start with a single short line (ideally less than 50 characters) summarizing the change, followed by a blank line and then a more thorough description. The purpose of the change log is not to explain how the code works; that should be done in the source code itself. It's to explain *why* you made the change, and what is affected by it. Don't change whitespace gratuitiously, say just because your editor screws up tabs. Such changes make it much harder to figure out who changed what and when. Historical guidelines, superseded by the above Remarks from Dave Pare: And now, a few editorial remarks on my views on Good Techniques for Modifying Existing Software: My safari through the hot and bug-ridden jungle that Empire has become has given me some new insights on the modification of large software packages. My adventure has prompted me to propose some simple coding guidelines for those dreaming of hacking on the code: 1) Be invisible. When you make a change, think not about marking your place in history, or about showing off how much nicer your two-space tabs are than those old, icky eight-space tabs that the stupid empire hackers of old used, but think instead about the asethetics of the whole. The resulting lines of code should flow smoothly from the beginning of the procedure to the end. Empire is 60,000 lines of code. If you're the general case, you're only changing fifty lines, so be aware of that. 2) Procedurize. If you find yourself in a triple-nested loop, or a five hundred line procedure, perhaps it's because you need to split some of that code off into a procedure. In general, if you've got more than two levels of tabs (or indentations), or more than one or two major loops in a procedure, something may well be amiss. Sasha Mikheev on indentation: The empire indentation style can be achived by using indent -orig -i8 foo.c or in old c-mode emacs (versions before 19.29): ;add this to .emacs (setq c-indent-level 8) (setq c-continued-statement-offset 8) (setq c-argdecl-indent 8) (setq c-brace-offset -8) (setq c-label-offset -8) Further comments by Ken Stevens: 1) Global variables The only variables which should be global are constants. If you write a routine which changes a global variable, then you will corrupt the data when two different players run that routine at the same time. 2) Portability. The following code: char a; #ifdef FOO unsigned char b; #endif FOO Should be rewritten as: s_char a; #ifdef FOO u_char b; #endif /* FOO */ AIX has different conventions for signed chars, and IRIX requires the /* comments */ after #endif. 3) Cut-and-Paste Cut-and-Paste coding is by far the biggest problem that the current Empire suffers from. This is how cut-and-paste coding happens. Joe Shmuck decides that he wants to add a new function to the server. So he goes hunting through the server to find some already existing code which does something similar to what he wants to do. This is good. You should always write new code to imitate old code. What is bad is when Joe Shmuck decides to simply "copy" 200 lines of code from the old function into his new function, and then just change a couple of little things here and there to suit his needs. This method, known as Cut-and-Paste coding is the fastest and easiest way to code. However, it results in a server that is impossible to maintain. What Joe _should_ have done, is "move" the 200 lines of code into a new _third_ function which the first two both call. This is called writing a "general solution" to handle both cases. Most of my work in the Empire2 project consisted in cleaning up after a bonch of Joe Shmucks. I took repeated code and "consolidated" it into general function libraries. 4) Good style. Just to add to Dave's "Be Invisible" motto, I'd like to give a little example to illustrate some basic do's and don'ts for coding style: The following function has bad style: double att_combat_eff(com,own) struct combat *com; natid own;{ double str; double eff=1.0; if(com->type==EF_SECTOR) { eff=com->eff/100.0; if(com->own==own){ str=com->sct_dcp->d_ostr; }else{ str=com->sct_dcp->d_dstr;eff=2.0+(str-2.0)*eff; }else if(com->type==EF_SHIP&&com->own!=own) eff=(1.0+com->shp_mcp->m_armor/100.0); return eff;} Here is the same function written with "good" style: double att_combat_eff(com, own) struct combat *com; natid own; { double eff = 1.0; double str; if (com->type == EF_SECTOR) { eff = com->eff / 100.0; if (com->own == own) str = com->sct_dcp->d_ostr; else str = com->sct_dcp->d_dstr; eff = 2.0 + (str - 2.0) * eff; } else if (com->type == EF_SHIP && com->own != own) eff = (1.0 + com->shp_mcp->m_armor / 100.0); return eff; } These are all the things I fixed when changing the bad to the good: - Function names should always start a new line (so you can search for them) - There should always be a space after a "," - Function arguments should be indented 8 spaces - There should always be a tab after a type declaration - Opening function bracket should be on a line by itself - Indentation should be 8 spaces - There should always be a space on both sides of every operator - There should always be a space after if, for, while, switch - The opening bracket should be on the same line as the if - There should always be a space on either side of a { - There should always be a new line after a ; - The closing function bracket should be on a line by itself