The tech center doesn't have enough workers to use all materials in
some updates. How much get made depends on a die roll then. Tech
variations are inconvenient because they ripple through the rest of
the smoke test.
Make news item merging deterministic and safe for year 2038
News reporting merges news items into recent items with same contents.
For that purpuse, we keep a small cache of recent items. When a new
item can't be merged into an item in the cache, the oldest item gets
evicted to make space for the new one.
ncache() evicts the first item with the smallest timestamp (struct
nwsstr member nws_when). Timestamps are in seconds, therefore clashes
are common, and eviction depends on exact timing. Such indeterminism
can make the smoke test fail.
Moreover, ncache() assumes timestamps cannot exceed 0x7fffffff. If
they do, it always evicts the slot 0. They will in 2038.
Fix by evicting round robin. This always evicts the oldest item.
The airfield is a sector taken from player 8. How many updates it
takes to convert is highly variable. If it converts late, the
airfield may not be constructed in time. This is currently the case
for me.
Move the airfield to a more dependable sector.
For me, the smoke test now fails frequently, because of differences in
news. To be fixed next.
Just a smoke test so far, extracted from src/scripts/nightly/. This
makes the existing smoke test more easily accessible. Noteworthy
differences:
* Instead of patching the code to make output more stable, postprocess
the output to normalize it.
* Compare actual results to expected results instead of the previous
test run's results.
* Much faster. The old test harness used sleep liberally to "ensure"
things always happen in the same order.
Known shortcomings:
* The smoke test hangs when the server fails to complete startup, or
fails to terminate.
* Normalization of xdump hardcodes columns instead of getting them
from xdump meta.
* Normalization of time values in xdump is an ugly hack.
* xdump meta column type isn't normalized. Actual values can vary
between systems, because the width of enumeration types is
implementation-defined. The smoke test works only when they're
represented as int, which is the case on common systems.
* Currently expected to work only with thread package LWP and a
random() that behaves exactly like the one on my development system,
because:
- Thread scheduling is reliably deterministic only with LWP
- The PRN sequence produced by random() isn't portable
- Shell builtin kill appears not to do the job in MinGW
- The Windows server tries to run as service when -d isn't
specified
Further work is needed to address these shortcomings.
Getting C programs behave exactly the same on all systems is hard.
We'll likely run into system-dependent differences that upset the
smoke test. Floating-point computation seems particularly vulnerable.
Instead of updating src/scripts/nightly/ to use "make check", retire
it. It hasn't been used in quite a while. Investing more into our
homegrown auto-builder doesn't make sense, as canned auto-builders
such as Travis CI and Jenkins are readily available.
The shell scripts src/scripts/nightly/tests/?? become Empire batch
files tests/smoke/. The shell scripts are actually shell boilerplate
around Empire batch files. To make sure git recognizes the move, this
commit moves them unchanged. tests/smoke-test strips the boilerplate
before it feeds the batch files to the client. The next commit will
get rid fo that.
Really fix accepting connections from "long" IPv6 address
Commit ee01ac19 (v4.3.23) enlarged player member hostaddr from 32 to
46 characters, but missed natstr member nat_hostaddr. player_main()
copies hostaddr to nat_hostaddr. Can overrun the destination, but
fortunately just into nat_hostname.
Impact:
* Can makes praddr() print only a suffix of the address. Used by play
command, for player messages during login and logout, and for
logging.
* Can make player_main()'s test for "same address as last time" fail,
causing extra "Last connection" messages.
* Matching against econfig key privip is not affected.
Fix setsector not to disclose number of landmines to occupier
When the deity sets the number of mines with setsector, the sector
owner (if any) is told the resulting number of mines. Even for
occupied sectors, where mines belong to the old owner, and thus
shouldn't be disclosed. Oops.
Fix setsector not to tell the sector owner anything then.
Change capital to take a single sector as argument
Capital takes a <SECTS> argument, and picks the first suitable sector
it finds there. It fails if none can be found, or if the first one
found already is the capital (even when more suitable sectors follow).
Has always worked that way, but never documented.
I don't think the search feature is really useful, and documenting it
isn't worth my while. Change the command to take a <SECT> argument
instead, as documented.
Change GODNEWS reports not to affect headlines and relations
Option GODNEWS controls news reports give's N_GIFT, N_TAKE, and edit's
and setsector's N_AIDS, N_HURTS.
They affect news headlines because of their non-zero r_good_will.
N_TAKE and N_HURTS can downgrade relations because of their negative
r_good_will. All tolerable, except N_TAKE has actor and victim
reverted: the deity running the give command is the victim, and the
sector owner is the actor. Because of that, give with a negative
amount downgrades the deity's relations towards the sector owner.
Inappropriate.
Has always been that way. Chainsaw disabled these news at
compile-time; to enable you had to define GODNEWS (not documented
anywhere). Empire 4.2.0 made GODNEWS a proper option, enabled by
default.
Fix flying commands for destination equal to assembly point
bomb, drop, fly, paradrop, recon and sweep fail when given a
destination sector equal to the assembly point. Broken in commit 404a76f7, v4.3.27. Reported by Tom Johnson.
Before that commit, getpath() returned NULL on error, "" when input is
an empty path, "h" when it's coordinates of the assembly point, and a
non-empty path otherwise.
The commit accidentally changed it to return "" instead of "h".
Instead of changing it back, make it return NULL when input is an
empty path, and change bomb() & friends to accept empty flight paths.
This also affects sail: it now fails when you give it an empty path,
just like bomb & friends. Path "h" still works.
Fix portability bug in configure test for Windows API
The test uses an erroneous non-directive within #ifdef _WIN32 to
signal that _WIN32 is defined. Some compilers choke on this even when
_WIN32 isn't defined. Observed with FreeBSD 4.10's gcc 2.95.4.
Broken in commit c02468fd, v4.3.22. Standaline client build already
broken in commit 774b590f, v4.3.17.
Forbid selling units with unsalable cargo, permit selling military
Deities can customize which commodities can be sold in table item.
Default is to allow anything but civilians and military. However,
this applies only to the commodity market, not to the unit market:
cargo of ships and land units is not restricted.
Make the two markets consistent: permit selling military by default,
forbid selling units carrying unsalable commodities. This outlaws
selling units carrying civilians by default.
Assigning to tp->trd_owner is unclean. Can be dropped safely, because
it has no effect: prior check_trade() drops all trades where the
assignment would change anything.
Planes flying one-way with crew or cargo spread plague from their old
base to their new base. Planes dropping cargo spread plague from
their base to the drop's target sector.
msl_equip(), find_escorts() and perform_mission() memset() the plist,
then assign to all members but load. Just zero load instead, like
getilists(), msl_sel() and pln_sel() do.
Flying them to a foreign destination magically changes their
allegiance. Prohibit that.
Equivalent change was already in commit 35887222 (v4.2.17) but got
reverted immediately (commit 20199b22), because fly and drop should
stay consistent with load, which let you give away civilians then. No
more since commit 92a366ce (v4.3.20). This change makes fly and drop
consistent with load again.
Replace pln_onewaymission() by pln_where_to_land()
New function reads and returns target sector/ship. Avoids reading the
target sector unnecessarily. Callers receive the target ship, not
just its number. Next commit will put it to use.
Don't let planes fly to a carrier without sufficient space
We test whether the the carrier has space for each plane individually
instead of whether it has space for all of them. The planes that fit
land, the others abort and get teleported home. Abusable.
pln_oneway_to_carrier_ok() was created in commit 1127762c (v4.2.17) to
fix almost the same bug. It worked fine then, because
fit_plane_on_ship() worked with load counters, and incremented them.
Broken in commit 3e370da5 (v4.3.17), which made fit_plane_on_ship()
count the loaded planes, to permit the removal of load counters. But
unlike load counters, loaded planes don't change during
pln_oneway_to_carrier_ok(). Thus, each plane is checked individually.
Fix by tallying all the planes before checking for space.
Fix tend not to leak which commodities are loaded on friendlies
Tending a negative number of commodities takes from the target ships.
The target ships must be owned. Tend complains when the target
doesn't have the commodity loaded. It does that even for friendly
foreign ships. Don't.
Fix tend from target not to stop on foreign target
Tending a negative number of commodities takes from the target ships.
When a target ship is foreign, tend silently stops. This is wrong.
Fix it to skip foreign target ships instead.
Disable damage to base when missile explodes on launch
When a missile explodes on launch, it has a 33% chance to damage its
base.
Unfortunately, damaging the base breaks callers that call msl_launch()
for each member of a list of missiles created by msl_sel() or
perform_mission(). Damage to the base can damage other missiles
there. Any copies of them in the list become stale. When
msl_launch() modifies and writes back such a stale copy, the damage
gets wiped out, triggering a seqno oops.
Affects missile interdiction and interception using missiles with
non-zero load. Stock game's ABMs have zero load, so interception is
safe there. Relatively harmless in practice. Broken in Empire 2.
Instead of fixing the bug, simply disable damage to the base for now.
March code reads the land units into a land unit list, and writes them
back when it changes them, e.g. when a land unit stops. If a land
unit changes in the land unit file while it is in such a land unit
list, the copy in the land unit list becomes stale, and must not be
used.
To that end, do_unit_move() calls lnd_mar() after prompting for path
or destination. lnd_mar() re-reads all the land units.
Unfortunately, it still writes back stale copies in certain
circumstances. Known ways to trigger such writes:
* Deity loads land unit onto a ship or land unit
* Land unit's crew killed just right, e.g. by collateral damage from
interdiction, followed by additional updates, such as shell fire
damage
* Sector no longer owned or allied, e.g. allied sector captured by an
enemy (own sector would kill or retreat the land unit)
Writing a stale copy wipes out the updates that made the copy stale,
and triggers a seqno mismatch oops. For instance, damage that follows
killing of all crew by collateral damage from interdiction is wiped
out. If no damage follows, we still get a generation oops.
Navigation code reads the ships into a ship list, and writes them back
when it changes them, e.g. when a ship stops. If a ship changes in
the ship file while it is in such a ship list, the copy in the ship
list becomes stale, and must not be used.
To that end, do_unit_move() calls shp_nav() after prompting for path
or destination. shp_nav() re-reads all the ships. Unfortunately, it
still writes back stale copies in certain circumstances. Known ways
to trigger such writes:
* Deity sets a sail path
* Ship's crew gone, e.g. killed by shell fire
* Sector no longer navigable, e.g. harbor shelled down, or bridge
built
Writing a stale copy wipes out the updates that made the copy stale,
and triggers a seqno mismatch oops. For instance, ship damage that
kills all crew while the ship is being navigated gets wiped out.
Fix buffer overruns in fairland for island size zero
Fairland creates islands with size 1 + random() % (2 * is - 1), where
"is" is either chosen by the user (fourth command line argument) or
defaults to half the continent size (second command line argument).
Negative values are silently replaced by zero.
Not only does value zero make no sense, it also breaks the code: the
island size is always one then (because random() % -1 is zero), but
allocate_memory() provides only space for zero sectors in sectx[],
secty[] and sectc[]. This leads to buffer overruns in try_to_grow(),
find_coast(), elevate_land, set_coastal_flags(). Can smash the heap.
Fix by changing the lower bound from zero to one. Diagnosed with
valgrind. Has always been broken.
elevate_land() tests for capital sector in three places. The third
one is broken: half of the test is done even for islands, subscripting
capx[] and possibly capy[] out of bounds. This could screw up
elevation (unlikely) or crash (even less likely). Diagnosed with
valgrind.
Broken since the test was added in Chainsaw 3.12. Parenthesis were
added blindly 4.0.11 to shut up the compiler. Reindentation (commit 9b7adfbe and ef383c06, v4.2.13) made the bug stand out more, but it
still managed to hide in the general ugliness of fairland's code.
Journal login before changing the player thread's name
The journal logs a thread name for each event. The player thread name
changes on entry to the playing phase. Connecting old and new name
isn't as easy as it should be:
Sun Apr 29 12:13:39 2012 Conn29 input coun POGO
Sun Apr 29 12:13:39 2012 Conn29 input pass peter
Sun Apr 29 12:13:39 2012 Conn29 input play
Sun Apr 29 12:13:39 2012 Play#0 login 0 127.0.0.1 armbru
Sun Apr 29 12:15:39 2012 Play#0 logout 0
To connect Conn29 with Play#0, you have to know that country#0 is
named POGO.
Fix that by logging login before the thread name change:
Sun Apr 29 12:17:41 2012 Conn29 input coun POGO
Sun Apr 29 12:17:41 2012 Conn29 input pass peter
Sun Apr 29 12:17:41 2012 Conn29 input play
Sun Apr 29 12:17:41 2012 Conn29 login 0 127.0.0.1 armbru
Sun Apr 29 12:19:41 2012 Play#0 logout 0
Now "Conn29 login 0" makes the connection obvious.
This involves moving journal_login() from player_main() before
empth_set_name() in its caller play_cmd(). Move journal_logout() as
well, for symmetry.
If player_main() fails, we now log login/logout instead of nothing in
the journal. That's okay. Note that before commit c9f21c0e (v4.3.8),
we logged just login then.
Fix arm to require nuke and plane to be in the same sector
It happily arms a plane with a remote nuke. The nuke gets teleported
to the plane when the plane moves (a two-way sortie doesn't count as
move). Broken in 4.3.3. Reported by Harald Katzer.
It was renamed to play_lock because it synchronized not just updates
but also shutdown. Since the previous commit, it again only
synchronizes updates. Rename it back.
Also move its initialization next to shutdown_lock's.
Fix synchronization between shutdown and player threads
shutdwn() sets the EOF indicator, aborts the running command, if any,
forbids sleeping on I/O and wakes up the player thread, for all player
threads in state PS_PLAYING. It takes play_lock to prevent new
commands from running. It then waits up to 3s for player threads to
terminate, by polling player_next(), to let output buffers drain.
Issues:
1. Polling is lame.
2. New player threads can still enter state PS_PLAYING. They'll block
as soon as they try to run a command. Somehwat unclean.
3. We can exit before all player threads left state PS_PLAYING, losing
a treasury update, play time update, and log entries. Could happen
when player threads blocked on output until commit 90b3abc5 fixed
that; its commit message describes the bug's impact in more detail.
Since then, the bug shouldn't bite in practice, because player
threads should leave state PS_PLAYING quickly.
Fix by introducing shutdown_lock: player threads in state PS_PLAYING
hold it shared, shutdwn() takes it exclusive, instead of play_lock.
Takes care of the issues as follows:
3. shutdwn() waits until all player threads left state PS_PLAYING, no
matter how long it takes them.
2. New player threads block before entering state PS_PLAYING.
1. shutdwn() still polls up to 3s for player threads to terminate.
Still lame. Left for another day.
The victim's connection closes without any explanation. Output may be
lost. This is because kill_cmd() kills by calling io_shutdown(),
which shuts down the socket and drains the I/O queues.
How this makes the victim's thread terminate is a bit subtle: shutting
down the socket makes it ready. If the victim's thread is waiting for
I/O, it wakes up. Since all further reads return EOF, and all further
writes fail, the command terminates quickly (short of inifinite loop
bugs), then the command loop, and finally the thread.
To make kill behave more nicely, change kill_cmd() to work exactly
like server shutdown: send a flash message to the victim, set his EOF
indicator, abort the command, forbid sleeping on I/O, wake up the
victim's thread. Just as reliable, but doesn't lose output.
If the victim's client fails to close his connection, the victim's
thread may still linger in state PS_SHUTDOWN for up to
login_grace_time (default 120s). An attacker could try to use that to
make the server run out of file descriptors or memory, but simply
connecting achieves the same effect more cheaply.
max_idle applies in state PS_PLAYING, login_grace_time before (login,
state PS_INIT) and after (logout, state PS_SHUTDOWN).
Cut login_grace_time to two minutes, from max_idle's 15. Two minutes
is plenty to complete login and logout. Makes swamping the server
with connections slightly harder, as they get dropped faster. While
that makes sense all by itself, the real aim is making increasing
max_idle safe. The next commit will complete that job.
Fix unwanted player thread blocking on output during shutdown
shutdwn() disables blocking on I/O for all player threads in state
PS_PLAYING, by setting struct player member may_sleep to
PLAYER_SLEEP_NEVER. This ensures the player threads complete logout
quickly and reliably. A thread may still block on I/O in io_close()
called from player_delete(), since commit 904822e3, but that's okay,
because it happens after all game state updates.
Bug: if shutdwn() aborts a command, the player thread returns through
dispatch(), which resets may_sleep back to PLAYER_SLEEP_FREELY. Input
can't block regardless, because the EOF indicator is set, but output
can. When it happens, the player thread may not complete logout
before shutdwn() terminates the process.
This can make us lose a treasury update (similar to the bug fixed by
commit bdc1c40f; the relevant bug description is in commit note 6f8ca87f), play time update, and log entries.
How? There are two paths from dispatch() to player_delete(). Here's
the first one:
1. command()
Doesn't print since dispatch() returns 0 when it resets may_sleep
2. player_main()
Loop and call status()
3. status()
If the command set dolcost to a non-trivial amount, print it
Charge dolcost
If player went broke or became solvent, notify him
Charge time used
Return 0, because shutdwn() set the EOF indicator
4. player_main()
Break the loop
Charge time used
print Bye-bye
journal.log the logout
5. play_cmd()
server.log the logout
6. player_login()
Loop
Try to flush output
get EOF, break loop
print so long
call player_delete()
Ways the bug can bite:
A. When we block in 4. print Bye-bye, we can fail to log.
B. When we block in 3. print broke/solvent notification, we can
additionally fail to charge time used.
C. When we block in 3. print dolcost, we can additionally fail to
charge dolcost.
Note: B. and C. couldn't happen before commit bdc1c40f. Instead,
something just like C happened always, whether player thread blocked
or not.
The second path:
1. execute()
Loop and call status()
2. status()
As above
3. execute()
break the loop
4. dispatch()
Continue with the first path
No additional ways to bite.
Fix by avoiding the may_sleep reset when the player thread is on its
way to terminate: may not sleep and has its EOF indicator set.
The output queue flush can block indefinitely. Permits a client to
hog the thread indefinitely by not reading output.
Broken in commit 08b94556 (v4.3.20) "Reimplement max_idle without a
separate thread". Until then, the idle thread aborted a stuck attempt
to flush output.
recvclient() flushes the output queue before receiving input. The
receive obeys max_idle, the flush doesn't.
Broken in commit 08b94556 (v4.3.20) "Reimplement max_idle without a
separate thread". Until then, the idle thread aborted a stuck attempt
to flush output.
Clean up how recvclient() deals with command abortion
We must not block in io_input() after command abortion unblocked
io_output(). Instead of checking player->aborted, compute the
deadline according to player->may_sleep, like we do for io_output().