Fix buffer overruns in fairland for island size zero
Fairland creates islands with size 1 + random() % (2 * is - 1), where
"is" is either chosen by the user (fourth command line argument) or
defaults to half the continent size (second command line argument).
Negative values are silently replaced by zero.
Not only does value zero make no sense, it also breaks the code: the
island size is always one then (because random() % -1 is zero), but
allocate_memory() provides only space for zero sectors in sectx[],
secty[] and sectc[]. This leads to buffer overruns in try_to_grow(),
find_coast(), elevate_land, set_coastal_flags(). Can smash the heap.
Fix by changing the lower bound from zero to one. Diagnosed with
valgrind. Has always been broken.
elevate_land() tests for capital sector in three places. The third
one is broken: half of the test is done even for islands, subscripting
capx[] and possibly capy[] out of bounds. This could screw up
elevation (unlikely) or crash (even less likely). Diagnosed with
valgrind.
Broken since the test was added in Chainsaw 3.12. Parenthesis were
added blindly 4.0.11 to shut up the compiler. Reindentation (commit 9b7adfbe and ef383c06, v4.2.13) made the bug stand out more, but it
still managed to hide in the general ugliness of fairland's code.
Journal login before changing the player thread's name
The journal logs a thread name for each event. The player thread name
changes on entry to the playing phase. Connecting old and new name
isn't as easy as it should be:
Sun Apr 29 12:13:39 2012 Conn29 input coun POGO
Sun Apr 29 12:13:39 2012 Conn29 input pass peter
Sun Apr 29 12:13:39 2012 Conn29 input play
Sun Apr 29 12:13:39 2012 Play#0 login 0 127.0.0.1 armbru
Sun Apr 29 12:15:39 2012 Play#0 logout 0
To connect Conn29 with Play#0, you have to know that country#0 is
named POGO.
Fix that by logging login before the thread name change:
Sun Apr 29 12:17:41 2012 Conn29 input coun POGO
Sun Apr 29 12:17:41 2012 Conn29 input pass peter
Sun Apr 29 12:17:41 2012 Conn29 input play
Sun Apr 29 12:17:41 2012 Conn29 login 0 127.0.0.1 armbru
Sun Apr 29 12:19:41 2012 Play#0 logout 0
Now "Conn29 login 0" makes the connection obvious.
This involves moving journal_login() from player_main() before
empth_set_name() in its caller play_cmd(). Move journal_logout() as
well, for symmetry.
If player_main() fails, we now log login/logout instead of nothing in
the journal. That's okay. Note that before commit c9f21c0e (v4.3.8),
we logged just login then.
Fix arm to require nuke and plane to be in the same sector
It happily arms a plane with a remote nuke. The nuke gets teleported
to the plane when the plane moves (a two-way sortie doesn't count as
move). Broken in 4.3.3. Reported by Harald Katzer.
It was renamed to play_lock because it synchronized not just updates
but also shutdown. Since the previous commit, it again only
synchronizes updates. Rename it back.
Also move its initialization next to shutdown_lock's.
Fix synchronization between shutdown and player threads
shutdwn() sets the EOF indicator, aborts the running command, if any,
forbids sleeping on I/O and wakes up the player thread, for all player
threads in state PS_PLAYING. It takes play_lock to prevent new
commands from running. It then waits up to 3s for player threads to
terminate, by polling player_next(), to let output buffers drain.
Issues:
1. Polling is lame.
2. New player threads can still enter state PS_PLAYING. They'll block
as soon as they try to run a command. Somehwat unclean.
3. We can exit before all player threads left state PS_PLAYING, losing
a treasury update, play time update, and log entries. Could happen
when player threads blocked on output until commit 90b3abc5 fixed
that; its commit message describes the bug's impact in more detail.
Since then, the bug shouldn't bite in practice, because player
threads should leave state PS_PLAYING quickly.
Fix by introducing shutdown_lock: player threads in state PS_PLAYING
hold it shared, shutdwn() takes it exclusive, instead of play_lock.
Takes care of the issues as follows:
3. shutdwn() waits until all player threads left state PS_PLAYING, no
matter how long it takes them.
2. New player threads block before entering state PS_PLAYING.
1. shutdwn() still polls up to 3s for player threads to terminate.
Still lame. Left for another day.
The victim's connection closes without any explanation. Output may be
lost. This is because kill_cmd() kills by calling io_shutdown(),
which shuts down the socket and drains the I/O queues.
How this makes the victim's thread terminate is a bit subtle: shutting
down the socket makes it ready. If the victim's thread is waiting for
I/O, it wakes up. Since all further reads return EOF, and all further
writes fail, the command terminates quickly (short of inifinite loop
bugs), then the command loop, and finally the thread.
To make kill behave more nicely, change kill_cmd() to work exactly
like server shutdown: send a flash message to the victim, set his EOF
indicator, abort the command, forbid sleeping on I/O, wake up the
victim's thread. Just as reliable, but doesn't lose output.
If the victim's client fails to close his connection, the victim's
thread may still linger in state PS_SHUTDOWN for up to
login_grace_time (default 120s). An attacker could try to use that to
make the server run out of file descriptors or memory, but simply
connecting achieves the same effect more cheaply.
max_idle applies in state PS_PLAYING, login_grace_time before (login,
state PS_INIT) and after (logout, state PS_SHUTDOWN).
Cut login_grace_time to two minutes, from max_idle's 15. Two minutes
is plenty to complete login and logout. Makes swamping the server
with connections slightly harder, as they get dropped faster. While
that makes sense all by itself, the real aim is making increasing
max_idle safe. The next commit will complete that job.
Fix unwanted player thread blocking on output during shutdown
shutdwn() disables blocking on I/O for all player threads in state
PS_PLAYING, by setting struct player member may_sleep to
PLAYER_SLEEP_NEVER. This ensures the player threads complete logout
quickly and reliably. A thread may still block on I/O in io_close()
called from player_delete(), since commit 904822e3, but that's okay,
because it happens after all game state updates.
Bug: if shutdwn() aborts a command, the player thread returns through
dispatch(), which resets may_sleep back to PLAYER_SLEEP_FREELY. Input
can't block regardless, because the EOF indicator is set, but output
can. When it happens, the player thread may not complete logout
before shutdwn() terminates the process.
This can make us lose a treasury update (similar to the bug fixed by
commit bdc1c40f; the relevant bug description is in commit note 6f8ca87f), play time update, and log entries.
How? There are two paths from dispatch() to player_delete(). Here's
the first one:
1. command()
Doesn't print since dispatch() returns 0 when it resets may_sleep
2. player_main()
Loop and call status()
3. status()
If the command set dolcost to a non-trivial amount, print it
Charge dolcost
If player went broke or became solvent, notify him
Charge time used
Return 0, because shutdwn() set the EOF indicator
4. player_main()
Break the loop
Charge time used
print Bye-bye
journal.log the logout
5. play_cmd()
server.log the logout
6. player_login()
Loop
Try to flush output
get EOF, break loop
print so long
call player_delete()
Ways the bug can bite:
A. When we block in 4. print Bye-bye, we can fail to log.
B. When we block in 3. print broke/solvent notification, we can
additionally fail to charge time used.
C. When we block in 3. print dolcost, we can additionally fail to
charge dolcost.
Note: B. and C. couldn't happen before commit bdc1c40f. Instead,
something just like C happened always, whether player thread blocked
or not.
The second path:
1. execute()
Loop and call status()
2. status()
As above
3. execute()
break the loop
4. dispatch()
Continue with the first path
No additional ways to bite.
Fix by avoiding the may_sleep reset when the player thread is on its
way to terminate: may not sleep and has its EOF indicator set.
The output queue flush can block indefinitely. Permits a client to
hog the thread indefinitely by not reading output.
Broken in commit 08b94556 (v4.3.20) "Reimplement max_idle without a
separate thread". Until then, the idle thread aborted a stuck attempt
to flush output.
recvclient() flushes the output queue before receiving input. The
receive obeys max_idle, the flush doesn't.
Broken in commit 08b94556 (v4.3.20) "Reimplement max_idle without a
separate thread". Until then, the idle thread aborted a stuck attempt
to flush output.
Clean up how recvclient() deals with command abortion
We must not block in io_input() after command abortion unblocked
io_output(). Instead of checking player->aborted, compute the
deadline according to player->may_sleep, like we do for io_output().
Fix player_login() to obey max_idle for output, too
player_login() flushes the output queue before receiving input. The
receive obeys max_idle, the flush doesn't. Which means a client could
hog the thread indefinitely.
Broken in commit 08b94556 (v4.3.20) "Reimplement max_idle without a
separate thread". Until then, the idle thread aborted a stuck attempt
to flush output.
A client can delay thread exit indefinitely by not reading output.
Broken in commit 08b94556 (v4.3.20) "Reimplement max_idle without a
separate thread". Until then, the idle thread aborted a stuck attempt
to flush output.
Denial of service seems possible.
Note that commit 904822e3 moved flushing the output queue from
player_login() to io_close(). It also made io_close() wait for the
client to close the connection. That wait obeys the deadline.
The nightly build uses a gross hack to keep timestamps stable: it
replaces the system's time() by emp_time(). It doesn't replace other
time-related functions such as gettimeofday(). Works as long as we
don't mix hacked time with unhacked time.
The previous commit compares time gotten from gettimeofday() with time
gotten from time(). The nightly build's I/O timeouts become zero,
which makes login impossible.
Base idle timeout on player->curup again, not current time
Idle timeout used to expire max_idle minutes after the last
player->curup update. When we got rid of the idle thread in commit 08b94556 (v4.3.20), this got changed to "wait no more than max_idle
minutes for input". Time spent computing and time spent blocked on
output no longer counts. In particular, a connection can block
indefinitely on output since then. Let's fix that.
Start with basing the input timeout on player->curup again. The
missing output timeout will be added shortly.
Aside: since status() updates player->curup, the idle timer gets reset
when the update aborts a command. Left for another day.
Replace the per-iop input_timeout by per-function timeouts
Commit 08b94556 (v4.3.20) added io_open() parameter input_timeout. It
applies to io_input() and, since commit 904822e3, to io_close(). Add
timeout parameters to these functions instead.
Clean up io_input() for the "no input available" case
Return zero when no input is available, regardless of parameter
waitforinput. Before, it returned -1 with errno set to EAGAIN or
EWOULDBLOCK when not waiting for input. Current callers all wait.
Drop errno from the function's contract, for consistency with
io_output().
Remote hole, can smash the stack. Additionally, the confirmation
prompt is misleading when the player supplies conditionals. Redesign
the flawed prompt.
Broken when Chainsaw added the confirmation prompt. Reported by Scott
C. Zielinski.
If execute's argument was read interactively, it prints the argument
instead of the command. Else, it prints uninitialized garbage. The
latter can theoretically crash the server, or leak information.
Fix uninitialized variable use in defending land unit retreat
lnd_take_casualty() uses uninitialized rsect to compute the mobility
cost of retreating a defending land unit. This can charge incorrect
mobility, prevent retreat, or, if the stars align just right, crash
the server when sector_mcost() subscripts dchr[] with it.
Broken in commit 4e7c993a, v4.3.6. Reported by Scott C. Zielinski.
play_cmd() needs to return in a state that makes player_login() break
the login command loop.
play_cmd() assumes player_main() always returns in such a state:
connection's EOF indicator set. Unfortunately, the assumption is
wrong. Fortunately, play_cmd() checks it, oopses and recovers.
player_main() can return with neiter error nor EOF indicator set,
e.g. when the game is down.
player_main() can return with just the error indicator set. For
instance, when the client dies, io_input() detects ECONNRESET, sets
the error indicator and fails.
Broken in commit 8549efbc. Fix by setting the EOF indicator silently
(without oopsing) when necessary.
Drop C_MOD flag from commands that "obviously" don't need it
C_MOD prevents print functions from blocking. The common reason for
wanting non-blocking prints are "read, print, write back" patterns.
There might be other reasons lurking in the code, and that's why
messing with C_MOD is scary.
Nevertheless, drop it from cutoff, headlines, land, lstat, motd,
neweff, payoff, qorder, sorder, and wall. These commands have had
C_MOD "forever", even though they clearly don't modify game state.
They're all pretty simple. Similar commands such as census, news,
ship, plane, nuke, sstat, pstat, flash never had C_MOD.
Create new command capability NONVIS. Give it to players in any state
except visitors (and STAT_UNUSED, but those must not exist). This
makes it possible to have commands available to anyone but visitors.
Command change fails when the player is a visitor. Simply make it
unavailable instead, by requiring NONVIS.
Make read unavailable to visitors, because it's useless: visitors
can't receive telegrams (typed_wu() fails).
Make census, commodity and sinfra unavailable to visitors. Visitors
don't normally have sectors.
Make map and nmap unavailable to visitors. Visitors don't have sectors,
so their maps are always empty.
Make them unavailable to new players (between add and newcap) and
players in sanctuary, too. This is consistent with all the other
commands to examine the environment. It also prevents people from
trying multiple unbroken countries in a blitz to find the one with the
nicest vicinity.
Make resource available to new players, for consistency with census
and commodity.
Make country, echo and financial available to anyone.
A player may execute a command when his player->nstat has all the bits
in the command's c_permit.
Normal player commands require bit(2). Command break requires bit(1),
and execute requires bit(5). Deity commands require both bit(2) and
bit(3). Works, because deities always have both bits set in nstat, as
they may execute normal player commands, too. But it's a bit
confusing. Change them to only require their own bit(3).
Don't say selected sectors "show up in reverse video", that depends on
the client. Just say "will be highlighted", like we do for the other
commands that highlight, e.g. info survey.
read can print telegram without header after deletion prompt
rea() loops if more telegrams arrive while we wait for the player to
confirm deletion. If the first new one is a continuation of the last
old one, its header is suppressed. Don't do that.
Fix 'm' in path argument of explore, move, transport
Two related bugs:
* It moans about deprecated argument syntax ('m' without a space
before its argument) even when there's no argument.
* It uses the third instead of second argument for map flags (second
argument is ignored): "m# s" doesn't show ships, and "m# s p" shows
planes instead of ships.
Make market command unavailable to visitors, and set C_MOD flag
The market command executes all trades that have become ready, by
calling check_market() and check_trade(). This modifies game state,
but market lacks C_MOD. That's wrong. But can it do harm?
Turns out yes. check_trade() looks safe, but check_market() telexes
seller and buyer while holding a copy of the commodity struct. If
this telexes the player who sent the market command, and he has
NF_INFORM on, the telegram notification may yield the processor. It
then writes back its copy, triggering a generation oops. Any updates
made by other threads meanwhile are wiped out, triggering a seqno
mismatch oops.
This can cause commodity trades to be executed multiple times,
multiplying the sold commodities. Abuse seems tricky, but possible:
conspiring trade partners trade commodities back and forth to multiply
them. One of them needs to get the output backlog just right to make
the telegram notification yield, and the timing right so that the
MarketUpdate thread or another player's buy, market, reset, sell, set
or trade command runs check_market() before his thread resumes.
Closely related bug: visitors can trigger execution of trades by means
of command market. That's clearly inappropriate.
Broken in Empire 3. Reported by Scott C. Zielinski.
When a ship, plane, land unit or nuke is sold, the seller is replaced
by POGO: POGO gets the money, the telegrams and makes the news.
Likewise when a sale fails because the buyer can't pay.
Broken in commit 94a3108b, v4.3.17. Reported by Scott C. Zielinski.
Fix server shutdown to let player output drain properly
Commit 1e1dfc86 (v4.3.23) attempted to do this, but it's flawed.
Server shutdown makes the player command loops terminate. Each player
thread then flushes buffered output and closes the server's end of the
connection. The client eventually gets EOF, and closes its end of the
connection. All is well.
However, if the client sends more input after the server closed its
end of the connection, but before it completed receiving server
output, it gets ECONNRESET, and the remaining output is lost.
Instead of closing the server's end of the connection, we need to shut
down its transmission direction, then wait for the client to close its
end, by receiving client input until EOF. Do that in io_close().
The output flushing in player_login() is now superfluous. Remove it.
Make shutdwn() wait for the io_close() to complete instead of output
queues to drain. Without that, we could still close the server's end
of the connection prematurely, through program termination. Change
player_delete() to keep the player in Players until after io_close()
completes, so that shutdwn() can detect completion.
quit_cmd() calls io_shutdown() to make player_login()'s next command
read detect EOF. io_shutdown() drains the input queue and shuts down
the socket with shutdown(). player_login()'s next io_gets() fails all
right, but then io_input() *can* read more from the socket on my Linux
box, at least when I send plenty of input fast. Thus, we ignore
whatever input after quit was already queued, then resume reading
commands, not necessarily at the beginning of a line.
Timeout during execute gets handled just like an EOF cookie: end the
batch file, resume reading normal commands. That's wrong, we need to
close the connection.
A real EOF is recorded in the player's connection's EOF indicator.
Let's use that for all "connection needs to be closed" conditions, so
they all work the same. Create io_set_eof() to provide access.
Make recvclient() set the player connection's EOF indicator on
timeout. This makes the timeout "stick". Record receipt of an EOF
cookie in new struct player member got_ctld. Also abort the command,
as before. This leaves further interpretation of the EOF cookie to
the command loops.
Make player_main() set the player connection's EOF indicator on
got_ctld. Player connection gets closed on on EOF cookie, as before.
Change execute() to break the batch command loop when got_ctld is set,
then reset it. Ends the batch file on EOF cookie, as before.
Change status() back to checking EOF and error indicators (partial
revert of commit 9c5854c8, v4.3.16), and drop struct player member
eof.
Make execute yield the processor after every command
A player sending execute script contents quickly could theoretically
get an unfair share of the server. Missed in commit db6fd8da
(v4.2.22), which made only the normal command loop yield.
Don't lose output when client shuts down input transmission
player_login() skips sending C_EXIT and flushing server output when
io_eof() is true. That's the case after a read from the socket
returned zero, which means the client has shut down transmission on
his socket, or closed it. If it's the former, then dropping output
like that is bad. Our client never does that, but others might.
Condition was introduced in Empire 2, don't know why.
status() informs the player of non-trivial command costs and earnings,
and when he goes broke or becomes solvent. However, this is skipped
when the command gets aborted by the player signalling EOF, or by
server shutdown, and after a quit command.
Fix by moving the check for EOF or shutdown down to the may_play_now()
check.
This looks a bit like it would also fix charging of play time. But
that's not broken, because player_main() charges, too.
Fix client's command abort at beginning of first input line
Commit 3cceb59b (v4.3.26) fixed the client to abort commands reliably
on ^C, even when it arrives at the beginning of an input line. Except
it didn't work at the beginning of the first input line, because
input_eol was initialized to zero.
Easily fixed, but "end of line" isn't quite right there. Revert sense
and rename to partial_line_sent.
Simplify journal_entry_pr(), rename to journal_entry_write()
journal_entry_pr(S, N) writes up to N characters from zero-terminated
string S. journal_input() passes -1 for N to write all characters.
Unclean. SIZE_MAX would do, but it's C99, and MSC doesn't provide it.
Simplify journal_entry_pr() to write exactly N characters. This makes
it more similar to write() than to pr(), therefore rename.
drop and fly from carrier can fail to load last civ or mil
pln_equip() refuses to abandon its base sector. Unfortunately, it
checks even when flying off carriers, and refuses to load the last
civilian or military depending on what happens to be in uninitialized
variable sect.
Change read and wire to never delete a corrupt mailbox
Before, rea() deleted the mailbox regardless of errors. Acceptable
only when the user gets a chance to avoid that after the problem is
reported. Not the case for "read y".
Not an issue for announcements, but treat them the same for
simplicity.
Clean up how read deals with new telegram arrivals
Fooling around with the file size is silly. It works only because
read has flag C_MOD set, so they can only arrive while we're sitting
at the delete confirmation prompt, not during reading.
Change wire not to read more after confirming deletion
Telegram deletion deletes the mailbox. If more telegrams arrive while
we wait for the player to confirm deletion, the mailbox again contains
unread telegrams, so we can't just delete it. Instead, rea() loops to
read the new telegrams.
Announcements worked the same until Empire 3 put them in a single file
shared by all. Since then, deleting announcements merely updates
nat_annotim, and there's no need to read new announcements after
getting the player's confirmation. So don't.
Don't lie "You have a new telegram" after read with inform off
Here's how telegram notification works with NF_INFORM off: typed_wu()
increments the telegram recipient's nat_tgms. status(), running right
before command prompts, notifies the player when nat_tgms > 0, and
resets it. Thus, we tell the player how many telegrams arrived since
the previous command prompt.
However, what we really want is something else, namely the number of
"new telegrams waiting". That's what the notification message says,
after all. Telegrams already printed by read shouldn't count, even
when they arrived since the previous command prompt.
Make them not count by clearing pending telegrams on read regardless
of toggle inform.
Fix telegram notifications during read (toggle inform on)
Reset number of pending telegrams before delete prompt instead of
after.
Before, the client claimed pending telegrams at that prompt, because
it wasn't C_INFORMed of the read, yet. Worse, if more telegrams
arrived while sitting at the prompt, the reset clobbered their number
and sent a bogus clear C_INFORM message, effectively hiding the new
arrivals from the player.
Fix pending anno count for annos squashed together
Adjacent announcements are squashed together when the sender is the
same and the timestamp is "close enough". Except typed_wu()
increments natstr member nat_ann regardless. Fix that to work exactly
like nat_tgms.
Fix read not to split production report when update is slow
Adjacent telegrams are squashed together if type and sender are the
same, and the timestamp is "close enough". This is done in two
places: rea() and typed_wu(). They're inconsistent: typed_wu()
ignores the timestamp for production reports since Empire 2, but rea()
doesn't.
Record typed_wu()'s decision in new telstr member tel_cont. Use it in
rea().
typed_wu() counts telegrams to update nat_tgms and, since Empire 2,
send C_INFORM messages. Adjacent telegrams are squashed together if
type and sender are the same, and the timestamp is within TEL_SECONDS.
typed_wu() increments nat_tgms when it sends a telegram that read
doesn't squash into the previous one.
Since Empire 2, it also sends a C_INFORM message then. Inexplicably,
it fails to use the same condition: it tests just new_tele, not
new_tele || np->nat_tgms == 0. C_INFORM messages got missed, until
4.0.18 made rea() call clear_telegram_is_new(). Convoluted.
Send C_INFORM exactly when incrementing nat_tgms, and back out
4.0.18's fix.
Move clear_telegram_is_new() call to beginning of update
The call was added in 4.2.5 "so that the next telegram is flagged as
new and not part of the update". Since the update sends only
TEL_UPDATE telegrams (the previous commit restored that property), and
nothing else does, the next telegram is flagged as new automatically,
except when it's from the next update. Document that, and move the
call to a more natural place.
Ensure the update's production report isn't split by bulletins
During the update, wu() sends TEL_UPDATE telegrams ("Production
Report") instead of TEL_BULLETIN telegrams, but typed_wu() has no such
logic. It's used by tele(), which doesn't run during the update, and
mpr(), which may, e.g. called from ship or mission code used by
autonav or sail. This inserts bulletins in the middle of the
production report, splitting it apart.
Happens since mpr() was added in Empire 2. Before, only tele() used
typed_wu() directly, and everything else wu().
Leave nat_ann, nat_tgm alone and return -1 on all errors. Before,
only failed open was handled that way. Failed write and close were
logged and ignored. While there, improve the log messages a bit.
Note: the return value fix has little effect. It makes tele() log the
failure, which is redundant. Everything else goes through wu() and
ignores the value.
Make add require confirmation for unadvisable actions
Deleting a country in state STAT_SANCT, STAT_ACTIVE or STAT_GOD is
risky, because any references to this country become dangling, which
makes ef_verify() unhappy. For a reason: we may well have code that
isn't prepared for dangling references, and breaks.
Replacing a country that is being used is risky, because it can get us
into weird states. For instance, replacing a player by a visitor can
result in a visitor that owns stuff.
The add command sets nation flags NF_FLASH, NF_BEEP, NF_COASTWATCH,
NF_SONAR, and since 4.2.6 NF_TECHLISTS. POGO is created by the files
utility, which sets only NF_FLASH (since 4.2.2). Change files to
match add.