Idle timeout used to expire max_idle minutes after the last
player->curup update. When we got rid of the idle thread in commit
08b94556 (v4.3.20), this got changed to "wait no more than max_idle
minutes for input". Time spent computing and time spent blocked on
output no longer counts. In particular, a connection can block
indefinitely on output since then. Let's fix that.
Start with basing the input timeout on player->curup again. The
missing output timeout will be added shortly.
Aside: since status() updates player->curup, the idle timer gets reset
when the update aborts a command. Left for another day.
Commit 08b94556 (v4.3.20) added io_open() parameter input_timeout. It
applies to io_input() and, since commit 904822e3, to io_close(). Add
timeout parameters to these functions instead.
Return zero when no input is available, regardless of parameter
waitforinput. Before, it returned -1 with errno set to EAGAIN or
EWOULDBLOCK when not waiting for input. Current callers all wait.
Drop errno from the function's contract, for consistency with
io_output().
Commit 1e1dfc86 (v4.3.23) attempted to do this, but it's flawed.
Server shutdown makes the player command loops terminate. Each player
thread then flushes buffered output and closes the server's end of the
connection. The client eventually gets EOF, and closes its end of the
connection. All is well.
However, if the client sends more input after the server closed its
end of the connection, but before it completed receiving server
output, it gets ECONNRESET, and the remaining output is lost.
Instead of closing the server's end of the connection, we need to shut
down its transmission direction, then wait for the client to close its
end, by receiving client input until EOF. Do that in io_close().
The output flushing in player_login() is now superfluous. Remove it.
Make shutdwn() wait for the io_close() to complete instead of output
queues to drain. Without that, we could still close the server's end
of the connection prematurely, through program termination. Change
player_delete() to keep the player in Players until after io_close()
completes, so that shutdwn() can detect completion.
Timeout during execute gets handled just like an EOF cookie: end the
batch file, resume reading normal commands. That's wrong, we need to
close the connection.
A real EOF is recorded in the player's connection's EOF indicator.
Let's use that for all "connection needs to be closed" conditions, so
they all work the same. Create io_set_eof() to provide access.
Make recvclient() set the player connection's EOF indicator on
timeout. This makes the timeout "stick". Record receipt of an EOF
cookie in new struct player member got_ctld. Also abort the command,
as before. This leaves further interpretation of the EOF cookie to
the command loops.
Make player_main() set the player connection's EOF indicator on
got_ctld. Player connection gets closed on on EOF cookie, as before.
Change execute() to break the batch command loop when got_ctld is set,
then reset it. Ends the batch file on EOF cookie, as before.
Change status() back to checking EOF and error indicators (partial
revert of commit 9c5854c8, v4.3.16), and drop struct player member
eof.
Why upgrade? I'm not a lawyer, but here's my take on the differences
to version 2:
* Software patents: better protection against abuse of patents to
prevent users from exercising the rights under the GPL. I doubt
we'll get hit with a patent suit, but it's a good move just on
general principles.
* License compatibility: compatible with more free licenses, i.e. can
"steal" more free software for use in Empire. I don't expect to steal
much, but it's nice to have the option.
* Definition of "source code": modernization of some details for today's
networked world, to make it easier to distribute the software. Not
really relevant to us now, as we normally distribute full source code.
* Tivoization: this is about putting GPL-licensed software in hardware,
then make the hardware refuse to run modified software. "Neat" trick
to effectively deny its users their rights under the GPL. Abuse was
"pioneered" by TiVo (popular digital video recorders). GPLv3 forbids
it. Unlikely to become a problem for us.
* Internationalization: more careful wording, to harden the license
outside the US. The lawyers tell us it better be done that way.
* License violations: friendlier way to deal with license violations.
This has come out of past experience enforcing the GPL.
* Additional permissions: Probably not relevant to us.
Also include myself in the list of principal authors.
Move call of ioq_makeiov() to its use, because calling it before
empth_select() is racy, as follows.
Player thread flushes output by calling io_output(player->iop, 1).
io_output() sets up iov[] to point to queued output. empth_select()
blocks on output.
Another thread sends a C_FLASH or C_INFORM message to this player.
This calls io_output(p->iop, 0). The output file descriptor has
become writable since the player thread blocked on it, so some output
gets written and dequeued.
The player thread resumes, writes out iov[] and dequeues. Any output
already written by the other thread gets duplicated. If the other
thread's dequeue operation freed struct io buffers, there's use after
free followed by double-free.
Oops when a stale copy is written back, i.e. the processor was yielded
since the copy was made. Such bugs are difficult to spot. Sequence
numbers catch them when they do actual harm (they also catch different
bugs). Generation numbers catch them even when they don't.
New ef_generation to count generations. Call new ef_make_stale() to
increment it whenever the processor may be yielded.
New struct emptypedstr member generation. To conserve space, make it
a bit-field of twelve bits, i.e. generations are only recorded modulo
2^12. Make sure all members of unit empobj_storage share it. It is
only used in copies; its value on disk and in the cache is
meaningless. Copies with generation other than ef_generation are
stale. Stale copies that are a multiple of 2^12 generations old can't
be detected, but that is sufficiently improbable.
Set generation to ef_generation by calling new ef_mark_fresh() when
making copies in ef_read() and ef_blank(). nav_ship() and
fltp_to_list() make copies without going through ef_read(), and
therefore need to call ef_mark_fresh() as well. Also call it in
obj_changed() to make check_sect_ok() & friends freshen their argument
when it is unchanged.
New must_be_fresh() oopses when its argument is stale. Call it in
ef_write() to catch write back of stale copies.
Player threads may only sleep under certain conditions. In
particular, they must not sleep while a command is being aborted by
the update or shutdown.
io.c should not know about that. Yet io_output_all() does, because it
needs to give up when update or shutdown interrupt it. The function
was introduced in Empire 2, but it didn't give up then. Fixed in
commit a7fa7dee, v4.2.22. The fix dragged unwanted knowledge of
command abortion into io.c.
To clean up this mess, io_output_all() has to go.
First user is io_write(). io_write() automatically flushes the queue.
In wait-mode, it calls io_output_all() when the queue is longer than
the bufsize, to attempt flushing the queue completely. In
no-wait-mode, it calls io_output() every bufsize bytes. Except the
test for that is screwy, so it actually misses some of the flush
conditions.
The automatic flush makes io_write() differ from io_gets(), which is
ugly. It wasn't present in BSD Empire 1.1. Remove it again, dropping
io_write()'s last argument.
Flush the queue in its callers pr_player() and upr_player() instead.
Provide new io_output_if_queue_long() for them. Requires new struct
iop member last_out to keep track of queue growth. pr_player() and
upr_player() call repeatedly until it makes no more progress. This
flushes a bit less eagerly in wait-mode, and a bit more eagerly in
non-wait mode.
Second user is recvclient(). It needs to flush the queue before
potentially sleeping in io_input(). Do that with a simple loop around
io_output(). No functional change there.
Return number of bytes written on success, -1 on error. In
particular, return zero when nothing was written because the queue was
empty, or because the write slept and got woken up, or because the
write refused to sleep.
Before, it instead returned the number of bytes remaining to be
written when empth_select() failed, when woken up from sleep, or
refusing to sleep. You couldn't tell from the return value whether
the call made progress writing out the queue.
The current callers don't actually notice the change.
Don't set IO_EOF when writev() returns zero. I don't think this could
happen, but it's wrong anyway, because a short write should not stop
future reads.
The blocking I/O option makes no sense in the server, because it
blocks the server process instead of the thread. In fact, it's been
unused since Empire 2, except for one place, where it was used
incorrectly, and got removed in the previous commit.
Make I/O non-blocking in io_open() unconditionally. Remove IO_NBLOCK
and io_noblocking().
Chainsaw used this together with the notify callback to make the iop
data type usable for sockets it listened on, so that io_select() could
multiplex them along with the sockets used for actual I/O.
io_select() became unused in Empire 2, and finally got removed in
commit 875d72a0, v4.2.13. That made the IO_NEWSOCK and the notify
callback defunct. The latter got removed in commit 7d5a6b81, v4.3.1.
pthread.c's empth_select() returned -1 when empth_wakeup() interrupted
select(). The failure then got propagated all the way up, and the
player got logged out. Fix by returning 0 in that case. While there,
retry on EINTR, to match LWP. Also clarify comments.
Commit 08b94556 introduced the timeout parameter. The empthread
implementation could change it, at least on some systems, and its user
worked around a possible change. However, that behavior was not
documented, and it's inconvenient. Fix the pthread implementation,
and remove the workaround.
Remove the KillIdle thread. Add timeout to struct iop, initialized in
io_open(). Obey it in io_input() by passing it to empth_select(). If
empth_select() times out, report that back through io_input() to
recvclient() and player_login(). If player_login() receives a timeout
indication, print a message and terminate the session. If
recvclient() receives a timeout indication, flash a message to the
player and initiate a shut down the player's session.
Create WIN32 sys/time.h to define struct timeval. This creates some
conflicts with WIN32 windows.h definitions. Including windows.h in
show.c and info.c creates conflicts, so remove that. Modify service.c
to include sys/socket.h instead of windows.h to remove the conflict
with sys/time.h.
Move stuff to untangle the ugly cyclic dependencies between the
archives built for selected subdirectories of src/lib/:
* Move common/io.c to empthread/ because it requires empthread stuff
* Move parts of subs/nstr.c to common/nstreval.c to satisfy
common/ef_verify.o
* Move getstarg.c getstring.c onearg.c from gen/ to subs/ because they
require stuff from there
* Move bridgefall.c check.c damage.c empobj.c journal.c maps.c
sectdamage.c from common/ to subs/ because they require stuff from
there
* Move cnumb.c from subs/ to common/ to satisfy common/type.o
* Move log.c fsize.c from common/ to gen/ because they really belong
there
* Move emp_config.c mapdist.c from gen/ to common/ because they really
belong there, and require stuff from libglobal.a
Also package as/ as libas.a to satisfy common/path.o.
Remaining dependencies:
lib needs
--------------------------------------------
libas.a libglobal.a
libcommon.a libas.a libglobal.a libgen.a
libgen.a
libglobal.a
liblwp.a libgen.a
libw32.a[*] libgen.a
[*] Except for service.o, which can only be linked into the server
Link order now: liblwp.a libcommon.a libas.a libgen.a libglobal.a
libw32.a. The position of libw32.a is not quite right, but works
anyway.