TREATIES has issues:
* Treaties can cover attack, assault, paradrop, board, lboard, fire,
build (s|p|l|n) and enlist, but not bomb, launch, torpedo and
enlistment centers.
* Usability is very poor. While a treaty is in effect, every player
action that violates a treaty condition triggers a prompt like this:
This action is in contravention of treaty #0 (with Curmudgeon)
Do you wish to go ahead anyway? [yn]
If you decline, the action is not executed. If you accept, it is.
In both cases, your decision is reported in the news.
You cannot get rid of these prompts until the treaty expires.
* Virtually nobody uses them.
* Virtually unused code is buggy code. There is at least one race
condition: multifire() reads the firing sector, ship or land unit
before the treaty prompt, and writes it back after, triggering a
generation oops. Any updates made by other threads while trechk()
waits for input are wiped out, triggering a seqno mismatch oops.
* The treaty prompts could confuse smart clients that aren't prepared
for them. WinACE isn't, but is reported to work anyway at least
common usage. Ron Koenderink (the WinACE maintainer) suspects there
could be a few situations where it will fail.
This feature is not earning its keep. Remove it. Drop command
treaty, consider treaty, offer treaty, xdump treaty, reject treaties.
Output of accept changed, obviously.
Signed-off-by: Markus Armbruster <armbru@pond.sub.org>
Adding or removing a command to/from a test has unfortunate effects:
* Before the previous commit: if the command consumes pseudorandom
numbers, all subsequent users of pseudorandom numbers get different
ones. This has always been a major headache.
* Since the previous commit: all subsequent users of pseudorandom
numbers get different ones whether the command consumes any or not.
That's even worse.
* If the command uses BTUs, subsequent prompts are changed. Not
nearly as bad as the above, but still annoying.
Create a new command __cmd to allow compensating for adding/removing
commands for tests. Throw in the ability to compensate treasury
changes for good measure. Three arguments: command count, BTU use,
money use.
Usage example: say you add a convert command to a test, and it uses 3
BTUs and $15. Then you compensate by adding "__cmd added 1 3 15"
right after it.
The command must not be available unless running_test_suite is on, of
course. Make it require the new player command capability TESTING,
and give that to all players when running_test_suite is on.
The command is intentionally not documented in info. Switch
running_test_suite off for info-test, to hide it (and any future
TESTING commands) from info-test.
Suppress the command counter increment for TESTING commands, so they
can be used without upsetting pseudorandom numbers
Signed-off-by: Markus Armbruster <armbru@pond.sub.org>
Tests need repeatable pseudorandom numbers to yield repeatable
results. We seed the pseudorandom number generator with a fixed value
(emp_server -R) to make it produce the same sequence of numbers every
time. But whenever code exercised by a test is changed to consume
fewer or more of them, all subsequent users get different numbers
regardless. The ensuing test result changes are extremely tedious to
review.
To address this problem, reseed the PRNG with the count of commands
right before executing a command when running_test_suite is on. This
way, the effect of perturbing the PRN sequence lasts only until the
next command.
Note that the next command could be another player's. Doesn't matter.
Adding or removing commands now upsets the PRN sequence even for
commands that don't consume PRNs. The next commit will take care of
that.
Perturbs test results across the board. Hopefully, that'll happen
much less frequently now.
Signed-off-by: Markus Armbruster <armbru@pond.sub.org>
Command prompts show nat_timeused rounded down to minutes. They need
to be normalized, or else tests can fail when they take too long, or
cross midnight. Formatted prompts are normalized correctly (not
actually used since commit 9ca3fa9), but the journal contains raw
prompts. Normalize them, too.
Smoke test's journal.log is affected, because the server charges at
least 15s per login, which adds up into minutes in this test.
Signed-off-by: Markus Armbruster <armbru@pond.sub.org>
The other test output files have fixed names, and having just one with
a name that varies with the test name complicates things with no gain.
Signed-off-by: Markus Armbruster <armbru@pond.sub.org>
Only actofgod-test wants it. Enable it there. The others either
don't want it (fire-test), or don't care.
Signed-off-by: Markus Armbruster <armbru@pond.sub.org>
Commit 37ff377 stripped trailing white space in test output, except
for journal.log, where it stripped it only from player output. This
misses the space preceeding player empty output, and doesn't cover
equally annoying trailing white space elsewhere, such as the space
preceeding empty input and trailing white space in prompts. Testing
the latter could be marginally useful, but let's keep things simple,
and strip all trailing white space for now.
Signed-off-by: Markus Armbruster <armbru@pond.sub.org>
Capturing the client's output tests both client and server, which is
nice. However, player input isn't visible in the resulting file,
which makes it more difficult to understand.
Route player output to journal (econfig key "keep_journal 2"), and
ignore client output.
Separate tests for the client would be useful.
Signed-off-by: Markus Armbruster <armbru@pond.sub.org>
Coordinates run into army when the y coordinate is wider than three
characters. Has always been broken. Insert a separating space.
Signed-off-by: Markus Armbruster <armbru@pond.sub.org>
So you don't have to micromanage workers to maximize useful work.
The previous commit made the problem a bit worse. If you had a few
workers too many before, you perhaps produced an extra unit. Now, you
get to keep the extra work instead. Useless, unless it rolls over.
produce() limits production to how many units the workers can produce,
rounding randomly. It charges work for the units actually produced,
discarding fractions.
If you get lucky with the random rounding, you may get a bit of extra
work done for free. Else, you get to keep the unused work, and may
even be undercharged a tiny bit of work. Has always been that way.
The production command assumes the random rounding rounds up if and
only if the probability to do so is at least 50%. Thus, it's
frequently off by one for sectors producing at their worker limit.
The budget command runs the update code, and is therefore also off by
one, only differently.
Rather annoying for tech and research centers, where a single unit
matters. A tech center with full civilian population can produce 37.5
units in 60 etus. Given enough materials, it'll fluctuate between 37
and 38. Production consistently predicts 38, and budget randomly
predicts either 37 or 38. Both are off by one half the time.
Fix this as follows: limit production to the amount the workers can
produce (no rounding). Work becomes a hard limit, not subject to
random fluctuations. Randomly round the work charged for actual
production. On average, this charges exactly the work that's used.
More importantly, production and budget now predict how much gets
produced more accurately. They're still not exact, as the amount of
work available for production remains slightly random.
This also "fixes" the smoke test on a i686 Debian 6 box for me. The
root problem is that floating-point subexpressions may either be
computed in double precision or extended precision. Different
machines (or different compilers, or even different compiler flags)
may use different precision, and get different results.
Example: producing 108 units at one work per unit, sector p.e. 0.4
needs to charge 108 / 0.4 work. Computed in double precision, this
gets rounded to 270.0, then truncated to 270. In 80 bit extended
precision, it gets rounded to 269.999999999, then truncated to 269.
With random rounding instead of truncation, the probability for a
different result is vanishingly small. However, this commit
introduces truncation in another place. It just happens not to mess
up the smoke test there. I doubt this is the last time this kind of
problem upsets the smoke test.
Broken in commit 3a7d7fa, which enlarged struct natstr member
nat_hostaddr[] from 32 to 46 characters, but neglected to update the
ca_len in nat_ca[]. Consequently, the address is truncated in xdump.
Can also break country * ?ip=... and such, but that's exotic.
emp_server and empdump refuse to start on most big endian hosts,
because ef_verify_config() chokes on mdchr_ca[]:
Config meta uid 0 field type: value 0 is not in symbol table meta-type
Config meta uid 1 field type: value 0 is not in symbol table meta-type
Config meta uid 2 field type: value 0 is not in symbol table meta-type
Config meta uid 3 field type: value 0 is not in symbol table meta-type
Config meta uid 4 field type: value 0 is not in symbol table meta-type
Broken in commit 06a0036 (v4.3.12), which changed struct castr member
ca_type from packed_nsc_type (typedef'ed to char) to enum nsc_type,
but neglected to update the ca_type in mdchr_ca[].
On little endian hosts, the selector reads the least significant byte,
with sign extension. Happens to work, because the type values are all
sufficiently small integers.
On big endian hosts, the selector reads the most signiciant byte.
which is always zero (NSC_NOTYPE). Makes ef_verify_config() fail.
Except when sizeof(enum nsc_notype) == 1. Then selector type works
fine, and ef_verify_config() succeeds, but we run into the next
problem: the same commit also changed member ca_flags from nsc_flags
(typedef'ed to unsigned char) to int without updating the ca_type in
mdchr_ca[]. This breaks "only" xdump meta column flags.
v4.3.12 was released in April 2008. Either nobody has tried to run a
game on a big endian host since, or all who did gave up quietly,
without reporting the problem.
We clearly need to test on a wider range of machines.
Broken in commit 14ea670 (v4.3.8), which changed struct trdstr member
trd_type from char to short, but neglected to update the ca_type in
trade_ca[].
On little endian hosts, the selector reads the least significant byte,
with sign extension. Happens to work, because the type values are all
sufficiently small integers.
On big endian hosts, the selector reads the most signiciant byte,
which is always zero (EF_SECTOR). Messes up xdump trade badly.
Broken in commit 09248d0 (v4.3.8), which changed struct loststr member
lost_type from char to short, but neglected to update the ca_type in
lost_ca[].
On little endian hosts, the selector reads the least significant byte,
with sign extension. Happens to work, because the type values are all
sufficiently small integers.
On big endian hosts, the selector reads the most signiciant byte,
which is always zero (EF_SECTOR). Messes up xdump lost badly. Also
breaks lost * ?type=..., but that's exotic.
Code dealing with money mixes int and long pretty haphazardly.
Harmless, because practical amounts of money fit into int on any
machine capable of running the server. Clean up anyway.
Code dealing with reserves mixes int and long pretty haphazardly.
Harmless, because practical reserves fit easily on any machine capable
of running the server. Clean up anyway.
As long as symbol_by_value(), show_capab() and togg() support only
int, flags need to fit into int.
Not a problem in practice, because no machine capable of running
Empire has int narrower than 32 bits, and 32 bits suffice.
Some flags members are long instead of int: struct lchrstr member
l_flags, struct natstr member nat_flags, struct mchrstr member m_flags
are long. Waste of space on machines with long wider than int.
Change them to int.
Rearrange struct lchrstr and struct natstr to avoid holes.
random() may yield different pseudo-random number sequences for the
same seed on another system. For instance, at least some versions of
MinGW provide a random() in -liberty that differs from traditional BSD
(see commit c8231b12). Rather inconvenient for regression testing.
MT19937 Mersenne Twister is a proven, high-quality PRNG. Actual code
is reference code provided by the inventors[*]. Quick tests show
performance comparable to random().
Like random(), MT is not cryptographically secure: observing enough of
its output permits guessing its state, and thus its future output. I
don't think players can do that.
Drop the copy of BSD random() we added for Windows.
Like the previous commit, this changes the server's die rolls, and
makes fairland create a different random map for the same seed. Update
expected smoke test results accordingly.
[*] mt19937ar.sep.tgz downloaded from
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html
"random() % n" is sound only when n is a power of two. The error is
hardly relevant in Empire, because random() yields 31 bits, and our n
are always much smaller than 2^31. Fix it anyway.
Use smallest the 2^m >= n instead of n, and discard numbers exceeding
n.
Bonus: faster for me even in the worst case n = 2^m+1.
Like the recent change to damage(), this changes some of the server's
die rolls, only this time the effect is pretty pervasive. Worse,
fairland now creates a completely different random map for the same
seed. Update expected smoke test results accordingly.
Turns damage() into a one-liner.
damage() now uses random() % 32768 in chance() instead of random() %
100 inline, therefore can round differently for the same pseudo-random
number. Update expected smoke test results accordingly.
Aside: "random() % n" distributes evenly only when n is a power of
two. 100 isn't. However, because random() yields at least 31 bits,
and 100 is so much smaller than 2^31, the error is vanishingly small.
The tech center doesn't have enough workers to use all materials in
some updates. How much get made depends on a die roll then. Tech
variations are inconvenient because they ripple through the rest of
the smoke test.
Too easily upset by random variations. Kill them off with anti after
two updates, and occupy with a few more military.
While there, enlist the military in a highway rather than an lcm
plant.
The airfield is a sector taken from player 8. How many updates it
takes to convert is highly variable. If it converts late, the
airfield may not be constructed in time. This is currently the case
for me.
Move the airfield to a more dependable sector.
For me, the smoke test now fails frequently, because of differences in
news. To be fixed next.
Just a smoke test so far, extracted from src/scripts/nightly/. This
makes the existing smoke test more easily accessible. Noteworthy
differences:
* Instead of patching the code to make output more stable, postprocess
the output to normalize it.
* Compare actual results to expected results instead of the previous
test run's results.
* Much faster. The old test harness used sleep liberally to "ensure"
things always happen in the same order.
Known shortcomings:
* The smoke test hangs when the server fails to complete startup, or
fails to terminate.
* Normalization of xdump hardcodes columns instead of getting them
from xdump meta.
* Normalization of time values in xdump is an ugly hack.
* xdump meta column type isn't normalized. Actual values can vary
between systems, because the width of enumeration types is
implementation-defined. The smoke test works only when they're
represented as int, which is the case on common systems.
* Currently expected to work only with thread package LWP and a
random() that behaves exactly like the one on my development system,
because:
- Thread scheduling is reliably deterministic only with LWP
- The PRN sequence produced by random() isn't portable
- Shell builtin kill appears not to do the job in MinGW
- The Windows server tries to run as service when -d isn't
specified
Further work is needed to address these shortcomings.
Getting C programs behave exactly the same on all systems is hard.
We'll likely run into system-dependent differences that upset the
smoke test. Floating-point computation seems particularly vulnerable.
Instead of updating src/scripts/nightly/ to use "make check", retire
it. It hasn't been used in quite a while. Investing more into our
homegrown auto-builder doesn't make sense, as canned auto-builders
such as Travis CI and Jenkins are readily available.
The shell scripts src/scripts/nightly/tests/?? become Empire batch
files tests/smoke/. The shell scripts are actually shell boilerplate
around Empire batch files. To make sure git recognizes the move, this
commit moves them unchanged. tests/smoke-test strips the boilerplate
before it feeds the batch files to the client. The next commit will
get rid fo that.