empserver/tests/normalize.pl
Markus Armbruster 89a06ec2d4 xdump: Make meta-type numbers portable, get rid of meta-type "c"
The xdump field data types are abstract symbols "d", "f", "s" and "c".
However, the abstraction leaks: we dump the enum nsc_type ca_type
values verbatim in meta table field "type", and have symbol table
meta-type map all integer types to "d", and both floating-point types
to "f".  Not a problem for well-behaved clients, since all they do
with the dumped value is referencing table meta-type.  It is a problem
for version-test: since the integer type compatible with an
enumeration type is implementation-defined, the type value of
selectors of enumeration type can vary between compilers.  It also
makes table meta-type a somewhat ugly exception to the rule that a
symbol table maps integers to names 1:1.

Virtual selectors let us seal the abstraction: dump the promoted
ca_type value.

The integer types get all promoted to NSC_LONG.  This takes care of
version-test.

The floating-point types get all promoted to NSC_DOUBLE.  Makes sense.

NSC_STRINGY gets promoted to NSC_STRING.  This changes all field data
types "c" to "s".  Getting rid of "c" is a welcome simplification,
because now the meaning of meta type field "len" no longer depends on
"type", but always means that the array is dumped as that many fields.
We lose string length limit information, though.

Signed-off-by: Markus Armbruster <armbru@pond.sub.org>
2017-08-06 20:09:17 +02:00

141 lines
4.1 KiB
Perl
Executable file

#!/usr/bin/perl
# TODO Don't hardcode xdump columns, get them from xdump meta
use warnings;
use strict;
use Getopt::Std;
$Getopt::Std::STANDARD_HELP_VERSION = 1;
our ($opt_j, $opt_s);
getopts('js')
or die "$0: invalid options\n";
die "$0: either -j or -s, not both\n"
if $opt_j && $opt_s;
my $dow_re = qr/(Sun|Mon|Tue|Wed|Thu|Fri|Sat)/;
my $mon_re = qr/(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)/;
my $tod_re = qr/[0-2][0-9]:[0-5][0-9]:[0-6][0-9]/;
my $year_re = qr/[0-9][0-9][0-9][0-9]/;
my $ctime19_re = qr/$dow_re $mon_re [ 123][0-9] $tod_re/;
my $ctime_re = qr/$dow_re $mon_re [ 123][0-9] $tod_re $year_re/;
my $fmttime2822_re = qr/$dow_re, [0123][0-9] $mon_re $year_re $tod_re [-+][0-9][0-9][0-9][0-9]/;
my $xdfld_re = qr/\([^)]*\)|[^ (][^ ]*/;
# Current dump, if any
# Either zero or the name of the dump we're currently processing.
# Legacy dump names start with an uppercase letter, and xdump names
# start with a lowercase letter.
my $dump = "";
sub norm_ctime {
my ($s) = @_;
$s =~ s/$ctime_re/Thu Jan 1 00:00:00 1970/g;
$s =~ s/$ctime19_re/Thu Jan 1 00:00:00/g;
$s =~ s/$fmttime2822_re/Thu, 01 Jan 1970 00:00:00 +0000/g;
return $s;
}
while (<>) {
chomp;
# Strip log timestamp
if ($opt_j || $opt_s) {
die "$0: malformed line" unless /^$ctime_re /;
$_ = substr($_, 25);
}
# Strip trailing white space
# We don't really care for it in test output, and git's pre-commit
# hook flags it, which is annoying.
s/\s+$//;
# Split off prefix that is not to be normalized
my $pfx = '';
if ($opt_j) {
die "$0: malformed line" unless substr($_, 10, 1) eq ' ';
$pfx .= substr($_, 0, 11);
# Normalize only player output
$_ = substr($_, 11);
if (/(^output [^ ]* 1 )(.*)/) {
$pfx .= $1;
$_ = $2;
} else {
$pfx .= $_;
$_ = '';
}
}
if ($opt_s) {
$_ = norm_ctime($_);
### Host environment in logs
# getrusage() results in server.log
s/(End update|done assembling paths) .* user .* system/$1 0.0 user 0.0 system/;
# PID in server.log
s/(Empire server \(pid) [0-9]+(\) started)/$1 42$2/;
### Harmless races
# shutdown wins race with logout
next if /Waiting for player threads to terminate/;
print "$pfx$_\n";
next;
}
$dump = ""
if ($dump =~ /^[a-z]/
and /^\//)
or ($dump =~ /^[A-Z]/
and (/\: No (sector|ship|plane|unit|nuke)\(s\)|\: Nothing lost/
or /^[0-9]+ (sector|ship|plane|unit|nuke|lost item)/));
### Formatted time
# nat_timeused in prompt
s/^\[[0-9]+(:[0-9]+\] Command \:)/[0$1/;
$pfx =~ s/( output [^ ]* 6) [0-9]+ ([0-9]+$)/$1 0 $2/
if $opt_j;
# TODO command play column time
# result of ctime() in many commands
$_ = norm_ctime($_)
unless $dump;
### Time values in legacy dumps
s/(DUMP (SECTOR|SHIPS|PLANES|LAND UNITS|NUKES|LOST ITEMS)) [0-9]+$/$1 0/;
s/ [0-9]+$/ 0/
if $dump eq 'LOST ITEMS';
### Time values in xdump
s/(XDUMP (meta )?[-a-z0-9]+) [0-9]+$/$1 0/
unless $dump;
# HACK: assume any integer with more than 10 digits is time
# TODO don't do that, use xdump meta instead
s/(^| )[0-9]{10,}/${1}0/g
if $dump =~ /^[a-z]/;
# timeused in xdump country timeused (column 10)
s/^(($xdfld_re ){10})([0-9]+) /${1}255 /
if $dump eq 'country';
# timeused in xdump nat (column 15)
s/^(($xdfld_re ){15})([0-9]+) /${1}255 /
if $dump eq 'nat';
# duration in xdump news (column 4)
s/^(($xdfld_re ){4})([0-9]+) /${1}0 /
if $dump eq 'news';
### nrndx values in xdump
# Encoding depends on the host, see resources[]. Too clever by half;
# perhaps we should change it.
# nrndx in xdump product (column 12)
s/^(($xdfld_re ){12})([0-9]+) /${1}0 /
if $dump eq 'product';
# value in xdump resources (column 0)
s/^[0-9]+ /0 /
if $dump eq 'resources';
### Floating-point zero in xdump
# Windows %#g prints it with seven significant digits instead of six
s/ 0\.000000/ 0.00000/g
if $dump =~ /^[a-z]/;
print "$pfx$_\n";
if (/(XDUMP|^config) (meta )?([-a-z0-9]+)/) {
$dump = $3;
die unless $dump =~ /^[a-z]/;
} elsif (/DUMP (SECTOR|SHIPS|PLANES|LAND UNITS|NUKES|LOST ITEMS) /) {
$dump = $1;
}
}