empserver/src/lib/as
2006-03-23 20:48:49 +00:00
..
as.h Normalize inclusion guards: use NAME_H for name.h. Some headers 2005-12-29 10:16:01 +00:00
as_cache.c Use NULL instead of (FOO *)0, it's easier to read. 2006-03-23 20:48:49 +00:00
as_costcomp.c Include config.h. 2005-12-27 18:04:19 +00:00
as_delete.c Include config.h. 2005-12-27 18:04:19 +00:00
as_extend.c Include config.h. 2005-12-27 18:04:19 +00:00
as_hash.c Include config.h. 2005-12-27 18:04:19 +00:00
as_init.c Include config.h. 2005-12-27 18:04:19 +00:00
as_merge.c Include config.h. 2005-12-27 18:04:19 +00:00
as_search.c Include config.h. 2005-12-27 18:04:19 +00:00
as_stats.c Include config.h. 2005-12-27 18:04:19 +00:00
as_winnow.c Include config.h. 2005-12-27 18:04:19 +00:00
COPYRIGHT Import of Empire 4.2.12 2003-08-23 12:23:04 +00:00
README Import of Empire 4.2.12 2003-08-23 12:23:04 +00:00

Tue Nov 13 11:50:24 PST 1990	Phil Lapsley	phil@Berkeley.EDU

This library implements a reasonably general version of the A* algorithm.
Basically, A* is like an ordered search, but with a heuristic that allows
it to make a better choices as to which path to take.  The subdirectory
"test" has example code for using A* to search a weighted cartesian matrix.
The file "XXX//bestpath.c" has code to interface Empire to the A*
algorithms.

This library is copyrighted; see the file COPYRIGHT for details.

COMPILATION

Do a "make" in this directory to make the library.  Cd into "test" and
do a "make" there to make a test program, called (ta da) "test".

LIBRARY USAGE

Pretty much all the data that the user needs to communicate to the library
is stored in an "as_data" structure, which is created by a call to "as_init":

	struct as_data	*adp;

	adp = as_init( ... );

The arguments to as_init specify a number of key things the algorithm will
use, and these are discussed below.

Once you have an "as_data" structure, you can fill in its "from" and "to"
members, which specify the coordinates of the points between which you want
to go.  The basic coordinate data structure is the "as_coord", which is
just an "x" integer and a "y" integer.  So:

	adp->from.x = ...;
	adp->from.y = ...;
	adp->to.x = ...;
	adp->to.y = ...;

and then call as_search:

	as_search(adp);

If the return value of as_search is 0, the algorithm found a path from
"from" to "to".  The path is stored in the "path" member of the "as_data"
structure, and is just a linked list of coordinates ("as_coord" structs)
from "from" to "to".

If the return value of as_search is -1, the algorithm couldn't find a
path from "from" to "to".  If the return is -2, a system error occurred
(most probably malloc failed).

After a call to as_search, lots of malloced data 'n' stuff will be
floating around, all pointed to by items in the "as_data" data structure.
If you call "as_search" again (presumably with new "from" and "to"
coordinates), the system will free these data structures and reallocate
new ones.  If you no longer want the data structures at all (i.e., you
never intend to call "as_search" again), you can call:

	as_delete(adp);

and this will free up all data associated with the as_data structure pointed
to by adp.

ARGUMENTS TO AS_INIT

"as_init" takes eight arguments, in the following order:

	maxneighbors		The maximum number of neighboring sectors a
				coordinate can have.  On a cartesian grid,
				this would be 8.  On a hex map, it would be 6.

	hashsize		The size of the hash table used to determine
				if we've visited a particular coordinate.

	hashfunc		A pointer to a function that takes an
				"as_coord" as an argument and returns an
				integer that will be used as a hash value.
				If this is a NULL pointer, the as library
				will assign its own hash function that just
				adds the x and y coordinates together.

	neighborfunc		A pointer to a function that takes a coordinate
				(call it "c"), a pointer to an array of
				coordinates of length "maxneighbors", and
				a user data pointer (see below).  This
				function should figure out the coordinates
				of all the neighbors of "c" and put them in
				the array cp[0] ... cp[maxneighbors-1].
				It should then return the number of neighbors
				found.

	lbcostfunc		A pointer to a function that takes two
				coordinates, call them "from" and "to",
				and a user data pointer (see below).  It
				returns a double precision *LOWER BOUND*
				on the cost to get from "from" to "to".
				"from" and "to" may be separated by an
				arbitrary distance.  More on this below.

	realcostfunc		A pointer to a function that takes two
				coordinates, call them "from" and "to",
				and a user data pointer (see below).  It
				returns a double precision value of
				the *actual cost* to move from "from" to
				"to".  Note that "from" will never be more
				than one sector away from "to".

	seccostfunc		A pointer to a function that takes two
				coordinates, call them "from" and "to",
				and a user data pointer (see below).  It
				returns a double precision value that will
				be used to break ties when two nodes have
				the same lower bound to the target.  An
				example will be discussed below.

	userdata		A (char *) that can be a pointer to
				any kind of data you want.  This will
				be passed as the third argument to the
				neighborfunc, lbcostfunc, realcostfunc,
				and seccostfunc.

Look in test/cart.c to see examples of these functions for cartesian
coordinates.

NOTES ON THE LOWER BOUND FUNCTION

"lbcostfunc" is *CRUCIAL* to the algorithm's performance.  The entire
idea behind the A* algorithm is that, when considering whether to move
to a new coordinate, you know two things:

	(1) how much it's cost you to get to that coordinate so far,

	(2) a LOWER BOUND on how much it will cost to get to the
	    destination from that coordinate.

If these two conditions are met, the algorithm will return the optimal
path to the destination.  The closer the lower bound is to the actual
cost to get from one point to another, the quicker the algorithm will
find this path.  HOWEVER, if the lower bound is ever violated, i.e.,
if the so-called lower bound function returns a value that is greater
than the actual cost to get to the destination, then the algorithm will
not necessarily find the optimal path.

Example:

Assume that we're on a cartesian matrix, and the cost to move from one point
to another is just the distance between the two points, and that no
other costs are involved.  In this case, the lower bound function could
be the same as the actual cost function, i.e.,

	real cost = lower bound cost = sqrt(dx^2 + dy^2);

In this case, the algorithm will find the destination as quickly as possible.

Another example:

Again assume we're on a cartesian matrix, and the cost to move from
one point to another is two things: (1) the distance between them, (2) some
arbitrary cost function we get off of a map.  E.g.,

		X

	   0  1  2  3
	0  0  0  0  0
	1  0  0  0  0 
   Y	2  0  0  2  0
	3  0  0  0  0

The real cost to move from (x,y) 0,0 to 1,0 is 1.  That is, it's the distance
(1) plus the value of the map at (1,0), which is 0.  The real cost to move
from (1,2) to (2,2) is 3: the distance (1) plus the map value at (2,2), which
is 2, totaling 3.

In this case, the lower bound function could still be the distance between
the two points, since this WILL NEVER BE MORE than the actual cost, and
hence is a lower bound.  I.e.,

	real cost = sqrt(dx^2 + dy^2) + map costs

	lower bound cost = sqrt(dx^2 + dy^2)

	lower bound cost <= real cost for all coordinates

This is what the the example in the "test" directory uses.

A third example:

You could make the lower bound function always return 0.  This would be
a valid lower bound of the cost to move between any two points.  In this
case, the A* algorithm will behave like a breadth-first search, and isn't
very much fun.

SECONDARY COST FUNCTION

The algorithm tries new coordinates in order of lowest lower-bound.
There can be cases where the lower-bound function for two (or more)
coordinates is the same, i.e., there is a tie.  The algorithm uses
the secondary cost function (which does NOT have to be a lower bound)
to choose which coordinate to pick.  A typical heuristic might be
to use Euclidian distance for this secondary cost function, on the
assumption that it's always better to move closer the destination.
The Empire code does just this.

If you don't need a secondary cost function, just specify a NULL pointer
to the seccost argument to as_init, and the routines will use 0.0 for you.

EMPIRE INTERFACE

The interface code in "XXX/bestpath.c" is a rather complicated example
of A* interface code.  What it does is based on some features of Empire
that are explained here.

First, movement cost in Empire is based on a quantity called "mobility
cost", which is a property of every sector.  As we trace a path from
source to destination, we add up mobility cost as we go.  Once we're
there, we have a total mobility cost.  This is what we'd like to
minimize.

Second, Empire has highways, which are zero cost movement paths.  This
hurts the A* algorithm, because it means that the lower bound cost
function is very weak.  For example, think what happens if we move from
one side of the country to another: if the two sectors are attached via
a highway, the cost will be very small -- in fact, it will be the cost
to move out of the source sector, and the cost to move into the
destination sector.  If, on the other hand, the two sectors aren't
connected by a highway, the cost could be quite large.  Thus, the lower
bound is just the cost to move out of a sector plus the cost to move
into a sector.  This is a pretty weak lower bound, and means that we
have to explore a lot of sectors.

Third, the mobility costs tend to tie a lot, as explained in the
section above on secondary cost functions.  Thus, we use the Empire
"mapdist" function as a secondary sort function to break ties.  For
example, consider the case where a sector borders two highway sectors,
one on each side.  The lower bound function will say that the two have
equal lower bound costs, since they're both highways and cost nothing
to move on.  The secondary sort function is then used to break the tie --
it says, "Take the one that moves you closer to the destination".

Fourth, all of the information we need about a sector (its mobility
cost, who owns it, etc.) is stored in the sector file on disk.  This 
means that the getsect() function to get it off disk will do a read(),
which is VERY expensive.  Because of the weak lower bound, A* ends up
checking lots of sectors, including sectors that it's seen before.
To avoid doing thousands of system calls per second, the bestpath.c file
has sector caching routines that store in memory copies of every
sector we read.  (The function bp_getsect handles this).  This means
we never read a sector from disk more than once, although at the expense
of using lots of memory.

THEORY OF OPERATION

The basic idea is as follows:

	1. Add the start node to the head of a master queue.
	2. Is the head of the queue where we want to be?  If so, stop.
	3. Make a list of all the neighbors coordinates around the
	   coordinate of the head.

	4. For each neighbor coordinate,

		Compute the lower bound cost to the destination from here.

		If this coordinate is already on the either the
		master queue or the "tried" queue:

		4a.	If it was on either the "master" queue or the
			"tried" queue and the one on the queue has a
			smaller lower-bound than the neighbor, ignore
			this neighbor and go on to the next neighbor.

		4b.	If it was on the "master" queue and the new
			neighbor has a smaller lower bound,
				Move the one on the queue to a different
				queue called "subsumed".

		4c.	If it was on the "tried" queue and the new
			neighbor has a smaller lower bound,
				Move the one on the "tried" queue to the
				master queue, and update its lower bound
				value to be as the new neighbor, and
				update its backpointer appropriately.

	   We now have a list of all neighbor coordinates that are either
	   not on the queue already, or are cheaper than the ones on the
	   queue.

	5. Move the node at the head of the queue (the one whose neighbors
	   we now have in a list) onto a different queue called "tried".

	6. Sort this list of neighbors, and merge it into the master queue,
	   keeping the master queue ordered by lower bound cost to destination.

	7. Goto 2.

My algorithm does all of this, plus a little more (the above doesn't really
mention backpointers except in passing), EXCEPT: I don't do step 4c.
I'm not convinced that this can ever occur if the lower bound rule isn't
broken, and I have yet to see it occur.  However, if as_winnow returns -1,
this error has occurred.