git.pond.sub.org Git - empserver/blob - src/lib/as/README

   1 Tue Nov 13 11:50:24 PST 1990    Phil Lapsley    phil@Berkeley.EDU
   2
   3 This library implements a reasonably general version of the A* algorithm.
   4 Basically, A* is like an ordered search, but with a heuristic that allows
   5 it to make a better choices as to which path to take.  The subdirectory
   6 "test" has example code for using A* to search a weighted cartesian matrix.
   7 The file "XXX//bestpath.c" has code to interface Empire to the A*
   8 algorithms.
   9
  10 This library is copyrighted; see the file COPYRIGHT for details.
  11
  12 COMPILATION
  13
  14 Do a "make" in this directory to make the library.  Cd into "test" and
  15 do a "make" there to make a test program, called (ta da) "test".
  16
  17 LIBRARY USAGE
  18
  19 Pretty much all the data that the user needs to communicate to the library
  20 is stored in an "as_data" structure, which is created by a call to "as_init":
  21
  22         struct as_data  *adp;
  23
  24         adp = as_init( ... );
  25
  26 The arguments to as_init specify a number of key things the algorithm will
  27 use, and these are discussed below.
  28
  29 Once you have an "as_data" structure, you can fill in its "from" and "to"
  30 members, which specify the coordinates of the points between which you want
  31 to go.  The basic coordinate data structure is the "as_coord", which is
  32 just an "x" integer and a "y" integer.  So:
  33
  34         adp->from.x = ...;
  35         adp->from.y = ...;
  36         adp->to.x = ...;
  37         adp->to.y = ...;
  38
  39 and then call as_search:
  40
  41         as_search(adp);
  42
  43 If the return value of as_search is 0, the algorithm found a path from
  44 "from" to "to".  The path is stored in the "path" member of the "as_data"
  45 structure, and is just a linked list of coordinates ("as_coord" structs)
  46 from "from" to "to".
  47
  48 If the return value of as_search is -1, the algorithm couldn't find a
  49 path from "from" to "to".  If the return is -2, a system error occurred
  50 (most probably malloc failed).
  51
  52 After a call to as_search, lots of malloced data 'n' stuff will be
  53 floating around, all pointed to by items in the "as_data" data structure.
  54 If you call "as_search" again (presumably with new "from" and "to"
  55 coordinates), the system will free these data structures and reallocate
  56 new ones.  If you no longer want the data structures at all (i.e., you
  57 never intend to call "as_search" again), you can call:
  58
  59         as_delete(adp);
  60
  61 and this will free up all data associated with the as_data structure pointed
  62 to by adp.
  63
  64 ARGUMENTS TO AS_INIT
  65
  66 "as_init" takes eight arguments, in the following order:
  67
  68         maxneighbors            The maximum number of neighboring sectors a
  69                                 coordinate can have.  On a cartesian grid,
  70                                 this would be 8.  On a hex map, it would be 6.
  71
  72         hashsize                The size of the hash table used to determine
  73                                 if we've visited a particular coordinate.
  74
  75         hashfunc                A pointer to a function that takes an
  76                                 "as_coord" as an argument and returns an
  77                                 integer that will be used as a hash value.
  78                                 If this is a NULL pointer, the as library
  79                                 will assign its own hash function that just
  80                                 adds the x and y coordinates together.
  81
  82         neighborfunc            A pointer to a function that takes a coordinate
  83                                 (call it "c"), a pointer to an array of
  84                                 coordinates of length "maxneighbors", and
  85                                 a user data pointer (see below).  This
  86                                 function should figure out the coordinates
  87                                 of all the neighbors of "c" and put them in
  88                                 the array cp[0] ... cp[maxneighbors-1].
  89                                 It should then return the number of neighbors
  90                                 found.
  91
  92         lbcostfunc              A pointer to a function that takes two
  93                                 coordinates, call them "from" and "to",
  94                                 and a user data pointer (see below).  It
  95                                 returns a double precision *LOWER BOUND*
  96                                 on the cost to get from "from" to "to".
  97                                 "from" and "to" may be separated by an
  98                                 arbitrary distance.  More on this below.
  99
 100         realcostfunc            A pointer to a function that takes two
 101                                 coordinates, call them "from" and "to",
 102                                 and a user data pointer (see below).  It
 103                                 returns a double precision value of
 104                                 the *actual cost* to move from "from" to
 105                                 "to".  Note that "from" will never be more
 106                                 than one sector away from "to".
 107
 108         seccostfunc             A pointer to a function that takes two
 109                                 coordinates, call them "from" and "to",
 110                                 and a user data pointer (see below).  It
 111                                 returns a double precision value that will
 112                                 be used to break ties when two nodes have
 113                                 the same lower bound to the target.  An
 114                                 example will be discussed below.
 115
 116         userdata                A (char *) that can be a pointer to
 117                                 any kind of data you want.  This will
 118                                 be passed as the third argument to the
 119                                 neighborfunc, lbcostfunc, realcostfunc,
 120                                 and seccostfunc.
 121
 122 Look in test/cart.c to see examples of these functions for cartesian
 123 coordinates.
 124
 125 NOTES ON THE LOWER BOUND FUNCTION
 126
 127 "lbcostfunc" is *CRUCIAL* to the algorithm's performance.  The entire
 128 idea behind the A* algorithm is that, when considering whether to move
 129 to a new coordinate, you know two things:
 130
 131         (1) how much it's cost you to get to that coordinate so far,
 132
 133         (2) a LOWER BOUND on how much it will cost to get to the
 134             destination from that coordinate.
 135
 136 If these two conditions are met, the algorithm will return the optimal
 137 path to the destination.  The closer the lower bound is to the actual
 138 cost to get from one point to another, the quicker the algorithm will
 139 find this path.  HOWEVER, if the lower bound is ever violated, i.e.,
 140 if the so-called lower bound function returns a value that is greater
 141 than the actual cost to get to the destination, then the algorithm will
 142 not necessarily find the optimal path.
 143
 144 Example:
 145
 146 Assume that we're on a cartesian matrix, and the cost to move from one point
 147 to another is just the distance between the two points, and that no
 148 other costs are involved.  In this case, the lower bound function could
 149 be the same as the actual cost function, i.e.,
 150
 151         real cost = lower bound cost = sqrt(dx^2 + dy^2);
 152
 153 In this case, the algorithm will find the destination as quickly as possible.
 154
 155 Another example:
 156
 157 Again assume we're on a cartesian matrix, and the cost to move from
 158 one point to another is two things: (1) the distance between them, (2) some
 159 arbitrary cost function we get off of a map.  E.g.,
 160
 161                 X
 162
 163            0  1  2  3
 164         0  0  0  0  0
 165         1  0  0  0  0
 166    Y    2  0  0  2  0
 167         3  0  0  0  0
 168
 169 The real cost to move from (x,y) 0,0 to 1,0 is 1.  That is, it's the distance
 170 (1) plus the value of the map at (1,0), which is 0.  The real cost to move
 171 from (1,2) to (2,2) is 3: the distance (1) plus the map value at (2,2), which
 172 is 2, totaling 3.
 173
 174 In this case, the lower bound function could still be the distance between
 175 the two points, since this WILL NEVER BE MORE than the actual cost, and
 176 hence is a lower bound.  I.e.,
 177
 178         real cost = sqrt(dx^2 + dy^2) + map costs
 179
 180         lower bound cost = sqrt(dx^2 + dy^2)
 181
 182         lower bound cost <= real cost for all coordinates
 183
 184 This is what the the example in the "test" directory uses.
 185
 186 A third example:
 187
 188 You could make the lower bound function always return 0.  This would be
 189 a valid lower bound of the cost to move between any two points.  In this
 190 case, the A* algorithm will behave like a breadth-first search, and isn't
 191 very much fun.
 192
 193 SECONDARY COST FUNCTION
 194
 195 The algorithm tries new coordinates in order of lowest lower-bound.
 196 There can be cases where the lower-bound function for two (or more)
 197 coordinates is the same, i.e., there is a tie.  The algorithm uses
 198 the secondary cost function (which does NOT have to be a lower bound)
 199 to choose which coordinate to pick.  A typical heuristic might be
 200 to use Euclidian distance for this secondary cost function, on the
 201 assumption that it's always better to move closer the destination.
 202 The Empire code does just this.
 203
 204 If you don't need a secondary cost function, just specify a NULL pointer
 205 to the seccost argument to as_init, and the routines will use 0.0 for you.
 206
 207 EMPIRE INTERFACE
 208
 209 The interface code in "XXX/bestpath.c" is a rather complicated example
 210 of A* interface code.  What it does is based on some features of Empire
 211 that are explained here.
 212
 213 First, movement cost in Empire is based on a quantity called "mobility
 214 cost", which is a property of every sector.  As we trace a path from
 215 source to destination, we add up mobility cost as we go.  Once we're
 216 there, we have a total mobility cost.  This is what we'd like to
 217 minimize.
 218
 219 Second, Empire has highways, which are zero cost movement paths.  This
 220 hurts the A* algorithm, because it means that the lower bound cost
 221 function is very weak.  For example, think what happens if we move from
 222 one side of the country to another: if the two sectors are attached via
 223 a highway, the cost will be very small -- in fact, it will be the cost
 224 to move out of the source sector, and the cost to move into the
 225 destination sector.  If, on the other hand, the two sectors aren't
 226 connected by a highway, the cost could be quite large.  Thus, the lower
 227 bound is just the cost to move out of a sector plus the cost to move
 228 into a sector.  This is a pretty weak lower bound, and means that we
 229 have to explore a lot of sectors.
 230
 231 Third, the mobility costs tend to tie a lot, as explained in the
 232 section above on secondary cost functions.  Thus, we use the Empire
 233 "mapdist" function as a secondary sort function to break ties.  For
 234 example, consider the case where a sector borders two highway sectors,
 235 one on each side.  The lower bound function will say that the two have
 236 equal lower bound costs, since they're both highways and cost nothing
 237 to move on.  The secondary sort function is then used to break the tie --
 238 it says, "Take the one that moves you closer to the destination".
 239
 240 Fourth, all of the information we need about a sector (its mobility
 241 cost, who owns it, etc.) is stored in the sector file on disk.  This
 242 means that the getsect() function to get it off disk will do a read(),
 243 which is VERY expensive.  Because of the weak lower bound, A* ends up
 244 checking lots of sectors, including sectors that it's seen before.
 245 To avoid doing thousands of system calls per second, the bestpath.c file
 246 has sector caching routines that store in memory copies of every
 247 sector we read.  (The function bp_getsect handles this).  This means
 248 we never read a sector from disk more than once, although at the expense
 249 of using lots of memory.
 250
 251 THEORY OF OPERATION
 252
 253 The basic idea is as follows:
 254
 255         1. Add the start node to the head of a master queue.
 256         2. Is the head of the queue where we want to be?  If so, stop.
 257         3. Make a list of all the neighbors coordinates around the
 258            coordinate of the head.
 259
 260         4. For each neighbor coordinate,
 261
 262                 Compute the lower bound cost to the destination from here.
 263
 264                 If this coordinate is already on the either the
 265                 master queue or the "tried" queue:
 266
 267                 4a.     If it was on either the "master" queue or the
 268                         "tried" queue and the one on the queue has a
 269                         smaller lower-bound than the neighbor, ignore
 270                         this neighbor and go on to the next neighbor.
 271
 272                 4b.     If it was on the "master" queue and the new
 273                         neighbor has a smaller lower bound,
 274                                 Move the one on the queue to a different
 275                                 queue called "subsumed".
 276
 277                 4c.     If it was on the "tried" queue and the new
 278                         neighbor has a smaller lower bound,
 279                                 Move the one on the "tried" queue to the
 280                                 master queue, and update its lower bound
 281                                 value to be as the new neighbor, and
 282                                 update its backpointer appropriately.
 283
 284            We now have a list of all neighbor coordinates that are either
 285            not on the queue already, or are cheaper than the ones on the
 286            queue.
 287
 288         5. Move the node at the head of the queue (the one whose neighbors
 289            we now have in a list) onto a different queue called "tried".
 290
 291         6. Sort this list of neighbors, and merge it into the master queue,
 292            keeping the master queue ordered by lower bound cost to destination.
 293
 294         7. Goto 2.
 295
 296 My algorithm does all of this, plus a little more (the above doesn't really
 297 mention backpointers except in passing), EXCEPT: I don't do step 4c.
 298 I'm not convinced that this can ever occur if the lower bound rule isn't
 299 broken, and I have yet to see it occur.  However, if as_winnow returns -1,
 300 this error has occurred.