1 Tue Nov 13 11:50:24 PST 1990 Phil Lapsley phil@Berkeley.EDU
3 This library implements a reasonably general version of the A* algorithm.
4 Basically, A* is like an ordered search, but with a heuristic that allows
5 it to make a better choices as to which path to take. The subdirectory
6 "test" has example code for using A* to search a weighted cartesian matrix.
7 The file "XXX//bestpath.c" has code to interface Empire to the A*
10 This library is copyrighted; see the file COPYRIGHT for details.
14 Do a "make" in this directory to make the library. Cd into "test" and
15 do a "make" there to make a test program, called (ta da) "test".
19 Pretty much all the data that the user needs to communicate to the library
20 is stored in an "as_data" structure, which is created by a call to "as_init":
26 The arguments to as_init specify a number of key things the algorithm will
27 use, and these are discussed below.
29 Once you have an "as_data" structure, you can fill in its "from" and "to"
30 members, which specify the coordinates of the points between which you want
31 to go. The basic coordinate data structure is the "as_coord", which is
32 just an "x" integer and a "y" integer. So:
39 and then call as_search:
43 If the return value of as_search is 0, the algorithm found a path from
44 "from" to "to". The path is stored in the "path" member of the "as_data"
45 structure, and is just a linked list of coordinates ("as_coord" structs)
48 If the return value of as_search is -1, the algorithm couldn't find a
49 path from "from" to "to". If the return is -2, a system error occurred
50 (most probably malloc failed).
52 After a call to as_search, lots of malloced data 'n' stuff will be
53 floating around, all pointed to by items in the "as_data" data structure.
54 If you call "as_search" again (presumably with new "from" and "to"
55 coordinates), the system will free these data structures and reallocate
56 new ones. If you no longer want the data structures at all (i.e., you
57 never intend to call "as_search" again), you can call:
61 and this will free up all data associated with the as_data structure pointed
66 "as_init" takes eight arguments, in the following order:
68 maxneighbors The maximum number of neighboring sectors a
69 coordinate can have. On a cartesian grid,
70 this would be 8. On a hex map, it would be 6.
72 hashsize The size of the hash table used to determine
73 if we've visited a particular coordinate.
75 hashfunc A pointer to a function that takes an
76 "as_coord" as an argument and returns an
77 integer that will be used as a hash value.
78 If this is a NULL pointer, the as library
79 will assign its own hash function that just
80 adds the x and y coordinates together.
82 neighborfunc A pointer to a function that takes a coordinate
83 (call it "c"), a pointer to an array of
84 coordinates of length "maxneighbors", and
85 a user data pointer (see below). This
86 function should figure out the coordinates
87 of all the neighbors of "c" and put them in
88 the array cp[0] ... cp[maxneighbors-1].
89 It should then return the number of neighbors
92 lbcostfunc A pointer to a function that takes two
93 coordinates, call them "from" and "to",
94 and a user data pointer (see below). It
95 returns a double precision *LOWER BOUND*
96 on the cost to get from "from" to "to".
97 "from" and "to" may be separated by an
98 arbitrary distance. More on this below.
100 realcostfunc A pointer to a function that takes two
101 coordinates, call them "from" and "to",
102 and a user data pointer (see below). It
103 returns a double precision value of
104 the *actual cost* to move from "from" to
105 "to". Note that "from" will never be more
106 than one sector away from "to".
108 seccostfunc A pointer to a function that takes two
109 coordinates, call them "from" and "to",
110 and a user data pointer (see below). It
111 returns a double precision value that will
112 be used to break ties when two nodes have
113 the same lower bound to the target. An
114 example will be discussed below.
116 userdata A (char *) that can be a pointer to
117 any kind of data you want. This will
118 be passed as the third argument to the
119 neighborfunc, lbcostfunc, realcostfunc,
122 Look in test/cart.c to see examples of these functions for cartesian
125 NOTES ON THE LOWER BOUND FUNCTION
127 "lbcostfunc" is *CRUCIAL* to the algorithm's performance. The entire
128 idea behind the A* algorithm is that, when considering whether to move
129 to a new coordinate, you know two things:
131 (1) how much it's cost you to get to that coordinate so far,
133 (2) a LOWER BOUND on how much it will cost to get to the
134 destination from that coordinate.
136 If these two conditions are met, the algorithm will return the optimal
137 path to the destination. The closer the lower bound is to the actual
138 cost to get from one point to another, the quicker the algorithm will
139 find this path. HOWEVER, if the lower bound is ever violated, i.e.,
140 if the so-called lower bound function returns a value that is greater
141 than the actual cost to get to the destination, then the algorithm will
142 not necessarily find the optimal path.
146 Assume that we're on a cartesian matrix, and the cost to move from one point
147 to another is just the distance between the two points, and that no
148 other costs are involved. In this case, the lower bound function could
149 be the same as the actual cost function, i.e.,
151 real cost = lower bound cost = sqrt(dx^2 + dy^2);
153 In this case, the algorithm will find the destination as quickly as possible.
157 Again assume we're on a cartesian matrix, and the cost to move from
158 one point to another is two things: (1) the distance between them, (2) some
159 arbitrary cost function we get off of a map. E.g.,
169 The real cost to move from (x,y) 0,0 to 1,0 is 1. That is, it's the distance
170 (1) plus the value of the map at (1,0), which is 0. The real cost to move
171 from (1,2) to (2,2) is 3: the distance (1) plus the map value at (2,2), which
174 In this case, the lower bound function could still be the distance between
175 the two points, since this WILL NEVER BE MORE than the actual cost, and
176 hence is a lower bound. I.e.,
178 real cost = sqrt(dx^2 + dy^2) + map costs
180 lower bound cost = sqrt(dx^2 + dy^2)
182 lower bound cost <= real cost for all coordinates
184 This is what the the example in the "test" directory uses.
188 You could make the lower bound function always return 0. This would be
189 a valid lower bound of the cost to move between any two points. In this
190 case, the A* algorithm will behave like a breadth-first search, and isn't
193 SECONDARY COST FUNCTION
195 The algorithm tries new coordinates in order of lowest lower-bound.
196 There can be cases where the lower-bound function for two (or more)
197 coordinates is the same, i.e., there is a tie. The algorithm uses
198 the secondary cost function (which does NOT have to be a lower bound)
199 to choose which coordinate to pick. A typical heuristic might be
200 to use Euclidian distance for this secondary cost function, on the
201 assumption that it's always better to move closer the destination.
202 The Empire code does just this.
204 If you don't need a secondary cost function, just specify a NULL pointer
205 to the seccost argument to as_init, and the routines will use 0.0 for you.
209 The interface code in "XXX/bestpath.c" is a rather complicated example
210 of A* interface code. What it does is based on some features of Empire
211 that are explained here.
213 First, movement cost in Empire is based on a quantity called "mobility
214 cost", which is a property of every sector. As we trace a path from
215 source to destination, we add up mobility cost as we go. Once we're
216 there, we have a total mobility cost. This is what we'd like to
219 Second, Empire has highways, which are zero cost movement paths. This
220 hurts the A* algorithm, because it means that the lower bound cost
221 function is very weak. For example, think what happens if we move from
222 one side of the country to another: if the two sectors are attached via
223 a highway, the cost will be very small -- in fact, it will be the cost
224 to move out of the source sector, and the cost to move into the
225 destination sector. If, on the other hand, the two sectors aren't
226 connected by a highway, the cost could be quite large. Thus, the lower
227 bound is just the cost to move out of a sector plus the cost to move
228 into a sector. This is a pretty weak lower bound, and means that we
229 have to explore a lot of sectors.
231 Third, the mobility costs tend to tie a lot, as explained in the
232 section above on secondary cost functions. Thus, we use the Empire
233 "mapdist" function as a secondary sort function to break ties. For
234 example, consider the case where a sector borders two highway sectors,
235 one on each side. The lower bound function will say that the two have
236 equal lower bound costs, since they're both highways and cost nothing
237 to move on. The secondary sort function is then used to break the tie --
238 it says, "Take the one that moves you closer to the destination".
240 Fourth, all of the information we need about a sector (its mobility
241 cost, who owns it, etc.) is stored in the sector file on disk. This
242 means that the getsect() function to get it off disk will do a read(),
243 which is VERY expensive. Because of the weak lower bound, A* ends up
244 checking lots of sectors, including sectors that it's seen before.
245 To avoid doing thousands of system calls per second, the bestpath.c file
246 has sector caching routines that store in memory copies of every
247 sector we read. (The function bp_getsect handles this). This means
248 we never read a sector from disk more than once, although at the expense
249 of using lots of memory.
253 The basic idea is as follows:
255 1. Add the start node to the head of a master queue.
256 2. Is the head of the queue where we want to be? If so, stop.
257 3. Make a list of all the neighbors coordinates around the
258 coordinate of the head.
260 4. For each neighbor coordinate,
262 Compute the lower bound cost to the destination from here.
264 If this coordinate is already on the either the
265 master queue or the "tried" queue:
267 4a. If it was on either the "master" queue or the
268 "tried" queue and the one on the queue has a
269 smaller lower-bound than the neighbor, ignore
270 this neighbor and go on to the next neighbor.
272 4b. If it was on the "master" queue and the new
273 neighbor has a smaller lower bound,
274 Move the one on the queue to a different
275 queue called "subsumed".
277 4c. If it was on the "tried" queue and the new
278 neighbor has a smaller lower bound,
279 Move the one on the "tried" queue to the
280 master queue, and update its lower bound
281 value to be as the new neighbor, and
282 update its backpointer appropriately.
284 We now have a list of all neighbor coordinates that are either
285 not on the queue already, or are cheaper than the ones on the
288 5. Move the node at the head of the queue (the one whose neighbors
289 we now have in a list) onto a different queue called "tried".
291 6. Sort this list of neighbors, and merge it into the master queue,
292 keeping the master queue ordered by lower bound cost to destination.
296 My algorithm does all of this, plus a little more (the above doesn't really
297 mention backpointers except in passing), EXCEPT: I don't do step 4c.
298 I'm not convinced that this can ever occur if the lower bound rule isn't
299 broken, and I have yet to see it occur. However, if as_winnow returns -1,
300 this error has occurred.