Iliès Alouini, Peter Van Roy
In any wide-area distributed system such as the Internet, fault
tolerance is crucial for real-world applications. We describe a new
practical fault-tolerant mobile agent platform. The agent platform is built
on the top of the Mozart system using a “global store” abstraction
that provides a globally coherent and fault tolerant memory. The global
store looks like a set of objects which are accessed using a transactional
interface. Instances of the global store are used for two purposes:
to store the agent state and to communicate with the agent. The store is
lightweight, requires no persistence, and is independent of the file system.
Processes can be added to or removed from the store dynamically. With $n$ processes, the store tolerates up to $n-1$ fail-stop process failures. This is adequate for fault tolerance on a LAN; we are working on extending the store for network partitioning, so that it will be adequate also on the complete Internet. The store can migrate without dependencies, i.e., the migration depends on no fixed process. Mozart is a general-purpose development platform for open, robust distributed applications that is based on the Oz language. A beta version of the global store is implemented completely within Oz using Mozart’s reflective fault model. The beta version has been publicly released in the Mozart Global User Library.