15 Resolving URLs: Resolve

The Resolve module generalizes the idea of a search path and simplifies read-oriented operations on arbitrary files and urls. The reader should be warned that this module has not yet reached full maturity.

15.1 From search paths to search methods

A search path is a list of directores sequentially searched to resolve a relative pathname. On Unix a search path is traditionally specified by an environment variable whose value is of the form:

Dir1:Dir2:...:DirN

On Windows, the colons : would be replaced by semi-colons ;. In the age of the World Wide Web, the classical notion of a search path is too limited: we want to search for arbitrary urls in arbitrary networked locations, and not simply for relative pathnames in local directories. For this reason, the notion of a directory to be searched is replaced by that of a method to be applied. A sequence of search methods can be specified by an environment variable whose value is of the form:

Meth1:Meth2:...:MethN

where each MethK is of the form KIND=ARG. KIND selects the method to be applied and ARG is its parameters. On Windows, the colons might be replaced by semi-colons, we support both notations on all platforms. The idea is of course that each method should be tried until one of them succeeds in locating the desired resource.

15.2 Syntax of methods

We now describe the syntax of KIND=ARG for the supported methods. For each one, we use a concrete example. ARG can normally be indifferently a directory or a url.

Any character of ARG can be escaped by preceding it with a backslash \: this is useful e. g. to prevent an occurrence of a colon in a url to be interpreted as a method separator. However, it means that, if you insist on using \ as a path component separator (à la Windows) instead of / (à la Unix), then you will have to escape them in ARG. Furthermore, \ is also an escape character for the shell, which means that you will normally have to double each escape character.

all=/usr/local/oz/share

The last component in the input url is extracted and looked up in the location supplied as the methods argument. If the input url is http://www.mozart-oz.org/home/share/Foo.ozf, then we try to look up /usr/local/oz/share/Foo.ozf instead.

root=~/lib/oz

This applies only to a relative pathname: it is resolved relative to the base url or directory supplied as argument to the method. If the input url is share/Foo.ozf then ~/lib/oz/share/Foo.ozf is looked up instead. For convenience, and to be compatible with search path notation, you can omit root= and simply write this method as ~/lib/oz

cache=/usr/local/oz/cache

Applies only to a full url: it is transformed into a relative pathname and looked up in the specified location. If the input url is: http://www.mozart-oz.org/home/share/Foo.ozf, then /usr/local/oz/cache/http/www.mozart-oz.org/home/share/Foo.ozf is looked up instead. This method is typically used to permit local caching of often used functors. The cache location could also be the url of some sort of mirroring server.

prefix=http://www.mozart-oz.org/home/=~/oz/

This method has the form prefix=LOC1=LOC2. Whenever the input url begins with the string LOC1, this prefix is replaced by LOC2 and the result is looked for instead. Thus, if the input url is http://www.mozart-oz.org/home/share/Foo.ozf, we would look for ~/oz/share/Foo.ozf.

pattern=http://www.?{x}/home/?{y}=ftp://ftp.?{x}/oz/?{y}

The pattern method is more general than the prefix method. LOC1 can contain match variables, such as ?{x} and ?{y} that should match arbitrary sequences of characters, and LOC2 can also mention these variables to denote the value they have been assigned by the match. Thus, if the input url is http://www.mozart-oz.org/home/share/Foo.ozf, we would look for ftp://ftp.mozart-oz.org/oz/share/Foo.ozf.

=

Normally, the default handler is implicitly appended to your search methods. This is the handler that simply looks up the input url itself, when all previous methods have failed. Sometimes it is desirable to disallow this default: for example this is the case when building the mozart distribution; the build process should be self contained and should not attempt to access resources over the network. You can disallow the default by appending = as the very last of your search methods. Thus

.:all=~/oz/bazar:=

would first try to resolve relative pathnames with respect to the current directory, then all urls by looking up their final component in directory ~/oz/bazar, and that's it. If neither of these two methods succeeds, the resolution would simply raise an exception, but it would not attempt to retrieve the input url from the net.

15.3 Interface of Resolve Module

A resolver is a module that encapsulates and exports the resolving services of a sequence of search methods. For different purposes, you may need to apply different resolution strategies. For this reason, you may create arbitrarily many resolvers, each implementing an arbitrary resolution strategy.

Resolve.make

{Resolve.make +VS +Spec ?R}

Creates a new resolver R, identified by virtual string VS in trace messages, and whose strategy is initialized according to Spec which is one of:

init(L)

where L is a list of handlers (see later).

env(V)

where V names an environment variable whose value provides the search methods. If it is not set, the initial strategy simply looks up the input url itself.

env(V S)

same as above, but, if the environment variable is not set, then use S as its value.

vs(S)

simply get the search methods from virtual string S.

Resolve.trace.get
Resolve.trace.set

{Resolve.trace.get ?Bool}

{Resolve.trace.set +Bool}

Obtain or set the trace flag. When tracing is enabled, every resolve method that is attempted prints an informative message. Furthermore, all messages are prefixed by the virtual string identifying the resolver in which these methods are being invoked.

Resolve.expand

{Resolve.expand +Url1 ?Url2}

Takes a Url or virtual string as input and returns a Url with "~" expanded to the full user's home directory path, "~john" expanded to john's home directory, "." and ".." expanded to the current directory and parent directory. This functionality really belongs in the URL module, but is put here instead to keep module URL stateless.

Resolve.handler

You don't have to specify your methods as virtual strings, instead you can directly construct them using the following procedures:

Resolve.handler.default

This is the default handler that simply looks up the given url as is.

Resolve.handler.all

{Resolve.handler.all +LOC ?Handler}

This creates a handler that implements the all method for location LOC. The final component in the input url is looked up in LOC.

Resolve.handler.root

{Resolve.handler.root +LOC ?Handler}

This creates a handler that implements the root method for location LOC. A relative pathname is resolved relative to LOC.

Resolve.handler.cache

{Resolve.handler.cache +LOC ?Handler}

This creates a handler that implements the cache method for location LOC. A full url is transformed into a relative pathname and resolved relative to LOC.

Resolve.handler.prefix

{Resolve.handler.prefix +Prefix +Subst ?Handler}

This creates a handler that implements the prefix method. If the input url begins with string Prefix, then this is replaced by Subst and looked up instead.

Resolve.handler.pattern

{Resolve.handler.prefix +Pattern +Subst ?Handler}

This creates a handler that implements the pattern method. If the input url matches the string pattern Pattern, then this is replaced by the corresponding instantiation of Subst and looked up instead.

15.4 Interface of a Resolver

Each resolver R exports the following methods

R.getHandlers

{R.getHandlers ?L}

obtains R's current list of handlers.

R.setHandlers

{R.setHandlers +L}

install's L as R's current list of handlers.

R.addHandler

{R.addHandler front(H)}
{R.addHandler back(H)}

adds H at the front (resp. at the back) of R's list of handlers.

R.localize

{R.localize +Url ?Rec}

the return value Rec is old(Filename) if Url resolves to local file Filename, else it is new(Filename) where Filename is a new local file created by retrieving the data at Url.

R.open

{R.open +Url ?FdI}

returns FdI, which is an integer file descriptor open for read on the data available from Url.

R.load

{R.load +Url ?V}

returns the value V obtained from the pickle available at Url.

R.native

{R.native +Url ?M}

returns the native module M obtained by dynamically linking the native functor available in file Url.


Denys Duchier, Leif Kornstaedt, Martin Homik, Tobias Müller, Christian Schulte and Peter Van Roy
Version 1.4.0 (20080702)