[previous] [next] [top] [search] [index]

Creating your data hierarchy


The Index file

In each directory of your data hierarchy you create a file called index with information about each file you want to serve. This file might look like the following:

     # This is a comment
     Owner=mailto:wnperson@host.edu

     File=file.txt
     Title=This is a descriptive title for file.txt 

     File=file2.html

     File=soundfile
     Title=This plays some sounds
     Content-type=audio/basic


The line starting with Owner= is optional. It should contain a reference to the maintainer of this directory like the one above (technically any URL is permissible).

The remainder of this index file describes three files, file.txt, file2.html, and soundfile, in the directory which we wish to serve. The index file is processed with the utility WNdex (pronounced "windex") to produce another file called index.cache. Detailed information on the wndex utility is given below, but simply running it with no arguments in a directory containing an index file will produce the index.cache file for that directory. This file contains all the information in the index file plus additional information gathered automatically about the files to be served. In particular the index.cache file will list the names of the files given in the File= line of the index file. Any file on the server whose name is not listed in an index.cache file will not be served. This is the basis of WN security. For security reasons the server will refuse to use any index.cache file which is in reality a symbolic link to another file.

The index.cache file has a number of other functions beyond its security role. Attributes of the files listed in the index file which can be computed before they are served and which don't often change are stored in the index.cache file. For example, the MIME content type of soundfile is read from the Content-type= line. The other files do not need such a line since wndex can deduce from the file name extensions that file.txt has type text/plain and file2.html has type text/html. This is done once at the time index.cache is created and need not be done every time the file is served. By the way, if the sound file were named soundfile.au it wouldn't need a Content-type line either.

The title of a file is another example of information stored in the index.cache file. With the WN server every file served has a title (even binaries) and optionally has a list of keywords associated with it. For an HTML document the title and the keywords are automatically extracted by wndex from the header of the document and stored in fields of that file's line in index.cache. These are used for the built-in keyword an title searches which the server supports.

Using the WNdex utility

Before describing the index file in greater detail we briefly explain the use of the program which reads this file and produces the index.cache database file. Simply running the wndex with no arguments in a directory containing a file named index causes that file to be read and a file called index.cache to be created in that directory. If the index file contains an Indexfile directive as described above then a file called index.html will also be created.

There are several command line arguments for wndex. The -r option causes wndex to recursively descend your data hierarchy using all subdirectories listed in the Subdirs= line of the directory record in the index file (see above).

The -i and -c options specify an alternate name for the index file and the index.cache file respectively. For example the command wndex -i foo -c bar will attempt to use foo as the index file and produce the file bar instead of index.cache.

The -d option specifies a directory other than the current directory in which to find the index file and in which to creat the index.cache and index.htm files.

Finally the -q option (for quiet) supresses the printing of any warning or informational messages by wndex.

The Directory Record

The first group of lines in an index file provides information about the directory itself and the collection of files it contains rather than about any single file in the directory. It is called the directory record. This beginning collection of lines might look like
     Owner=mailto:you@host.edu
     Wrapper=dir_search_wrap
     Accessfile=/dir/access
     Subdirs=dir1,dir2,directory3

This specifies the owner of items in the directory (which is used in the HTTP headers sent by the server. It also specifies a "wrapper" for the various searches of the directory, that is an HTML document which provides a customized response listing the matching items in one of the various searches of the directory (for more details see the section on wrappers and includes. The Accessfile= line specifies the name of the file which controls access (by IP address) to this directory. If this item is omitted then items in the directory may be served to anyone. For more information on using the access mechanism see the section of this document on access. Finally the line starting with Subdirs= specifies the subdirectories of this directory which you wish to have recursively searched when a title or keyword search is done on this directory. More information about searching can be found in the chapter on searches,

After the directory record line group an index file will typically have groups of lines called file records describing a particular file. A file record can be as simple as a single line like the line "File=file2.html" in the example above or it can contain several lines describing the file. For a complete list of the possible lines (called "directives") which a file can have see Appendix B.

The Indexfile directive

One special kind of file record is one created with the Indexfile= directive. This directive must (if it is used) be the start of the first file record in the index file. The lines
     Indexfile=index.html
     Title=Here is a collection of interesting documents

have much the same effects as the entry "File=index.html" would, but additionally causes the wndex program to create the file index.html containing (at least) an unordered list of anchors with links to all the documents listed in the index file with a File= directive, a Link= directive or a URL= directive. There should always be a Title= line associated with an Indexfile item since wndex cannot both create index.html and read the title from it. The presence of the Indexfile= line causes an entry for this file to be written in the index.cache file created by wndex. When the file index.html is created any previous file by that name is overwritten. It is not necessary to name the file "index.html."

The Link and URL directives

Two other types of records in an index file are Link= and URL=. The Link= directive is used to create a reference in the current index.html to a document somwhere else on your server. The line
     Link=/dir/foo

has no effect unless there is a previous Indexfile= line causing the creation of an index.html file. In this case this line causes the creation of a link to /dir/foo in the list of links in index.html. This directive has no effect on the index.cache file.

The URL= directive is intended for links to items on other servers, even other types of servers. The lines

     URL=http://host/dir/foo.html
     Title=Here is the title

has the effect of putting an anchor to this item in the index.html file if the Indexfile= directive has been used and also putting the line

     url=http://host/dir/foo.html&title=Here is the title

in the current index.cache file. This is then used for title searches, so any query with a match will result in an anchor to the remote item in the list returned by the search. A URL= directive can have any valid URL as its value.

The Text and EndText directives

As mentioned above if the index file contains an Indexfile directive, then wndex will produce an index.html file containing a list of anchors to all items specified with a File= or URL= directive. It is sometimes convenient to insert some text (in HTML format) into this file. This can be done with the Text= and EndText= directives. For example the lines
     Text=
     Here is a list of the best stuff on this server and some of
     my choices for this year's best of the web contest:
     Endtext=

in an index file will cause the text between "Text=" and "Endtext=" to be inserted literally into the index.html file at the place where this directive occurs in the index file. Like the "File=", a "Text=" directive must come after the "Indexfile=" directive.


WN -- for those who think the Web should be more than a user friendly version of ftp

John Franks <john@math.nwu.edu>
[previous] [next] [top] [search] [index]