Source Description Structures Brewster Kahle brewster@Think.com Harry Morris morris@think.com 2/20/91 A "source description structure" is used by a client of the WAIS system to contact a database on a server. This document describes a text encoding of the source description structure. This specification is compatible with the clients that are given out with the WAIS release (version 8 and higher) and with the Directory of Servers maintained by Thinking Machines. Example C code to read and write these structures is available free of charge from the authors. Note that WAIS end users should never see this structure, since a typical user will use the source structures retrieved from the directory of servers or those created by waisindex, and interact with them through a client user interface. Philosophy: ----------- Sources form the conceptual link between an information user and a database vendor. As such, they serve two functions. First, they provide a means for the database vendor to inform the user of database specific information, in particular, how to contact the database, and how much it costs to do so. Second, they serve as repositories for the user's customizations. The fields which describe how to contact a server, what database name to use, and how much the service costs are required. The other fields are strictly optional. Servers may choose to use them to make suggestions about how their service might be used, or clients may choose to let the user modify certain options. While not all clients may want or need all the optional fields, it is in everyone's best interest to have any particular function expressed in a single format that all can share. The optional fields presented here are ones we have found useful. If you have suggestions for fields that should be added to the specification, please contact the authors. Clients should always be prepared to ignore the non-required fields. An example source structure is: (:source :version 3 :ip-name "quake.think.com" :ip-address "192.31.181.1" :tcp-port 210 :maintainer "brewster@think.com" :database-name "directory-of-servers" :cost 0.00 :cost-unit :free :description "The directory of servers is a white pages of servers maintained by many others. This one is maintained by Thinking Machines Corporation on the internet. To submit new entries to the directory of servers, mail to wais-directory-of-servers@quake.think.com. -brewster" :update-time (:time-interval :interval :daily :day 0 :hour 1 :min 30) ) Syntax: ------- The syntax is based on a subset the Common Lisp printer format (see Common Lisp, the Manual, by Guy Steele). We chose this format because is easy to parse on many different platforms, and from many different languages. The format is summarized below. A source is a structure but written as a list. Every structure is written in the form ( ... ) Structure types and keywords are always symbols. Value may be a symbol, decimal, integer, string, array, ANY, or another structure. The Common Lisp structure format was not used because many lisp implementations do not handle unknown keywords well. Symbols are preceded with a colon (in the keyword package), and consist of all lower case letters with embedded hyphens. Strictly speaking, symbols are not case sensitive, but some clients seem to be ignoring this. Sticking with lower case is safe. Example symbols :2foo and :bar-baz3. Strings start and end with '"'; if a '"' is to appear in the string, then it a '\' is inserted before it. If a '\' is to appear then, '\\' is used. Integers are written without a decimal point. Floating point numbers are written with a decimal point. Arrays are written as #(value value value ...). For example, a 1 dimensional array of numbers looks like #(115 101 120). Lists are formed by simply enclosing enumerated items in parentheses. The items are separated by white space. For example, a list of numbers looks like (3 4 57). Z39.50 byte arrays (ANYs) are written as structs with length and byte keys or as a string. For example, (:any :length 3 :bytes #(115 101 120)) or "foo". The string form may only used when every character is a printable character; this is used to reduce the structure size and increase readability, but is optional. Required Fields (set by vendor): ------------------------------- :version The number is the major version number of the source structure format. The current version is 3. This must be the first slot. :database-name These are the names of the databases on the server. It is a list of databases, separated by characters at least one of :ip-name, :ip-address. Both combination may be specified. XXX what about modems :ip-address a legal internet address such as "192.31.181.1" :ip-name a legal internet name such as "quake.think.com" :cost Cost of using this server. see :cost-unit for units :cost-unit one of the following values. :free, :dollars-per-session, :dollars-per-minute, :dollars-per-query, :dollars-per-retrieval, :other Optional Fields set by vendor: ------------------------------ :tcp-port The tcp port to connect to. The default is 210, the official Z39.50 port number. :configuration (XXX weird stuff, plus what contact method to use! XXX) :script A string of commands to be interpreted after establishing the connection to the server. This may involve loging in, authentication, configuring the connection etc. The commands are those used by White Knight (yuk). They are currently only used in making modem connections (XXX it is possible that tcp might want to use this too, and that we may want to run scripts just before starting the connection, just after starting the connection, and just before closing the connection). :maintainer An english description of who to contact for information. :description An english description of the database. :update-time The interval at which the server's data is updated. This is a structure of the form (:time-interval :interval :day :hour :min ) where interval-type is one of :continuous - day, hour, min are ignored :hourly - min is used :daily - hour and min are used :weekly - day is day of week, hour and min are used :monthly - day is day of month, hour and min are used :unschedualed - day, hour, min are ignored (XXX what about yearly) Optional Fields set by user: ---------------------------- :font Font for displaying this source's documents. This is machine specific, clients should use their default if they don't understand the value. Note that a vendor may wish to suggest a font for use with its database. :font-size The size to use for displaying the font. The units are machine dependent. :window-geometry The positioning of the source editing window (if there is one) in machine dependent units. It is of the form (:rect :left :top :right :bottom ) :confidence A subjective evaluation of how "good" the server is. This may be used by clients to help score documents or allocate funds. :num-docs-to-request How many documents to ask for from this source. :contact-at Clients which can automatically contact a server on behalf of their user use this field to determine when. Note, multitasking clients may have system level utilities for this purpose. :last-contacted A record of the last time the source was contacted. The format is (:absolute-time :year :month :mday :hour :minute :second (second = 61 means the server has not been contacted) ) :timeout The number of seconds to give the this source's server to respond to a query before disconnecting the communications circuit.