SRL Technical Report Number 92-01

The Tennis Data Formatting Standard

Thomas Garrard
Space Radiation Laboratory
220-47 Downs
California Institute of Technology
Pasadena, California 91125
11 March 1992


1.  Introduction
 1.1  Purpose of the Tennis Standard
 1.2  Description
 1.3  History and Context of the Standard

2.  How to Use the Standard: the User Interface
 2.1  Reading Data
 2.2  Writing Data
 2.3  Describing New Data Formats
 2.4  Keeping Track of Pedigree
 2.5  Time Organization
 2.6  Examples
 2.7  Function Lists
  2.7.1  User Functions
  2.7.2  Internal and System Functions
  2.7.3  Multiple Unit Functions

3.  Utilities

4.  How the Standard Works: Implementation & Layout
 4.1  User Sets
 4.2  Metasets
  4.2.1  Sets  0[  and  0]  :   Beginning and End of Tourney
  4.2.2  Set  0!  :   Set Descriptor
  4.2.3  Sets  [[  and  ]]  :   Match Markers
 4.3  Pedigree Sets
        0$ ,  @@ , and  1?

5.  Why to Use the Standard: Justification and Tradeoffs
 5.1  Short Tourneys
 5.2  Long User Sets
 5.3  Platform and Medium Independence
 5.4  Alternative Possibilities

Appendix:  Various Lists

1. Introduction

1.1. Purpose of the Tennis Standard

This proposed data formatting standard is intended to facilitate documentation of data formats and allow the casual programmer to do input and output of spacecraft data and other similar data by use of an existing library of functions, as opposed to having each user create his own i/o functions. Use of the library avoids duplication of effort and makes programs more transferable from user to user. The tennis standard includes this library of C functions (or FORTRAN equivalents) for getting data from and putting data on the appropriate medium (the "court") and the blocking protocol for the data. An additional benefit of the standard is its lack of dependence on hardware, operating system, and distribution medium. Thus the user not only avoids re-invention of the i/o system every time a new person starts analyzing data, the user also avoids re-writing applications whenever hardware or operating system is "improved" or changed. The only program changes needed are in the tennis library. We should be able to support both Unix and VMS platforms on SAMPEX.

Still another part of the standard will be a library of utility programs, described in section 3. These utilities will perform functions such as ascii dumps without any need for user programming.

The tennis standard originated as a blocking protocol, which allowed collection of many short logical blocks (then called chapters) into physical record blocks for mag tape. Our experience demonstrated the virtues of standards extolled above, but, in addition to these virtues, which apply to almost any standard, we found the standard to be extremely useful for that sort of task -- time sequential blocking of short logical records from spacecraft telemetry streams.

1.2. Description

The tennis library will get or put data from the user program to the court (the selected storage medium) in units of various sizes. These units are nested, thus: The smallest unit, clearly, is the bit. Next larger is a "point". A point may consist of 1, 2, 4, or 8 bytes, i.e., it is a character or integer or float or .... A collection of points is a "game" and a collection of games is a "set". A set might, for example, be the data associated with a cosmic ray pulse height event -- something like eight 16-bit matrix detector addresses, six 16-bit pulse heights, and 16 single bit discriminator indicators. In this example, each address is a point, the eight addresses collectively are one game, the six pulse heights are another game, and the 16 indicators could be a third game. How each of these units is assembled to make the next larger is detailed in the Implementation and Layout section, section 4. The novice does not really need these details -- use of the library is independent of them.

The library is described in the User Interface section, section 2. Previewing briefly: The typically most heavily used function will be get_set, which brings the next available set from the court to the user program. If the set is of interest, the user will ask for a pointer to the data with get_game. The user may then dig into the set directly using pointers, or may ask to library to get_float or get_bit or etc. Use of these "additional" library functions is encouraged as good style, but is not required if the user is sufficiently familiar with the data to know where it is within the set. A similar set of functions allow the user to put points, games, etc. from his/her program to the court.

A collection of sets is called a "match" and typically corresponds to a physical record on a tape or a fixed size group of records on a disk, but also has meaning in non-record-oriented media like a Unix pipe. A collection of matchs is a "tourney" and would typically correspond to a tape or a disk file. Tourneys also have meaning in non-record-oriented media. The Justification and Tradeoffs section (5) includes a discussion of workarounds used when sets have lengths longer than convenient match sizes. The Implementation and Layout Section explains how the library deals with non-record-oriented media, in a fashion intended to be invisible to the user.

Tourneys are the data units which will normally be shipped from one user or data center to another. Each tourney begins with a description of the meaning and format of the various data units in the tourney, i.e., the description specifies what types of games and sets are in the tourney, what points are in the games, and, if interesting, what bits are in the points. These descriptions, in addition to documenting the data for the user, tell the library how to find the bits, points, etc. requested by the user. If necessary the library can use this information to transform, for example, floating-point points from VAX to Sun format, freeing the user from such hassles.

A special feature of the tennis standard is its emphasis on maintaining a creation history or pedigree of the data. The library, with some assistance from the user, records such details as what court served as input to the current court, what program created the current court, and what were the similar details for all ancestral courts. The user's responsibilities in assisting this function are detailed in sections 2.3 and 2.4. Generally pedigree is of interest only when something goes wrong.

Thus there are three types of sets: user data sets contain the data which the user normally wants to get or put; metasets include the sets containing descriptions of the other data sets, and various marker sets (such as end-of-tourney) which allow use of non-record-oriented media; and pedigree sets describe the history of the tourney and its data. Details of these sets are given in sections 2.3 and 2.4 and in section 4.

1.3. History and Context of the Standard

This section is only interesting to those who are familiar with previous versions of this document and wonder how we got here. New users should skip it.

The tennis standard is a successor to the chapter-verse (set-game) standard used first on the HEAO C3 experiment by Caltech, Washington University, and the University of Minnesota. Chapter-verse was also used at Caltech for Voyager, IMP, ISEE, and Galileo data. Experience showed that the chapter-verse standard provided some immunity to computer upgrades (PDP-11 to VAX to Sun) in addition to its benefits regarding documentation and blocking. The new standard has system and medium independence as an explicit goal.

The unusual nomenclature is intended to avoid frequently confusing associations with previous usage of words. Credit goes to the SRL ad hoc lunchtime committee on free advice, led by Dr. A. C. Cummings.

Alternatives to writing our own standard are discussed in section 5.4.

2. How to Use the Standard: the User Interface

Each set is labeled with a key, which consists of two ascii bytes. For example, the Galileo set which specifies the time of a data group is a tG set. An eG set contains a Galileo event. The convention is that the second character specifies the project; keys for special or pedigree sets are non-alphabetic -- for example, end-of-tourney is marked by a key of 0] . The user knows the format of a particular set (or game, point, etc.) from paper documentation or can get the format information from the tourney itself using a standard utility program.

These sets (games, points, etc.) are moved into and out of the user's program by use of the library routines: get_set, put_set, get_game, .... Call conventions for these routines forms the interface between the user and the library. See the figure in section 2.7 for an illustration of how this interface isolates the user from the operating system.

A typical program is a filter -- for example, it might read a tourney containing, amongst other things (like time indicators and rate readouts), sets which correspond to pulse height events. It might apply a PHA-channel to energy-signal calibration to these events and write out new sets which now have floating point signals instead of integer channel numbers. Typically all the "other things" on the input tourney would be copied to the output tourney along with the new signal sets. The other typical pattern for a program is "read data; output plot".

Given a set or a game by the get_set or get_game routines, the user will normally access individual variables (points or bits) by use of pointers or indices, operate on them according to his/her needs, and plot them or output them using the putset, putgame, ... routines.

In a little more detail, one typical program might look like:

 define the format of the new "sG" set;
 do other initialization;
 begin a loop -- get a set;
  if it is not an "eG" set {copy it to the output tourney;}
  if it is an "eG" set,
   {get the game containing the channel numbers;
   calculate the signals from the channel numbers;
   output a "sG" set with a signal game to replace
    the channel number game (other games unchanged;}
 repeat the loop until reaching end-of-tourney;
 polish things off and quit;
The calls for getting sets, etc. are documented in Reading Data (2.1) below. The calls for putting sets, etc. are documented in Writing Data (2.2) below. After you understand those two sections, then some cumbersome details of the "initialize" section of the program example above are described in the following two sections. These details are needed mainly for creating new sets, games, etc. which have not previously been described to the library.

2.1. Reading Data

No initialization is needed for reading data. Assuming a Unix C environment, to read a tourney or collection of tourneys in tennis format:

1) Include the tennis library in your program,
   and use  -ltennis.clib  in the compile command.
2) Call get_set(). It returns the key string of the next set.
3) Call get_game(N). It returns a pointer to game N, the sequence number of the game
   within the set. Note that the pointer for the first game can be used as a pointer 
   for the set.
4) A key of   0]   means end of tourney, i.e., no more data. Before telling the user
   program that end of tourney has been reached, the library will ask the operator 
   for additional tourneys.
When running programs using tennis format, when input or output is required, the library prompts on the terminal for input or output tourney. If a disk file is being used, enter the name of the file (followed by a carriage return). Otherwise, enter the device name, such as /dev/nrst0 for the 8-mm tape drive number 0. To input or output data from or to standard i/o (the keyboard), enter a backslash and carriage return in place of the device name. Terminate standard input with a ^D. Procedures for other courts are TBD.

When the end of an input tourney is reached, the library asks for the next input. Thus it is easy to add files or tapes together. When the last tourney is reached, enter "-1" for input and the program will quit. When the end of an output tape is reached (disk files should never have this problem) it prompts for the next output tape, so both input and output can be continued over many different tapes and files.

A FORTRAN interface will be much cleaner if we also have routines for getting points and bits. For consistency, we will therefore include C versions -- get_point and get_bit.

2.2. Writing Data

Assuming, again, a Unix C environment and assuming initialization is done, the following steps are used to write data:

  1. Call put_set(key). It returns a pointer to a space large enough to hold data for a set of type key.
  2. Fill the buffer space by assigning and incrementing the pointer. See the C manual for the "\(pl\(pl" operator and pointer usage. If the data are already in an array, see move_set() below.
  3. Repeat 1) and 2) until done. The routines will take care of output names and whether the output medium is filled. No bookkeeping needs to be done in the user program.
  4. When done, call put_set('0]'), then call tourneyfin(), which will do such things as rewind tapes.

As above, we will also need put_game, put_point, and put_bit to clean up the FORTRAN interface.

2.3. Describing New Data Formats

To create a new tennis format data set, assuming a Unix C environment, one follows the procedure below:

  1. Establish and document a set of set keys in coordination with TLG or a local project coordinator. This practice allows us to avoid conflicting use of set keys, which could cause confusion, especially if the tourneys containing the conflicts needed to be merged. It also facilitates maintenance of paper documentation, for the many users who prefer that.
  2. Include the tennis library (tennis.h and tennis.clib) in your program.
  3. For each new set, i.e., any sets which are not on the input tourney, give a pointer to the information needed for the set description in the metaset 0! . This info should be in one of the two files ("main" and "other") preserved in the metaset 0$ , see below. These data are passed to the library by calling function set_init with one argument, a character pointer to a string which is the name of a file containing all the info needed for the set descriptions. For example, the tG set would be initialized by:
 char *newpnt = "/home/thor/tlg/tG.set.dscr";
 where the file tG.set.dscr contains ascii along the following lines:
 BEGIN_GROUP = setdscr;
  setkey = "tG ";  /* two trailing blanks*/
  END_GROUP = gamedescr;
 END_GROUP = setdscr;
Section 4.1 contains a more complete description of the format of this info.

2.4. Keeping Track of Pedigree

In addition to the information needed for the metaset, as described above, the user must provide information needed to keep track of pedigree when writing a new tourney. In particular, the user must provide the self-documentation info needed in set 0[ , i.e., provide the name of the main routine, the name of an "other" routine, and, if appropriate (see section 2.5), the variable skiptm (tpnm is requested from the operator). The user will also likely provide names of files to be preserved in set 0$ -- optional but highly recommended. These data are provided to the library in the following fashion:

1) For the two most interesting files of source language in your program (main and 
 one other), save the file names in a set  0[  by calling setnmsav as in the 
 following example:


 This statement should be in either "main" or "other". These two files will be 
 copied into pedigree sets  0$  by the library.

2) For all other interesting source language files, the user should save the 
 source in a set 0$ by passing a pointer to a string with the complete filename.
 The pointer is passed with a call to function setsrcsav(pointer)  . For example,

 pointer = setmnpt;
 pointer = "/home/thor/galileo/gensrc/flux.h";

 These files should be short compared to buffer length, or broken into short pieces 
 with ^L characters (formfeeds); else they will be broken at arbitrary locations
 by the library functions. Only the last 256 characters of the filename are saved 
 on the tourney.
2.5. Time Organization

Typically the sets within a tourney will be time-ordered with sets specifying time bracketing sets which contain events or rate readouts or field measurements. Individual events, etc. will not usually be individually time labeled because of the overhead cost of storing so many time specifications. They are frequently labeled with a one or two byte integer time offset from the most recent time set.

One of the services to be provided by the library is the insertion of warning messages into the output tourney when data is processed from time that is listed in the warning database. For example, if the user is processing an interval from Jan 28 thru June 11, the library might insert, near the March 11 data the message (in tennis set form) that the front detector was noisy around noon. This message would also be copied to the user's terminal.

Some users have expressed an interest in direct access to data. It is certainly conceivable that one could create a collection of pointers that allowed direct access to individual sets or time sets within a tourney. I judge that to have excessive overhead. This specification does, however, allow for access to individual time periods which are assumed to be relatively lengthy, perhaps a day or so.

If the user wishes to use this feature, the time period must be specified with a variable named skiptm in the 0[ set, and the tourney must contain recognizable time sets. In order to be recognizeable, a time set must have a key starting with upper-case T, the two bytes following the key must contain two more T's, and the time is specified in ISO standard format in the first game of the set. An example is given in section 4.1, User Sets.

The ISO standard time format is an ascii string: YYYY-MM-DDThh:mm:ss[.fff] or YYYY-DDDThh:mm:ss[.fff], where T is a delimiter and [ ] implies optional. See examples in section 4.2.1.

Alternative systems, with much better direct access capabilities, are described in section 5.4.

2.6. Examples

2.7. Function Lists

The tennis library is included by "-ltennis.clib" when compiling, and #include in the program. The following figure illustrates the relationship of the user program, the operating system, and the tennis library.

  | tennis.h |  user   _______      |
  |          |  user functions|_____|
  |          |  internal functions  |
  |          |  system functions    |
  |     Operating System            |

The tennis library consists of the four pieces: the user functions, the internal functions, the system functions, and the tennis.h include file. The user functions, listed below, are those functions which are normally used by the user. Internal functions are available to the user, but are not expected to be needed. The user is expected to avoid using system routines, which may be dependent on the operating system. The names of internal functions start with the two characters t_ and the names of system functions start with the three characters ts_ . All O.S. dependent functions are to be isolated into the system routines of the tennis library. The tennis.h file consists of various definitions which might make the user's program more readable. For example, one might define:

#define EOT "0]" /* end-of-tourney metaset */

As the figure illustrates, normally the user calls only user functions, but the user can call internal functions. System functions should be called only by the internal functions. All modules have access to the definitions in tennis.h.

2.7.1. User Functions

The list of functions includes:

2.7.2. Internal and System Functions

2.7.3. Multiple Unit Functions

There are also routines for handling multiple (two) input and output units. Generally speaking, they are the same as the above, with the letter "m" prepended and an extra argument for the unit number. Unit numbers can be 0 or 1. Dialog with the user/operator will assign physical devices to the units as above (section 2.1). If there are two input tourneys, any sets that are on both tourneys and have the same key must have the same format if they are going to be used.

3. Utilities

Utilities to be furnished along with the subroutine library would include: Verify: Reads a tourney and prints a summary of its contents -- start time, end time, and gaps; total length in time units, bytes, matchs, etc.; numbers of sets of various sorts; any warning sets on the tourney, etc.

Browse: Similar to verify, but interactive and capable of looking at the documentation in the metasets and of dumping sets.

Enrecord & Derecord: Read a tourney which does not observe the convention requiring alignment between physical records and matchs and write one which does or vice-versa.

Emailsend & Emailrecv: Possibly redundant. Send a tourney out over the network or receive from the network and write a tourney which observes record/match alignment conventions.

Getsrc: Reads all the source language sets off a tourney into files in the current directory. Filenames are prepended to the actual text, inside the file. The new files are names according to which sets they came from -- 0$ , 1$ , ....

Merge: Reads two tourney and outputs one, in time order. Assumes that each of the two input tourneys are in time order.

Split: Reads one tourney and outputs one or two, with specified sets going to the specified output unit.

File Database: Finds new tourneys on a disk and prepares an index of the information in pedigree sets 0[ and 0] .

Index Database: For a particular tourney, prepares a separate file of pointers to records with time sets separated by interval skiptm as specified in the set 0[ .

4. How the Standard Works: Implementation & Layout

As previously stated, each tourney has a metaset containing information describing the format of the other sets. The library uses this information to perform the functions specified in the Interface description.

We will describe the sets in three groups, user sets, for which hypothetical Galileo data sets will serve as examples; metasets of which the 0! set which describes the format of other sets is the most important; and pedigree sets which specify the source of the data, such as set 0[ .

Sets with characters chosen from the NASA SFDU PVL non-alphanumeric, non-reserved list (see appendix) in the second byte of the key (the project) are "internal" sets (metasets or pedigree sets) and are not normally of interest to the user. They are generated automatically by the library. Note that these "internal" sets are pure ascii and will (presumably) never need translation due to computer type change. A sequence of 0! sets near the beginning of a tape defines all the user sets and most of the metasets and pedigree sets. This sequence will contain [[ and ]] sets to mark off matchs and is preceded by 0[ sets (also marked off by [[ and ]] sets). Thus, these sets -- 0[ , [[ , ]] -- must be defined a priori. The 1[ , 2[ , etc. which have the same format as the 0[ are also defined a priori. In addition to the self-documentation provided by the sets 0! , documentation for all sets should be in a file maintained by TLG.

The following is intended to be a complete list of internal sets with comments on where additional description is found

0[   This metaset marks beginning of tourney. It also contains pedigree of this 
     tourney and specifies skiptm.
1[   Information from  0[  sets on input tourney. Pedigree set. 
n[   Information from  (n-1)[  sets on input. Same format as  0[ . 
0!   A metaset, specifying format (set, game, point, bit lengths and offsets) for a 
     particular set. Several will be required for a tourney.
0]   A metaset which specifies end of tourney. 
1]   Output when a set  0]  is found in the input tourney. Pedigree set. 
n]   Output when a set  (n-1)]  is found in the input. 
[[   Beginning of match metaset. Marks beginning of physical record or synch flag
     for electronic transmission/storage media (or other non-record oriented media).
]]   End of match metaset. Specifies end-of-physical-record or end of data within a
     physical record of fixed size. Also used for synch detection with electronic
     transmission/storage media. On standard tape allows variable length records.
1?   Output if there is a unrecoverable read error on the input tourney. Pedigree 
     set; format TBD.
2?   Output if there is a set  1?  on the input tourney. Pedigree set.
n?   Output if there is a set  (n-1)?  on the input. Same format as  1? .
0$   Pedigree set containing source listing of programs that created this output
     tourney and description of all variables in sets. User must furnish file names 
     to library functions. An example of a long set.
1$   Copy input  0$  onto output tourney with new key. Pedigree. 
n$   Copy  (n-1)$  onto output tourney with new key. Same format. 
@@   This pedigree set specifies change of computer type. Computer changes are 
     needed if mixed, untranslated computer data formats occur on same tape. Not
     recommended. Format TBD.
4.1. User Sets

Sets whose "keys" are ascii alphabetic characters are used for user data. By convention, the second character identifies the project and the first specifies a type of data within that project. Thus each project is limited to 52 (upper and lower case) types of data. Likewise, we are limited to 52 projects, if we avoid duplication of labels. Numerical characters and some non-alphanumeric characters can be used (but only after consultation with TLG, please) in cases where that limit presents a problem. We also have maintained the option of expansion to 4-character keys.

Sets are blocked together in the buffer and on the court, forming a sequential list. When putting sets, if a set threatens to overflow the buffer, the buffer is written out to the court and the set goes near the beginning of a new buffer. Note that some computers may require 8- byte variables (double precision floating point) to be aligned on 8-byte boundaries. In order to avoid alignment problems, it is conventional to make set lengths a multiple of 8 bytes. While one can frequently disregard this convention safely, it will almost certainly cause trouble if set lengths are not multiples of 4.

Our first example is a short set with four games, the tG set contains a time label, which specifies the time at the beginning of the instrument subcom cycle and the rate, status, and spin angle data associated with that cycle. Note that we used a lower case t in the key; this set does not have ISO format time in it and is not compatible with the skiptm "direct access" indexing scheme. The set 0! which describes this set tG is shown in section 4.2.2.

  Set tG                        | Galileo time (example only) 
 Game#   Name   Length   Index  |      Comments 
         key       2        0   | key = tG 
  1      TIME     12        2   | Time at beginning of instrument cycle. 
  2      STAT      8       14   | Status read out during cycle. 
  3      RATE    256       22   | Rates read out during cycle. 
  4      ANGL     10      278   | Angle parameters during cycle. 

  tG:  Game 1            TIME 
 Point   Length   Index  Comments 
 timtyp   1*A        0   S for SCET OR E for ERT 
 errflg   1*A        1   G for good or B 
 msec     1*S        2   0 to 999, millisecond of second 
 sec      1*i        4   seconds since start of 1989 
 sc_clk   1*i        8   see jpl doc'n of spacecraft clock 

  tG:  Game 2            STAT 
 Point   Length   Index  Comments 
 swa      1*b       0    status word a bit pattern 
 swb      1*b       1    status word b bit pattern 
 swc      1*b       2    ditto 
 swd      1*b       3  
 swe      1*b       4  
 swf      1*b       5  
 dqfl     2*b       6    16-bits of data quality flags, see jpl doc 

  tG:  Game 3            RATE 
 Point   Length   Index  Comments 
 ratea1    1*S       0   first readout of a scaler, negative flags prob 
 ratea2    1*S       2   second readout of a scaler 
 rateh16   1*S     254   16th readout of h scaler 

  tG:  Game 4            ANGL 
 Point   Length   Index  Comments 
 aqflg     1*A       0   quality flag, G or B 
 spare     1*A       1  
 offset    1*F       2   spin angle(time) = offset + arate*time 
 arate     1*F       6      " 
The next example is another time set, now a TG set which does have the ISO format time that can be used to create an index. It has two games: ISO time is in the first and the second is a duplicate of the game 1 of set tG .
  Set TG                         Galileo ISO Time (example only) 
  Game #  Name   Length   Index  Comments 
          key       2        0   key = TG 
          spare     2        2   two blanks 
  1       ISOTM    24        4   Same time in ISO format. 
  2       TIME     12       28   Time at beginning of instrument cycle. 

  TG:  Game 1            ISOTM 
 Point   Length   Index  Comments 
 timiso   24*A      0    Same time as game 2, ISO format, blank pad. 

  TG:  Game 2            TIME 
 Point   Length   Index  Comments 
 timtyp     A       0    S for SCET OR E for ERT 
 errflg     A       1    G for good or B 
 msec       S       2    0 to 999, millisecond of second 
 sec        i       4    seconds since start of 1989 
 sc_clk     i       8    see jpl doc'n of spacecraft clock 
The next example is an eG set, a short set containing a single cosmic ray event composed of 12 bits of "tag" (discriminator) information and three 12-bit pulse height channel numbers. These data are labeled with spacecraft clock offset relative to the time in the tG set. The set is made up of one game, containing the five points just mentioned. Each of these points is stored in a 16-bit integer, an "unsigned short integer" in VAX C usage. Note that the storage overhead of adding a key to this short set is not trivial and observing the eight-byte boundary convention costs even more, but the alternatives (see example set EG below) are noticeably more complex. On most computers it would be ok to omit the 4 spare bytes. This omission is also safe if there are no 8-byte floating point numbers in any of the sets (frequently true).
  Set eG                         Galileo Cosmic Ray Event 
  Game #  Name   Length   Index  Comments 
          key       2       0    key = eG 
          spare     4       2    four ascii blanks 
    1     EVENT    10       6    Pulse height event. 

  eG:  Game 1            EVENT 
 Point   Length   Index  Comments 
 cntoff    S        0    Number of clock counts since time in tG (0-90)
 tag       s        2    Tag bits -- which discriminators fired 
 pha3      s        4    Pulse height from pha3 
 pha2      s        6    Pulse height 
 pha1      s        8    Pulse height 
The EG set contains a variable number (n) of events, all from a single instrument cycle. Each event is in a game, containing the same five points as above. There are up to 48 events in an instrument cycle, hence, up to 48 games in the set, plus a "control" game at the beginning and a "control" game at the end of the set. Conglomeration of very short sets in this fashion to avoid storage overhead adds one more level of looping to any program which reads the data -- inside of the loop which reads all sets is another loop which reads all events within that set.
  Set EG                           Galileo Events (example only) 
  Game #  Name   Length   Index    Comments 
          key      2        0      key = EG 
          spare    2        2   
    1     BCNTL   12        4      Specifies beginning, length of this set
    2     EVENT   10       16      First non-null event in instrument cycle
    3     EVENT   10       26      Second non-null event in instrument cycle
  n+1     EVENT   10   16+10*(n-1) Last (nth) non-null event in the instrument cycle 
				   which began at the time specified in tG
  n+2     ECNTL   16     16+10*n   Specifies that the set is ending. 

  EG:  Game 1            BCNTL 
 Point   Length   Index  Comments 
 bsynstr   4*A      0    synch string is ]V[B 
 n          s       4    number of events in this particular set 
 setlen    4*A      6    number of bytes (32+10*n+padlen) in this particular 
			 set, ascii integer, blank pad

  EG:  Game 2            EVENT 
 Point   Length   Index  Comments 
 cntoff     S       0    Number of clock counts since time in tG (0-90)
 tag        s       2    Tag bits 
 pha3       s       2    Pulse height from pha3 
 pha2       s       2    Pulse height 
 pha1       s       2    Pulse height 

  EG:  Game n+2            ECNTL 
 Point   Length   Index    Comments 
 flag       S       0      -1 where cntoff might be expected 
 pad    padlen*A    2      padlen blanks, for the 8 byte set length convention.
			   See below.
 setlen   10*A   2+padlen  as above 
 esynstr   4*A  12+padlen  synch string is ]V[E 

The user should arrange a unique flag value for the ECNTL game, but this arrangement is redundant since n is available from the BCNTL game. Note that n must be known and output at the beginning of the set. Buffering can be done in the put_set area since sets are not split across buffers/matchs. The length of blank padding, padlen is given by: padlen = 8 - ((10*n) mod 8) where 10 should be understood as the length of the game and is found by taking the difference between the gamepnt's of the EVENT and ECNTL games. Zero is not allowed. Once padlen is calculated, the set length, setlen, can be gotten from: setlen = 10*n+32+padlen Note that bsynstr is 4 bytes after beginning of set, as is the case for all special sets that have synch strings. Note the placement and use of ascii for setlen. These factors should be considered part of the protocol for short, variable-length sets. Note that the user need not use the setlen variable, hence, need not translate it to binary. Also note that the user might well not need n; the -1 in game ECNTL serves a similar purpose.

The set 0! for the EG includes:

 setkey = "EG ";
 setname = galileo_events;
 setlen = -1; /* flag for variable */
 setyp = svl;
 gamecnt = -1; /* flag for variable */

 gamename = BCNTL;
 gamepnt = 4;
 bsynstr = ]V[B;

 gamename = EVENT;
 gamepnt = 16;

 gamename = ECNTL;
 gamepnt = 26;   /*   +(n-1)*10   */
 esynstr = ]V[E
Finally, an example of a long, fixed-length set (setyp=lfl). Postulate a galileo image file with a length of 639,100 bytes and details of the file are unspecified (therefore no automatic translation is possible). Keep in mind that this image might be better handled as a file in SFDU format or an HDF or a CDF, in addition to the possibility of logical subdivision already mentioned.

  Set IG                             Galileo Images 
  Game #  Name   Length   Index      Comments 
          key               0        key is ascii string IG 
          spare     6       2        six blanks 
    1     CNTRL   288       8        locates the data 
    2     DATA   nbyte    296        contains the data (32000) 
                        nbyte+296    (32296) 

  IG:  Game 1            CNTRL 
 Point   Length   Index  Comments 
 seqno    4*A        0   Sequence no. of this set in the ncpf sets. 1, ... ncpf
 ncpf     4*A        4   Number of sets required to contain this file
 nbyte    8*A        8   Number of bytes in DATA. 
 fillen   8*A       16   Number of bytes in the image file. 
 filpad   8*A       24   Number of bytes pad needed to produce integer multiple 
			 of nbyte.
 flnm   256*A       32   Full name of the file 

  IG:  Game 2            DATA 
 Point   Length   Index  Comments 
 txt    nbyte*b     0    A section of length nbyte from a data file of length 
			 640000 (setlen). In this example, the data are all 
			 one-byte unsigned integers.  nbyte   
The set 0! for this set will include statements like:
 setkey = "IG ";
 setname = galileo_image;
 setlen = -32296;  /*   note the minus sign   */
 setyp = lfl;
 nbyte = 32000;
 fillen = 639100;
 filpad = 900;
 ncpf = 20;
 gamecnt = 2;

 gamename = CNTRL;
 gamepnt = 8;

 gamename = DATA;
 gamepnt = 288;

 pointyp = 8000*b;
It is assumed that the pattern specified (8000*b here) is repeated throughout the data file. If possible, make nbyte a multiple of the length of that pattern. Automatic translation of representations may well not be possible for complicated patterns. Note again: if the pattern is of reasonable length, it is probably better for the user to make multiple short fixed-length sets of that length, rather than these huge sets.

4.2. Metasets

The 0! metaset specifies format info for most other sets, including all user sets. The 0[ serves both metaset (specifies beginning of tourney and skiptm) and pedigree set functions and is described here. Other marker metasets include 0] (end of tourney), and [[ and ]] (beginning and end of match).

4.2.1. Sets 0[ and 0] : Beginning and End of Tourney

Set 0[ is the first data on the tourney. It is in SFDU Parameter Value Language after the synch characters. Note that white space -- blanks, tabs, carriage return, line feed, vertical tab, and form feed are ignored outside variable or value strings, and can be used to improve readability. Within variable names, white space is not allowed. Within values, white space is allowed only if quoted. In general, values must be specified with a restricted ASCII subset. Comments are specified inside /* */ pairs as in C and are also ignored. The general format is variable = value with semicolon separators. The variables are as listed in the example below. The set has a fixed length of 4000 bytes. It is padded with white space, preferably blanks, after the last semicolon. After the 4000 bytes comes a set ]]. No more data is put in this record, since the next set (after the [[ ) will be a 0! , which is usually too long to fit in the same match.

 0[  ]S[syBOT;
 BEGIN_GROUP = trnydscr;  /* PVL from here down */
 bfsz  = 32768;
 cmptyp = VAXII;  /* names should be chosen from a list for consistency */
 cmpos = LTRX4.1; /* ditto */
 cmpnm =;
 trnm  = galileo/data/lib/1989-294.297; /* tourney name */
 trdt  = 1991-05-02T05:14:23;
 lbnm  = /home/odin/usr/lib/tennis.clib;
 lbdt  = 1991-03-29T14:32:00;
 mnnm = /home/odin/galileo/gensrc/flux/main.c;
 mndt  = 1991-05-01T13:17:02;  /* fixed oxygen limits */
 othnm = /home/odin/galileo/gensrc/flux/zcalc.c;
 othdt  = 1991-05-01T13:33:56;
 skiptm = 0000-00-00T12:00:00; /* 12 hours */
 END_GROUP = trnydscr;
Note that the time is in ISO standard time format, i.e., YYYY-MM-DDThh:mm:ss[.fff] or YYYY-DDDThh:mm:ss[.fff], where T is a delimiter and [ ] implies optional.

The set 0] contains match and set counts for the tourney which it terminates and has the following format. Provision is made for "only" 64 different sets on the tourney. Additional sets will be ignored.

  Set 0]                          End of Tourney 
   Game #  Name   Length   Index  Comments 
           key       2       0    key is ascii string 0] 
          spare      2       2    2 blanks 
     1    MSTCNT   916       4    Match and set counts. 

  0]  Game 1            RECCNT 
 Item   Length   Index  Comments 
 synstr   8*A       0   synch string is ]S[syEOT 
 mtcnt   12*A       8   Count of matchs on tourney, ascii integer, padded with white 
 stky     2*A      20   key of set whose count follows.
 kycnt   12*A      22   Number of sets with key stky on this tourney.
 stky     2*A     902   key of set whose count follows.
 kycnt   12*A     904   Number of sets with key stky on this tourney.
4.2.2. Set 0! : Set Descriptor

The set 0! follows the 0[ set on the tape (ignoring match markers); it is in Parameter Value Language also. Its format is very like that of the 0$ , i.e., it is a text set with variable length. Maximum set length is 30,000 bytes. An example of a set 0! which describes a set tG follows. The set was introduced in section 4.1

Note that the pointyp, cmptyp, and setyp variables are chosen from a list to ensure consistency. Those lists are in the appendix. The point types include, for example, A for ascii character S for signed short integer, s for unsigned short, ....

 BEGIN_GROUP = setdscr;
  setkey = "tG "; /* two trailing blanks */
  setname = galileo_time;
  setlen = 288;
  setyp = sfl;  /* choose from sfl, lfl, lvl, ... */
  setext = "time, status, rates, angle for one rate/status subcom cycle";
  gamecnt = 4;

  BEGIN_GROUP = gamedscr;
  gamename = TIME;
  gamepnt = 2;
  gametext = "time for beginning of rate subcom cycle";

  BEGIN_GROUP = pointdscr;
   pointnm = timtyp;
   pointpnt = 0;
   pointyp = A;  /* ascii */
   pointext = "ASCII encoded logical variable; S for SCET,
     E for ERT.";
  END_GROUP = pointdscr;

  BEGIN_GROUP = pointdscr;
   pointnm = errflg;
   pointpnt = 1;
   pointyp = A;
   pointext = "ASCII encoded logical variable; G for Good,
      B for Bad.";
  END_GROUP = pointdscr;

  BEGIN_GROUP = pointdscr;
   pointnm = msec;
   pointpnt = 2;
   pointyp = S;  /* signed short integer */
   pointext = "0 to 999, millisecond of second.";
  END_GROUP = pointdscr;

  BEGIN_GROUP = pointdscr;
   pointnm = sec;
   pointpnt = 4;
  END_GROUP = pointdscr;
  END_GROUP = gamedscr;

  BEGIN_GROUP = gamedscr;
  gamename = STAT;
  gamepnt = 14;
  BEGIN_GROUP = pointdscr;
   pointnm = swa;
  END_GROUP = pointdscr;
  END_GROUP = gamedscr;

  BEGIN_GROUP = gamedscr;
  gamename = RATE;
  gamepnt = 22;

  BEGIN_GROUP = pointdscr;
   pointnm = rate_sclr;
    pointpnt = 0
   pointyp = 128*S;
   pointext = "An array of 128 short, signed integers specify rate
     scaler readouts. The sequence is a1, a2, a3,
     ..., h16. The letter refers to which scaler,
     the number tells which subcom state.";
  END_GROUP = pointdscr;
  END_GROUP = gamedscr;
 END_GROUP = setdscr;
Sets 0! may have a length exceeding the 30,000 bytes specified above. The library will split them across match boundaries just as for the 0$ . Split locations can be specified by the user by inserting form-feed characters (^L).

4.2.3. Sets [[ and ]] : Match Markers

These sets are used to mark off beginning and end of matchs, which will normally be aligned with beginning and end of record.

  Set [[  Beginning of match 
    Game #  Name   Length   Index  Comments 
            key       2       0    key is ascii string [[ 
            spare     2       2    two blanks 
      1     RECSQ    20       4    match sequence number 

  [[   Game 1            RECSQ 
 Point   Length   Index  Comments 
 synstr   8*A        0   synch string is ]S[syBOM 
 rcsq    12*A        8   Sequence number of this match on this tourney.  an ascii 
			 integer padded with blanks.

  Set ]]                           End of match 
    Game #  Name   Length   Index  Comments 
            key       2       0    key is ascii string ]] 
            spare     2       2    two blanks 
      1     RECLN    20       4    Match length 

  ]]   Game 1            RECLN 
 Point   Length   Index  Comments 
 synstr   8*A        0   synch string is ]S[syEOM 
 rcln    12*A        8   Number of bytes of data in this match, including the match 
			 markers themselves. An ascii in- teger padded with blanks.
There is a set [[ preceding the set 0[ at the beginning of the tourney and there is a set ]] after the 0] at the end of the tourney.

4.3. Pedigree Sets

The pedigree sets include the 0[ described above, and the 0$ set, which contains source language from the program that created the tourney.

  Set 0$                           Source Language 
    Game #  Name   Length   Index  Comments 
            key       2        0   key is ascii string 0$ 
            spare     2        2   two blanks 
      1     BCNTL    20        4   specifies length of text game 
      2     TEXT   texlng     24   contains actual text 

  0$   Game 1            BCNTL 
 Point   Length   Index  Comments 
 bsynstr   4*A       0   synch string is ]$[B 
 seqno     4*A       4   sequence no. of this set in string of sets containing the 
			 text file
 ncpf      4*A       8   total no. of sets used to hold the text file
 texlng    8*A      12   No. of characters in the TEXT game. Need not be a multiple 
			 of 8.

  0$   Game 2            TEXT 
 Point   Length   Index  Comments 
 txt    texlng*A    0    actual text 
The 0$ set occupies a match of its own due to its uncertain, presumable large length. Thus the 8-byte convention and padding can be ignored.

When a pedigree set is encountered on an input tourney, it is copied by the library onto the output tourney. For example, a 0[ is copied; its key is changed to 1[ to indicate generation level. In general, A n[ set is written to the output tourney whenever a (n-1)[ is encountered in the input tourney. The number in the set key is incremented by one each time to indicate generation level. When 9 is reached, quit incrementing. Thus, pedigree sets also include the sets 1[ , 2[ , ... 9[ . Note that the synch string in the 0[ must be replaced by blanks when writing a 1[ .

Similarly, sets 1] , ... 9] record input of 0] sets and 1$ , ... 9$ sets record input of 0$ sets. A set 1? is output if there is an unrecoverable read error on the input tourney; later generations show 2? , ... 9? . Except for the 1? set, these sets are all identical in format to their obvious progenitors. Formats for 1? and @@ are TBD.

5. Why to Use the Standard: Justification and Tradeoffs

5.1. Short Tourneys

5.2. Long User Sets

The tennis standard was invented for short, fixed-length sets, and clearly excels for that type of data. However, some sets will have to contain long alphanumeric strings of text for documentation purposes. Set 0$ contains source listing for the programs which create the tape, and might be 50,000 to 100,000 bytes long or more. Since these sets don't fit neatly into a single physical record or buffer, it is necessary to break them into pieces. The size of the pieces should reflect the data properties rather than be fixed, i.e., breaks in sets 0$ might be put where page breaks (formfeeds or ^L) occur in listings.

Sets with long variable length are broken into pieces shorter than one match. Set length (setlen) is specified as -1 in the set description in the 0! set. The game containing the actual data has its length specified in the first game and is also terminated by an end of match marker, a special set.

Consider, on the other hand an 800 \(mu 800 \(mu 8 bits image. The logical block length is \(ap640 Kbytes, much longer than any reasonable physical record length, but fixed. This logical block would have to be spanned across \(ap20 to 80 records or matchs. These longer data sets could, of course, always be broken down into smaller logical blocks and this procedure is likely the best available. Thus the image mentioned above could be represented by 800 sets, each containing one line.

Long fixed-length sets (setyp = lfl) are broken into pieces which will normally occupy a full physical record or match, i.e., the pieces are the size of the buffer or slightly smaller. Information is provided so that the library function get_set can reassemble the logical record (file). See the example sets IG in section 4.1.

Also, consider Galileo or Voyager cosmic-ray events -- they are only 48 bits long, so adding a 16- or 32-bit key to each one seems inefficient. One might prefer to gather them up in groups, with each group corresponding to some particular time interval, such as an instrument cycle. In that group we may well find that most events are null and should be omitted for efficient storage. In that case, we end up with groups of variable length, but still short compared to expected match lengths. (Galileo, for instance, has up to 48 events telemetered per instrument cycle.)

Short variable-length sets (setyp = svl) will be marked off by synch strings and will contain a specification of their length following the opening synch string. They may contain either one data game of variable length or a variable number of identical data games. See the example EG.

Note the alternatives mentioned in section 5.4 for dealing with large sets.

5.3. Platform and Medium Independence

The tennis metasets are pure ascii and should be readable on any system. The data they describe, when binary, is generally in the native format of the system which wrote it, and that native format is known, since the computer system is identified in the set 0[ . Thus the library can translate without user assistance.

The library can be programmed to read data from any reasonable medium of storage, in particular, it should have the ability to read tape, disk, Unix pipes, and probably network connections. Direct access, of course, is only possible on disk and is not generally considered a feature relevant to tennis. The get_time function can do a fast-forward sort of operation. Direct access of data on disk would appear to be straightforward, requiring "only" creation of an index and addition of the appropriate library functions. Since the index could be separate from the tennis data itself, it does not require modification of the tennis format standard.

5.4. Alternative Possibilities

Alternatives to writing our own standard including the three "popular" standards of national standing: NASA's SFDU and CDF standards, and NCSA's HDF.

The Standard Formatted Data Unit is a very "loose" standard, basically specifying a means for documentation of packet like data units. The typical unit is megabytes long, with kilobytes of documentation, and dozens of bytes of overhead. It is too much overhead for a typical tennis set, which might be as short as, for example, 16 bytes. However, it is possible to make a tennis tourney look very much like a SFDU, so that translation to SFDU format is trivial.

The NASA Common Data Format is intended for storage of large arrays, and is not appropriate for lists of structured sets.

The National Center for Supercomputing Applications' Hierarchical Data Format looks similar to tennis in that it can support structures and short data sets. It has significantly more overhead (20 bytes per equivalent of a set) and is normally used for large sets like images. All of their free application support is image/array oriented. It is clear that HDF could replace tennis, but at significant cost. The two are sufficiently similar that translation is easy.

6. Appendix: Various Lists

Format specification of points in each user game is necessary to allow automatic translation. This is done with the variable pointyp using the abbreviations listed below,

  A  ascii character 
  B  one byte integer 
  b  one byte unsigned integer or bit pattern 
  S  short integer -- nominally 16 bits 
  s  unsigned short integer. 
  I  integer -- nominally 32 bits 
  i  unsigned integer 
  E  extended integer -- nominally 64 bits 
  F  native floating point -- 32 bits 
  D  native double precision floating point -- 64 bits 
The cmptyp variable is chosen from a list that currently includes: PDP11 , VAXII , SUN3 , SSPARC . Obviously more format types can be added as needed. In particular, I think there are non-proprietary "standard" data representations including IEEE and Sun XDR, which are used on most RISC machines.

 The setyp list includes:

  sfl  short, fixed-length 
  lfl  long, fixed-length 
  lvl  long, variable-length 
  svl  short, variable-length 
The SFDU PVL non-alphanumeric, non-reserved list includes all the characters below:
&  *  ^  :  @  $  !  /  %  +  ?  [  ]
There has been talk of reserving the [ and ] characters; this does not appear to affect our usage, since we always quote these characters.