========== FORMAT.TXT ========== UNITRET datafile structure (unified old/new system format) Igor Kagan, July 2002 [part of this file is based on DATAM.PHD: chas 21march95 - see it for description of old system] The Max Snodderly lab's monkey data are stored in binary files, using the format described below. The experiments involve a series of ~5 s trials, run in batches of 10 or so. A batch is called a "trial-set". Most of the experimental parameters remain the same throughout a trial-set; therefore, we store the data of an entire trial-set in one file. The name of a trial-set is the base name of the file which contains it. Trials in a trial-set are numbered, starting from 1. &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& FILE STRUCTURE: DATA STRUCTURE ASPECTS &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& A tabular listing of the file structure is found further in this doc file; here I give a description of what is going on. The meaning of the fields of scientific interest -- those that store actual parameters and data -- is found in the next section. A data file is composed of several blocks. There are 3 blocks that start the file, followed by the several trials, each of which is composed of several blocks in turn. These first three blocks are the file header, the file specification block, and the comment block. The file header holds information about the blocks of the file, the file specification holds some information about the experiment that can never change between trials, the comment block holds text typed by the experimenter. To help re-sync things if something goes wrong, all blocks are separated by a 4-byte code (value: hexadecimal 77777777, equivalent to ASCII "wwww"). The first block is the FILE HEADER. It starts with a version number, which was 1 through Summer 93. It is currently 2, and has that value in the converted files from Summer 93. Remember, this is a binary file, so this is not the character '2', but binary 2. The version number is followed by the length of the file in bytes (a 4-byte integer), and then a number of short (2-byte) integers giving the number and lengths of the various blocks, including the number of trials. Some values are included even though they are never expected to change, just in case. For example, I'm sure that we will never have 2 file specification blocks, but I still store the number of them (1). Notice that the actual number the values in the file header depends on these block counts; for example, the length of the list of offsets to the trials is equal to the number of trials. All block lengths and offsets are in bytes; thus, the stored length of a block of short integers has to be divided by 2 to get the actual number of values. The FILE SPECIFICATION BLOCK contains values shared by all trials in the trial-set, such as the date the experiment was run and the data acquisition rates. The COMMENT BLOCK contains text typed by the experimenter. If it has length 0, there will be two separators next to each other. In any case, this is followed by ... THE INDIVIDUAL TRIALS Each trial starts with a TRIAL HEADER, which contains the number of the trial in the file, and the usual list of block counts and lengths. After this comes the TRIAL PARAMETER BLOCK, which contains most of the information about the stimulus given to the subject and any other information that might vary from trial to trial. After the parameter block comes the several data blocks. Currently there are three of these: two blocks of eye position data, and a block of spike times. These are all described in the next section. As much as possible, the specifications and parameters are stored in "real-world" values, e.g. minutes of arc. The acquired data are stored in raw form -- the actual values received from the data acquisition devices. As mentioned above, the conversion factors are stored in the specification block. &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& FILE STRUCTURE: SUBSTANTIVE CONTENTS (Experimental params, data blocks) &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& The contents of the file specification and trial parameter blocks are listed in detail below. TIME IN THE TRIAL. These trials run for about 5 seconds. There are many events during the trial, and a consistent method of refering to the timing of them is needed. The stimulus is produced on a video monitor. The trigger pulse that is coming from stimulus generator when it shows the first frame is the Zero time of the trial. All time values are measured with respect to this moment. DATA BLOCKS: Eye position data. There are (currently) two blocks of raw A2D analog data. These are the values received from the eye tracker, one for horizontal values, one for vertical. Currently, the values range from 0 to 4095, but there is no requirement that this be the case; the meaning of the values are indicated by fields in the file specification block. These fields are described in detail below. The number of these analog data is indicated in the trial header (note this is BYTES, and must be divided by 2). DATA BLOCKS: Spike data. The spike data are stored as a list of the times when spikes were detected, using 4-byte ("long") integers. A zero value indicates a spike at the same time as the first video frame's sync signal. A value of 1 indicates a spike 1 "spike timing unit" after that. The length of these units is also stored in the trial specification block. Currently, for "Control" system files, the units are 10 microsec; for "Anal" system files, they are 200 microsec. &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& FILE STRUCTURE: LIST OF BLOCKS & THEIR CONTENTS &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& There are fields in the headers which indicate the number of repetitions of certain blocks further on in the file. The names of these fields start with the character '#', followed by a labeling number in parentheses. Further on in the file are the fields or blocks that get repeated. The correspondance is indicated by the labeling number in the Data Type column. Those fields whose values change so rarely that we pretty much know what their values are are so indicated by having the presumed value is in square braces on the right. BLOCK DATA TYPE | FIELD | | | | V V V **** File starts with **** FILE HEADER Version SHORT [value: 2] File Length in bytes LONG File Header length in bytes SHORT # (1) Global specification blocks SHORT [value: 1] # (2) Trials SHORT Comment length in bytes SHORT List of lengths of global specs blocks SHORTs (# 1 repetitions) List of trial beginnings (from file start) LONGs (# 2 repetitions) Separator LONG FILE SPECIFICATION BLOCK ** DESCRIBED BELOW ** Separator LONG COMMENT Separator LONG **** Then for each trial **** TRIAL HEADER Serial number SHORT Length of trial header SHORT # (3) of Parameter blocks SHORT [value: 1] # (4) of Data blocks SHORT [value: was 3, now 5] Lengths (5a ...) of parameter blocks SHORTs (# 3 of them) Lengths (6a, 6b, 6c, ...) of data blocks SHORTs (# 4 of them) Separator LONG TRIAL PARAMETERS ** DESCRIBED BELOW ** Separator LONG HORIZONTAL EYE POSITION: SHORTs (# 6a bytes of them) Separator LONG VERTICAL EYE POSITION: SHORTs (# 6b bytes of them) Separator LONG SPIKE ARRIVAL TIMES LONGs (# 6c bytes of them) Separator LONG SHAPE ARRIVAL TIMES LONGs (# 6d bytes of them) Separator LONG SHAPE VALUES SHORTs (# 6e bytes of them) Separator LONG **** End of a trial (repeat # 2 times) **** The separators have the value hexadecimal 77777777. This is the binary ascii equivalent of "wwww". File Specification Block Name of file 14 CHARS Date 10 CHARS Name of run module that produced it 10 CHARS Video frame period (ms) FLOAT Viewing distance (cm) FLOAT Time of first sample in frame (ms) FLOAT [ SEE NOTE 1 ] Number of analog samples / frame SHORT [ SEE NOTE 2 ] Visual field location, Horizontal (deg) FLOAT [ SEE NOTE 3 ] same, Vertical FLOAT Position of fixation LED, horizontal (min) FLOAT [ SEE NOTE 4 ] same, Vertical (minutes) FLOAT Eye-position gain, Horizontal FLOAT [ SEE NOTE 5 ] same, Vertical FLOAT Eyetracker arb definition FLOAT [ SEE NOTE 6 ] Eyetracker arb value for zero voltage SHORT [ SEE NOTE 7 ] EMPTY -- 2 bytes available SHORT Stablilization flag -- 0 for no locking SHORT [ SEE NOTE 7A ] EMPTY temporal type: flag SHORT [ SEE NOTE 8 ] EMPTY spatial type: flag SHORT [ SEE NOTE 8 ] Computer flag SHORT [ SEE NOTE 9 ] Date&Time of run file creation 18 characters Period of eye data acquisition, msec FLOAT [ SEE NOTE 10 ] Clock period of spike data acquisition, ms FLOAT [ SEE NOTE 10 ] Clock period of shape data values, ms FLOAT [ SEE NOTE 10 ] Trial Parameter block Time of trial 10 CHARS Trial Duration (ms) SHORT Duration of one stimulus action (msec) SHORT Time between stimulus actions (msec) SHORT Tilt -- box angle (deg) SHORT Box size, radial: horizontal if tilt 0 (min) SHORT Box size, perpendicular: vertical if tilt 0 SHORT X start position, in arc minutes SHORT [ SEE NOTE 11 ] Y start position SHORT Extent stim motion in 1 stim period (min) SHORT Velocity of stimulus, if defined (min/sec) SHORT Color code for standard values (flag) SHORT [ SEE NOTE 12 ] Foreground red intensity (candles / m^2) FLOAT Foreground green intensity FLOAT Foreground blue intensity FLOAT Background red intensity FLOAT Background green intensity FLOAT Background blue intensity FLOAT Element red intensity FLOAT Element green intensity FLOAT Element blue intensity FLOAT ** Applicable to Sinusoid & Gabor: ** Spatial freq: Cycles / Degree FLOAT Phase of red: Degrees of phase SHORT [ SEE NOTE 13 ] Phase of green SHORT Phase of blue SHORT Standard deviation (degrees of arc) FLOAT (GABOR & D6 only) Contrast FLOAT [ SEE NOTE 14 ] Temporal frequency: Cycles / sec FLOAT ** Applicable to REGULAR: ** Length of element FLOAT Width of element FLOAT Spacing in length direction FLOAT [ SEE NOTE 15 ] Spacing in width direction FLOAT ** Additional timing values ** Start of eye data acq, msec FLOAT [ SEE NOTE 16 ] Start of spike data acq, msec FLOAT End of spike data acq, msec FLOAT Timing Code: indicates validity INT [ SEE NOTE 17 ] ** Stimulus description flags ** Stimulus temporal type: flag SHORT [ SEE NOTE 18 ] Stimulus spatial type: flag SHORT [ SEE NOTE 19 ] Eye choice: flag SHORT [ SEE NOTE 20 ] ** Additional positioning info ** Sweep fraction FLOAT [ SEE NOTE 21 ] ** Shape data acquisition values ** Spike trigger method SHORT [ SEE NOTE 22 ] Spike value trigger voltage FLOAT Shape value trigger voltage FLOAT Shape value hysteresis voltage FLOAT Number of shape values per spike SHORT Shape value at trigger time SHORT Notes on parameter specifications: 1. Time of stabilization eye sample in frame. This field is the time into the previous video frame of the particular eye position reading that was used to provide stimulus position stabilization. It was once used to indicate the timing of the eye readings that are stored in the trial, but that is better obtained from the trial parameter "eye start". 2. Number of analog samples / frame This is nearly obsolete; the filespec field "eye_period" should be used to determine eye data timing. In CONTROL system trials, the value is 2. In ANAL trials, it is set to 0, as it has no meaning. 3. Visual field location. This is the position user puts in the menu, as opposed to start position. The start position differs by a user-setable fraction ("sweep_frac") of "extent". The visual field location is presuambly what the user thinks is the retinal area to be investigated. This value is the displacement from the fixation LED position, not from the lower-left corner of the screen. This spec was placed in the file spec block because I did not understand its purpose exactly -- it actually should have been put in the trial param block, because the user can change the position between trials. The value is entered at the start of the first trial of the trial-set, but it is probably wiser (after Summer 93) to make use of "sweep_frac" and "extent". Positive horizontal values mean the stimulus is to the right of the fixation LED; positive vertical values mean the stimulus is above the fixation LED. 4. Position of fixation LED. The horizontal value is the distance the LED is to the left of the left side of the video monitor; the vertical value is the distance the LED is below the bottom of the video monitor. 5. Eyetracker gain. These values and those in the next two fields are used to decipher the raw eye-position values stored in the analog data blocks. The gains are the actual analog voltages produced by the eyetracker (in mV) per minute of eye motion. These values are obtained empirically by training the subject to fixate at known targets. A more complete explanation of how to use these values is found in note 7. 6. Eyetracker digital "arb" definition This is the number of analog-to-digital converter arb values that correspond to 1 mV of analog input. This value is not obtained empirically, but rather from the specification of the analog-to-digital devices. It is entered in a menuline by the user. 7. Eyetracker digital "arb" zero This is the arb value that corresponds to a zero voltage input in either the horizontal or the vertical direction. We are currently using 12-bit A2Ds, whose output values range from 0 to 4095, with the center value of 2048 corresponding to the zero voltage. However, readers of these files should not assume that 2048 is correct, but use the value found in this field. Thus, the translation from the arb values we store to minutes of arc is: Horz angle = ( - ) / ( * ) Vert angle = ( - ) / ( * ) ( These fields were once used to represent the gain values of an amplifier attached to the eye-tracker. That amplifier has never been used with the PC system, and never will be. But there were values of 4.22 in these variables before summer 93. ) 7A. Stabilization flag: 0 No stabilization 1 Stabilization on every frame 2 On even numbered frames (using eye position taken in odd frames) 8. Old stim type flags. These have been moved to the individual trial specs. However, since they did have important values in them, I have not recycled them -- at least not yet. 9. Computer flag: 0 Control PC 1 Anal PC 10. Period of data acquisition clocks Note that these are periods, in msec. A value of .5 means 2000 Hz; a value of 50 means 20 Hz. 11. Start position. Unfortunately, the position stored here is not exactly consistent with the value stored under "visual field location" -- this one is the position of the center of the stimulus with respect to the LLC of the screen, rather with respect to the fixation LED. Again, the positive direction is right and up. 12. Color code. Not in use right now. 13. Color phases. With sinusoidal-like stimuli, the default is for the center of the stimulus to be at a maximum. This is indicated by a phase value of 0 for that R,G,B gun. A phase value of 180 means that the center is at a minimum for that R,G,B gun. 14. Contrast. The specification of patterned stimuli is based on the combination of the element & foreground colors, but what the user actually enters is a single CONTRAST value (applied to all 3 color guns). The element color parameters are then set automatically to (1 + contrast) * foreground_color The amplitude of the variation is contrast * foreground_color Contrast must be between 1. and 0., since a value greater than 1 would yield a trough value of less than 0. 15. Spacing of Regular Stim in length direction This is the length plus the gap between. 16. Start of eye data acquistion. The trial has an official start time, corresponding to the beginning of the stimulus, or the time of its first change. Eye and spike data can start at some other time, however. Since the eye data are taken at equal intervals, the time of their start, their frequency (see Note 10), and their number provide all the timing information we need. Spike data, however, requires both a start and end time, as a period of time with no spikes leaves no record in the spike train, but constitutes significant information. 17. Timing Code: indicates validity of start & stop times, 1 bits if: Bits: 0 Trial start signal was received 1 Trial length was determined by Mstar sample count 2 Trial end signal was received 3 Spikes overflowed space allocated Note that if bit 0 is not set (if its value is 0), it is impossible to determine the relationship between the stimulus and the acquired data for the trial in question. We will probably discard such trials before they are stored; but this field allows the file spec to describe such trials, without making them look more valid than they are. 18. Stimulus motion type. Values: 0 No motion 1 Alternating, back and forth, continuous motion. 2 Flashing, jumping by "extent" each time in turns on. 3 Repeating -- continuous, but going back to the start each time 19. Stimulus pattern type. Values: 0 Solid, tilted rectangle 1 Sinusoidal pattern 2 Gabor 3 Sixth derivative of gaussian. 4 Regular textured pattern 5 Random (not yet implemented) 20. Eyes that were stimulated (not covered). Values: 0 No eyes: i.e. complete darkness 1 Left only 2 Right only 3 Both eyes 4 Not recorded which one(s) were used. Presuambly both. 21. Sweep fraction This field is defined for any moving stimulus. It indicates the location of the stimulus at the place that we are interested in. The stimulus usually starts elsewhere and sweeps through this location. The value is the fraction the "extent" that separates the start location from the location of interest. Actually, the "Visual field location" ought to indicate this location, but it is too hard to compute this value from it when graphing, etc. Besides, the "visual field location" is perhaps unreliably recorded. This field was not stored before February 94. Trials recorded Sept 93 have this field, but don't depend on it. 22. Flag for shape triggering method: -1 = no shape values recorded 0 = use level detector 1 = use level detector's reference voltage 2 = use menu determined reference voltage &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& DATA FILE FILENAMES &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& The filenames used by data files contain information about the trials whose data are included in the files. This is done by dividing the file names into several fields. Some of these fields contain information about the trial, some provide a unique name for each file and the trials it includes. Here is how the filename base is divided up: 1 2 3 4 5 6 7 8 |--------'--------'--------'--------|----------|--------'--------'---------| | Year Month Day Day | Stimulus | Serial number | |--------,--------,--------,--------|----------|--------,--------,---------| Here is the file extension: 1 2 3 ------------'----------'----------- | File-type | Number of trials | ------------,----------,----------- The first 4 characters of the base name give the day the trials were run. Through Summer '93, this discipline was not maintained by the computer; these characters were entered by the user. To the program, they are just a 4 character string. The first character is the last digit of the year. The second character is '1' through '9' for January through September, 'A' through 'C' for October, November, December. The third & fourth characters are the day of the month. Character # 5 indicates the type of stimulus that was used for the trials in the file. The encoding is: Unknown: _ Steady: S Flashing: F Back & forth: A (for Alternating) Repeating: R (back to the start each time) Characters 6 - 8 are a file serial number for the day. This number can be set by the user, but he should depend on the auto-incrementation that takes place each time a data file is written. The first character of the extension indicates the computer that made the file: 'A' for the Anal computer; 'C' for the Control computer. The letter 'R' (for raw-data) was used before Sept 93. Other letters can be used to indicate files produced by analysis software which might still want to use the same base name. The only such defined letter is 'H' for "human-readable" dump file. The encoding is: Data file, control computer C Data file, anal computer A Pre-'93 raw-data file: R Human-readable dump: H The last 2 extension characters give a decimal representation of the number of trials stored in the file. &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& RUN-TIME DATA STRUCTURE &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& THE GLOBAL TRIAL-SET At run-time, there is one trial-set data structure in the program. This corresponds to a trial-set data file; the data and parameter blocks are identical, only some of the header blocks are different. This internal format is a C structure; it contains an image of the file specification block, and pointers to the comment block, and to the separate trials. The individual trial data structures each contain a pointer to this trial-set specification block, so that access to a trial provides access to the trial-set specification. The trials are numbered from 1 on, to the last trial (= COUNT). No gaps exist; all trials from 1 to COUNT-1 must have valid data; the last trial may be empty while it is obtaining data. (C array indexes start at number 0, so trial # 4 is tr[3] inside the array.) Trials are obtained from the trial manager module (in TRM.C) by a function call; only the manager should know where the trials actually reside. TRIAL-SET STATUS The trial-set and each trial it contains hold a flag indicating their current status. The particular status of the trial-set constrains the possible status values of the contained trials: Trial-set Significance: Possible trial flag values: flag: EMPTY: No trials in memory (Not Applicable) FRESH: Data-acquision going on. All trials < count are FRESH. Last trial is FRESH or EMPTY. FILLED: Old data, a copy of which All trials < count are FILLED. exists on the disk. Last trial is EMPTY at certain moments during file loading. INVALID: Does not contain real data. Always INVALID TRIAL DATA STRUCTURE Each trial contains its own parameter block, containing information about experimental stimuli, etc. There is also a header, containing the size of the various data blocks, etc. The acquired data are in dynamically allocated data blocks; the trial data structure holds pointers to these blocks. THE CURRENT TRIAL When there are trials loaded into the trial-set, one of them is the "current trial". During experiments, while a trial is being run, the current trial is the last trial in the trial set. As stated above, all the lower-numbered (earlier) trials must be FRESH; the current trial is EMPTY until after the trial is run and the trial structure is complete. When the trial-set is FILLED, the choice of current trial can be adjusted, selecting various trials for graphing or analysis. The trials are indicated by the their own serial number inside the file. The count starts at 1. The technique of producing trial files forces the order of trial data in the file be the same as the temporal order in which the trials were run. The trial flag value INVALID is used for any other trial structures that might be in the program, for example, one holding stimulus values to produce a human-readable output of a particular set of them.