The uow.json File#
Introduction#
A uow.json
file must be created in the root of your uow directory.
The file provides the instructions needed to make the correct API calls to accomplish the commit.
It is more explicit than having files “in the right place” or having a program guess the data type by reading the file extention.
As of writing this document, the uow.json
file is not generated automatically.
This chapter describes the uow.json
file and the reasoning behind what is present in it.
This is a blank uow.json
:
1{
2"files": [
3],
4"processing_note":{
5 "date": "",
6 "data_type": "",
7 "action":"",
8 "summary": "",
9 "name": "",
10 "notes": ""
11 }
12}
It has two basic requried elements: an array for files (line 2-3), and a processing note object (line 4-11).
These are both under a files
and processing_note
key, respectivly.
No other elements are allowed at the root level.
Since the files
array is more complicated, lets discuss the processing_note
first.
The Processing Note#
With the commit is a processing note which gets attached to the cruise.
This note is described by an object contained within the processing_note
root level key.
The processing note object has the following required keys: date
, data_type
, action
, summary
, name
, and notes
.
No other keys are allowed.
They are as follows:
date
The
date
key contains a string with an ISO-8601 date in it. This format isYYYY-MM-DD
, with zero padded month and days. It can be set to any valid date. The reccomended value is the commit date.data_type
The
data_type
key contains a string which may contain any valid unicode charicters. It is displayed under the “Data Type” field on the website. Reccomended values are the paramters that were merged in, or “CrsRpt” in the case of documentation updates.action
The
action
key contains a string which may contain any valid unicode charicters. It is displayed under the “Action” field on the website. Almost always it is set to “Website Update”.summary
The
summary
key contains a string which may contain any valid unicode charicters. It is displayed under the “Summary” field on the website. It should be a short description of what was done. For example, “Updated DOC, TDN, NUTS, bottle data online in all formats”.name
The
name
key contains a string which may contain any valid unicode charicters. It is displayed under the “Name” field of the website. It should be set to the name of the person doing the commit (or however they want to be represented on the website).notes
The
notes
key contains a string which may contain any valid unicode charicters. It is displayed under the “Note” field on the website in a<pre>
tag (this means it will appear exactly is). Thenotes
field has some special bahavior if it starts with an@
charicter.When the
notes
field starts with an@
charicter, theuow
command will interpert the rest of the string as a path to a file. The file path is relitave to the root of your uow directory. For example, if your processing notes are in a file callednotes.txt
, thenotes
key would contain"@notes.txt"
. The uow would then look for thenotes.txt
file and include it as the note. It is reccomended that the any notes be less than 80 characters wide. This behavior was inspired by how thecurl
command works.Warning
If not using a seperate file for the notes, do not start the
notes
string with an@
. Additionally, when not using a seperate file for notes, do not manually write new lines charicters (\n
).Note
When designing the cruise JSON object we were faced with the following limitations and tradeoffs when it came to actually storing notes.
JSON does not support multi-line strings, so how should multi-line history notes actualy be stored? There were two options, store the notes as single lines with escaped new lines (
\n
) in them, or store the notes as an array of strings where each line of the note is a seperate string in the array.There were downsides to both, but the array representation was chosen for human readabiltiy.
The Files Array#
The archetectural changes of the cchdo website allows for new functionality. One major new feature is the ability to have multiple files of the same “kind” in a cruises dataset. For example, there can now be two exchange bottle files online. This new ability means certain actions which were previously implicit can no longer be. The files array contains objects with information to construct the actions (API calls) of the commit.
File Array Objects#
Each object in the files
array represent a single file to which an action will be done to.
All file objects must contain file
and action
as keys with strings as values.
The file
is the path to the file, relative to the uow directory root.
The action
must a string of either new
or merge
.
Let’s start with file the merge
action.
The merge
action#
Here is a complete file object with the merge
action:
1 {
2 "file":"0.existing_files/4126_BerPolarforsch2002433do.pdf",
3 "action":"merge"
4 }
The file path is specified under the file
key on line 2.
The action, “merge”, is specified on line 3.
No other keys are needed or allowed.
Note
What will happen at commit time?
When the uow is comitted several actions occur.
The path listed in
file
will be checked for existance.If the file exists, it will be hashed with sha256.
This hash will be searched for in the fetch log.
If a fetch event for this file is found, the id and other needed information is extracted to construct the PATCH request that will be emitted.
Finally the API itself is asked to ensure that the file already exists on the server.
If any of the above actions fail, the commit is aborted before any state changing API calls are made.
Finally, for all the files with the merge action, an HTTP PATCH request is made which changes the files “role” to merged.
The new
action#
Comitting files which do not currently exist in the system requires the action of new
to be specified.
There are two types of new files, one which replace one currently in the dataset, and one that is not replaceing anything (a completly new file).
To understand what the replaces
key does, let’s first look at completly new file.
Here is complete file object with the new
action:
1 {
2 "file":"1.new_files/ARK-XVII-1_06AQ20010619.txt",
3 "action":"new",
4 "data_format":"text",
5 "data_type":"documentation",
6 "role": "dataset"
7 }
As with a “merged” files, the path is specified by the file
key on line 2.
The action, “new”, is specified on line 3.
A file object which does not have the replaces
key in it, must have these keys present: data_format
, data_type
, and role
.
The data_format
key#
The data_format
key is a string describing the format the data is actually in, allowed values are:
exchange
This is data in exchange format, both plain csv and zip archives containing exchange formatted data should have this as the data format.
cf_netcdf
This is data in the “new” CCHDO CF netCDF format. These files should never been in a zip archive (on the site).
whp_netcdf
This is data in the default netCDF format CCHDO uses, the
whp_
prefix is to distinguish these files from netCDF files which may conform to some other standard such as OceanSites or CF. These files will almost always be zip archives.woce
This is data in the legacy woce formats for bottle, ctd, and summary. This could be both zip archives and plain (ASCII) text.
text
This is data which is simply plain (UTF-8) text. Typically only used for the cruise report or other documentation.
pdf
Used exclusivly for any PDF documentation.
The data_type
key#
The data_type
key is a string which describes the kind of data this file is, allowed values are:
bottle
This file represents discrete bottle data.
ctd
This file represents the in situ continious ctd data.
documentation
This file contains human readable documentation.
summary
This file is a legacy woce sum file.
large_volume
This is a “large volume sample” file. Usually it is in the the
woce
data format.trace_metals
This is a file containing (only) trace metal data. Usually it is in the
exchange
data format. Trace metals typically occur on seperate casts and tend to be kept seperate from the bottle data.
The role
key#
The role
key is a string which describes how the site should display the file a cruise page, allowed values are:
dataset
This file should be part of the main dataset. A file with the dataset role will appear in the “Dataset” section of the website AND be included in any bulk download actions.
unprocessed
An unprocessed file appear in the “Data as Received” section of the website, it will only be publicly available by going to the cruise page. This is the role given to user submitted files to make the available as received.
merged
This file should be marked as merged, it will appear in the “Data as Received” section of the wesbite. It can only be downloaded by going to the cruise page. This is the role given to user submitted files which have been merged into the main dataset. It should also be given to files which were in the main dataset but were merged with another file.
hidden
Hidden is just that, the file will be hidden from all but the staff, it will only be accessable through the API.
residual
A residual file will contain pressure levels that have been removed from files that were missing something preventing CF conversion. Usual examples are lat, lon, time or pressure having a fill value.
archive
Archive is the role that was given to the tar files which contain the legacy “data directory”. It will also be given to the archive containing extra files associated with a commit. Generally, this should not be user set.
Let’s then look at a file object which has the replaces
key in it, here is a complete file object:
{
"file":"1.new_files/06AQ20010619_do.pdf",
"action":"new",
"replaces":"0.existing_files/4126_BerPolarforsch2002433do.pdf"
}
This object still has new
as the action, but is lacking the data_format
, data_type
, and role
keys.
The replaces
key contains a string with a file path to a file.
This path must also appear as a seperate file object in the files array containing the merge
action.
When the replaces
key is specified, the uow copies the data_format
, data_type
, and role
values from the existing file to use for this new one.
Note
What will happen at commit time?
All the file objects with the
new
action specified are verified to exist at the path specified byfile
.These files are then hashed with sha256.
The
replaces
key is looked for, if present, the uow looks for a file object with the same path as the one inreplaces
If found, the
data_format
,data_type
, androle
values are coppied from the file being replaced.
If the
replaces
key is not present, thedata_format
,data_type
, androle
keys are searched for.Their values are verfied to be one of the allowed values.
A new file json is constructed containing the needed metadata and the file itself base64 encoded.
The API is asked to ensure the file DOES NOT already exist in the system.
If any of the above fail, the commit is aborted before any state modifying API calls are made.
As the new files are being POSTed to the api, new file IDs are being returned, these are then used to attach the file to the cruise.
The optional from
key#
Any file object which has the new
action may also have an array of file path strings under the from
key.
This key is intented to allow for a record of what files were involved in the creation of this new file.
Some examples would be two or more files merged to create a new one, or even a zip archive which was simply split apart.
Here is an example of a file object containing a from
key:
1{
2 "file":"1.new_files/33RR20050106_hy1.csv",
3 "action":"new",
4 "from":[
5 "0.existing_files/2099_33RR20050106.exc.csv",
6 "0.existing_files/271_33RR20050106_hy1.csv"
7 ],
8 "replaces":"0.existing_files/271_33RR20050106_hy1.csv"
9}
The paths listed in the from
key must also exist as seperate file objects in the files array.
At commit time, those files sha256 hashes are simply added to the file json to be committed under the file_sources
key.
The paths in the from
array can be both merged
files or new
files.
For example, a netCDF file created from a newly merged exchnage file would have that exchange file as the from
source.
Here is a complete uow.json example:
1{
2"files": [
3 {"file":"0.existing_files/2099_33RR20050106.exc.csv",
4 "action":"merge"
5 },
6 {"file":"0.existing_files/2671_33RR20050106_nc_hyd.zip",
7 "action":"merge"
8 },
9 {"file":"0.existing_files/271_33RR20050106_hy1.csv",
10 "action":"merge"
11 },
12 {"file":"0.existing_files/528_LDEO_NGL_CliVarTritium4CCHDO_P16S.xlsx",
13 "action":"merge"
14 },
15 {"file":"0.existing_files/8297_33RR20050106hy.txt",
16 "action":"merge"
17 },
18 {"file":"1.new_files/33RR20050106_hy1.csv",
19 "action":"new",
20 "from":[
21 "0.existing_files/2099_33RR20050106.exc.csv",
22 "0.existing_files/271_33RR20050106_hy1.csv"
23 ],
24 "replaces":"0.existing_files/271_33RR20050106_hy1.csv"
25 },
26 {"file":"1.new_files/33RR20050106_nc_hyd.zip",
27 "action":"new",
28 "from":[
29 "1.new_files/33RR20050106_hy1.csv"
30 ],
31 "replaces": "0.existing_files/2671_33RR20050106_nc_hyd.zip"
32 },
33 {"file":"1.new_files/33RR20050106hy.txt",
34 "action":"new",
35 "from":[
36 "1.new_files/33RR20050106_hy1.csv"
37 ],
38 "replaces":"0.existing_files/8297_33RR20050106hy.txt"
39 }
40],
41"processing_note":{
42 "date": "2015-05-14",
43 "data_type": "Bottle",
44 "action":"Merge",
45 "summary": "Tr Merged",
46 "name": "Andrew Barna",
47 "notes": "@00README.txt"
48 }
49}
In the above example the following has occured:
Two submitted files were marked as merged (lines 3-4, 12-13).
Three files already in the dataset were replaced, so they were also marked as merged (lines 6-11, 15-16).
A new exchange bottle file is to be placed on line, it was merged from the existing dataset file and a submitted file (lines 21, 22). It is replacing a file so grab the metadata from the old file (line 24).
A new netCDF bottle file (lines 26-32) was created from the new exchange file (line 28-30). It is replacing a file online to grab the metadata from the old file (line 31).
A new woce bottle file (lines 33-39) was created from the new exchange file (lines 35-37). It is replacing a file online so grab the metadta from the old file (line 38)
The processing note (lines 41-48) contents are in a seperate file, so use the @path syntax (line 47)
Blank File Object Snippets#
Here are some useful blank file objects to construct a uow.json files
array.
Blank Merge File#
{
"file":"",
"action":"merge"
}
Blank New File Replacing#
Without “from” array:
{
"file":"",
"action":"new",
"replaces":""
}
With “from” array:
{
"file":"",
"action":"new",
"from":[
""
],
"replaces":""
}
Blank New File#
Without “from” array:
{
"file":"",
"action":"new",
"role":"",
"data_format":"",
"data_type":""
}
With “from” array:
{
"file":"",
"action":"new",
"from":[
""
],
"role":"",
"data_format":"",
"data_type":""
}