Running QA Routines¶
Important
Make sure r2r-ctd
is talking to docker, see Docker in the Installing instructions.
Basic Usage¶
Given an R2R CTD breakout, run the QA routines by executing:
uvx r2r-ctd qa <path_to_breakout>
Multiple breakouts can be specified and they will be proceed in sequence:
uvx r2r-ctd qa <path_to_breakout1> <path_to_breakout2>
Important
Almost all crashes are considered bugs and should be reported/fixed.
With the exception of an invalid breakout structure where the xmlt and manifest files are missing or malformed, the QA processing should not throw or crash in the case of invalid input files, the invalidness should be reported in the QA xml report itself.
If the xmlt and manifrst file are malformed or missing, something has gone wrong on the r2r side that needs to be investigated.
Tip
It is always safe to interrupt/kill the python process with a control + c and restart the QA process. There is significant caching of intermediate results and the QA process should quickly catch up to where it left off.
Switches¶
Quiet -q
¶
The verbosity of the logging can be controlled by adding one or more -q
flags after the r2r-ctd
but before the qa
subcommand.
uvx r2r-ctd -q qa <path_to_breakout>
Only prints log message of level INFO
or greater.
Each q
reduces the log verbosity by one level from the default which is DEBUG
.
uvx r2r-ctd -qq qa <path_to_breakout>
Only prints logs messages of level WARNING
or greater.
Skip CNV generation --no-gen-cnvs
¶
Generating the cnv products is not necessary for the QA routines, it is also computationally expensive.
Adding a --no-gen-cnvs
will skip generating these files:
uvx r2r-ctd qa --no-gen-cnvs <path_to_breakout>
Warning
In testing and development, occasionally in the production of the cnv products the underlying seabird software programs would not exit. There would be no open GUI windows and I have been unable find logs or debug information about what might be causing this.
It is safe to kill (control + c) and restart the QA process when this occurs. The python program, not the docker container, the container should clean itself up when python exits.
Breakout Structure¶
When R2R receives data from a cruise it will be split up into separate collections called “breakouts”.
To be processed, the breakout is expected to be a directory with contents, not an archive such as a zip file.
r2r-ctd
does no interaction with remote systems and has no assumptions about how to obtain the breakouts or put ths qa results back into.
The R2R CTD Breakout must have the following structure and almost follows the BagIt standard[1].
This section will follow the nomenclature in the BagIt terminology section.
The starting /
will refer here to the root of the breakout
A
/manifest-md5.txt
payload manifest, containing a list of md5 file hashes and relative paths to the files corresponding to those hashes. Only md5 is supported byr2r-ctd
at this time.A
/data
payload directory containing the datafiles that will be checked.A
/qa
tag directory containing at a minimum a*_qa.2.0.xmlt
tag file that conforms to the R2R QA 2.0 Schema schema. The prefix of this xml file is probably some combination of cruise name and breakout id, however this is not too important, only that exactly one file matches this pattern.
While the BagIt spec requires all the actual content to be in the /data
directory, r2r-ctd
just uses the paths inside the manifest-md5.txt
file and does not do any validation that this breakout conforms to the BagIt specification.
The details of what cruise specific files are being looked for within the /data
directory are in the API documentation.
Specifically r2r_ctd.breakout.Breakout.stations_hex_paths
for what is considered as a station[2], and r2r_ctd.checks.check_three_files
for what each station is expected to have.
QA Template File: *_qa.2.0.xmlt
¶
This xml file is the “template” that will both be updated with the results of the QA routines, but also contains some of the metadata that the breakout files are tested against. Specifically, the cruise start/end dates and the bounding box.
QA Results¶
Several result files are produced along with some processing state files.
Everything r2r-ctd
generates will be placed into a /proc
directory[3].
Inside this /proc
directory are several other directories:
/proc/nc
has netCDF files containing all the “state” of the QA routines, this includes test results and derived files. These netCDF files are an implementation detail and the contents can be ignored unless things are going really wrong. These files can be safely deleted, but it removes the “cache” of the QA results for each cast. Do not modify these files./proc/qa
will have the qa results:If the QA routines finished a
*_qa.2.0.xml
will be present (note the lack oft
in the file extension), updated with resultsA
*_ctd_metdata.geoCSV
file should be present.A
/proc/qa/config
directory containing the instrument configuration report text files.
/proc/products/r2rctd
will have all the generated cnv files (2 per cast) if the--no-gen-cnvs
switch was not provided.
Presumably, the contents of /proc
excluding the nc
sub-directory can be rsync-ed back to the r2r server (without the --delete
switch)
Parallel Processing¶
Since docker provides reasonable process isolation for the Windows based conversion tools, it is possible to have multiple container instances running the Seabird software in parallel.
This is most simply done by having multiple terminal sessions open and running the basic usage commands above on a single breakout in each session.
In the same session you could also use something like xargs
to parallelize, but the emitted log message will be muxed making it difficult to follow what is going on.
In general, you’ll want to limit the number of parallel processors going to the number of physical cores in your CPU, in the case of Apple arm hardware, this is further the number of performance cores your machine has.
To see how many performance cores are present on an M-family mac, you can use the system_profiler
command:
system_profiler SPHardwareDataType
Look for the line that says: Total Number of Cores:
In parenthesis it should have the breakdown between performance and efficiency cores.
For example, the baseline M4 MacBook Air has 10 cores but only 4 are performance, so the number of parallel processes should reasonably kept to 4:
Total Number of Cores: 10 (4 performance and 6 efficiency)