r2r_ctd.breakout¶
Convenience classes for interacting with a single breakout directory.
Has one main class: Breakout
.
The other two classes: BBox
and Interval
are in here because they are properties of the cruise of that breakout.
Attributes¶
Classes¶
Module Contents¶
- r2r_ctd.breakout.logger¶
- class r2r_ctd.breakout.BBox¶
Bases:
NamedTuple
namedtuple to represent a geo bounding box
The coordinates are in Westernmost, Southernmost, Easternmost, Northernmost order (see the geojson spec). The zonal coordinates are assumed to be [-180, 180] i.e. there a discontinuity that would cause a bounding box crossing the antimeridian to have a westernmost coordinate that is larger than the easternmost.
No actual bounds checks are done when instantiating.
- w: float¶
- s: float¶
- e: float¶
- n: float¶
- contains(lon: float, lat: float) bool ¶
given a lon/lat pair, determine if it is inside the bounding box represented by this instance
- class r2r_ctd.breakout.Interval¶
Bases:
NamedTuple
namedtuple to represent a temporal interval
- dtstart: datetime.datetime¶
- dtend: datetime.datetime¶
- contains(dt: datetime.datetime) bool ¶
Given a datetime object, determine if it is inside the interval represented by this instance
- class r2r_ctd.breakout.Breakout¶
Convenience wrapper for manipulating the various Paths of the r2r breakout
This class is also responsible for some of the more basic checks/functions:
Is the manifest-md5.txt ok
Filtering out the “deck test” looking paths
Getting the qa xml template
Extracting some information from said template
This class also keeps track of the various state netCDF files and the open Dataset objects. Access to files in the breakout (not proc dir) should always go through an instance of this class.
- path: pathlib.Path¶
Path to the breakout itself, this set on instantiating a Breakout
- property manifest_path: pathlib.Path¶
The Path of the manifest-md5.txt file in this breakout
- property manifest: str¶
Reads the manifest file as returns its contents as a string
- property manifest_dict: dict[pathlib.Path, str]¶
Transforms the manifest file into a dict containing file path to file hash mappings
- property manifest_ok: bool¶
Iterate over the manifest and check all the file hashes against the files in the breakout
In an actual bag-it bag, it would be an error for extra stuff to be in the data directory. For example, a .DS_Store file if you looked at the breakout data directory on a mac. This ignores anything not in the manifest file.
This returns True if both all the files in the manifest are present and their md5 hashes match.
This is one of the checks that goes into the stoplight report.
- property hex_paths: list[pathlib.Path]¶
Get all the paths that look like raw hex files
This is roughly equivalent to the create_stations_from_raw in the orig processing scripts. Instead of walking the dir, we will just check the paths generated by the manifest.
WHOI Divergence
The original would also try to load/open .hex.gz and .dat.gz files, this is not supported by the underlying odf.sbe reader yet. The underlying odf.sbe reader also probably cannot read .dat files, but I’ve never seen one.
- property deck_test_paths: list[pathlib.Path]¶
Returns a list of path that match the
is_deck_test()
check
- property stations_hex_paths: list[pathlib.Path]¶
Return a list of hex paths that are not deck tests, i.e.
is_deck_test()
is False for these paths.For the purposes of QC, these are the set of stations to operate on.
- property qa_template_path: pathlib.Path¶
Get the file named <cruise_id>_<fileset_id>_qc.2.0.xmlt from the breakout and return its path
- property qa_template_xml: lxml.etree._ElementTree¶
Parse the XML document located at
Breakout.qa_template_path
This template is where we will get the temporal and spatial bounds for the cruise. It is also the template that gets modified with the results of the QA routines.
- property namespaces: dict[str, str]¶
Get the XML namespaces from the XML document located at
Breakout.qa_template_path
These namespaces are then filtered to omit the default (None) namespace. R2R uses an r2r namespace in its XML documents, to make working with this easier, the r2r namespace is extracted from the template. This namespace needs to be added to the various xpath or find methods of the lxml.etree._ElementTree.
- property cruise_id: str | None¶
Extracts the cruise id from the XML document located at
Breakout.qa_template_path
The cruise id is the string that looks like RR1806 or TN336
- property fileset_id: str | None¶
Extracts the fileset id from the XML document located at
Breakout.qa_template_path
The fileset id appears to only be numeric and appears as part of the doi for this breakout.
- property bbox: BBox | None¶
The bounding of the cruise in geojson bbox format/order: w, s, e, n
This extracts the bounding box from the qa templates qareport/filesetinfo/cruise/extent nodes.
WHOI Divergence
The original code expanded the breakout bounding box by 0.0002 in each direction to “avoid a rounding problem”
- Returns:
a BBox instance if a bounding box could be extracted from the XML, else None
- property temporal_bounds: Interval | None¶
The temporal bounds of the cruise in start, stop order.
This extracts the temporal bounds from the qa templates qareport/filesetinfo/cruise depart_date and arrive_date nodes. For the end date, because this uses a time away datetime object, a day is added to ensure any bounds checks includes the entire end day.
WHOI Divergence
The original WHOI code would also pad the start with a day.
- Returns:
a DTRange instance if a temporal bounds could be extracted from the XML, else None
- __getitem__(key)¶
- __iter__()¶