r2r_ctd.breakout ================ .. py:module:: r2r_ctd.breakout .. autoapi-nested-parse:: Convenience classes for interacting with a single breakout directory. Has one main class: :py:class:`Breakout`. The other two classes: :py:class:`BBox` and :py:class:`Interval` are in here because they are properties of the cruise of that breakout. Attributes ---------- .. autoapisummary:: r2r_ctd.breakout.logger Classes ------- .. autoapisummary:: r2r_ctd.breakout.BBox r2r_ctd.breakout.Interval r2r_ctd.breakout.Breakout Module Contents --------------- .. py:data:: logger .. py:class:: BBox Bases: :py:obj:`NamedTuple` namedtuple to represent a geo bounding box The coordinates are in Westernmost, Southernmost, Easternmost, Northernmost order (see the geojson spec). The zonal coordinates are assumed to be [-180, 180] i.e. there a discontinuity that would cause a bounding box crossing the antimeridian to have a westernmost coordinate that is larger than the easternmost. No actual bounds checks are done when instantiating. .. py:attribute:: w :type: float .. py:attribute:: s :type: float .. py:attribute:: e :type: float .. py:attribute:: n :type: float .. py:property:: __geo_interface__ .. py:method:: contains(lon: float, lat: float) -> bool given a lon/lat pair, determine if it is inside the bounding box represented by this instance .. py:class:: Interval Bases: :py:obj:`NamedTuple` namedtuple to represent a temporal interval .. py:attribute:: dtstart :type: datetime.datetime .. py:attribute:: dtend :type: datetime.datetime .. py:method:: contains(dt: datetime.datetime) -> bool Given a datetime object, determine if it is inside the interval represented by this instance .. py:class:: Breakout Convenience wrapper for manipulating the various Paths of the r2r breakout This class is also responsible for some of the more basic checks/functions: * Is the manifest-md5.txt ok * Filtering out the "deck test" looking paths * Getting the qa xml template * Extracting some information from said template This class also keeps track of the various state netCDF files and the open Dataset objects. Access to files in the breakout (not proc dir) should always go through an instance of this class. .. py:attribute:: path :type: pathlib.Path Path to the breakout itself, this set on instantiating a Breakout .. py:property:: manifest_path :type: pathlib.Path The Path of the manifest-md5.txt file in this breakout .. py:property:: manifest :type: str Reads the manifest file as returns its contents as a string .. py:property:: manifest_dict :type: dict[pathlib.Path, str] Transforms the manifest file into a dict containing file path to file hash mappings .. py:property:: manifest_ok :type: bool Iterate over the manifest and check all the file hashes against the files in the breakout In an actual bag-it bag, it would be an error for extra stuff to be in the data directory. For example, a .DS_Store file if you looked at the breakout data directory on a mac. This ignores anything not in the manifest file. This returns True if both all the files in the manifest are present and their md5 hashes match. This is one of the checks that goes into the stoplight report. .. py:property:: hex_paths :type: list[pathlib.Path] Get all the paths that look like raw hex files This is roughly equivalent to the create_stations_from_raw in the orig processing scripts. Instead of walking the dir, we will just check the paths generated by the manifest. .. admonition:: WHOI Divergence :class: warning The original would also try to load/open .hex.gz and .dat.gz files, this is not supported by the underlying odf.sbe reader yet. The underlying odf.sbe reader also probably cannot read .dat files, but I've never seen one. .. py:property:: deck_test_paths :type: list[pathlib.Path] Returns a list of path that match the :py:func:`.is_deck_test` check .. py:property:: stations_hex_paths :type: list[pathlib.Path] Return a list of hex paths that are not deck tests, i.e. :py:func:`.is_deck_test` is False for these paths. For the purposes of QC, these are the set of stations to operate on. .. py:property:: qa_template_path :type: pathlib.Path Get the file named __qc.2.0.xmlt from the breakout and return its path .. py:property:: qa_template_xml :type: lxml.etree._ElementTree Parse the XML document located at :py:obj:`Breakout.qa_template_path` This template is where we will get the temporal and spatial bounds for the cruise. It is also the template that gets modified with the results of the QA routines. .. py:property:: namespaces :type: dict[str, str] Get the XML namespaces from the XML document located at :py:obj:`Breakout.qa_template_path` These namespaces are then filtered to omit the default (None) namespace. R2R uses an r2r namespace in its XML documents, to make working with this easier, the r2r namespace is extracted from the template. This namespace needs to be added to the various xpath or find methods of the lxml.etree._ElementTree. .. py:property:: cruise_id :type: str | None Extracts the cruise id from the XML document located at :py:obj:`Breakout.qa_template_path` The cruise id is the string that looks like RR1806 or TN336 .. py:property:: fileset_id :type: str | None Extracts the fileset id from the XML document located at :py:obj:`Breakout.qa_template_path` The fileset id appears to only be numeric and appears as part of the doi for this breakout. .. py:property:: bbox :type: BBox | None The bounding of the cruise in geojson bbox format/order: w, s, e, n This extracts the bounding box from the qa templates qareport/filesetinfo/cruise/extent nodes. .. admonition:: WHOI Divergence :class: warning The original code expanded the breakout bounding box by 0.0002 in each direction to "avoid a rounding problem" :returns: a BBox instance if a bounding box could be extracted from the XML, else None .. py:property:: temporal_bounds :type: Interval | None The temporal bounds of the cruise in start, stop order. This extracts the temporal bounds from the qa templates qareport/filesetinfo/cruise depart_date and arrive_date nodes. For the end date, because this uses a time away datetime object, a day is added to ensure any bounds checks includes the entire end day. .. admonition:: WHOI Divergence :class: warning The original WHOI code would also pad the start with a day. :returns: a DTRange instance if a temporal bounds could be extracted from the XML, else None .. py:method:: __getitem__(key) .. py:method:: __iter__()