Published

OGC Engineering Report

Logical data model for hydrographic data based on HY_Features concepts
David Blodgett Editor
Additional Formats: PDF
OGC Engineering Report

Published

Document number:25-045
Document type:OGC Engineering Report
Document subtype:
Document stage:Published
Document language:English

License Agreement

Use of this document is subject to the license agreement at https://www.ogc.org/license



I.  Keywords

The following are keywords to be used by search engines and document catalogues.

hydrology, hydrofabric, WaterML

1.  Normative references

There are no normative references in this document.

The following normative documents are referenced in this document.

2.  Subject

Hydrologic geospatial data are built through integrated analysis of a wide variety of geospatial datasets. The resulting datasets contain multi-scale networked features with numerous internal and external linkages to form what has come to be known as a “hydrofabric.” The “hydrofabric data model” (introduced and discussed here) defines logic for encoding hydrographic features, the cross-scale networked relationships between them, and a minimal but sufficient approach to external dataset linkages. With the contents of this report as context, the OGC Hydrology Domain Working Group could consider initiating a Standard Working Group activity to establish a logical and physical data model for concepts defined in WaterML 2: Part 3 — Surface Hydrology Features (HY_Features) — Conceptual Model.

3.  Executive Summary

This report describes background and design of the “hydrofabric data model” which defines logic for implementation of data schemas and software that deals with hydrologic geospatial data. As a “logical” data model, the hydrofabric data model specifies details necessary to support compatibility of data and software that satisfy diverse needs without unnecessarily restricting implementation details. The logic presented in this report is based on concepts defined in WaterML2 Part 3 Surface Hydrology Features Concepts and is designed to serve the needs of a range of hydroscience use cases.

Development of international community standards applicable to hydrofabrics began, prompted by the World Meteorological Organization Commission for Hydrology, in 2012 [5] . More than 10 years later, this report documents one aspect of a long-term research and development activity that traces its roots back that far.

This report describes terminology, use cases, and background as context preceding presentation of the logical model and discussion of its design. Three appendices document related data models, an example encoding of the hydrofabric data model, and an artificial schematic and tabular data example. The sections of the report can be accessed in the Clause 5 section.

Figure 1 — "Simplified Schematic of Hydrofabric Data."

The figure: Figure 1 in this section illustrates most components of the hydrofabric data model. The Annex C appendix contains complete example attribute tables and additional details.

  • Feature identifiers prefixed with “fl-”: flowline features are linear representations of where water may flow and may or may not have an associated catchment. Dashed flowlines represent features such as headwater drainage pathways or side channels and do not have a defined catchment area. Solid flowlines represent channelized flow pathways or linear waterbodies and are (part of) the flowpath of a specific catchment.

  • Feature identifiers prefixed with “hl-”: hydrologic location features are points that lie along the network of flowlines

  • Feature identifiers prefixed with “c-”: catchment features are polygons that encompass a unit of hydrology that performs both land surface (catchment area) and stream (flowpath) functions.

  • Feature identifiers prefixed with “wb-”: waterbody features are polygons that represent the extent of a 2D waterbody. They can relate to one or more flowlines that connect through them.

  • The six thick colored lines are mainstem flowpaths [1] which aggregate flowlines from a flow initiation location to a basin outlet. Mainstem features are composed of flowlines (every flowline is part of one and only one mainstem) and provide a minimal yet sufficient set of linear feature identifiers for dataset cross referencing. Said another way, a mainstem is the set of flowlines that connect where flow initiates in a basin to the basin’s outlet.

  • Flowpath features, not shown explicitly, connect the inflow to the outflow of a catchment and are aggregates of flowlines. A flowpath is the set of flowlines that connect the inlet of a catchment to the outlet of a catchment.

  • Dotted flowlines are within a catchment but not along its flowpath and solid blue flowlines are along a catchment’s flowpath.

  • Each flowpath in the hydrofabric data model can have one or more “type” attributes although no type list is provided in the model.

Using the outcomes of this report as a point of departure, a hydrographic features logical and physical model implementing HY_Features concepts may be an achievable goal for the OGC Hydrology Domain Working group to pursue.

3.1.  Disclaimer

Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

3.2.  Document Contributor Contact Points

All questions regarding this document should be directed to the editor or the contributors:

Contacts

Table 1
NameOrganizationRole
David BlodgettUSGSEditor

4.  Terms and definitions

This document uses the terms defined in OGC Policy Directive 49, which is based on the ISO/IEC Directives, Part 2, Rules for the structure and drafting of International Standards. In particular, the word “shall” (not “must”) is the verb form used to indicate a requirement to be strictly followed to conform to this document and OGC documents do not use the equivalent phrases in the ISO/IEC Directives, Part 2.

This document also uses terms defined in the OGC Standard for Modular specifications (OGC 08-131r3), also known as the ‘ModSpec’. The definitions of terms such as standard, specification, requirement, and conformance test are provided in the ModSpec.

For the purposes of this document, the following additional terms and definitions apply.

For the purposes of this report, the definitions specified in Clause 4 of the HY_Features conceptual model standard OGC 14-111r6 shall apply. In addition, the following terms and definitions apply.

drainage basin

Like the catchment feature type, a drainage basin is a holistic feature defined as the total upstream area draining to an outlet. It is comparable to a catchment with no inflows and a single outlet. Drainage basins can be thought of as a total accumulated or total upstream catchment and can be described with a pair of locations defining: 1) the headwater area with no discernible flowpaths where flow initiates and 2) the outlet where flow enters a larger river, waterbody (including oceans), or inland sink. A single mainstem flowpath connects a drainage basin’s headwater to its outlet. See [1] for additional discussion.

flowline

One-dimensional (linear) feature that represents a hydrologic pathway that is sometimes or always a flowing body of water and may be dry or stagnant at other times. A flowline is a type of flowpath that, as with a levee or underground conduit, may not have a catchment area but forms all or part of a hydrologic flowpath. A flowline should be thought of as a hydrographic connector with an inlet and an outlet that may not receive lateral flow from a catchment area.

headwater

A headwater is scale-dependent and represents the most upstream location where flow initiates in a drainage basin. Such a point is typically somewhere along a drainage basin boundary. Given that the definition of a flowpath does not necessitate the existence of water at all times, a headwater can be imagined as a point where an extended flowpath touches a drainage basin boundary. See [1] for additional discussion.

hydrologic geospatial fabric | hydrofabric

A hydrologic geospatial fabric (hydrofabric) discretizes the landscape according to hydrologic processes and the hydrologic network that conveys water and its constituents downstream. This integration of spatially extensive catchments and an expansive network of flowpaths enables integration of landscape and river data for hydroscience applications. A hydrofabric intended to capture surface water-groundwater interactions includes hydrogeologic features and their association to surface hydrologic features.

hydrologic unit

A hydrologic unit is an incremental drainage polygon that encompasses an area that drains to a single primary outlet. Conceptually, a hydrologic unit is an aggregate of one or more catchments. Unlike a catchment, a hydrologic unit can have more than one inflow. A hydrologic unit may have more than one outflow, but must have one and only one dominant outflow such that the network of hydrologic units has a dendritic topology that can be used to estimate total drainage area.

mainstem

The mainstem concept extends and constrains the concept of a flowpath by designating a single path from a headwater to an outlet through a drainage basin. In other words, a mainstem is a linear realization or backbone of a drainage basin. See [1] for additional discussion.

flow network

Describes the connectivity between flowline features. Each entry in the flow network should be thought of as a connection between two (linear representations of) waterbodies.

elevation derived hydrography

Elevation derived hydrography is a geographic representation of water features that is created using algorithms that identify channels and depressions in elevation data that may carry surface flow and/or contain water.

  • API — Application Programming Interface

  • HY_Features — WaterML2 Part 3 Surface Hydrology Features

  • OGC — Open Geospatial Consortium

  • NHD — National Hydrography Dataset

  • NHM — National Hydrologic Model

  • NWM — National Water Model

  • UML — Unified Modeling Language

  • USGS — U.S. Geological Survey

  • WaterML — Water Markup Language

The UML class diagrams in this document follow the conventions described in the following two figures.

Figure 2 — Class hierarchy describing relationship between conceptual and logical classes.
Figure 3 — Logical model showing attributes and structure of relationships between logical classes.

5.  Overview

Clause 6 introduces the use cases considered for this engineering report.

Clause 7 provides a summary of key advances and points of reference.

Clause 8 documents the logical model classes individually.

Clause 9 describes key aspects of the overall logical model system.

Clause 10 summarizes the logical model in terms of use cases and next steps.

Annex A: Annex A provides description of data models evaluated for this report.

Annex B: Annex B shows how to create simple features that follow hydrofabric logic.

Annex C: Annex C illustrates the logical data model with an idealized schematic and data.

6.  Use Cases

The logical data model documented in this report is designed to satisfy three principal use cases: elevation derived hydrography core data model, hydrographic data services, and hydrologic modeling. All three require a common data model that provides a target for data ingestion and a source for data use. While designed to meet the needs of the three principal use cases described below, the logical data model is intended to form the core of a common data model for hydrologic feature data generally.

6.1.  Elevation Derived Hydrography Core Data model

As the agency that leads [hydrographic data compilation for the United States](https://www.usgs.gov/national-hydrography), the U.S. Geological Survey (USGS) needs a single data model to encompass data created through a large collection of regional elevation derived hydrography projects so it can compile and improve a single national dataset over time to serve the cartographic and diverse hydrologic uses. [2]

Elevation derived hydrography is the practice of interpreting high-resolution elevation data to delineate geographic representations of hydrologic and hydrodynamic features. Specifications for delineating hydrography from elevation data have been developed by the USGS and provide guidance for creation of compiled hydrographic feature datasets. See [15] for more information about elevation derived hydrography.

Elevation derived hydrography data follow a consistent specification but do not include important attributes, such as geographic feature names, and are not integrated with pre-existing data outside their spatial domain. Validation, ingest, conflation, integration, and derivative attribute creation must take place prior to a given elevation derived hydrography collection being ready to load into a broader national dataset.

Specific details of the validation, ingestion, conflation, integration, and derivative attribute creation steps of this use case are beyond the scope of this report. However, the steps are helpful to frame the scope of the use case.

  • Validation: Testing elevation derived hydrography datasets to ensure they meet acquisition requirements and specifications.

  • Ingestion: Extract, transform, and load source data into operational data store format, creation of feature identifiers, prepare data for later processing steps.

  • Conflation: Transfer name and persistent identifier attributes using spatial proximity and feature topology.

  • Hydrologic Integration: Establish broad (external) network connectivity and ensure conflated persistent identifiers are valid.

  • Derivative Attribute Creation: Assignment and calculation of attributes that must be derived from elevation, more basic attributes of features, and the flow network.

With these steps complete, the data set is ready to be loaded into a national database, at which point it replaces data in the region it covers. The following simplified class diagrams show the tables in an elevation derived hydrography collection.

Figure 4 — Rudimentary class diagram of elevation derived hydrography data.

The following illustrate key requirements that must be satisfied by a logical data model to support this use case.

6.1.1.  Identifier Permanence

The most important aspect of this use case is management of temporary feature identifiers within a system of persistent identifiers. Identifiers for individual flowlines are defined by a given (regional) elevation derived hydrography dataset, but the national dataset has persistent identifiers for naming and hydrologic integration that must be maintained [4]. Ensuring that new geometry and new identifiers can be introduced to the dataset and maintaining pre-existing identifiers in a national context is important.

6.1.2.  Hydrologic Integration

Elevation derived hydrography includes a (hydrologic) network only in that its geometry forms a node topology that must be converted into a tabular (attribute-based) topology for use in hydrologic applications. It does not include attribute information that relates upstream features to downstream features or linear (flowline) features to polygonal (waterbody) features. This attribute-based topology needs to be created such that basin-level hydrology is accurate after each regional elevation derived hydrography dataset is incorporated.

6.1.3.  Domain Decomposition

Derived network attributes such as total drainage area and topological sort order (commonly referred to as hydrosequence order) require boundary initialization values at upstream and downstream boundaries. The ability to insert new data into an existing network is a key requirement which allows creation of continuous derived attributes and facilitates hydrologic integration.

6.2.  Hydrographic Data Services Database

As a provider of nation-wide hydrographic data in service of cartography and diverse hydrologic applications, the USGS can create a system of data services that meet the needs of many uses and applications so that the general hydrographic data the USGS produces are available in ways that people find useful for specific applications.

As described in Clause 6.1, the USGS assembles a single hydrographic dataset that is designed to work for many applications. Some applications may use the full dataset directly, but most will use a small subset or a transformed version of it. Given this, the ability to present (partial) application specific versions of the general data model can increase its usefulness.

In this context, “data services” takes on several forms. Classically, a data service is a web service intended to provide a web application programming interface for general applications e.g., OGC-API features or Esri Rest API. In other cases, a data service could be a web service intended to meet a more specific application need. e.g. a custom API that returns a web page or JSON document with application specific content. A data service could also be capable of preparing custom data subset packages or cartography according to a set of flexible input parameters passed via a web interface.

Having a single core data system that uses a relatively simple data model for data services to draw from supports avoiding unnecessary duplication and complexity. The data model for such a core data system needs to be capable of directly, or indirectly through integration and augmentation, satisfy the needs of this diverse set of data services.

6.2.1.  Subset Capabilities

When generating a data subset, it is important that the resulting subset forms a complete dataset that can be used independent of the parent dataset. For this hydrographic data use case, a subset can be taken in at least five ways.

  • Spatially-defined subsets, by bounding box or polygon, include a specific spatial region of a dataset.

  • Attribute subsets include only selected attributes from a larger set of attributes. For example, retrieve only attributes relevant to specific study.

  • Featuretype subsets remove unneeded feature types from a larger set of feature types. For example, only retrieve features of type “river” and not “creek.”

  • Scale/resolution subsets remove unneeded features based on their size or relative prominence. For example, only retrieve features with estimated drainage area larger than 20 square kilometers.

  • Network connectivity subsets are based on a network navigation from a starting location. For example, only retrieve features found upstream of a starting location.

Each of these has nuances which go beyond the scope of this report but the general use case is important to understanding and interpreting the logical data model.

6.2.2.  Extensibility

Given that water integrates so many landscape processes, hydrography data are used as a central component of numerous integrated environmental data applications. As a result, the ability to extend hydrography by integrating it with other data sets and systems is required. This, coupled with the desire to avoid complexity and include only generally needed content in a core data model, further enforces the requirement for extensibility.

In hydrography, extensibility can be thought of in at least three ways.

  • Association through spatial relationships to transfer data from one spatial basis to another. For example, apply a spatial join to place soil observation locations into the catchment they were observed in.

  • Association via relationship to a common reference identifier or shared index. For example, linking streamgauge monitoring to hydrography based on the mainstem river both datasets are indexed to.

As with subsetting, these modes of extending hydrography have many nuances that are beyond the scope of this report but the high-level use case is important to understand and interpret the logical data model.

6.3.  Hydrologic Modeling

6.3.1.  Introduction and Overview of Purpose

Hydrologic models represent the storage and movement of water in and between catchments and bodies of water. Not all hydrologic models represent features explicitly, but they all represent hydrologic process and resolve it in a way that can be thought of as “an abstraction of a real-world phenomena” — the definition of a feature from ISO 19101 / the OGC Abstract Specification Topic 5 — Features. The hydrofabric logical data model is intended to supported integration of the features that various conceptualizations of hydrologic model represent for the purpose of interoperability, integration, sharing, and intercomparison.

As Federal agencies with a role in continental scale water availability and hazards, the USGS and National Weather Service (among many other organizations) require a common geospatial reference system with which to catalog and construct hydrologic models and related data [13] [14] [18]. The diversity of model applications required to accomplish the needs of modern society and the environment is large and ever expanding as science and technology advance. However, the abstraction used to describe hydrologic features and the data collected and simulated about them can be seen as stable in the context of hydrologic science. This report relies on (and attempts to define) this stability using precise yet sufficiently abstract terminology to provide the needed “hydrologic science building blocks” for new and novel science and applications going forward. For a thorough background of the concepts described in this use case, refer to [16] and [14].

6.3.2.  Catchments and Hydrologic Units

Many hydrologic model formulations are based around control volumes (modeling units) that tie hydrologic processes of the land surface and near surface to a predominant waterbody that drains a unit of land. This physiographic unit is referred to as a “catchment” and is the result of geomorphic evolution of a unit of hydrology from some upstream location to some downstream location. The HY_Features data model describes this “catchment” data model in great detail [7]. However, not all hydrologic models use an abstraction that follows the strict constraints of the catchment data model.

In some cases, hydrologic processes of the land surface are represented as a continuous or regularly discretized surface with no pre-defined surficial features. The applicability of the logical model presented here is limited in these cases. However, such models do represent the emergence of concentrated flow in rivers and lakes with identifiable confluences and anthropogenic features such as bridges or streamgages. The need to maintain a degree of identifiability for such features is the key use case that requires such continuum models to recognize and associate with some real-world features.

In other cases, catchments are lumped such that a given unit of land receives flow from potentially many upstream hydrologic units and delivers flow to one and only one primary downstream hydrologic unit and is bound by a drainage divide. This representation does not adhere to the single-inflow nexus constraint of the catchment data model so does not relate to river networks in a 1:1 cardinality as is the case with catchments. The U.S. “Watershed Boundary Dataset” [3] follows this logic and is one of many such datasets that attempt to represent hydrologic units of uniform size for statistical reporting and jurisdictional needs. This form of hydrologic model control volume is useful for rudimentary water budgeting and for cataloging purposes. The logical model presented here is compatible with this “hydrologic unit” paradigm but network and waterbody integration support is limited because the catchment data model does not apply directly.

6.3.3.  Flowpath Network and Hydrolocations

Hydrologic models typically include routing of flow volume through a network of flowpaths but do not attempt to capture precise detail of hydrodynamic, sediment transport, or constituent fate and transport. The data requirements for the relatively simple “hydrologic routing” are typically limited to uniform or segmented parameters for the flowpath of a given catchment. Hydrodynamic models require more detail such as cross sections or a three-dimensional mesh representing channel bathymetry. The logical model described here supports the former, linear segments (flowlines) making up the flowpath of a given catchment which may have 3D coordinates, but does not attempt to represent cross sections or other aspects of channel geometry.

A hydrologic model will typically represent regional connectivity as a network of connected catchment flowpaths. Although some continuously discretized models represent regional connectivity as an emergent phenomena, even these models must be integrated and/or evaluated on the basis of a pre-existing network. For traditional hydrologic models, the network is a key model framework component that, if not correct, renders a model incapable of accurate prediction. It provides the pathways that a model could possibly route water and the implied relationship between upstream sources of water and downstream modeling elements like calibration/validation locations or locations the model is intended to make predictions.

A wide array of locations lies along the network of hydrologic flowpaths. These include confluences, waterbody inlets and outlets, infrastructure like bridges, streamgages, dams, diversions and returns, recreational sites, jurisdictional boundaries, environmental compliance zones, protected environments, etc. All these features and more may be critical to the function or purpose of a given hydrologic model. Robust and durable linkages between these hydrologic locations and the network of flowpaths and catchments are a key requirement implied by the hydrologic modeling use case.

6.3.4.  Multiscale Comparability

Until the emergence of the HY_Features data model, the only hydrologic modeling framework that could cross scale while adhering to the catchment data model was the Pfafstetter system [17] which requires a strict approach to aggregation that may not be general enough to suit all applications. The data model presented here provides a generalized approach to cross scale hydrologic model intercomparison and catchment aggregation. In practice, if a model is defined at a given resolution or with a given conceptualization, comparison of its predictions to a model using a different resolution or conceptualization would only be possible if the two models used a shared geospatial basis or some common set of prediction locations (such as streamgages). Rigorous adherence to the catchment data model has led to a natural ability to identify hydrologically identical locations in models at any scale.

7.  Background

This engineering report documents progress of an effort to establish a set of standard data models in support of hydrologic data integration and sharing. This section describes how key references that preceded this work provide foundational concepts and context for the work.

7.1.  NHDPlus

NHDPlus, first released as NHDPlusV1 in 2006 [9], combined feature data from the U.S. National Hydrography Dataset with a snapshot of the U.S. National Elevation Dataset [10]. Key enhancements that the NHDPlus introduced were:

  • A set of value-added attributes to enhance stream network navigation, analysis and display;

  • An elevation-based catchment for each flowline in the stream network ;

  • Catchment characteristics;

  • Headwater node areas;

  • Cumulative drainage basin characteristics;

  • Flow direction and flow accumulation grids;

  • Flowline min/max elevations and slopes; and

  • Flow volume and velocity estimates for each flowline in the stream network.

(Bullets reproduced from https://nhdplus.com/NHDPlus/NHDPlusV1_home.php [9])

In 2012, an enhanced NHDPlusV2 was released [8]. A further enhancement, NHDPlusHR [11], which builds on higher resolution hydrographic features and elevation data was produced between 2015 and 2023.

Specific advances introduced in NHDPlus that are of relevance here are:

  • indication of primary upstream and downstream connections at tributaries and diversions, respectively,

  • network attributes built from network topology that facilitate network navigation functionality across scales, and

  • and construction of catchment polygons for flowlines that receive lateral inflow from the surrounding landscape.

See [4] for additional background on the NHDPlusV2 data model and its application to hydrofabrics.

7.2.  WaterML2

WaterML2 is a suite of OGC conceptual, logical, and physical data models intended to support exchange of observational water data and related spatial features and metadata. WaterML2 was initiated as an international standardization effort inspired by WaterML1. Framed by the “Observations and Measurements” concept of an observation process, WaterML2 Part 1 primarily concerns observation results that are time-series data. Part 2 concerns “ratings and gagings” data that relate easily observed quantities like depth and velocity to (relatively difficult to observe) discharge. Parts 3 and 4 concern surface and subsurface hydrology features, respectively. These are so-called “features of interest” in observations and measurements.

7.3.  HY_Features

WaterML2 Part 3: Surface Hydrology Features Concepts (HY_Features) [6] was initially conceptualized in service of the World Meteorological Organization’s goals for an international hydrologic data exchange. [5] HY_Features introduces an abstract concept of “catchment” as a wholistic hydrologic unit which has several “realizations”, each a partial expression of the wholistic catchment concept. This HY_Features catchment concept formalizes the dual role of hydrologic systems in conveying flow both from the land surface to waterbodies and through networks of waterbodies to a common outlet. HY_Features also formalizes concepts for hydrologic datasets that cross-scales and fit together in self-similar and tightly integrated networks.

HY_Features is strictly a conceptual data model. That is, it does not specify logic for a particular application or the details of a particular physical data model encoding. As a result, HY_Features terms and definitions can be used for general documentation of hydrologic features and to underpin more specific logical and physical data models as discussed here.

7.4.  Geospatial Fabric

The USGS National Hydrologic Model (NHM) infrastructure [13] was developed over a period of years as a key outcome of a broader research agenda on continental domain hydrologic modeling [14]. The “Geospatial Fabric” [12], an aggregated set of modeling units based on NHDPlusV1 and a curated set of “points of interest”, was developed to support the NHM. The creation of an aggregated version of a base hydrographic dataset was novel and was a proof of concept for development of capabilities to delineate custom modeling units in future work [13]. The “Geospatial Fabric” had two key weaknesses: 1) it was build using largely manual methods and could not easily be regenerated given new points of interest or fixes to source datasets and 2) it was not based on a recognized or pre-specified data model, making development of software to work with it difficult. [4]

7.5.  Mainstems and Drainage Basins

HY_Features provides baseline concepts for creation of multi-scale datasets and the geospatial fabric proved that creation of aggregate modeling units is possible. However, a systematic and cross-scale integration of the network of waterbodies and catchment area that drains to the network was still needed. Such a system was described in [1] as an interpretation of familiar, yet informal, concepts of mainstems and drainage basins. A “mainstem” is the predominant pathway of a given drainage basin and, as a system, mainstems form a tree whose branches are the mainstems of nested drainage basins. As discussed in [1], the system is useful in that it supports cross-scale applications and can be used to uniquely identify rivers and the basins that they drain.

7.6.  National Water Model

The National Weather Service’s “National Water Model” (NWM) predicts runoff and streamflow for the entire NHDPlusV2 in an ongoing operational modelling system. The formulation of the NWM predicts runoff on a uniform rectilinear grid, apportions that runoff to catchment area polygons and routes flow through a network with parameters and topology based on the vector flowline network of NHDPlusV2. [18] Rather than use pre-specified “points of interest” as was done in the “geospatial fabric”, the NWM’s “Hydrofabric” uses the highly granular catchment outlets as forecast locations and tie points for observational streamgage data. By adopting the full NHDPlusV2 network, the NWM can predict streamflow at practically any location, but also adopts many exceedingly small / short and exceedingly large / long flowlines. Additionally, preparation of the NWM “Hydrofabric” was a largely manual process, resulting in the efforts to modify or improve it being costly and cumbersome. [19] [20]

7.7.  National Hydrologic Model

The “National Hydrologic Model” (NHM) is a moniker used by the USGS to describe a collection of modelling activities that share a common spatial framework that aims to provide national consistency and local relevance. Models associated with the NHM predict water budget components at daily and monthly time steps using a spatial conceptualization that minimizes the number of landscape parameters required by aggregating modeling units to the greatest extent possible given the requirement to resolve certain “points of interest” on the landscape. The process of preparing the “hydrologic geospatial fabric” for the NHM was, in its first major version, a manually orchestrated script-based workflow. [12] In its second major version, the manual orchestration was converted to full automation such that the aggregated modeling units could be generated based entirely on declarative workflow code. The workflow is available as an open-source software project.

7.8.  Hydrologic Geospatial Fabric

A “hydrologic geospatial fabric” (hydrogeofabric) is a combination of concepts from WaterML2, HY_Features, Mainstems, the NHM Geospatial Fabric, and the NWM Hydrofabric. A hydrogeofabric is composed of four components:

1) a network of connected mainstem river features which are composed of incremental flowlines, 2) geospatial representations of the flowlines as waterbodies, 3) geospatial representations of the catchment areas which drain to the incremental flowlines, and 4) a library of “points of interest” which are linked to the network features.

The components form a complete “fabric” of connected parts that capture the broad connectivity of the network, the relationship between local landscape units and network features, and the variety of societally and environmentally relevant locations that are meaningfully along the network. Additional discussion of the design and utility of a hydrogeofabric is presented in [6].

7.9.  Reference Flow Network

If mainstem identifiers in a hydrogeofabric are created to be persistent, the “flow network” component of a hydrogeofabric can be used as a long term reference network to curate the relationship between multiple datasets which depict catchments, flowlines, and locations in a variety of ways [4]. In this context, persistent mainstem identifiers are composed of a collection of flowline features which may change over time. By maintaining a primary persistent identifier for a dominant river in a given drainage basin, additional (smaller) drainage basins and rivers can be added or related to the flow network over time without the need to change or create new identifiers for pre-existing features. The key characteristic of a “reference flow network” is that it is the most resolved and highly validated network available which all others can be related to with confidence.

7.10.  Environmental Linked Features Interoperability Experiment (ELFIE)

The ELFIE and Second ELFIE (SELFIE) established a Web architecture and proposed best practices for exposure of hydrogeofabric data as a hydrologic index for water resources and related information. The first ELFIE [21] established a strategy for web resource content linking environmental features, such as hydrogeofabric features, to each other and related data, especially focused on monitoring and modeling data. The Second ELFIE [22] focused on the web resource access based on geospatial and linked data best practices and standards. The overall ELFIE content and resource access architecture provides data and web development teams guidance for establishment of a rich and cohesive system of data that spans multiple organizations while using a single hydrogeofabric as a shared index.

7.11.  3D Hydrography Program

The USGS 3D Hydrography Program (3DHP) [2] introduced a new data model in 2023 that combines aspects of the hydrologic geospatial fabric and reference flow network to support the evolution from the cartographic “National Hydrography Dataset” to a new generation of hydrography derived primarily from interpretation of elevation data. The data model uses a persistent mainstem identifier for all on-network features and is designed to facilitate improvement, densification and general evolution of the representation of rivers, lakes, and catchments while maintaining stable identifiers and overall functionality.

Figure 5 — simplified classes / schema of 3DHP core data model v2023. [_DHP-datamodel]

NOTE:  that updates may have occurred in newer versions of 3DHP. Dotted arrows indicate feature identifier relationships. Mainstemid is an aggregate feature identifier — no standalone mainstem table is specified so relationships are not shown."]

The data model is built around flowline geometries, which are highly granular line features that represent waterbodies that may or may not always exist as standing or flowing water. In the case that flowlines exist within waterbodies of appreciable width, an association is made between a flowline and the waterbody it is within. Flowlines which receive flow from upstream or contribute flow downstream are said to be part of the flownetwork. Flowlines are aggregated into small flowpath collections that connect a headwater to a confluence, one confluence to another, or a confluence to a network terminus. Catchments are created for the flowpath collections. Flowlines that exist upstream of the onset of channelized flow are categorized as “drainage ways” and are not included in flowpaths. To accommodate drainage ways, two “catchment level” grouping attributes are included: one for all flowlines within a catchment and one for the flowlines that lie along the flowpath of a catchment. Hydrolocations can be linked to any mainstemid and play several roles in the overall dataset.

8.  Logical Model

The hydrofabric logical model is specified using terms and definitions from HY_Features and the geopackage conceptual data models. The following sections describe key aspects of each “feature type” and “related table” in the hydrofabric logical data model. For an explanation of conventions used in the model class diagrams see the [_uml_reference].

Figure 6 — Class diagram showing only hydrofabric logical model classes.
Figure 7 — Class diagram showing relationship between HY_Features conceptual feature types (prefix of 'HY_') and hydrofabric logical feature types.

8.1.  flowline [feature type]

The flowline feature type is a HY_Flowpath represented as a single linear geometry. The area covered by the catchment associated with a single flowline is not included in the hydrofabric logical data model. This is an accommodation for flowlines that are levied or flow below ground and have negligible catchment area but do make up part of the hydrologic network. A flowline is based on the “waterbody-flowpath” constraint of the HY_HydrographicNetwork feature type from HY_Features and, as such, is both a flowpath and a linear representation of a flowing waterbody.

Relationships:

  • a flowline may be within a single waterbody polygon.

  • a flowline must be part of one and only one mainstem.

  • a flowline may be along the flowpath of a catchment.

  • a flowline may be within one catchment.

Attributes:

  • a flowline may have a cartographic name attribute

  • a flowline may have one or more type attributes

  • a flowline may have a flow direction attribute

  • a flowline may have a length attribute

Numerous additional attributes such as a date, stream order, stream level, topological sort, etc. may be desired in a given implementation but are not specified in the hydrofabric logical data model.

8.1.1.  flow network [related attribute table]

The flow network is a related attribute table extending the flowline feature type. It contains potentially many-to-many upstream to downstream relationships between flowlines and an indication of primary upstream and downstream connections.

Extended Relations:

  • “upstream flowline” indicates the flowline upstream of the flowline in the downstream flowline attribute.

  • “downstream flowline” indicates the flowline downstream of the flowline in the upstream flowline attribute.

Attributes:

  • “upstream main” indicates that the connection between downstream and upstream flowlines is primary. There can be one and only one upstream main connection from a given downstream flowline.

  • “downstream main” indicates that the connection between upstream and downstream flowlines is primary. There can be one and only one downstream main connection from a given upstream flowline.

8.2.  flowpath [aggregate feature type]

The flowpath feature type is a HY_Flowpath represented by an aggregation of flowlines extending from an inflow hydrolocation to an outflow hydrolocation. Given that flowpaths are aggregations of flowlines, a table of flowpath features would duplicate flowline geometry and is not required but may be useful in some circumstances. The flowpath feature type shares its identifier with the catchment it is associated with.

8.3.  catchment [feature type]

The catchment feature type is a HY_CatchmentDivide represented as a single polygon geometry. A catchment may have a flowpath connecting its inflow hydrolocation to its outflow hydrolocation. Catchments with no flowpath may have an outflow to a headwater hydrolocation.

Relationships:

  • a catchment may be associated with a flowpath which it shares an identifier with.

  • a catchment may contain any number of flowlines.

  • a catchment may be associated with one mainstem.

  • a catchment may have zero or one inflow hydrolocations.

  • a catchment may have zero or one outflow hydrolocations.

Attributes:

  • a catchment may have an area attribute

8.4.  mainstem [composite feature type]

The mainstem feature type is a HY_Flowpath represented as a composition of flowline features. The HY_CatchmentDivide of a mainstem is not included in the hydrofabric data model but could be derived as the union of all catchments which contribute to the outlet of a given mainstem. Similar to the flowpath feature type, a table of mainstem features would duplicate flowline geometry and is not required but may be useful in some circumstances. A table of derived mainstem summary attributes may also be useful in some applications.

In some cases, such as isolated drainage ditches, a mainstem may be composed of features with unknown or ambiguous flow direction. In this case, the mainstem’s headwater and outlet hydrolocation can be NULL to indicate that they are unknown to the dataset in question.

Relationships:

  • a mainstem has one and only one headwater hydrologic location which may be NULL if unknown.

  • a mainstem has one and only one outlet hydrologic location which may be NULL if unknown.

  • a mainstem has zero or one downstream mainstem.

Attributes:

  • a mainstem may have an estimated total drainage area to be interpreted as the drainage area at its outlet.

  • a mainstem may have an estimated total length.

8.5.  hydrolocation [feature type]

The hydrolocation feature type is a HY_HydroLocation represented as a point geometry. Hydrolocations are used as structural catchment inflow and outflow locations and may be placed anywhere along the network of mainstems to serve as links to and from other datasets and data systems.

Relationships:

  • A hydrolocation must be associated with one and only one mainstem feature.

  • A hydrolocation may have zero or more contributing and/or receiving catchments.

Attributes:

  • A hydrolocation may have a cartographic name

  • A hydrolocation may have one or more type attributes.

  • A hydrolocation may have one or more associated identifiers linking to other datasets or systems.

8.6.  hydrologic unit [feature type]

A hydrologic unit is a composition of catchment features represented by a single or multipolygon geometry. Hydrologic units may be catchment aggregates, yet in many cases, they will not adhere to the constraint that a catchment has no more than one inflow and no more than one outflow.

8.7.  waterbody [feature type]

The waterbody feature type is a HY_WaterBody represented as a single or multipolygon geometry. If a waterbody polygon represents a portion or all of a flowing body of water, one or more flowlines will be associated to it and the waterbody will be associated to the most downstream mainstem that exits the waterbody. If a single waterbody has more than one mainstem flowing out of it, it should be split into multiple waterbodies or associated with the most prominent mainstem flowing from it.

9.  Discussion

9.1.  Three Classes of HY_Flowpath

The hydrofabric logical model contains three separate feature types that are labeled as HY_Flowpath, each representing a different logical implementation of the concept.

At the most resolved, the “flowline” feature type is a HY_Flowpath for which the catchment divide may be the same as the flowline itself (a canal). Although catchment areas could be identified for flowlines, some would be insignificant and/or so small that representing them would not be useful. In contrast, no matter how small or separated from hydrologic process a flowline is, it must be represented in the overall network to connect upstream to downstream flowlines.

Slightly less resolved, the “flowpath” feature type is a HY_Flowpath that spans from one confluence or hydrolocation to another confluence or hydrolocation and does have a catchment associated with it. Flowlines that do not take part in flowpaths include but are not limited to small drainages upstream of a flowpath that do not warrant catchment delineation and flowlines which emerge from a diversion and are levied or otherwise disconnected from surface hydrology. The “flowpath” feature type is intended to be 1:1 with the catchment feature type noting that headwater catchments will not have a recognized flowpath.

At the least resolved, the “mainstem” feature type is a HY_Flowpath that connects a headwater hydrolocation to an outlet hydrolocation. In contrast to a flowpath which is an aggregate of flowlines, a mainstem is a composite of flowlines because every flowline is part of one and only one mainstem. For this reason, some mainstem features (those composed of flowlines that are not flowpaths) will have no identifiable drainage basin. However, in reality, the drainage basin for such mainstems, though insignificant in comparison to others, could be represented if more refined data than linear waterbody / channel representations were used.

Figure 8 — Class diagram showing three classes of HY_Flowpath

9.2.  Role of Mainstem Identifiers

Persistent identification of features in a hydrofabric will support cross references between features in various datasets and models. However, persistent identifiers must be maintained over time if a dataset is improved or evolves. Costs associated with maintenance and conflation of persistent identifiers increase as the number if identifiers increases. By introducing too many identifiers it can become impossible to treat them as persistent because it is too costly to conflate and validate them over time. The mainstem feature type aims to balance these competing factors with a minimum number of features (which have persistent identifiers) to uniquely identify the fully resolved network of flowlines. As such, in the hydrofabric logical model, all “on-network” features are linked to a mainstem identifier and all other identifiers are treated as internal dataset identifiers that will not necessarily persist between uses of the data.

Hydrolocations are 1D features conceptually thought to be along a HY_Flowpath (mainstem). Given that every flowline has a mainstem identifier, this also means that every hydrolocation is along a flowline. To reduce maintenance cost and improve durability of relationships between datasets, the relative position along a mainstem or flowline is intentionally not captured. However, in a given application, the flowline a hydrolocation is along and / or the measure along a flowpath may be desirable in order to establish precise upstream / downstream relationships between hydrolocations.

Waterbodies often overlay flowlines and have a one-to-many relationship with them. Given this, a waterbody could be associated with many mainstem identifiers. For instance, for wide rivers where many tributary flowlines intersect the waterbody, a single, dominant mainstem that flows through the polygon that represents the wide river is associated with it. This is usually the most downstream mainstem exiting the waterbody.

A catchment boundary may or may not have an associated flowpath or an outlet hydrolocation. In such cases, the catchment would not have an associated mainstem. However, an isolated catchment may have an internally drained (endorheic) network. If an endorheic network terminates at a sink within a catchment, it would be considered a mainstem but not be considered to be a flowpath of the catchment and the catchment would not be associated with the mainstem. In cases where a catchment has an outlet with a flowpath downstream or has a flowpath flowing from its inlet to its outlet, a catchment does have an association to a mainstem. Similar to a waterbody, the most dominant mainstem that flows through a catchment is the one associated with the catchment boundary. For a headwater catchment that has no flowlines within its boundary but does have an outlet hydrolocation, the mainstem that the catchment contributes to is the one associated with the catchment.

In most cases, a mainstem is a feature that is referenced by other features rather than having many references to features along it. That is, a mainstem does not contain a list of flowlines that compose it or a list of hydrolocations along it but a hydrolocation or flowline has a link to a mainstem. The exception to this is for headwater and outlet hydrolocations. A mainstem is defined as the HY_Flowpath connecting a headwater to an outlet where the headwater and outlet can be thought of as hydrolocations, one at the outlet of a headwater catchment, the other at the outlet of the mainstem. A mainstem feature is then merely a pair of hydrolocations and an identifier to be used as a reference system between features of a given dataset and among datasets which need to cross reference the same network.

Figure 9 — Class diagram showing role of mainstems

9.3.  Catchment and Nexus Representation

The hydrofabric logical model represents the HY_CatchmentDivide conceptual feature type as a polygonal “catchment” logical feature type. Although the HY_Features concept of HY_Catchment is an abstract wholistic feature with several potential conceptual “HY_CatchmentRealization” representations, the hydrofabric logical model follows the practice of calling the polygonal representation of the HY_Catchment concept a “catchment.”

The HY_Features concept of HY_HydroNexus has only one “realization”, the HY_Hydrolocation. The hydrofabric logical model represents this concept as a hydrolocation that is associated with a catchment via inflow (hydrolocation) and outflow (hydrolocation) associations. This association is inherited from HY_Hydronexus and HY_Catchment through partial inheritance of the “realizedCatchment” and “realizedNexus” conceptual class association.

Figure 10 — Class diagram showing catchment and nexus representation

9.4.  Flow Network, Catchment Network, Mainstem Network

Topology expressed in the hydrofabric logical model can be thought of in three related ways. A network of flowlines is a highly granular network of interconnected HY_Flowpaths (flowlines), some of which do not have discernable catchment boundaries. A network of catchments is the network of HY_Flowpaths (flowpaths) for which discernable catchments exist and inlet and outlet hydrolocations (HY_HydroNexus representations) can be identified. A network of mainstems is the network of HY_Flowpaths (mainstems) that create a cross-scale tree of features from the most granular flowline to the largest rivers. These three networks co-exist and “pin together” at inlet and outlet hydrolocations.

NOTE:  The hydrofabric logical model flow network does not require use of “node” identifiers between flowlines but nodes can be constructed from a many-to-many flow network representation. Constructing node identifiers can be accomplished by:

  1. a many-to-many table with fromid and toid ;

  2. ensure headwater flowlines do not appear in toid and outlet flowlines do not appear in fromid;

  3. group the table by fromid such that groups of toids from a given fromid are apparent; and

  4. create ids based on unique sets of toid groupings — sorting the ids in each group accomplishes this.

The result is one node id for junctions with any number of inflowing and outflowing flowlines.

Sample R code for the above operation looks like:

` select(x, fromid, toid) |> filter(!is.na(.data$fromid) & !is.na(.data$toid)) |> group_by(.data$fromid) |> mutate(node_id = paste(sort(toid), collapse = “-”)) |> ungroup() `

Derived from: Blodgett, D., 2023, hydroloom: Utilities to Weave Hydrologic Fabrics, https://doi.org/10.5066/P9AQCUY0

With this, the unique character-encoded node_id can be converted to whatever identifier scheme is desired.

Headwater and outlet nodes must also be constructed for fromids that do not appear in toid and toids that do not appear in fromids respectively.

10.  Summary and Next Steps

The hydrofabric logical model is the result of a variety of use-case driven applications that have sought to use existing data sources with concepts of the HY_Features conceptual data model. It provides specific logic for implementation of key use cases including:

  1. durable links to hydrologic locations representing network locations and other points of interest;

  2. cross-scale integration and multi-scale representation of hydrographic and hydrologic data and models;

  3. support for non-dendritic connectivity in the context of a predominant dendritic network of mainstems and drainage basins;

  4. support for linear waterbodies that do not take part in local hydrology but form part of a broader hydrologic network;

  5. direct integration of hydrologic summary units that do not adhere to all aspects of the HY_Features catchment data model; and

  6. support for on- and off-network waterbodies represented as polygonal geometries.

Examples of data models that were used in applications that led to the creation of the hydrofabric logical model are presented in Annex A. This logical model has not been implemented wholistically in any of these cases. A sample implementation of the logical model as a set of Simple Features tables compatible with GeoPackage and GeoJSON is included in Annex B. Future work could seek to implement an iteration of this data model for future application(s) and describe the encoding as a formal specification.

10.1.  Suitability of Logical Model for Use Cases

The use cases the hydrofabric logical model is intended to support (elevation integrated hydrography, hydrography data services, and hydrologic modeling) have all been satisfied using data structures that adopt the same core logic.

The following features of the data model are designed to support the range of use cases in the context of one general set of structures.

  1. The separation of flowlines, flowpaths, and mainstems provides flexibility and cross scale functionality without forcing complexity and large data volumes on use cases that do not require them.

  2. The flow network construct is compatible with a many-to-many representation of the network that does not involve “node” identifiers as well as a one-to-one to one (“fromnode” to “flowline” to “tonode”) data structure that leverages node identifiers within a table of flowlines.

  3. Recognition of upstream and downstream “main” paths is supported within the flownetwork in a way that is compatible with the NHDPlus “streamlevel” and “divergence” attribute approach.

  4. A special form of catchment aggregate, “hydrologic unit” is included to accommodate jurisdictional and cataloging units that do not adhere strictly to the one inflow one outflow catchment data model constraints.

For hydrologic modeling use cases that do not use control volumes or modeling units that are idealized as catchments, the hydrofabric data model is of limited direct utility but can still provide a hydrologic data framework for model input and output.

For hydrography data services, some of the aggregate features may need to be unioned into cached use-case specific instances and a wide range of use-case specific value-added attributes may be important, but the core flowline, catchment, waterbody abstraction is flexible and well suited to all use cases the data model has been tested with.

Finally, as a core data model for elevation derived hydrography, the data model has yet to be tested in operations, but has proved suitable for development of elevation derived hydrography replacement of legacy cartographic hydrography. The complexity inherent in the relations between feature types, aggregate features, and networked features have proven difficult to document and implement operational strategies for, but previous efforts did not account for some key complexities at all and no blockers to progress have been experienced as of summer 2024.

10.2.  Outstanding Issues

Large waterbodies present a unique challenge for general implementation of the catchment data model. In some cases, waterbodies can be thought of as overlaying the catchment network. For example, in the case of rivers that may nearly dry up at some time in the year, this is an obvious approach because at some point, the river bed is part of the land surface. In other cases, the boundary between land and water is more persistent and the bathymetric contours never take part in surface hydrology. For example, a 500 square kilometer inland lake surrounded by developed and agricultural lands, needs to be represented as if it were an ocean shoreline. A convention that has been used successfully in NHDPlus is to recognize “frontal” catchments with an outlet nexus represented as the nominal shoreline along a large body of water. [8] While functional for a given implementation of catchments, this solution fails to support cross-scale applications because what is a “large” waterbody at one catchment discretization is different in a different discretization.

Regions of the landscape that do not have established fluvial geomorphology to provide regionally connected surface conveyances are not well supported by the hydrofabric logical model. For example, definition of catchment boundaries and flowpaths in landscapes dominated by wetlands, karst geology, deep sand, or glacial outwash (among others) do not have a natural hierarchical drainage network. In such cases, if catchments are needed to partition the landscape, additional data model constructs are needed to express the connectivity between catchments. Compatibility of a non-surficial network with the surficial “flow network” as presented in this report is an open question.

rivers with complex systems of flowlines or channels that surround islands are not accounted for uniquely in the hydrofabric logical data model. No explicit distinction is drawn between a diverted flowline that stays within a river’s typical watercourse and a diverted flowline that exits the river valley to form an altogether separate river valley. A distinction could be drawn implicitly or by introduction of a convention, but clear and consistent methods for describing the distinction have been identified in work on the hydrofabric logical model to date.

Hydrofabric data are never done. Updates may be needed to improve some aspect of the data or to reflect changes that have occurred due to natural or human influences. Such updates are often executed by or are for people who do not have direct access to make updates to a centrally managed hydrofabric dataset. Similarly, a hydrofabric needs to integrate with data systems that are not known to the hydrofabric system or its stewards. These loose associations between updates and uses of a hydrofabric represent a large number of use cases with complex push and pull information flows. The reference fabric data model is designed to accommodate critical use cases in this area, yet significant testing and experimentation will be required in early operations of systems that leverage the hydrofabric model.

10.3.  Next Steps

The core hydrofabric logical data model as presented here can be implemented as shown in Annex B. It is offered for community consideration and as a guide for formalization of a logical and physical model encoding.


Bibliography

[1]  Blodgett, David; Johnson, J. Michael; Sondheim, Mark; Wieczorek, Michael; Frazier, Nels. Mainstems: A logical data model implementing mainstem and drainage basin feature types based on WaterML2 Part 3: HY Features concepts. Environmental Modelling & Software, vol. 135, pp. 104927, 2021. DOI: https://doi.org/10.1016/j.envsoft.2020.104927.

[2]  Anderson, Rebecca; Lukas, Vicki; Aichele, Stephen S. The 3D national topography model call for action—part 1. The 3D hydrography program. U.S. Geological Survey Circular 1519, U.S. Geological Survey, 2024. DOI: https://doi.org/10.3133/cir1519.

[3]  Jones, Kim; Niknami, Lily; Buto, Sue; Decker, Drew. Federal standards and procedures for the national Watershed Boundary Dataset (WBD) (5th ed.). U.S. Geological Survey Techniques and Methods 11-A3, US Geological Survey, 2022. DOI: https://doi.org/10.3133/tm11A3.

[4]  Blodgett, David; Johnson, J. Michael; Bock, Andy. Generating a reference flow network with improved connectivity to support durable data integration and reproducibility in the coterminous US. Environmental Modelling & Software, vol. 165, pp. 105726, 2023. DOI: https://doi.org/10.1016/j.envsoft.2023.105726.

[5]  Atkinson, Robert; Dornblut, Irina; Smith, Darren. An international standard conceptual model for sharing references to hydrologic features. Journal of Hydrology, vol. 424, pp. 24–36, 2012. DOI: https://doi.org/10.1016/j.jhydrol.2011.12.002.

[6]  Blodgett, David; Johnson, J. Michael. Hydrologic Modeling and River Corridor Applications of HY_Features Concepts. Open Geospatial Consortium, 2022. URL: http://www.opengis.net/doc/PER/Hydrofabric-er.

[7]  Blodgett, D.; Dornblut, I. OGC WaterML 2: Part 3—Surface Hydrology Features (HY_Features)—Conceptual Model. Technical Report 14‑111r7, Open Geospatial Consortium, 2018. URL: http://www.opengis.net/doc/IS/hy-features/1.0.

[8]  McKay, Lucinda; Bondelid, Timothy; Dewald, Tommy; Johnston, J.; Moore, Richard; Rea, Alan. NHDPlus version 2: User guide. US Environmental Protection Agency, vol. 745, 2012. URL: https://www.epa.gov/system/files/documents/2023-04/NHDPlusV2_User_Guide.pdf.

[9]  U.S. Environmental Protection Agency. NHDPlusV1. 2008. URL: https://www.epa.gov/waterdata/nhdplusv1-data. (Accessed 2024‑02‑27).

[10]  US Geological Survey. National Elevation Dataset. Fact Sheet, 1999. DOI: http://dx.doi.org/10.3133/fs14899.

[11]  Moore, Richard B.; McKay, Lucinda D.; Rea, Alan H.; Bondelid, Timothy R.; Price, Curtis V.; Dewald, Thomas G.; Hayes, Laura. User’s guide for the National Hydrography Dataset Plus High Resolution (NHDPlus HR). Scientific Investigations Report, US Geological Survey, 2025. DOI: http://dx.doi.org/10.3133/sir20255031.

[12]  Viger, Roland; Bock, Andy. GIS Features of the Geospatial Fabric for National Hydrologic Modeling. 2014. DOI: https://doi.org/10.5066/F7542KMD.

[13]  Regan, R.S.; Juracek, K.E.; Hay, L.E.; Markstrom, S.L.; Viger, R.J.; Driscoll, J.M.; LaFontaine, J.H.; Norton, P.A. The U.S. Geological Survey National Hydrologic Model infrastructure…​. Environmental Modelling & Software, vol. 111, pp. 192–203, 2019. DOI: https://doi.org/10.1016/j.envsoft.2018.09.023.

[14]  Archfield, Stacey A.; Clark, Martyn; Arheimer, Berit; Hay, Lauren E.; McMillan, Hilary; Kiang, Julie E.; …​ Over, Thomas. Accelerating advances in continental domain hydrologic modeling. Water Resources Research, vol. 51, no. 12, pp. 10078–10091, 2015. DOI: https://doi.org/10.1002/2015WR017498.

[15]  Archuleta, Christy-Ann; Terziotti, Silvia. Elevation-derived hydrography—Representation, extraction, attribution, and delineation rules. Techniques and Methods, U.S. Geological Survey, Versions 1.0 (2020) & 1.1 (2023), 74 pp. DOI: https://doi.org/10.3133/tm11B12.

[16]  Beven, K.J. Rainfall-Runoff Modelling: The Primer. Wiley, 2012. ISBN: 9780470714591.

[17]  Pfafstetter, Otto. Classification of hydrographic basins: coding methodology. Departamento Nacional de Obras de Saneamento, vol. 18, pp. 1–2, 1989.

[18]  Cosgrove, Brian et al. NOAA’s National Water Model: Advancing operational hydrology through continental-scale modeling. JAWRA Journal of the American Water Resources Association, vol. 60, no. 2, pp. 247–272, 2024. DOI: https://doi.org/10.1111/1752-1688.13184.

[19]  Ogden, Fred et al. The Next Generation Water Resources Modeling Framework…​. Presented at AGU Fall Meeting 2021. URL: https://ui.adsabs.harvard.edu/abs/2021AGUFM.H43D..01O.

[20]  NOAA Office of Water Prediction; Johnson, Mike. Hydrofabric for Next Generation Water Resource Modeling. NOAA OWP, 2024. URL: https://noaa-owp.github.io/hydrofabric/. (Accessed 2025‑12‑03).

[21]  Blodgett, David; Cochrane, Byron; Atkinson, Rob; Grellet, Sylvain; Feliachi, Abdelfettah; Ritchie, Alistair. OGC Environmental Linked Features Interoperability Experiment Engineering Report. Open Geospatial Consortium, 2018. URL: http://www.opengis.net/doc/PER/elfie-er.

[22]  Blodgett, David. Second Environmental Linked Features Experiment. Open Geospatial Consortium, 2020. URL: http://www.opengis.net/doc/PER/SELFIE-ER.

[23]  Horn, C. Robert. Appendix A to Metadata for RF1 USEPA Reach File 1 converted to Arc/INFO. USEPA, 1994. URL: https://water.usgs.gov/GIS/browse/rf1_appA.HTML.

[24]  Nolan, J.V.; Brakebill, J.W.; Alexander, R.B.; Schwarz. ERF1_2 — Enhanced River Reach File 2.0. U.S. Geological Survey, 2002. DOI: https://doi.org/10.5066/P9JVHVND.


Annex A
(informative)
Summary of Data Models

The following sections describe the existing data models that were used in applications that led to the hydrofabric data model documented in this report. Important similarities and differences between these data sources and the hydrofabric data model are highlighted to assist in understanding how these data are adapted into the hydrofabric.

This summary is by no means exhaustive and omits details that are not relevant to the hydrofabric logical model.

A.1.  Enhanced River Reach File

Nolan, J.V., Brakebill, J.W., Alexander, R.B., and Schwarz, G.E., 2002, ERF1_2 — Enhanced River Reach File 2.0: U.S. Geological Survey data release, https://doi.org/10.5066/P9JVHVND.

The earliest digital national scope hydrographic dataset in the United States was digitized from aviation charts in the 1970s [23]. Known as the “Reach File 1” (RF1), it was implemented on early computing infrastructure at the U.S. Environmental Protection Agency.

With about 68,000 “reach segments”, the dataset was spatially sparse but hydrologically rich. It was designed to support network operations and hydrologic indexing of water quality and other observational data as well as routing of water quantity and quality constituents through an explicitly non-dendritic network. The concept of primary upstream and primary downstream path was present in the form of stream “level”, which distinguishes tributary from primary upstream, and divergence codes, which distinguish primary and secondary paths below a divergence. Drainage basins and hydrologic units were also present in the dataset in the form of an identifier for all features draining to a given outlet and a “cataloging unit” that was identified as the first eight characters of a segment identifier.

Although many concepts were represented to some degree, RF1 was only published in a single ArcInfo exchange file with very limited population of its attributes. Subsequent to its original release, numerous projects worked with iterations and offshoots of RF1 [24], however, the conceptual underpinnings laid out in Horn, 1994 were maintained.

A.1.1.  flowline

The “reach segment” discretization of RF1 is equivalent to the flowline logic of the hydrofabric logical model. The RF1 data model is contained entirely within the flowline table with numerous descriptive attributes and various grouping attributes that represent virtual features.

A.1.1.1.  flow network

The flow network in RF1 is encoded using a “fromnode” and “tonode” structure where each reach segment is thought to flow from and to a “node”. Nodes do not have a physical representation other than the (usually identical) geometry node at the upstream or downstream end of related flowlines. The hydrofabric logical model does not depend on identified nodes but the node representation used by RF1 is compatible.

RF1 represents primary upstream connections implicitly via a “level” attribute. At a confluence, the tributary has a higher level than the main upstream flowline. RF1 represents primary downstream connections explicitly with a divergence code attribute (and explicit flow split fractions in [24]).

A.1.2.  flowpath

RF1 did not represent catchments and does not include the concept of flowpath.

A.1.3.  catchment

RF1 did not represent catchments.

A.1.4.  mainstem

RF1 used an implicit definition of mainstems and did not explicitly identify them. However, the concept of mainstem and drainage basin was present in that a “level” attribute was used to identify a headwater to outlet pathway for basins across scales.

A.1.5.  hydrolocation

Given that RF1 only included flowline features, hydrolocations were not included explicitly. However, the data model was used to support hydrologic indexing of data along rivers. Implementation details of hydrolocation indexing in RF1 are spotty, but the core concept that a hydrolocation is along a feature that can be conceptualized as a flowpath was present from the earliest ideas for the data model.

A.1.6.  hydrologic unit

Given that RF1 did not represent catchments, the hydrologic unit concept is also not represented. However, “cataloging units”, an type of hydrologic unit, were used to encapsulate large portions of the dataset and were included as a sort of namespace in identifiers.

A.1.7.  waterbody

RF1 represents waterbodies in a way that is incompatible with the hydrofabric logical model. Waterbodies not along a river network are not represented at all and waterbodies along flowing waterbodies are represented as a special type of reach segment which represents the shoreline. In this arrangement, the main flow into a waterbody appears to be a diversion but it is actually a transition from a “transport reach” to a pair of “shoreline” reaches.

A.2.  Watershed Boundary Dataset

https://www.usgs.gov/national-hydrography/watershed-boundary-dataset?qt-science_support_page_related_con=4#qt-science_support_page_related_con

The Watershed Boundary Dataset (WBD) is a multi-scale hydrologic cataloging unit dataset defined at six levels. The levels are two, four, six, eight, ten, and twelve digit numeric codes (hydrologic unit code or HUC) which define increasingly resolved hydrologic units that exist in a spatially nested hierarchy that corresponds to the identifier hierarchy. Each level is intended to contain units of a relatively consistent area with a single hydrologic outlet or a coastline or major waterbody shoreline defining the outlet.

To accommodate the consistent hydrologic unit goal, a WBD hydrologic unit can receive inflow from many upstream units and may include many small coastal outlets. This arrangement corresponds well to the logic of the hydrologic unit feature type in the hydrofabric logical model.

A.2.1.  flowline

The WBD does not contain flowlines.

A.2.1.1.  flow network

Though the WBD does not contain flowlines, it does include an indication of the primary downstream HUC. Abstractly, the connectivity between HUCs can be interpreted as a rudimentary flow network. The network is dendritic (no divergent connections) except in the case of qualitative attributes indicating secondary outlets or ambiguous connectivity. No indication of primary upstream HUC is identified in the WBD.

A.2.2.  flowpath

The WBD does not contain flowpaths.

A.2.3.  catchment

The WBD does not represent catchments.

A.2.4.  mainstem

The WBD does not represent mainstems.

A.2.5.  hydrolocation

Given that the WBD does not represent flowpaths, flowlines, or mainstems, it does not support hydrolocations.

A.2.6.  hydrologic unit

The WBD is predominantly a hydrologic unit dataset. However, given that it was developed independent of the rest of the network, it does not have explicit relationships that are expected by the hydrofabric logical model. In particular, the catchments that a given hydrologic unit is composed of are not defined explicitly and, as a result, the WBD is not easy to integrate with datasets that include catchments, flowlines, etc.

A.2.7.  waterbody

The WBD does not represent waterbodies directly. However, inland waterbodies that are larger than the target size of a given HUC level are represented as polygons with a type that indicates that they are waterbodies. These special units are different from all other units in that they have a historical shoreline rather than a topographic ridge as their boundary.

A.3.  National Hydrography Dataset

https://www.usgs.gov/national-hydrography/national-hydrography-dataset?qt-science_support_page_related_con=0#qt-science_support_page_related_con

The National Hydrography Dataset (NHD) is a digital hydrographic dataset that began as a 1:100k map scale product and was densified to 1:24k and finer map scale over time. As the detail of the dataset increased, the need for local knowledge in improvement of the dataset led to a “stewardship” approach in which regional contributors (stewards) facilitated edits to the dataset from local contributors. With increasing numbers of users and a stronger emphasis on cartographic detail came increased data type diversity and a decrease in the hydrologic functionality.

The NHD includes point, line, and polygon features and ancillary attribute tables in a relational database structure packaged within a geodatabase. The primary linear feature in NHD is referred to as a “flowline” which is identified with a “permanent identifier”. Point features can be indexed to flowlines using a relatively persistent, “reachcode” identifier which can group many flowlines into one composite feature. Both on and off network waterbodies are represented as polygon features. Network connectivity through waterbodies is represented as special “artificial path” flowlines. Although the flowlines form a geometric network in which flowlines are digitized from upstream to downstream, attributes for upstream downstream connections reflect only geometric connectivity.

A.3.1.  flowline

The most resolved linear feature in NHD is referred to as a flowline and is directly compatible with the hydrofabric logical model flowline.

A.3.1.1.  flow network

The NHD “flow table” is compatible with the hydrofabric logical model flow network but does not identify primary upstream or primary downstream connections at confluences or divergences. The NHD flow table is a many-to-many table describing from flowline to flowline connections that are implied by the geometric network formed by flowline geometry.

A.3.2.  flowpath

The NHD does include a persistent grouping attribute for flowlines, the “reachcode”, but reaches are not flowpaths of catchments and are not compatible with the hydrofabric logical model. Reachcode’s purpose, to provide a persistent ID for network features that may change over time, is different from the purpose of the “flowpath” feature, to identify the collection of flowlines that connect the inlet of a catchment to its outlet.

A.3.3.  catchment

The NHD does not represent catchment features.

A.3.4.  mainstem

The NHD does not represent mainstems.

A.3.5.  hydrolocation

The NHD includes a table of “hydrologic events” — a name that draws from the linear referencing terminology of an “event” that lies at a position along a linear feature. The key attributes of an NHD event are the “reachcode” and “measure” which represent a specific location (percent upstream) along the specified reach. In some ways, the NHD event data model is compatible with the hydrofabric logical model, but it includes much more specific and less durable information about the relationship between the location and the hydrologic network than the hydrofabric logical model. In contrast, a hydrolocation in the hydrofabric logical model has a nominal location and a known mainstem.

A.3.6.  hydrologic unit

The NHD does not include hydrologic unit features. However, the reachcode attribute carried by flowlines begins with the eight digit WBD hydrologic unit code such that hydrologic units can be used to extract subsets of the data through interpretation of identifiers.

A.3.7.  waterbody

Waterbodies in the NHD are represented as polygon features and include on and off network waterbodies. A special class of waterbody, commonly referred to as a “double line stream” are long narrow polygons used to describe wide rivers. In early versions of the NHD, these “double line streams” were in a “NHDarea” table separate from waterbodies but in later versions, all water features were consolidated into “NHDWaterbody” table.

A.4.  National Hydrography Dataset Plus

https://www.epa.gov/waterdata/nhdplus-national-hydrography-dataset-plus

The NHDPlus is a data model that incorporates the hydrologic attributes of RF1 and more with the 1:100k map scale NHD data. Two versions of NHDPlus were created, each with much the same data model between 2006 and 2019. The core data model of flowlines, reachcodes, waterbodies, and events from NHD is preserved and aspects of the RF1 are adopted and augmented for more advanced network navigation operations.

The most substantial addition introduced in NHDPlus is inclusion of elevation derived catchment polygons which describe the area of land with an implied surface drainage to flowlines. To create the catchment polygons, flowline and elevation preprocessing was performed to ensure the flowlines did not disagree with elevation and that the elevation surface would agree with the location of flowlines. This preprocessing involved trimming many headwater flowlines away from drainage divides and burning (artificially depressing elevation under flowlines features) the entire network into the elevation.

A.4.1.  flowline

In addition to the core NHD flowlines, NHDPlus includes numerous flowline attributes that support various hydrologic network based applications.

A.4.1.1.  flow network

The NHDPlus represents the flow network using a “fromnode” and “tonode” structure carried by the flowline table much the same as in RF1. Nodes do not have a physical representation other than the geometry node at the upstream or downstream end of related flowlines. the hydrofabric logical model does not include “nodes” it is compatible with the NHDPlus representation of network topology.

NHDPlus represents primary upstream connections implicitly via a “levelpath” attribute. At a confluence, the tributary has a different (lower) levelpath than the main upstream flowline. NHDPlus represents primary downstream connections explicitly with an divergence attribute as well as a “diverted fraction” for many divergences.

A.4.2.  flowpath

While NHDPlus doesn’t include flowpath features explicitly, it does include a distinction between flowlines that do not have an associated catchments and those that do. In most cases, these flowlines, which are thought to have no local contributing area, are so short that their drainage area is negligible or smaller than one elevation grid cell. However, in cases like man-made canals, the features may take part in the broader network of flowlines, form part of the mainstem of a drainage basin, but not have a strong relationship with hydrology within the landscape they flow through.

The vast majority of flowlines in the NHDPlus are 1:1 with catchments and are, by extension, also flowpath features. Unlike the hydrofabric logical model, however, the NHDPlus does not group flowlines into multi-flowline flowpaths of catchments. The result is a proliferation of catchments with many small tributaries or diversions.

A.4.3.  catchment

As discussed in flowpath, the NHDPlus data model includes catchment polygons for flowlines that are large enough to have a recognizable local contributing area and are a type of feature that is part of its local hydrology. These catchments are broadly compatible with the hydrofabric logical model.

A.4.4.  mainstem

NHDPlus represents mainstems with a “levelpath” attribute carried by flowline features. The levelpath identifier is tied to a “hydrologic sequence” attribute which numbers flowlines in downstream to upstream order according to a “topological sort” (graph theory terminology) of the network. The levelpath algorithm extends the hydrosequence of the outlet upstream along the main path at every confluence such that tributaries have a different (and higher) levelpath than the mainstem that they flow into.

In contrast with the hydrofabric data model, the NHDPlus levelpath attribute is not intended to be an identifier and, therefore, will change when the “hydrologic sequence” identifier of the network changes. Additionally, the “primary upstream” pathway determination that decides levelpath in the NHDPlus is sensitive to changes in flowline name and drainage network density. The hydrofabric logical model encourages stability to “upstream main” determination through assignment of identifiers that link a headwater to an outlet without strict adherence to naming or physical characteristics.

A.4.5.  hydrolocation

The NHDPlus adopts the linear referencing data model of the NHD.

A.4.6.  hydrologic unit

Like with the NHD, the NHDPlus does not include hydrologic unit features. However, in the second version of the NHDPlus, a snapshot of the WBD was used to “wall” (add artificially raised areas along hydrologic unit boundaries) the elevation data used to generate catchments. The resulting catchments agree with WBD unit boundaries in most cases. This leads to good spatial agreement between the two, but no specific network integration between the WBD and NHDPlus was established in the process.

A.4.7.  waterbody

The NHDPlus adopts the waterbody data model of the NHD.

A.5.  Reference and Derived Hydrofabrics

Bock, A.R., Blodgett, D.L., Johnson, J.M., Santiago, M., Wieczorek, M.E., 2024, PROVISIONAL: National Hydrologic Geospatial Fabric Reference and Derived Hydrofabrics: U.S. Geological Survey data release, https://doi.org/10.5066/P9NFPB5S — https://www.sciencebase.gov/catalog/item/60be0e53d34e86b93891012b

The NHDPlus defines about 2.6 million catchments with an average size of about 2.5 square kilometers. For many hydrologic modeling applications, it is desirable to have less and larger units. [16] The “Reference and Derived Hydrofabrics” is a series of datasets that seek to provide a generalized pattern to derive purpose-built hydrofabrics.

The “reference” term refers primarily to both the idea of a “reference system” used to link multiple sources of information together and a “reference instance” used as a shared resource for one reason or another. In this context, the “reference fabric” is both the system used to cross reference and integrate many data sources and the “reference features” to use as inputs workflows that derive purpose-built fabrics.

A.5.1.  flowline

The reference fabric adopts from the best available source flowlines available (the second version of NHDPlus [8] at the time of writing).

A.5.1.1.  flow network

Although features are not modified, best available network routing modifications are incorporated and network attributes are improved from various data sources if possible. Both a dendritic one:one id:toid network and a non-dendritic fromnode:tonode network are included in the reference hydrofabric.

A.5.2.  flowpath

The reference hydrofabric adopts the flowpath scheme of NHDPlus where most flowlines have a defined catchment area polygon and others do not. Derived hydrofabrics (aggregated for modeling purposes) eliminate flowlines without catchments as much as possible but, at the time of writing, are handled the same as NHDPlus.

A.5.3.  catchment

The reference hydrofabric adopts the catchment scheme of NHDPlus. Derived hydrofabrics eliminate very small catchments and seek to aggregate catchment polygons such that modeling requirements are met while upholding the hydrofabric catchment data model.

A.5.4.  mainstem

The reference hydrofabric adopts the NHDPlus levelpath and adds a persistent mainstem attribute to provide cross-dataset interoperability and support intercomparison of model applications build on it.

A.5.5.  hydrolocation

The reference hydrofabric includes a wide array of hydrolocations (monitoring locations, dams, water use infrastructure, etc.) that can be selected as inflow/outflow nexuses of derived fabric catchments according to model requirements.

A.5.6.  hydrologic unit

The reference and derived hydrofabric model does not include hydrologic unit polygons.

A.5.7.  waterbody

The reference and derived hydrofabric adopts the waterbody model of NHDPlus.

A.6.  Next Generation National Water Model Hydrofabric

https://noaa-owp.github.io/hydrofabric/articles/04-data-model-deep-dive.html#data-model

One of the derived hydrofabrics supports the Next Generation (NextGen) National Water Model. NextGen requires a hydrofabric data model which can encapsulate models of hydrologic process such that model formulation and operation can be decomposed into spatial subsets but also unified under a consistent high-resolution spatial framework. The use of regional decomposition of a network into processing units has been used in the past, but not to the degree required by coupled hydrologic and hydrodynamic modeling as is being pursued in NextGen. Where to break up the network must be flexible and the data model for regional model units must be rigorously yet flexibly defined.

In addition to the need for a regional domain decomposition, NextGen requires a consistent and high-resolution spatial framework that every model adheres to. That is, forecast, calibration, data assimilation, and other key model locations need to be handled across every region of the model in the same way and data from and for external systems must route into and out of the operational model at pre-determined framework locations.


Annex B
(informative)
Example Hydrofabric Encoding

Although aspects of the hydrofabric logical model have been implemented in various applications, no complete implementation of the model has been assembled. The following R code implements a nearly complete example of the logical model using the simple features (sf) R package which provides bindings to the GDAL GPKG driver. Example data are displayed in the sf R data table format.

This example begins from a subset of NHDPlusV2 data. Code and comments illustrate the necessary transformations to transform the NHDPlus data model to the hydrofabric logical model.

Listing B.1
library(nhdplusTools)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union

library(sf)
#> Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE


source(system.file("extdata/sample_data.R", package = "nhdplusTools"))

st_layers(sample_data)
#> Driver: GPKG
#> Available layers:
#>               layer_name                 geometry_type features fields
#> 1    NHDFlowline_Network 3D Measured Multi Line String      267    136
#> 2            CatchmentSP                 Multi Polygon      266      6
#> 3                   Gage                         Point       46     19
#> 4                NHDArea                 Multi Polygon        3     14
#> 5           NHDWaterbody                 Multi Polygon      128     21
#> 6 NHDFlowline_NonNetwork             Multi Line String       49     12
#> 7                   Sink                         Point       10      9
#>               crs_name
#> 1 GRS 1980(IUGG, 1980)
#> 2 GRS 1980(IUGG, 1980)
#> 3 GRS 1980(IUGG, 1980)
#> 4 GRS 1980(IUGG, 1980)
#> 5 GRS 1980(IUGG, 1980)
#> 6 GRS 1980(IUGG, 1980)
#> 7 GRS 1980(IUGG, 1980)


wbd_data <- "wbd_demo.gpkg"

st_layers(wbd_data)
#> Driver: GPKG
#> Available layers:
#>         layer_name geometry_type features fields crs_name
#> 1    drainage_area Multi Polygon       11     28    NAD83
#> 2 da_hydrolocation         Point       11     16   WGS 84


flowline_in <- read_sf(sample_data, "NHDFlowline_Network") |>
 
sf::st_zm()
hydrolocation_in <- read_sf(sample_data, "Gage")
catchment_in <- read_sf(sample_data, "CatchmentSP")
waterbody_in <- read_sf(sample_data, "NHDWaterbody")

drainage_area_in <- read_sf(wbd_data, "drainage_area")

da_hydrolocation_in <- read_sf(wbd_data, "da_hydrolocation")

catchment <- catchment_in |>
 
select(catchment_id = FEATUREID,
        
area_sqkm = AreaSqKM)

flowline <- flowline_in |>
 
select(flowline_id = COMID,
        
name = GNIS_NAME,
        
name_id = GNIS_ID,
        
mainstem_id = LevelPathI,
        
waterbody_id = WBAREACOMI,
        
flowpath_id = COMID,
        
catchment_id = COMID,
        
date = FDATE,
        
type = FTYPE,
        
type_id = FCODE,
        
flow_direction = FLOWDIR,
        
length_km = LENGTHKM,
        
stream_order = StreamOrde,
        
stream_level = StreamLeve,
        
topo_sort = Hydroseq) |>
 
# flowlines that do not have catchments are not get a
 
# flowpath_id or a catchment_id in this way.
 
mutate(flowpath_id =
          
ifelse(flowpath_id %in% catchment$catchment_id,
                 
yes = flowpath_id, no = NA),
        
catchment_id =
          
ifelse(catchment_id %in% catchment$catchment_id,
                 
yes = catchment_id, no = NA))

# for flowlines that do not have catchments, place in a catchment
# with a spatial join.
no_cat_flowline <- filter(flowline, is.na(catchment_id)) |>
 
select(flowline_id) |>
 
st_join(select(catchment, catchment_id))

flowline$catchment_id[match(no_cat_flowline$flowline_id,
                           
flowline$flowline_id)] <-
 
no_cat_flowline$catchment_id

flow_network <- left_join(
 
select(st_drop_geometry(flowline_in), from_flowline_id = COMID, node = ToNode),
 
select(st_drop_geometry(flowline_in), to_flowline_id = COMID, node = FromNode),
 
by = "node", relationship = "many-to-many") |>
 
select(-node) |>
 
left_join(select(st_drop_geometry(flowline),
                  
flowline_id, up_mainstem_id = mainstem_id),
           
by = c("from_flowline_id" = "flowline_id")) |>
 
left_join(select(st_drop_geometry(flowline),
                  
flowline_id, dn_mainstem_id = mainstem_id),
           
by = c("to_flowline_id" = "flowline_id")) |>
 
mutate(upstream_main = up_mainstem_id == dn_mainstem_id) |>
 
left_join(select(st_drop_geometry(flowline_in),
                  
flowline_id = COMID, Divergence),
           
by = c("to_flowline_id" = "flowline_id")) |>
 
mutate(downstream_main = Divergence < 2) |>
 
select(from_flowline_id, to_flowline_id, upstream_main, downstream_main)


mainstem <- sf::st_drop_geometry(flowline_in) |>
 
select(mainstem_id = LevelPathI,
        
downstream_mainstemid = DnLevelPat) |>
 
filter(downstream_mainstemid != mainstem_id) |>
 
right_join(select(flowline_in,
                   
mainstem_id = LevelPathI,
                   
total_estimated_area_sqkm = TotDASqKM,
                   
length_km = LENGTHKM),
            
by = "mainstem_id") |>
 
group_by(mainstem_id) |>
 
summarise(downstream_mainstemid = downstream_mainstemid[1],
           
total_estimated_area_sqkm = max(total_estimated_area_sqkm),
           
total_length_km = sum(length_km))

stopifnot(nrow(waterbody_in) == length(unique(waterbody_in$COMID)))

waterbody <- waterbody_in |>
 
select(id = COMID) |>
 
left_join(select(st_drop_geometry(flowline),
                  
waterbody_id, mainstem_id, topo_sort),
           
by = c("id" = "waterbody_id")) |>
 
group_by(id) |>
 
filter(is.na(mainstem_id) | topo_sort == min(topo_sort)) |>
 
ungroup() |>
 
select(id, mainstem_id)

stopifnot(nrow(waterbody) == length(unique(waterbody_in$COMID)))

drainage_area <- drainage_area_in |>
 
select(drainage_area_id = HUC_12,
        
name = "HU_12_NAME")

da_hydrolocation <- da_hydrolocation_in |>
 
select(flowline_id = COMID, drainage_area_id = HUC12) |>
 
left_join(select(st_drop_geometry(flowline),
                  
flowline_id, mainstem_id),
           
by = "flowline_id") |>
 
left_join(st_drop_geometry(drainage_area),
           
by = "drainage_area_id") |>
 
mutate(type = "hydrolocation_outlet",
        
link = sprintf("%s%s",
                       
"https://geoconnex.us/nhdplusv2/huc12/",
                       
drainage_area_id)) |>
 
select(hydrolocation_id = drainage_area_id,
        
name, type, link, mainstem_id) |>
 
sf::st_transform(st_crs(hydrolocation_in))

hydrolocation <- hydrolocation_in |>
 
select(hydrolocation_id = SOURCE_FEA,
        
name = STATION_NM,
        
type = EventType,
        
link = FEATUREDET,
        
flowline_id = FLComID) |>
 
filter(flowline_id %in% flowline$flowline_id) |>
 
left_join(select(st_drop_geometry(flowline),
                  
flowline_id, mainstem_id),
           
by = "flowline_id") |>
 
select(hydrolocation_id, name, type, link, mainstem_id) |>
 
bind_rows(da_hydrolocation)

flowline
#> Simple feature collection with 267 features and 15 fields
#> Geometry type: MULTILINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: -89.58537 ymin: 42.95163 xmax: -89.19935 ymax: 43.30179
#> Geodetic CRS:  GRS 1980(IUGG, 1980)
#> # A tibble: 267 × 16
#>    flowline_id name    name_id mainstem_id waterbody_id flowpath_id catchment_id
#>  *       <int> <chr>   <chr>         <dbl>        <int>       <int>        <int>
#>  1    13296606 "Yahar… "15770…   510014902            0    13296606     13296606
#>  2    13297170 "Yahar… "15770…   510014902     13638191    13297170     13297170
#>  3    13297160 "Yahar… "15770…   510014902     13638191    13297160     13297160
#>  4    13293970 "Yahar… "15770…   510014902            0    13293970     13293970
#>  5    13293750 "Yahar… "15770…   510014902            0    13293750     13293750
#>  6    13296614 " "     " "       510103046            0    13296614     13296614
#>  7    13297104 "Murph… "15606…   510083790            0    13297104     13297104
#>  8    13297106 "Murph… "15606…   510083790            0    13297106     13297106
#>  9    13294002 "Swan … "15751…   510044655            0    13294002     13294002
#> 10    13297098 "Swan … "15751…   510044655            0    13297098     13297098
#> # ℹ 257 more rows
#> # ℹ 9 more variables: date <dttm>, type <chr>, type_id <int>,
#> #   flow_direction <chr>, length_km <dbl>, stream_order <int>,
#> #   stream_level <int>, topo_sort <dbl>, geom <MULTILINESTRING [°]>


flow_network
#> # A tibble: 279 × 4
#>    from_flowline_id to_flowline_id upstream_main downstream_main
#>               <int>          <int> <lgl>         <lgl>
#>  1         13296606             NA NA            NA
#>  2         13297170       13297174 TRUE          TRUE
#>  3         13297160       13297170 TRUE          TRUE
#>  4         13293970       13294362 TRUE          TRUE
#>  5         13293750       13294318 TRUE          TRUE
#>  6         13296614       13297176 TRUE          TRUE
#>  7         13297104       13297166 TRUE          TRUE
#>  8         13297106       13297104 TRUE          TRUE
#>  9         13294002       13297098 TRUE          TRUE
#> 10         13297098       13297164 TRUE          TRUE
#> # ℹ 269 more rows


catchment
#> Simple feature collection with 266 features and 2 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -89.60479 ymin: 42.92054 xmax: -89.17447 ymax: 43.36607
#> Geodetic CRS:  GRS 1980(IUGG, 1980)
#> # A tibble: 266 × 3
#>    catchment_id area_sqkm                                                   geom
#>           <int>     <dbl>                                     <MULTIPOLYGON [°]>
#>  1     13293858    0.236  (((-89.40161 43.05263, -89.40235 43.05267, -89.40238 …
#>  2     13296566    1.07   (((-89.42555 42.99245, -89.42605 42.99283, -89.42417 …
#>  3     13297160    0.0027 (((-89.27801 42.99195, -89.27803 42.99168, -89.2784 4…
#>  4     13293454    0.328  (((-89.34821 43.20583, -89.34895 43.20586, -89.34933 …
#>  5     13293750    6.69   (((-89.3827 43.07563, -89.38211 43.07674, -89.3816 43…
#>  6     13294134    0.261  (((-89.36091 43.186, -89.35872 43.18765, -89.35851 43…
#>  7     13296606    0.0072 (((-89.22245 42.96862, -89.22319 42.96866, -89.22308 …
#>  8     13293970    0.0054 (((-89.30496 43.00841, -89.30533 43.00843, -89.30525 …
#>  9     13297170    4.60   (((-89.2934 42.97093, -89.29335 42.97323, -89.29385 4…
#> 10     13293570    0.0514 (((-89.4159 43.15162, -89.41637 43.15182, -89.41698 4…
#> # ℹ 256 more rows


mainstem
#> # A tibble: 91 × 4
#>    mainstem_id downstream_mainstemid total_estimated_area_sqkm total_length_km
#>          <dbl>                 <dbl>                     <dbl>           <dbl>
#>  1   510014902                    NA                     910.           65.8
#>  2   510022216             510014902                     721.            1.79
#>  3   510022558             510022216                     719.            0.774
#>  4   510032155             510014902                     161.           27.4
#>  5   510040397             510014902                      68.4          15.4
#>  6   510043635             510014902                      31.7          13.0
#>  7   510044655             510014902                      47.0          10.4
#>  8   510047249             510014902                      11.4           6.28
#>  9   510050328             510014902                      92.8          19.2
#> 10   510054029             510043635                      24.6           0.718
#> # ℹ 81 more rows


hydrolocation
#> Simple feature collection with 42 features and 5 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -89.5361 ymin: 42.96278 xmax: -89.23768 ymax: 43.26737
#> Geodetic CRS:  GRS 1980(IUGG, 1980)
#> # A tibble: 42 × 6
#>    hydrolocation_id name       type  link  mainstem_id                 geom
#>  * <chr>            <chr>      <chr> <chr>       <dbl>          <POINT [°]>
#>  1 05428668         STARKWEAT… Stre… http…   510064405 (-89.33325 43.09201)
#>  2 05427767         TOKEN CRE… Stre… http…   510040397 (-89.29231 43.19794)
#>  3 05429485         LAKE WAUB… Stre… http…   510014902  (-89.30557 43.0089)
#>  4 054279465        S FORK PH… Stre… http…   510050328  (-89.5204 43.10306)
#>  5 05428500         YAHARA RI… Stre… http…   510014902 (-89.36087 43.08946)
#>  6 05428000         LAKE MEND… Stre… http…   510014902  (-89.3705 43.09503)
#>  7 05427948         PHEASANT … Stre… http…   510050328 (-89.51185 43.10335)
#>  8 425715089164700  LAKE KEGO… Stre… http…   510014902 (-89.23768 42.96278)
#>  9 05427950         PHEASANT … Stre… http…   510050328 (-89.49328 43.10453)
#> 10 05427905         SIXMILE C… Stre… http…   510032155 (-89.43161 43.14154)
#> # ℹ 32 more rows


waterbody
#> Simple feature collection with 128 features and 2 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -89.72879 ymin: 42.92041 xmax: -89.17013 ymax: 43.40395
#> Geodetic CRS:  GRS 1980(IUGG, 1980)
#> # A tibble: 128 × 3
#>          id mainstem_id                                                     geom
#>       <int>       <dbl>                                       <MULTIPOLYGON [°]>
#>  1 13631659          NA (((-89.60118 43.27022, -89.6014 43.27027, -89.60187 43.…
#>  2 13631679          NA (((-89.58624 43.233, -89.58668 43.23325, -89.58687 43.2…
#>  3 13631711          NA (((-89.58908 43.17055, -89.58908 43.17094, -89.58886 43…
#>  4 13631715          NA (((-89.59018 43.16022, -89.59056 43.16036, -89.59093 43…
#>  5 13631719          NA (((-89.59322 43.13091, -89.59363 43.13091, -89.59453 43…
#>  6 13631721          NA (((-89.60271 43.12017, -89.60271 43.12003, -89.60234 43…
#>  7 14711398          NA (((-89.5408 43.37117, -89.54042 43.37135, -89.53961 43.…
#>  8 14711406          NA (((-89.53951 43.30373, -89.53901 43.30384, -89.5376 43.…
#>  9 13631579          NA (((-89.51859 43.36459, -89.51887 43.36473, -89.51887 43…
#> 10 13631537          NA (((-89.51263 43.40379, -89.51273 43.40349, -89.51288 43…
#> # ℹ 118 more rows


drainage_area
#> Simple feature collection with 11 features and 2 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -89.60503 ymin: 42.95023 xmax: -89.20406 ymax: 43.36615
#> Geodetic CRS:  NAD83
#> # A tibble: 11 × 3
#>    drainage_area_id name                                                    geom
#>    <chr>            <chr>                                     <MULTIPOLYGON [°]>
#>  1 070900020701     Starkweather Creek           (((-89.3183 43.17158, -89.3177…
#>  2 070900020702     Lake Monona-Yahara River     (((-89.35882 43.11769, -89.358…
#>  3 070900020703     Lake Waubesa-Yahara River    (((-89.43115 42.99114, -89.430…
#>  4 070900020501     Goose Lake-Yahara River      (((-89.29434 43.363, -89.29447…
#>  5 070900020502     100 Mile Grove Cemetary      (((-89.39129 43.2703, -89.3912…
#>  6 070900020504     Cherokee Lake-Yahara River   (((-89.29126 43.24864, -89.292…
#>  7 070900020503     Token Creek                  (((-89.3183 43.17158, -89.3183…
#>  8 070900020601     Waunakee Marsh-Sixmile Creek (((-89.42489 43.2166, -89.4253…
#>  9 070900020602     Sixmile Creek                (((-89.39986 43.19733, -89.399…
#> 10 070900020604     Lake Mendota-Yahara River    (((-89.39251 43.174, -89.39251…
#> 11 070900020603     Pheasant Branch              (((-89.55417 43.16457, -89.554…


sf::write_sf(flowline, "hydrofabric_sample.gpkg", "flowline")
sf::write_sf(flow_network, "hydrofabric_sample.gpkg", "flow_network")
sf::write_sf(catchment, "hydrofabric_sample.gpkg", "catchment")
sf::write_sf(mainstem, "hydrofabric_sample.gpkg", "mainstem")
sf::write_sf(hydrolocation, "hydrofabric_sample.gpkg", "hydrolocation")
sf::write_sf(waterbody, "hydrofabric_sample.gpkg", "waterbody")
sf::write_sf(drainage_area, "hydrofabric_sample.gpkg", "drainage_area")

Annex C
(informative)
Hydrofabric Data Model Schematic

Figure C.1 — "Simplified Schematic of Hydrofabric Data"
  • “fl-*” flowline features are linear representations of where water may flow and may or may not have an associated catchment. Dashed flowlines represent features such as headwater drainage pathways or side channels and do not have a defined catchment area. Solid flowlines represent channelized flow pathways or linear waterbodies and are (part of) the flowpath of a specific catchment.

  • “hl-*” hydrologic location features are points that lie along the network of flowlines

  • “c-*” catchment features are polygons that encompass a unit of hydrology that performs both land surface (catchment area) and stream (flowpath) functions.

  • “wb-*” waterbody features are polygons that represent the extent of a 2D waterbody. They can relate to one or more flowlines that connect through them.

  • Thick colored lines are mainstem flowpaths that aggregate flowlines from a flow initiation location to a basin outlet. Mainstem features are composed of flowlines and provide a minimal yet sufficient set of linear feature identifiers for dataset cross referencing.

  • flowpath features, not shown explicitly, connect the inflow to the outflow of a catchment and are aggregates of flowlines.

  • dotted flowlines are within a catchment but not along its flowpath and solid blue flowlines are along a catchment’s flowpath.

  • each flowpath in the hydrofabric data model can have one or more “type” attributes although no type list is provided in the model.

The tables below present a minimal set of attributes and associations for the figure above. Note that these are intended only to illustrate key concepts of the logical model and leave out many details that would be necessary for any implementation.

C.1.  flowline

Table C.1
flowline idnametypeflow dir.lengthmainstemflowpathcatchmentwaterbody
fl-1lower grey111greyc-1c-1
fl-2lower grey112greyc-1c-1
fl-3white canal212whitec-1
fl-4lower grey111greyc-1c-1
fl-5middle grey111greyc-2c-2
fl-6black creek215blackc-2
fl-7middle grey111greyc-2c-2
fl-8green channel112greenc-3c-3
fl-9middle grey112greyc-4c-4
fl-10crystal pond311greyc-4c-4wb-1
fl-11upper grey111greyc-4c-4
fl-12little orange112orangec-5c-5
fl-13yellow run214yellowc-5
fl-14little orange112orangec-5c-5
fl-15upper grey212greyc-7
fl-16little orange212orangec-7
fl-17lemon ditch212light-yellowc-8

NOTE: 

  • types are flowlines that are part of a flowpath, flowlines that are not part of a flowpath, and flowlines that flow through a waterbody. Type code lists are not specified in this report.

  • flow direction type codes are not specified in this report.

  • length is made up visually

  • mainstem, flowpath, catchment and waterbody ids are associated to the figure and tables below.

C.2.  flownetwork

Table C.2
from flowline idto flowline idupmaindownmain
fl-15fl-11TRUETRUE
fl-11fl-10TRUETRUE
fl-10fl-9TRUETRUE
fl-16fl-14TRUETRUE
fl-14fl-12TRUETRUE
fl-13fl-12FALSETRUE
fl-12fl-8FALSEFALSE
fl-9fl-8TRUEFALSE
fl-9fl-7TRUETRUE
fl-7fl-5TRUETRUE
fl-6fl-5FALSETRUE
fl-8fl-4FALSETRUE
fl-5fl-4TRUETRUE
fl-4fl-3TRUEFALSE
fl-4fl-2TRUETRUE
fl-3fl-1FALSETRUE
fl-2fl-1TRUETRUE

C.3.  catchment

Table C.3
catchment idareainflowoutflow
c-13hl-5hl-1
c-23hl-7hl-5
c-34hl-7hl-5
c-45hl-11hl-7
c-54hl-12hl-7
c-63hl-12
c-73hl-11
c-81

C.4.  mainstem

Table C.4
mainstem idheadwateroutletdownstreamdrainage arealength
greyhl-14hl-12513
whitehl-3hl-2grey2
blackhl-16hl-6grey5
greenhl-7hl-5grey2
orangehl-13hl-7grey76
yellowhl-15hl-10orange4
light-yellowhl-17hl-1812

C.5.  hydrolocation

Table C.5
headwater idmainstemtypelink
hl-1greynexus
hl-2greynexus
hl-3greydiversion
hl-4greymonitoringmonitoring-link
hl-5greynexus
hl-6greyconfluence
hl-7greynexus
hl-8greywb-outlet
hl-9greywb-inlet
hl-10orangeconfluence
hl-11greyheadwater
hl-12orangeheadwater
hl-13orangeinitiation
hl-14greyinitiation
hl-15yellowinitiation
hl-16blackinitiation
hl-17light-yellowinitiation
hl-18light-yellowterminus

Annex D
(informative)
Revision History

Table D.1 — Revision History
DateEditorReleasePrimary clauses modifiedDescriptions
October 12, 2023D. Blodgett.1allinitialize repository
November 5, 2023D. Blodgett.2alldraft use cases
November 21, 2023D. Blodgett.3alloutline and images
February 29, 2024D. Blodgett.4backgrounddrafted background section
April 1, 2024D. Blodgett.5variouslogical model and discussion draft
July 30, 2024D. Blodgett.6variousdefinitions and additional use cases
August 3, 2024D. Blodgett.7appendix aadd data model summaries appendix
August 8, 2024D. Blodgett.8appendix b and summaryadd example encoding and summary content
October 22, 2024D. Blodgett.9appendix c and summaryadd schematic and more summary content
June 23, 2025D. Blodgett.10allrespond to colleague review
December 4, 2025D. Blodgett1allupdate per bureau approval