Getting started with edr4r

edr4r is a small, tidy client for any service that speaks OGC API - Environmental Data Retrieval. Most of the real-world use to date has been against in-situ monitoring networks – stream gauges, weather stations, snow telemetry, reservoir telemetry – but the package itself is generic.

Two example endpoints you can point it at right now:

This vignette runs end-to-end against the Western Water Datahub’s rise-edr collection – a pygeoapi wrap of USBR’s reservoir telemetry. The USGS example lives in its own article, vignette("usgs-streamgages").

1. Create a client

library(edr4r)
library(ggplot2)

client <- edr_client("https://api.wwdh.internetofwater.app")
client
#> <edr_client>
#>   base_url:   <https://api.wwdh.internetofwater.app>
#>   user_agent: edr4r/0.1.0 (+https://github.com/ksonda/edr4r)
#>   timeout:    60s
#>   max_tries:  3

The client just stores connection settings (base URL, user agent, timeout, retry policy). Pass verbose = TRUE to echo every request URL, or headers = to attach auth tokens.

2. Discover collections and parameters

edr_collections() lists every EDR collection the service serves.

collections <- edr_collections(client)
collections[, c("id", "title", "data_queries")]
#> # A tibble: 35 × 3
#>    id                 title                                         data_queries
#>    <chr>              <chr>                                         <list>
#>  1 rise-edr           USBR Reclamation Information Sharing Environ… <chr [4]>
#>  2 snotel-edr         USDA Snowpack Telemetry Network (SNOTEL)      <chr [4]>
#>  3 awdb-forecasts-edr USDA Air and Water Database (AWDB) Forecasts  <chr [4]>
#>  4 snotel-huc06-means USDA Snotel Snow Water Equivalent Aggregated… <NULL>
#>  5 usace-edr          USACE Access2Water API                        <chr [4]>
#>  6 noaa-qpf-day-1     NOAA Weather Prediction Center Quantitative … <NULL>
#>  7 noaa-qpf-day-2     NOAA Weather Prediction Center Quantitative … <NULL>
#>  8 noaa-qpf-day-3     NOAA Weather Prediction Center Quantitative … <NULL>
#>  9 noaa-qpf-day-4-5   NOAA Weather Prediction Center Quantitative … <NULL>
#> 10 noaa-qpf-day-6-7   NOAA Weather Prediction Center Quantitative … <NULL>
#> # ℹ 25 more rows

The data_queries column tells you which EDR query types each collection supports (locations, cube, area, …). Hit a verb the server doesn’t implement and you get an HTTP error.

To see the data parameters a collection exposes (the values you can pass to parameter_name = on the query verbs), use edr_parameters():

params <- edr_parameters(client, "rise-edr")
nrow(params)
#> [1] 782
head(params[, c("id", "name", "unit_symbol")])
#> # A tibble: 6 × 3
#>   id    name                                             unit_symbol
#>   <chr> <chr>                                            <chr>
#> 1 1835  Secondary Canal Stage                            ft
#> 2 1834  Lake/Reservoir Elevation                         ft
#> 3 1830  Lake/Reservoir Release - Total                   cfs
#> 4 1818  Total Dissolved Gas (TDG)                        %
#> 5 1817  Growing Degree Days (50 Degree Base Temperature) GDD
#> 6 1816  Calculated Unregulated Flow                      taf

edr_queryables() is something different – it returns the OGC queryables JSON Schema (filter properties for CQL2 / OGC API Features). For discovering parameter names, edr_parameters() is what you want.

# Pick out the daily reservoir storage parameter (id "3"):
params[params$id == "3", c("id", "name", "unit_symbol", "unit_label")]
#> # A tibble: 1 × 4
#>   id    name                              unit_symbol unit_label
#>   <chr> <chr>                             <chr>       <chr>
#> 1 3     Daily Lake/Reservoir Storage (af) acre·ft     Acre Foot

3. Find locations

With no filters, edr_locations() returns the station index as a GeoJSON FeatureCollection, promoted to an sf object when sf is installed:

stations <- edr_locations(client, "rise-edr")
nrow(stations)
#> [1] 926
head(stations[, c("_id", "locationName")])
#> Simple feature collection with 6 features and 2 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -122.474 ymin: 32.8834 xmax: -101.2656 ymax: 48.2667
#> Geodetic CRS:  WGS 84
#> # A tibble: 6 × 3
#>   `_id` locationName                                      geometry
#>   <int> <chr>                                          <POINT [°]>
#> 1     1 Marys Lake                            (-105.5343 40.34408)
#> 2     2 Audubon Lake North Dakota              (-101.2656 47.6114)
#> 3     3 Gray Reef Reservoir and Dam            (-106.6989 42.5656)
#> 4     5 Lake Frances Reservoir                    (-112.2 48.2667)
#> 5     6 Spring Creek Reservoir and Debris Dam    (-122.474 40.629)
#> 6     7 Imperial Reservoir at Imperial Dam     (-114.4676 32.8834)

The WWDH rise-edr locations index uses _id as the identifier column, not id. That’s fine – the query verbs accept either, and edr_map() will auto-detect.

4. Retrieve data for one station

Pick a known station (Lake Mead, _id 3514) and a parameter (3, Daily Reservoir Storage) and you get CoverageJSON back. Flatten it with covjson_to_tibble():

resp <- edr_location(
  client, "rise-edr",
  location_id    = 3514,
  datetime       = "2023-01-01/2023-06-30",
  parameter_name = "3"
)

df <- covjson_to_tibble(resp)
head(df)
#> # A tibble: 6 × 9
#>   coverage_id parameter parameter_label    unit  datetime                x     y
#>   <chr>       <chr>     <chr>              <chr> <dttm>              <dbl> <dbl>
#> 1 1           3         Lake/Reservoir St… af    2023-01-01 07:00:00 -115.  36.0
#> 2 1           3         Lake/Reservoir St… af    2023-01-02 07:00:00 -115.  36.0
#> 3 1           3         Lake/Reservoir St… af    2023-01-03 07:00:00 -115.  36.0
#> 4 1           3         Lake/Reservoir St… af    2023-01-04 07:00:00 -115.  36.0
#> 5 1           3         Lake/Reservoir St… af    2023-01-05 07:00:00 -115.  36.0
#> 6 1           3         Lake/Reservoir St… af    2023-01-06 07:00:00 -115.  36.0
#> # ℹ 2 more variables: z <dbl>, value <dbl>

5. Plot the time series

edr_plot() is a small ggplot2 wrapper for the tidy tibble:

edr_plot(resp) +
  ggtitle("Lake Mead — daily reservoir storage")
plot of chunk plot-lake-mead

plot of chunk plot-lake-mead

Faceted by parameter (so different units don’t share a y-axis) and coloured by station. Add layers or themes like any other ggplot.

6. Map stations with per-station popups

For collections that support cube, edr_explore() fetches every station’s data in one bulk request and renders the map with per-station popups – much faster than fetching one station at a time. rise-edr advertises cube, so the default method = "auto" takes that path.

m <- edr_explore(
  client, "rise-edr",
  bbox           = c(-114, 32, -111, 38),         # SW US
  datetime       = "2023-01-01/2023-06-30",
  parameter_name = "3",
  popup          = "plot+csv",
  label_col      = "locationName",
  quiet          = TRUE
)

The object prints as an interactive leaflet map in an R session; each blue marker opens the larger plot/CSV popup for the station.

Markers with data are blue (clickable popup with plot + CSV); stations the bbox covers that returned no data for the requested parameter/window are dimmed grey. A legend in the bottom-right explains the distinction.

Save the same map to standalone HTML (selfcontained, no sidecar directory):

edr_save_html(m, "wwdh-southwest.html")

7. Map a gridded PRISM coverage

Station maps are only one side of EDR. The Western Water Datahub also publishes gridded PRISM climate data as the usgs-prism collection. That collection advertises cube, so you can request a small bounding box and map the returned grid directly.

This example pulls two monthly PRISM variables over a small northern Arizona window. edr_map() detects the grid and puts the parameter and time selectors inside the leaflet map, so the slice choice lives with the interactive map rather than in a separate plotting call.

prism <- edr_cube(
  client, "usgs-prism",
  bbox           = c(-112.6, 36.7, -112.2, 37.1),
  datetime       = "2023-01-01/2023-02-28",
  parameter_name = c("ppt", "tmx")
)

prism_grid <- covjson_to_tibble(prism)
prism_grid[, c("parameter", "datetime", "x", "y", "value")]
#> # A tibble: 324 × 5
#>    parameter datetime                x     y value
#>    <chr>     <dttm>              <dbl> <dbl> <dbl>
#>  1 ppt       2023-01-01 00:00:00 -113.  37.1 116.
#>  2 ppt       2023-01-01 00:00:00 -113.  37.1 120.
#>  3 ppt       2023-01-01 00:00:00 -112.  37.1 131.
#>  4 ppt       2023-01-01 00:00:00 -112.  37.1 126.
#>  5 ppt       2023-01-01 00:00:00 -112.  37.1 115.
#>  6 ppt       2023-01-01 00:00:00 -112.  37.1 115.
#>  7 ppt       2023-01-01 00:00:00 -112.  37.1 116.
#>  8 ppt       2023-01-01 00:00:00 -112.  37.1 111.
#>  9 ppt       2023-01-01 00:00:00 -112.  37.1  93.6
#> 10 ppt       2023-01-01 00:00:00 -113.  37.0 100.
#> # ℹ 314 more rows
PRISM precipitation and maximum monthly temperature over the sample bounding box.

PRISM precipitation and maximum monthly temperature over the sample bounding box.

Use the selectors in the top-right of the map to switch between precipitation (ppt) and maximum monthly temperature (tmx), and between January and February 2023. The map colors stay focused on the selected slice; click a grid cell to open the active value and a time-series chart for that pixel across the available dates.

The same one-shot helper works for gridded collections too:

edr_explore(
  client, "usgs-prism",
  bbox           = c(-112.6, 36.7, -112.2, 37.1),
  datetime       = "2023-01-01/2023-02-28",
  parameter_name = c("ppt", "tmx"),
  method         = "cube"
)

A few things worth knowing

See also