Geographic Data Transformation, Storage, and Publication

Introduction

Handling geographic data efficiently involves multiple steps: transforming it to match analysis requirements, storing it in optimized spatial databases, and publishing it for sharing or integration with other systems. This package offers a comprehensive solution to streamline the transformation, storage, and publication of raster and vector geographic data.

Built upon the powerful sf and terra R packages, this toolkit bridges the gap between local geographic data processing and server-based geographic information platforms like PostGIS and GeoServer. The primary aim is to simplify workflows for users dealing with diverse geographic data needs, allowing seamless transformation and integration of data processing, storage, and publication in R.

In this document, the datasets included in the package, as well as other datasets used within it, are presented first. Next, a section is dedicated to each group of functions: transformation, storage, and publication; each section illustrates the functions with examples. The document concludes by summarizing the functionalities and highlighting their advantages.

Datasets

The package includes and uses the following datasets to support the workflows.

sigugr.gpkg

library(sigugr)

sigugr_gpkg <- system.file("extdata", "sigugr.gpkg", package = "sigugr")

sf::st_layers(sigugr_gpkg)
#> Driver: GPKG 
#> Available layers:
#>   layer_name     geometry_type features fields              crs_name
#> 1      block           Polygon      104      1 ETRS89 / UTM zone 30N
#> 2        poi             Point      121      4 ETRS89 / UTM zone 30N
#> 3      hydro Multi Line String       11      4 ETRS89 / UTM zone 30N
#> 4   lanjaron     Multi Polygon        1      4 ETRS89 / UTM zone 30N

sat.tif

sat_tif <- system.file("extdata", "sat.tif", package = "sigugr")

sat <- terra::rast(sat_tif)

cat(paste("-", names(sat), collapse = "\n"))
#> - B2
#> - B3
#> - B4
#> - B5
#> - B6
#> - B7

Satellite bands for the Lanjarón area, downloaded from GloVis (USGS Global Visualization Viewer) for Landsat-8. These bands were integrated and initially processed using the satres package. They have been resampled to a coarser resolution to be included in this package. These are the bands displayed as the result of the previous code snippet.

mdt

mdt_dir <- system.file("extdata", "mdt", package = "sigugr")

cat(paste("-", list.files(mdt_dir), collapse = "\n"))
#> - MDT25-ETRS89-H30-1026-4-COB2.TIF
#> - MDT25-ETRS89-H30-1027-3-COB2.TIF
#> - MDT25-ETRS89-H30-1041-2-COB2.TIF
#> - MDT25-ETRS89-H30-1041-4-COB2.TIF
#> - MDT25-ETRS89-H30-1042-1-COB2.TIF
#> - MDT25-ETRS89-H30-1042-3-COB2.TIF

A set of Digital Terrain Model (DTM) files corresponding to the sheets of the National Topographic Map of Spain for the Lanjarón area, obtained from the CNIG. These files have also been resampled to a coarser resolution to reduce storage requirements.

clc.gpkg

clc_gpkg <- system.file("extdata", "clc.gpkg", package = "clc")

sf::st_layers(clc_gpkg)
#> Driver: GPKG 
#> Available layers:
#>     layer_name geometry_type features fields              crs_name
#> 1          clc Multi Polygon      136      2 ETRS89 / UTM zone 30N
#> 2     lanjaron Multi Polygon        1      4 ETRS89 / UTM zone 30N
#> 3 layer_styles            NA        1     12                  <NA>

This GeoPackage is included in the clc package. We will use the clc layer: a fragment of CLC data for the Lanjarón area, stored in vector format. This layer includes the associated style definitions, which are embedded within the same GeoPackage (layer_styles table.). The data was sourced from the CNIG.

Data Transformation

This module provides tools for preparing and manipulating raster and vector data to meet specific requirements. It includes the following functions, that can be grouped into two categories: clipping and other transformations.

Other transformations

We have the set of files that comprise the DTM of the area. The resolution has already been modified once, and we will change it again as shown below.

temp_dir <- tempdir()

result_files <- aggregate_rasters(mdt_dir, temp_dir, factor = 6)

The resulting files will be merged into a raster, which is also displayed below.

r_mdt <- compose_raster(temp_dir)

terra::plot(r_mdt)
The aggregated and composed raster.

The aggregated and composed raster.

We will continue working with the DTM stored in the package, once it has been composed. When displayed in the following section, the difference in resolution compared to the previous image will be noticeable.

mdt <- compose_raster(mdt_dir)

Clipping

One common operation we frequently perform is clipping various layers using a polygon. However, in some cases, we clip them using the minimum enclosing bounding box instead. The function generate_bbox() can be used to obtain this bounding box.

polygon <- sf::st_read(sigugr_gpkg, layer = "lanjaron", quiet = TRUE)

bbox <- generate_bbox(polygon)

plot(sf::st_geometry(bbox))
plot(sf::st_geometry(polygon), add = TRUE)
Polygon and bounding box.

Polygon and bounding box.

We will crop the DTM using the bounding box and then display the result.

mdt_bbox <- clip_raster(mdt, bbox, keep_crs = FALSE)

terra::plot(mdt_bbox)
Clipped DTM.

Clipped DTM.

In this case, both layers share the same CRS, but the keep_crs parameter allows us to specify whether the resulting raster retains the CRS of the original raster or is reprojected to the CRS of the clipping polygon.

If it becomes necessary to change the raster’s CRS, an algorithm is applied to minimize the area being reprojected and to prevent distortions in the result.

To clip a vector layer, we will use the clc layer as shown below. The clip_layer() function produces a layer with the CRS of the clipping polygon.

clc <- sf::st_read(clc_gpkg, layer = "clc", quiet = TRUE)

clc_polygon <- clip_layer(clc, polygon)

plot(sf::st_geometry(clc_polygon))
Clipped CLC layer.

Clipped CLC layer.

Clipping functions sometimes encounter issues with multipolygon geometries; these have been addressed with the clip_multipolygon() function, which performs the same operation as clip_layer() but uses more computationally expensive operations to mitigate the aforementioned issues.

To store in databases or publish the layers, they must be saved as files. Therefore, we will save the generated layers for later use.

mdt_tif <- tempfile(fileext = ".tif")
terra::writeRaster(mdt_bbox, mdt_tif, overwrite = TRUE)

clc2_gpkg <- tempfile(fileext = ".gpkg")
sf::st_write(clc_polygon, clc2_gpkg, layer = "clc_polygon", delete_dsn = TRUE, quiet = TRUE)

Data Storage in PostGIS

This module facilitates the storage of geographic data in a PostGIS database, offering optimized tools for handling both raster and vector data. The functions can be divided into two categories: layer storage and style management. The functions included are as follows:


Important Note: The code from the examples in this section and the next was executed during the development of this document. However, it has been disabled afterward to avoid dependencies on the PostGIS database or the GeoServer, which are installed locally on my computer.

evaluate <- FALSE

Layer storage

Below, we store the following data into a PostGIS database:

conn <- DBI::dbConnect(
  RPostgres::Postgres(),
  dbname = "sigugr",
  host = "localhost",
  user = "postgres",
  password = "postgres"
)

tables1 <- store_bands(sat_tif, conn, prefix = 'original_')

tables2 <- store_raster(mdt_tif, conn, table_name = 'mdt')

tables3 <- store_layers(sigugr_gpkg, conn)

tables4 <- store_layers(clc2_gpkg, conn)

Style management

The clc_polygon layer stored in the clc2_gpkg file has been copied to the database. However, the style of the original layer has not been transferred. We can copy the style using the copy_styles() function, as demonstrated below.

copy_styles(from = clc_gpkg, to = conn, database = "sigugr", to_layers = "clc_polygon")

The result, including the layers available in the database and the clc_polygon layer with its styles, can be viewed in QGIS, as shown in the following figure.

Accessing PostGIS from QGIS.

Accessing PostGIS from QGIS.

Additionally, the get_layer_categories() function can be used to retrieve the definitions for each style code. This result can be applied to display raster bands derived from vector layers using the same style as the original layers.

Data Publication on GeoServer

This module focuses on publishing geographic data to GeoServer. It is built around the geoserver class defined in the package, with its functions implemented as methods of this class.

The following demonstrates how to publish all vector layers from the database. First, the geoserver() function is used to create an object with the connection parameters and workspace. Next, the PostGIS database is registered as a datastore using the register_datastore_postgis() function. Finally, vector layers can be published individually with the publish_layer() function or in bulk using the publish_layer_set() function, which connects to the database to retrieve the available vector layer names.

gso <- geoserver(
  url = "http://localhost:8080/geoserver",
  user = "admin",
  password = "geoserver",
  workspace = "sigugr"
)

gso <- gso |>
  register_datastore_postgis(
    "sigugr-db",
    db_name = 'sigugr',
    host = 'localhost',
    port = 5432,
    db_user = 'postgres',
    db_password = 'postgres',
    schema = "public"
  )

gso |>
  publish_layer_set(conn)

On the other hand, rasters stored in files can also be published, as GeoServer does not support using PostGIS as a datastore for rasters. For rasters, it is possible to differentiate between publishing the entire raster with all bands together using the publish_raster() function or publishing each band individually using publish_bands(). In the latter case, a prefix is also added to distinguish these bands from those already published in the README document example.

gso |>
  publish_raster(mdt_tif, layer = 'mdt')

gso |>
  publish_bands(sat_tif, prefix = 'original_')

As with PostGIS, GeoServer can be accessed via QGIS using WMS. The following figure shows the layers available on the server within the workspace defined in the connection, along with a display of one of the layers -specifically, the DTM generated earlier in this example.

Accessing GeoServer from QGIS.

Accessing GeoServer from QGIS.

Conclusions

This package provides a framework for geographic data management, covering the lifecycle of data from processing to publication. The package enables users to:

The package is suitable for researchers, data analysts, and GIS professionals, facilitating robust workflows that bridge local data processing and server-based geospatial systems.