Spatial Input
This page covers polygon-based spatial input for the phenology and netCDF datacube
pipelines. Both accept --shapefile to clip processing to one or more polygon regions.
Pixel phenology pipeline: Does not use
--shapefileor--netcdf-dir. It reads pre-clipped datacubes produced by the netCDF datacube pipeline — the spatial boundary is already embedded in those files. See Pixel Phenology Pipeline.
--shapefile accepts any vector format readable by GeoPandas/Fiona:
Format |
Extension(s) |
|---|---|
ESRI Shapefile |
|
GeoPackage |
|
GeoJSON |
|
KML / KMZ |
|
FlatGeobuf |
|
File Geodatabase |
|
Omit --shapefile entirely to process the full spatial extent of the NetCDF files.
The same --shapefile and --shapefile-field arguments work identically in both
the phenology pipeline and the netCDF datacube pipeline.
Multiple Shapefiles
Pass multiple paths to produce an independent output set per shapefile:
python src/vi_phenology.py \
--shapefile /path/to/region1.gpkg /path/to/region2.geojson \
...
Per-Feature Splitting — --shapefile-field
To produce one independent output per feature value in a shapefile, provide the name of an attribute column:
python src/vi_phenology.py \
--shapefile biomes.gpkg \
--shapefile-field Name \
...
Each unique non-null value in the Name column becomes its own region. Field values are
sanitized for filesystem use: spaces and special characters are replaced with underscores.
For example, "Cape Fynbos" becomes the region label (and subdirectory) Cape_Fynbos.
When --shapefile-field is omitted, all features in a file are dissolved into a single
geometry and the shapefile’s filename stem is used as the region label.
Label collision error: If two distinct field values sanitize to the same string (e.g.
"My Park"and"My_Park"both becomeMy_Park), the pipeline exits with an error identifying the conflicting values. Rename one of them in the attribute table to make the labels unique before re-running.
Multiple Shapefiles with Per-Field Splitting
When passing multiple shapefiles, provide one field name per shapefile in the same positional
order. Use none to dissolve a specific shapefile rather than splitting it:
# Split first shapefile by 'box_nr', dissolve the second
python src/vi_phenology.py \
--shapefile flights.shp tiles.geojson \
--shapefile-field box_nr none \
...
# Split both shapefiles by their respective fields
python src/vi_phenology.py \
--shapefile flights.shp tiles.geojson \
--shapefile-field box_nr tile_id \
...
The number of --shapefile-field values must match the number of --shapefile paths exactly.
A mismatch is a hard error.
In config.local.env, set SHAPEFILE_FIELD to activate field splitting:
SHAPEFILE="flights.shp tiles.geojson"
SHAPEFILE_FIELD="box_nr none" # comment out entirely to dissolve all
Region Label Derivation
The region label controls both the output subdirectory name and all output file stems:
Scenario |
Region label |
|---|---|
No shapefile |
|
Shapefile, no |
Shapefile filename stem (e.g. |
Shapefile + |
Sanitized field value (e.g. |
For output directory structure and file naming, see Output.
Multi-Tile Polygon Handling
When a polygon spans multiple HLS MGRS tiles, both pipelines detect overlapping tiles automatically and pool data from all of them. The merge behavior differs between pipelines:
Phenology Pipeline
All valid pixels from all overlapping tiles are pooled per observation date before computing the spatial mean for that region. No spatial resampling occurs — the ROI-mean aggregation naturally handles tiles with different acquisition dates by combining the pixel pools per date.
netCDF Datacube Pipeline
The datacube pipeline preserves the full spatial dimension and merges tiles based on
their CRS relationships: same-UTM-zone tiles are mosaiced without resampling; tiles
spanning a UTM zone boundary are bilinearly reprojected to the dominant CRS before
merging. Merge behavior is controlled by MERGE_SAME_CRS and MERGE_CROSS_CRS in
config.local.env.
For the full merge algorithm, CRS detection logic, and merge options, see netCDF Datacube Pipeline — Multi-Tile and Multi-CRS Handling.