VI Phenology Guide

Python License: MIT Platform

A four-pipeline vegetation index analysis toolkit built around per-pixel CF-1.8 datacubes. The netcdf_datacube pipeline is the recommended starting point — it clips source NetCDF tiles to your polygon regions once and produces the per-pixel datacubes that power all downstream analysis. From those datacubes, run phenology for ROI-mean time series, smoothing, and plots; pixel_phenology for spatially explicit per-pixel metric maps; datacube_to_geotiff for model-ready raster statistics; or any combination.

Designed to work natively with output from HLS_VI_Pipeline, but accepts any CF-1.8 NetCDF with time, y, x dimensions and a VI data variable.


Four Pipelines

Pipeline

Role

Set in config.local.env

netcdf_datacube

Foundation — clip source tiles to polygon regions; produce per-pixel CF-1.8 datacubes for downstream use

PIPELINE="netcdf_datacube"

phenology

ROI-mean time series, smoothing, phenological metrics, and plots — reads datacubes or raw tiles

PIPELINE="phenology"

pixel_phenology

19 per-pixel phenological metric maps — reads datacubes produced by netcdf_datacube

PIPELINE="pixel_phenology"

datacube_to_geotiff

Per-year / per-month / per-DOY summary statistics as multi-band GeoTiffs — reads datacubes produced by netcdf_datacube

PIPELINE="datacube_to_geotiff"

netcdf_datacube and phenology share the same tile-based input configuration: NETCDF_DIR, VI, SHAPEFILE, SHAPEFILE_FIELD, VALID_RANGE_*, WORKERS, START_DATE, END_DATE. The three datacube-reading pipelines (phenology in datacube mode, pixel_phenology, and datacube_to_geotiff) take --input-datacubes (datacubes produced by netcdf_datacube) and do not use --netcdf-dir or shapefiles — the spatial clipping is already embedded in the datacube files.


Typical Workflows


Step 2 — Choose your analysis

From the same datacubes, run either or both downstream pipelines:

ROI-mean phenology — aggregate pixels to a regional mean, smooth, compute metrics, and generate plots:

PIPELINE="phenology"    (set PHENOLOGY_INPUT_DATACUBES to your datacube directory)

Per-pixel metric maps — 19 spatially explicit metric bands mapped across every pixel:

PIPELINE="pixel_phenology"    (set PIXEL_INPUT_DATACUBES to the same directory)

Both pipelines can be pointed at the same datacube output directory. Running netcdf_datacube once gives you access to both.


Single-pass phenology (no intermediate datacubes)

If you need a one-off phenology run and don’t plan to iterate or compute pixel metrics, you can skip the datacube step entirely:

PIPELINE="phenology"    (set NETCDF_DIR + SHAPEFILE)

Discovers tiles, clips, aggregates, smooths, and plots in one pass. Tile clipping runs every time, so this is slower for iterative work but requires no intermediate storage.


Supported Vegetation Indices

VI

Name

NDVI

Normalized Difference Vegetation Index

EVI2

Two-band Enhanced Vegetation Index

NIRv

Near-Infrared Reflectance of Vegetation

Multiple VIs can be processed in a single run (--vi NDVI EVI2 NIRv).


Features

netCDF Datacube Pipeline

  • Per-pixel CF-1.8 compliant datacubes clipped to polygon boundaries

  • Same-CRS multi-tile merging: pixel-perfect, memory-bounded mosaic — no resampling

  • Cross-CRS multi-tile merging: bilinear reprojection of minority tiles to dominant CRS before merge

  • Configurable per-tile or merged output via MERGE_SAME_CRS / MERGE_CROSS_CRS

  • Full CF-1.8 metadata: Conventions, history, tiles, region, vi, target_crs, resampling_method

  • Output feeds both phenology (datacube input mode) and pixel_phenology directly

Phenology Pipeline

  • Two input modes: standard (--netcdf-dir + --shapefile) or datacube (--input-datacubes)

  • Layered processing: raw observations → daily time axis → smoothed gap-filled series → phenological metrics

  • Multiple smoothing methods: Savitzky-Golay, LOESS, linear interpolation, harmonic fit, Whittaker (--smooth-lambda)

  • Core phenological metrics: SOS, POS, EOS, LOS, IVI, greening rate, senescence rate

  • Extended metrics: floor_ndvi, ceiling_ndvi, season_length_days, greenup_rate, n_peaks, peak_separation_days, relative_peak_amplitude, valley_depth, cv

  • Observation count thresholds: --min-valid-obs, --min-valid-obs-per-year

  • Pixel sampling: --sample-pixels, --random-seed, --min-ndvi-mean, --min-quality-frac

  • Annual DOY overlay plot, full time-series plot, anomaly plot, multi-VI comparison panel

  • Granular output toggles — disable any combination of outputs in config.local.env

  • Output formats: CSV (observations and metrics), PNG + interactive HTML plots

  • Combined per-shapefile observations CSV and metrics CSV when splitting by attribute field

Pixel Phenology Pipeline

  • Reads datacubes produced by netcdf_datacube — accepts a directory or individual file paths

  • Whittaker smoothing applied per pixel — handles HLS’s irregular revisit cadence natively

  • 19 metric bands: peak NDVI/DOY, integrated NDVI, green-up rate (mean + std), floor/ceiling NDVI, season length, CV, interannual peak range/std, bimodality metrics (n_peaks, separation, amplitude, valley depth)

  • Output per (VI, region): CF-1.8 NetCDF metric map + summary CSV + print-quality 4×5 overview PNG + interactive Plotly HTML (hover shows pixel coordinates and values)

  • Parallelised via ThreadPoolExecutor; scipy sparse solver releases GIL for true multi-core throughput

  • Overview outputs generated by default; disable with --no-overview-figure / --no-overview-html

datacube_to_geotiff Pipeline

  • Reads datacubes produced by netcdf_datacube — accepts a directory or individual file paths

  • Three GeoTiff products per (VI, region): per-year (N_years × 3 bands), per-month (36 bands), per-DOY (1095 bands)

  • Statistics: median, 5th percentile, 95th percentile at each temporal resolution

  • Per-month uses a per-year-then-average method to prevent observation-density bias across years

  • LZW-compressed, 256×256 tiled, BigTIFF when > 4 GB; NoData = CF float32 fill value

  • Band descriptions readable via gdalinfo -mdd all or rasterio.open().descriptions

  • Streaming band-by-band write — constant peak memory regardless of output size

  • Skip any product individually with --skip-per-year, --skip-per-month, --skip-per-doy

Shared Features

  • Spatial subsetting via any GeoPandas-readable vector format (.shp, .gpkg, .geojson, etc.)

  • Per-feature splitting: one independent output per attribute value in a shapefile

  • Multiple shapefiles in a single run, each with its own optional field splitting

  • Date range filtering applied at the NetCDF level before any aggregation

  • Valid-range filtering consistent with HLS_VI_Pipeline configuration

  • Parallel tile extraction via concurrent.futures.ProcessPoolExecutor

  • Automatic timestamped log file written to --output-dir


Performance

Tile-level extraction is parallelized using concurrent.futures.ProcessPoolExecutor. Each NetCDF tile is processed in a dedicated worker process.

Control the worker count with --workers N (default: 8). Set to 1 for fully sequential processing — useful for debugging or on memory-constrained machines.

Workers

23 tiles

Approx. time

1 (sequential)

~10 min

4

~2.5 min

8

~1.5 min


Setup

1. Clone the Repository

git clone https://github.com/stephenconklin/VI_Phenology.git
cd VI_Phenology

2. Create the Conda Environment

conda env create -f environment.yml
conda activate vi_phenology

3. Create Your Local Configuration

Configuration is split across two files:

File

Purpose

Committed to git?

config.env

Base template — all variables with defaults and inline documentation

Yes

config.local.env

Your project-specific overrides (actual paths, active pipeline)

No (gitignored)

run_phenology.sh

Execution engine — sources both files, dispatches the selected pipeline

Yes

Copy config.env to config.local.env and set your paths and active pipeline:

cp config.env config.local.env
# then edit config.local.env in your editor

config.local.env only needs to contain the variables you are overriding — everything else falls back to config.env. A minimal config.local.env looks like:

# config.local.env — my BioSCape project
PIPELINE="netcdf_datacube"

OUTPUT_DIR="/path/to/my/outputs"
NETCDF_DIR="/path/to/my/netcdfs"
VI="NDVI"
SHAPEFILE="/path/to/roi.gpkg"
SHAPEFILE_FIELD="box_nr"

To maintain multiple project configurations, keep named copies (e.g. config.local.BioSCape.env, config.local.Durango.env) alongside config.local.env. Copy or symlink the active one before each run.


Quickstart

Direct CLI — netCDF Datacube Pipeline

Run this first. Clips source tiles to your polygon regions and produces *_datacube.nc files that feed both downstream pipelines:

python src/netcdf_datacube_extract.py \
  --netcdf-dir /path/to/netcdfs \
  --vi NDVI EVI2 \
  --shapefile /path/to/roi.gpkg \
  --shapefile-field Name \
  --output-dir ./outputs \
  --workers 8
python src/netcdf_datacube_extract.py --help

For full details on the datacube pipeline, see netCDF Datacube Pipeline.

Direct CLI — Phenology Pipeline

Datacube input mode — reads pre-clipped datacubes produced by netcdf_datacube (recommended; skips tile discovery on every re-run):

python src/vi_phenology.py \
  --input-datacubes /path/to/outputs/my_shapefile_stem \
  --vi NDVI \
  --output-dir ./outputs \
  --smooth-method whittaker \
  --smooth-lambda 100 \
  --plot-style combined \
  --plot-format png html \
  --metrics

Standard mode — discovers and clips source tiles on each run (single-pass, no intermediate datacubes needed):

python src/vi_phenology.py \
  --netcdf-dir /path/to/netcdfs \
  --vi NDVI EVI2 \
  --shapefile /path/to/roi.gpkg \
  --shapefile-field Name \
  --output-dir ./outputs \
  --smooth-method whittaker \
  --smooth-lambda 100 \
  --plot-style combined \
  --plot-format png html \
  --metrics \
  --workers 8
python src/vi_phenology.py --help

For the full argument reference, see the Phenology Pipeline CLI Reference.

Direct CLI — Pixel Phenology Pipeline

Reads datacubes produced by netcdf_datacube and computes 19 per-pixel metric maps:

python src/pixel_phenology_extract.py \
  --input-datacubes /path/to/NDVI_MyRegion_datacube.nc \
  --output-dir ./pixel_metrics \
  --smooth-lambda 100 \
  --min-valid-obs 20 \
  --min-valid-obs-per-year 5 \
  --workers 8
python src/pixel_phenology_extract.py --help

Direct CLI — datacube_to_geotiff Pipeline

Reads datacubes produced by netcdf_datacube and writes multi-band GeoTiffs:

python src/datacube_to_geotiff.py \
  --input-datacubes /path/to/NDVI_MyRegion_datacube.nc \
  --output-dir ./geotiff_stats \
  --workers 4
python src/datacube_to_geotiff.py --help

For full details, see datacube_to_geotiff Pipeline.


Authors

Stephen Conklin, Geospatial Analyst — Pipeline architecture, orchestration, and all original code. https://github.com/stephenconklin

G. Burch Fisher, PhD, Research Scientist — Conceptual guidance and original code adapted for:

  • src/pixel_phenology_extract.py (Per-pixel phenological metric extraction from CF-1.8 datacubes)

AI Assistance: This tool was developed with the assistance of Anthropic Claude / Claude Code. These tools assisted with code generation and refinement under the direction and review of the author.


License

MIT