VI Phenology Guide

A four-pipeline vegetation index analysis toolkit built around per-pixel CF-1.8 datacubes. The netcdf_datacube pipeline is the recommended starting point — it clips source NetCDF tiles to your polygon regions once and produces the per-pixel datacubes that power all downstream analysis. From those datacubes, run phenology for ROI-mean time series, smoothing, and plots; pixel_phenology for spatially explicit per-pixel metric maps; datacube_to_geotiff for model-ready raster statistics; or any combination.

Designed to work natively with output from HLS_VI_Pipeline, but accepts any CF-1.8 NetCDF with time, y, x dimensions and a VI data variable.

Four Pipelines

Pipeline	Role	Set in `config.local.env`
netcdf_datacube	Foundation — clip source tiles to polygon regions; produce per-pixel CF-1.8 datacubes for downstream use	`PIPELINE="netcdf_datacube"`
phenology	ROI-mean time series, smoothing, phenological metrics, and plots — reads datacubes or raw tiles	`PIPELINE="phenology"`
pixel_phenology	19 per-pixel phenological metric maps — reads datacubes produced by `netcdf_datacube`	`PIPELINE="pixel_phenology"`
datacube_to_geotiff	Per-year / per-month / per-DOY summary statistics as multi-band GeoTiffs — reads datacubes produced by `netcdf_datacube`	`PIPELINE="datacube_to_geotiff"`

netcdf_datacube and phenology share the same tile-based input configuration: NETCDF_DIR, VI, SHAPEFILE, SHAPEFILE_FIELD, VALID_RANGE_*, WORKERS, START_DATE, END_DATE. The three datacube-reading pipelines (phenology in datacube mode, pixel_phenology, and datacube_to_geotiff) take --input-datacubes (datacubes produced by netcdf_datacube) and do not use --netcdf-dir or shapefiles — the spatial clipping is already embedded in the datacube files.

Typical Workflows

Step 1 — Produce datacubes (recommended for all workflows)

PIPELINE="netcdf_datacube"

Clips source tiles to your polygon boundaries. Produces one *_datacube.nc file per (VI, region). Run this once. All subsequent analysis reads from these files — no re-clipping of source tiles required.

Step 2 — Choose your analysis

From the same datacubes, run either or both downstream pipelines:

ROI-mean phenology — aggregate pixels to a regional mean, smooth, compute metrics, and generate plots:

PIPELINE="phenology"    (set PHENOLOGY_INPUT_DATACUBES to your datacube directory)

Per-pixel metric maps — 19 spatially explicit metric bands mapped across every pixel:

PIPELINE="pixel_phenology"    (set PIXEL_INPUT_DATACUBES to the same directory)

Both pipelines can be pointed at the same datacube output directory. Running netcdf_datacube once gives you access to both.

Single-pass phenology (no intermediate datacubes)

If you need a one-off phenology run and don’t plan to iterate or compute pixel metrics, you can skip the datacube step entirely:

PIPELINE="phenology"    (set NETCDF_DIR + SHAPEFILE)

Discovers tiles, clips, aggregates, smooths, and plots in one pass. Tile clipping runs every time, so this is slower for iterative work but requires no intermediate storage.

Supported Vegetation Indices

VI	Name
NDVI	Normalized Difference Vegetation Index
EVI2	Two-band Enhanced Vegetation Index
NIRv	Near-Infrared Reflectance of Vegetation

Multiple VIs can be processed in a single run (--vi NDVI EVI2 NIRv).

Features

netCDF Datacube Pipeline

Per-pixel CF-1.8 compliant datacubes clipped to polygon boundaries
Same-CRS multi-tile merging: pixel-perfect, memory-bounded mosaic — no resampling
Cross-CRS multi-tile merging: bilinear reprojection of minority tiles to dominant CRS before merge
Configurable per-tile or merged output via MERGE_SAME_CRS / MERGE_CROSS_CRS
Full CF-1.8 metadata: Conventions, history, tiles, region, vi, target_crs, resampling_method
Output feeds both phenology (datacube input mode) and pixel_phenology directly

Phenology Pipeline

Two input modes: standard (--netcdf-dir + --shapefile) or datacube (--input-datacubes)
Layered processing: raw observations → daily time axis → smoothed gap-filled series → phenological metrics
Multiple smoothing methods: Savitzky-Golay, LOESS, linear interpolation, harmonic fit, Whittaker (--smooth-lambda)
Core phenological metrics: SOS, POS, EOS, LOS, IVI, greening rate, senescence rate
Extended metrics: floor_ndvi, ceiling_ndvi, season_length_days, greenup_rate, n_peaks, peak_separation_days, relative_peak_amplitude, valley_depth, cv
Observation count thresholds: --min-valid-obs, --min-valid-obs-per-year
Pixel sampling: --sample-pixels, --random-seed, --min-ndvi-mean, --min-quality-frac
Annual DOY overlay plot, full time-series plot, anomaly plot, multi-VI comparison panel
Granular output toggles — disable any combination of outputs in config.local.env
Output formats: CSV (observations and metrics), PNG + interactive HTML plots
Combined per-shapefile observations CSV and metrics CSV when splitting by attribute field

Pixel Phenology Pipeline

Reads datacubes produced by netcdf_datacube — accepts a directory or individual file paths
Whittaker smoothing applied per pixel — handles HLS’s irregular revisit cadence natively
19 metric bands: peak NDVI/DOY, integrated NDVI, green-up rate (mean + std), floor/ceiling NDVI, season length, CV, interannual peak range/std, bimodality metrics (n_peaks, separation, amplitude, valley depth)
Output per (VI, region): CF-1.8 NetCDF metric map + summary CSV + print-quality 4×5 overview PNG + interactive Plotly HTML (hover shows pixel coordinates and values)
Parallelised via ThreadPoolExecutor; scipy sparse solver releases GIL for true multi-core throughput
Overview outputs generated by default; disable with --no-overview-figure / --no-overview-html

datacube_to_geotiff Pipeline

Reads datacubes produced by netcdf_datacube — accepts a directory or individual file paths
Three GeoTiff products per (VI, region): per-year (N_years × 3 bands), per-month (36 bands), per-DOY (1095 bands)
Statistics: median, 5th percentile, 95th percentile at each temporal resolution
Per-month uses a per-year-then-average method to prevent observation-density bias across years
LZW-compressed, 256×256 tiled, BigTIFF when > 4 GB; NoData = CF float32 fill value
Band descriptions readable via gdalinfo -mdd all or rasterio.open().descriptions
Streaming band-by-band write — constant peak memory regardless of output size
Skip any product individually with --skip-per-year, --skip-per-month, --skip-per-doy

Shared Features

Spatial subsetting via any GeoPandas-readable vector format (.shp, .gpkg, .geojson, etc.)
Per-feature splitting: one independent output per attribute value in a shapefile
Multiple shapefiles in a single run, each with its own optional field splitting
Date range filtering applied at the NetCDF level before any aggregation
Valid-range filtering consistent with HLS_VI_Pipeline configuration
Parallel tile extraction via concurrent.futures.ProcessPoolExecutor
Automatic timestamped log file written to --output-dir

Performance

Tile-level extraction is parallelized using concurrent.futures.ProcessPoolExecutor. Each NetCDF tile is processed in a dedicated worker process.

Control the worker count with --workers N (default: 8). Set to 1 for fully sequential processing — useful for debugging or on memory-constrained machines.

Workers	23 tiles	Approx. time
1 (sequential)	—	~10 min
4	—	~2.5 min
8	—	~1.5 min

Setup

1. Clone the Repository

git clone https://github.com/stephenconklin/VI_Phenology.git
cd VI_Phenology

2. Create the Conda Environment

conda env create -f environment.yml
conda activate vi_phenology

3. Create Your Local Configuration

Configuration is split across two files:

File	Purpose	Committed to git?
`config.env`	Base template — all variables with defaults and inline documentation	Yes
`config.local.env`	Your project-specific overrides (actual paths, active pipeline)	No (gitignored)
`run_phenology.sh`	Execution engine — sources both files, dispatches the selected pipeline	Yes

Copy config.env to config.local.env and set your paths and active pipeline:

cp config.env config.local.env
# then edit config.local.env in your editor

config.local.env only needs to contain the variables you are overriding — everything else falls back to config.env. A minimal config.local.env looks like:

# config.local.env — my BioSCape project
PIPELINE="netcdf_datacube"

OUTPUT_DIR="/path/to/my/outputs"
NETCDF_DIR="/path/to/my/netcdfs"
VI="NDVI"
SHAPEFILE="/path/to/roi.gpkg"
SHAPEFILE_FIELD="box_nr"

To maintain multiple project configurations, keep named copies (e.g. config.local.BioSCape.env, config.local.Durango.env) alongside config.local.env. Copy or symlink the active one before each run.

Quickstart

Recommended — `run_phenology.sh`

After creating config.local.env (see Setup above):

./run_phenology.sh

All variables are documented with inline comments in config.env.

Direct CLI — netCDF Datacube Pipeline

Run this first. Clips source tiles to your polygon regions and produces *_datacube.nc files that feed both downstream pipelines:

python src/netcdf_datacube_extract.py \
  --netcdf-dir /path/to/netcdfs \
  --vi NDVI EVI2 \
  --shapefile /path/to/roi.gpkg \
  --shapefile-field Name \
  --output-dir ./outputs \
  --workers 8

python src/netcdf_datacube_extract.py --help

For full details on the datacube pipeline, see netCDF Datacube Pipeline.

Direct CLI — Phenology Pipeline

Datacube input mode — reads pre-clipped datacubes produced by netcdf_datacube (recommended; skips tile discovery on every re-run):

python src/vi_phenology.py \
  --input-datacubes /path/to/outputs/my_shapefile_stem \
  --vi NDVI \
  --output-dir ./outputs \
  --smooth-method whittaker \
  --smooth-lambda 100 \
  --plot-style combined \
  --plot-format png html \
  --metrics

Standard mode — discovers and clips source tiles on each run (single-pass, no intermediate datacubes needed):

python src/vi_phenology.py \
  --netcdf-dir /path/to/netcdfs \
  --vi NDVI EVI2 \
  --shapefile /path/to/roi.gpkg \
  --shapefile-field Name \
  --output-dir ./outputs \
  --smooth-method whittaker \
  --smooth-lambda 100 \
  --plot-style combined \
  --plot-format png html \
  --metrics \
  --workers 8

python src/vi_phenology.py --help

For the full argument reference, see the Phenology Pipeline CLI Reference.

Direct CLI — Pixel Phenology Pipeline

Reads datacubes produced by netcdf_datacube and computes 19 per-pixel metric maps:

python src/pixel_phenology_extract.py \
  --input-datacubes /path/to/NDVI_MyRegion_datacube.nc \
  --output-dir ./pixel_metrics \
  --smooth-lambda 100 \
  --min-valid-obs 20 \
  --min-valid-obs-per-year 5 \
  --workers 8

python src/pixel_phenology_extract.py --help

Direct CLI — datacube_to_geotiff Pipeline

Reads datacubes produced by netcdf_datacube and writes multi-band GeoTiffs:

python src/datacube_to_geotiff.py \
  --input-datacubes /path/to/NDVI_MyRegion_datacube.nc \
  --output-dir ./geotiff_stats \
  --workers 4

python src/datacube_to_geotiff.py --help

For full details, see datacube_to_geotiff Pipeline.

Authors

Stephen Conklin, Geospatial Analyst — Pipeline architecture, orchestration, and all original code. https://github.com/stephenconklin

G. Burch Fisher, PhD, Research Scientist — Conceptual guidance and original code adapted for:

src/pixel_phenology_extract.py (Per-pixel phenological metric extraction from CF-1.8 datacubes)

AI Assistance: This tool was developed with the assistance of Anthropic Claude / Claude Code. These tools assisted with code generation and refinement under the direction and review of the author.

License

MIT