VI Phenology Guide
A four-pipeline vegetation index analysis toolkit built around per-pixel CF-1.8 datacubes.
The netcdf_datacube pipeline is the recommended starting point — it clips source NetCDF
tiles to your polygon regions once and produces the per-pixel datacubes that power all
downstream analysis. From those datacubes, run phenology for ROI-mean time series,
smoothing, and plots; pixel_phenology for spatially explicit per-pixel metric maps;
datacube_to_geotiff for model-ready raster statistics; or any combination.
Designed to work natively with output from HLS_VI_Pipeline,
but accepts any CF-1.8 NetCDF with time, y, x dimensions and a VI data variable.
Four Pipelines
Pipeline |
Role |
Set in |
|---|---|---|
netcdf_datacube |
Foundation — clip source tiles to polygon regions; produce per-pixel CF-1.8 datacubes for downstream use |
|
phenology |
ROI-mean time series, smoothing, phenological metrics, and plots — reads datacubes or raw tiles |
|
pixel_phenology |
19 per-pixel phenological metric maps — reads datacubes produced by |
|
datacube_to_geotiff |
Per-year / per-month / per-DOY summary statistics as multi-band GeoTiffs — reads datacubes produced by |
|
netcdf_datacube and phenology share the same tile-based input configuration:
NETCDF_DIR, VI, SHAPEFILE, SHAPEFILE_FIELD, VALID_RANGE_*, WORKERS,
START_DATE, END_DATE. The three datacube-reading pipelines (phenology in datacube
mode, pixel_phenology, and datacube_to_geotiff) take --input-datacubes (datacubes
produced by netcdf_datacube) and do not use --netcdf-dir or shapefiles — the spatial
clipping is already embedded in the datacube files.
Typical Workflows
Step 1 — Produce datacubes (recommended for all workflows)
PIPELINE="netcdf_datacube"
Clips source tiles to your polygon boundaries. Produces one *_datacube.nc file
per (VI, region). Run this once. All subsequent analysis reads from these files —
no re-clipping of source tiles required.
Step 2 — Choose your analysis
From the same datacubes, run either or both downstream pipelines:
ROI-mean phenology — aggregate pixels to a regional mean, smooth, compute metrics, and generate plots:
PIPELINE="phenology" (set PHENOLOGY_INPUT_DATACUBES to your datacube directory)
Per-pixel metric maps — 19 spatially explicit metric bands mapped across every pixel:
PIPELINE="pixel_phenology" (set PIXEL_INPUT_DATACUBES to the same directory)
Both pipelines can be pointed at the same datacube output directory. Running
netcdf_datacube once gives you access to both.
Single-pass phenology (no intermediate datacubes)
If you need a one-off phenology run and don’t plan to iterate or compute pixel metrics, you can skip the datacube step entirely:
PIPELINE="phenology" (set NETCDF_DIR + SHAPEFILE)
Discovers tiles, clips, aggregates, smooths, and plots in one pass. Tile clipping runs every time, so this is slower for iterative work but requires no intermediate storage.
Supported Vegetation Indices
VI |
Name |
|---|---|
NDVI |
Normalized Difference Vegetation Index |
EVI2 |
Two-band Enhanced Vegetation Index |
NIRv |
Near-Infrared Reflectance of Vegetation |
Multiple VIs can be processed in a single run (--vi NDVI EVI2 NIRv).
Features
netCDF Datacube Pipeline
Per-pixel CF-1.8 compliant datacubes clipped to polygon boundaries
Same-CRS multi-tile merging: pixel-perfect, memory-bounded mosaic — no resampling
Cross-CRS multi-tile merging: bilinear reprojection of minority tiles to dominant CRS before merge
Configurable per-tile or merged output via
MERGE_SAME_CRS/MERGE_CROSS_CRSFull CF-1.8 metadata:
Conventions,history,tiles,region,vi,target_crs,resampling_methodOutput feeds both
phenology(datacube input mode) andpixel_phenologydirectly
Phenology Pipeline
Two input modes: standard (
--netcdf-dir+--shapefile) or datacube (--input-datacubes)Layered processing: raw observations → daily time axis → smoothed gap-filled series → phenological metrics
Multiple smoothing methods: Savitzky-Golay, LOESS, linear interpolation, harmonic fit, Whittaker (
--smooth-lambda)Core phenological metrics: SOS, POS, EOS, LOS, IVI, greening rate, senescence rate
Extended metrics:
floor_ndvi,ceiling_ndvi,season_length_days,greenup_rate,n_peaks,peak_separation_days,relative_peak_amplitude,valley_depth,cvObservation count thresholds:
--min-valid-obs,--min-valid-obs-per-yearPixel sampling:
--sample-pixels,--random-seed,--min-ndvi-mean,--min-quality-fracAnnual DOY overlay plot, full time-series plot, anomaly plot, multi-VI comparison panel
Granular output toggles — disable any combination of outputs in
config.local.envOutput formats: CSV (observations and metrics), PNG + interactive HTML plots
Combined per-shapefile observations CSV and metrics CSV when splitting by attribute field
Pixel Phenology Pipeline
Reads datacubes produced by
netcdf_datacube— accepts a directory or individual file pathsWhittaker smoothing applied per pixel — handles HLS’s irregular revisit cadence natively
19 metric bands: peak NDVI/DOY, integrated NDVI, green-up rate (mean + std), floor/ceiling NDVI, season length, CV, interannual peak range/std, bimodality metrics (n_peaks, separation, amplitude, valley depth)
Output per (VI, region): CF-1.8 NetCDF metric map + summary CSV + print-quality 4×5 overview PNG + interactive Plotly HTML (hover shows pixel coordinates and values)
Parallelised via
ThreadPoolExecutor; scipy sparse solver releases GIL for true multi-core throughputOverview outputs generated by default; disable with
--no-overview-figure/--no-overview-html
datacube_to_geotiff Pipeline
Reads datacubes produced by
netcdf_datacube— accepts a directory or individual file pathsThree GeoTiff products per (VI, region): per-year (N_years × 3 bands), per-month (36 bands), per-DOY (1095 bands)
Statistics: median, 5th percentile, 95th percentile at each temporal resolution
Per-month uses a per-year-then-average method to prevent observation-density bias across years
LZW-compressed, 256×256 tiled, BigTIFF when > 4 GB; NoData = CF float32 fill value
Band descriptions readable via
gdalinfo -mdd allorrasterio.open().descriptionsStreaming band-by-band write — constant peak memory regardless of output size
Skip any product individually with
--skip-per-year,--skip-per-month,--skip-per-doy
Performance
Tile-level extraction is parallelized using concurrent.futures.ProcessPoolExecutor. Each
NetCDF tile is processed in a dedicated worker process.
Control the worker count with --workers N (default: 8). Set to 1 for fully sequential
processing — useful for debugging or on memory-constrained machines.
Workers |
23 tiles |
Approx. time |
|---|---|---|
1 (sequential) |
— |
~10 min |
4 |
— |
~2.5 min |
8 |
— |
~1.5 min |
Setup
1. Clone the Repository
git clone https://github.com/stephenconklin/VI_Phenology.git
cd VI_Phenology
2. Create the Conda Environment
conda env create -f environment.yml
conda activate vi_phenology
3. Create Your Local Configuration
Configuration is split across two files:
File |
Purpose |
Committed to git? |
|---|---|---|
|
Base template — all variables with defaults and inline documentation |
Yes |
|
Your project-specific overrides (actual paths, active pipeline) |
No (gitignored) |
|
Execution engine — sources both files, dispatches the selected pipeline |
Yes |
Copy config.env to config.local.env and set your paths and active pipeline:
cp config.env config.local.env
# then edit config.local.env in your editor
config.local.env only needs to contain the variables you are overriding — everything
else falls back to config.env. A minimal config.local.env looks like:
# config.local.env — my BioSCape project
PIPELINE="netcdf_datacube"
OUTPUT_DIR="/path/to/my/outputs"
NETCDF_DIR="/path/to/my/netcdfs"
VI="NDVI"
SHAPEFILE="/path/to/roi.gpkg"
SHAPEFILE_FIELD="box_nr"
To maintain multiple project configurations, keep named copies (e.g.
config.local.BioSCape.env, config.local.Durango.env) alongside config.local.env.
Copy or symlink the active one before each run.
Quickstart
Recommended — run_phenology.sh
After creating config.local.env (see Setup above):
./run_phenology.sh
All variables are documented with inline comments in config.env.
Direct CLI — netCDF Datacube Pipeline
Run this first. Clips source tiles to your polygon regions and produces
*_datacube.nc files that feed both downstream pipelines:
python src/netcdf_datacube_extract.py \
--netcdf-dir /path/to/netcdfs \
--vi NDVI EVI2 \
--shapefile /path/to/roi.gpkg \
--shapefile-field Name \
--output-dir ./outputs \
--workers 8
python src/netcdf_datacube_extract.py --help
For full details on the datacube pipeline, see netCDF Datacube Pipeline.
Direct CLI — Phenology Pipeline
Datacube input mode — reads pre-clipped datacubes produced by netcdf_datacube
(recommended; skips tile discovery on every re-run):
python src/vi_phenology.py \
--input-datacubes /path/to/outputs/my_shapefile_stem \
--vi NDVI \
--output-dir ./outputs \
--smooth-method whittaker \
--smooth-lambda 100 \
--plot-style combined \
--plot-format png html \
--metrics
Standard mode — discovers and clips source tiles on each run (single-pass, no intermediate datacubes needed):
python src/vi_phenology.py \
--netcdf-dir /path/to/netcdfs \
--vi NDVI EVI2 \
--shapefile /path/to/roi.gpkg \
--shapefile-field Name \
--output-dir ./outputs \
--smooth-method whittaker \
--smooth-lambda 100 \
--plot-style combined \
--plot-format png html \
--metrics \
--workers 8
python src/vi_phenology.py --help
For the full argument reference, see the Phenology Pipeline CLI Reference.
Direct CLI — Pixel Phenology Pipeline
Reads datacubes produced by netcdf_datacube and computes 19 per-pixel metric maps:
python src/pixel_phenology_extract.py \
--input-datacubes /path/to/NDVI_MyRegion_datacube.nc \
--output-dir ./pixel_metrics \
--smooth-lambda 100 \
--min-valid-obs 20 \
--min-valid-obs-per-year 5 \
--workers 8
python src/pixel_phenology_extract.py --help
Direct CLI — datacube_to_geotiff Pipeline
Reads datacubes produced by netcdf_datacube and writes multi-band GeoTiffs:
python src/datacube_to_geotiff.py \
--input-datacubes /path/to/NDVI_MyRegion_datacube.nc \
--output-dir ./geotiff_stats \
--workers 4
python src/datacube_to_geotiff.py --help
For full details, see datacube_to_geotiff Pipeline.
License
MIT