# VI Phenology Guide

[![Python](https://img.shields.io/badge/python-3.10--3.12-blue.svg)](https://www.python.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Platform](https://img.shields.io/badge/platform-linux%20%7C%20macOS-lightgrey.svg)]()

A four-pipeline vegetation index analysis toolkit built around per-pixel CF-1.8 datacubes.
The `netcdf_datacube` pipeline is the recommended starting point — it clips source NetCDF
tiles to your polygon regions once and produces the per-pixel datacubes that power all
downstream analysis. From those datacubes, run `phenology` for ROI-mean time series,
smoothing, and plots; `pixel_phenology` for spatially explicit per-pixel metric maps;
`datacube_to_geotiff` for model-ready raster statistics; or any combination.

Designed to work natively with output from [HLS_VI_Pipeline](https://github.com/stephenconklin/HLS_VI_Pipeline),
but accepts any CF-1.8 NetCDF with `time`, `y`, `x` dimensions and a VI data variable.

---

## Four Pipelines

| Pipeline | Role | Set in `config.local.env` |
|---|---|---|
| **netcdf_datacube** | **Foundation** — clip source tiles to polygon regions; produce per-pixel CF-1.8 datacubes for downstream use | `PIPELINE="netcdf_datacube"` |
| **phenology** | ROI-mean time series, smoothing, phenological metrics, and plots — reads datacubes or raw tiles | `PIPELINE="phenology"` |
| **pixel_phenology** | 19 per-pixel phenological metric maps — reads datacubes produced by `netcdf_datacube` | `PIPELINE="pixel_phenology"` |
| **datacube_to_geotiff** | Per-year / per-month / per-DOY summary statistics as multi-band GeoTiffs — reads datacubes produced by `netcdf_datacube` | `PIPELINE="datacube_to_geotiff"` |

`netcdf_datacube` and `phenology` share the same tile-based input configuration:
`NETCDF_DIR`, `VI`, `SHAPEFILE`, `SHAPEFILE_FIELD`, `VALID_RANGE_*`, `WORKERS`,
`START_DATE`, `END_DATE`. The three datacube-reading pipelines (`phenology` in datacube
mode, `pixel_phenology`, and `datacube_to_geotiff`) take `--input-datacubes` (datacubes
produced by `netcdf_datacube`) and do not use `--netcdf-dir` or shapefiles — the spatial
clipping is already embedded in the datacube files.

---

## Typical Workflows

### Step 1 — Produce datacubes (recommended for all workflows)

```
PIPELINE="netcdf_datacube"
```

Clips source tiles to your polygon boundaries. Produces one `*_datacube.nc` file
per (VI, region). **Run this once.** All subsequent analysis reads from these files —
no re-clipping of source tiles required.

---

### Step 2 — Choose your analysis

From the same datacubes, run either or both downstream pipelines:

**ROI-mean phenology** — aggregate pixels to a regional mean, smooth, compute
metrics, and generate plots:

```
PIPELINE="phenology"    (set PHENOLOGY_INPUT_DATACUBES to your datacube directory)
```

**Per-pixel metric maps** — 19 spatially explicit metric bands mapped across every pixel:

```
PIPELINE="pixel_phenology"    (set PIXEL_INPUT_DATACUBES to the same directory)
```

Both pipelines can be pointed at the same datacube output directory. Running
`netcdf_datacube` once gives you access to both.

---

### Single-pass phenology (no intermediate datacubes)

If you need a one-off phenology run and don't plan to iterate or compute pixel
metrics, you can skip the datacube step entirely:

```
PIPELINE="phenology"    (set NETCDF_DIR + SHAPEFILE)
```

Discovers tiles, clips, aggregates, smooths, and plots in one pass. Tile clipping
runs every time, so this is slower for iterative work but requires no intermediate
storage.

---

## Supported Vegetation Indices

| VI | Name |
|----|------|
| NDVI | Normalized Difference Vegetation Index |
| EVI2 | Two-band Enhanced Vegetation Index |
| NIRv | Near-Infrared Reflectance of Vegetation |

Multiple VIs can be processed in a single run (`--vi NDVI EVI2 NIRv`).

---

## Features

### netCDF Datacube Pipeline
- Per-pixel CF-1.8 compliant datacubes clipped to polygon boundaries
- Same-CRS multi-tile merging: pixel-perfect, memory-bounded mosaic — no resampling
- Cross-CRS multi-tile merging: bilinear reprojection of minority tiles to dominant CRS before merge
- Configurable per-tile or merged output via `MERGE_SAME_CRS` / `MERGE_CROSS_CRS`
- Full CF-1.8 metadata: `Conventions`, `history`, `tiles`, `region`, `vi`, `target_crs`, `resampling_method`
- Output feeds both `phenology` (datacube input mode) and `pixel_phenology` directly

### Phenology Pipeline
- Two input modes: standard (`--netcdf-dir` + `--shapefile`) or datacube (`--input-datacubes`)
- Layered processing: raw observations → daily time axis → smoothed gap-filled series → phenological metrics
- Multiple smoothing methods: Savitzky-Golay, LOESS, linear interpolation, harmonic fit, Whittaker (`--smooth-lambda`)
- Core phenological metrics: SOS, POS, EOS, LOS, IVI, greening rate, senescence rate
- Extended metrics: `floor_ndvi`, `ceiling_ndvi`, `season_length_days`, `greenup_rate`, `n_peaks`, `peak_separation_days`, `relative_peak_amplitude`, `valley_depth`, `cv`
- Observation count thresholds: `--min-valid-obs`, `--min-valid-obs-per-year`
- Pixel sampling: `--sample-pixels`, `--random-seed`, `--min-ndvi-mean`, `--min-quality-frac`
- Annual DOY overlay plot, full time-series plot, anomaly plot, multi-VI comparison panel
- Granular output toggles — disable any combination of outputs in `config.local.env`
- Output formats: CSV (observations and metrics), PNG + interactive HTML plots
- Combined per-shapefile observations CSV and metrics CSV when splitting by attribute field

### Pixel Phenology Pipeline
- Reads datacubes produced by `netcdf_datacube` — accepts a directory or individual file paths
- Whittaker smoothing applied per pixel — handles HLS's irregular revisit cadence natively
- 19 metric bands: peak NDVI/DOY, integrated NDVI, green-up rate (mean + std), floor/ceiling NDVI,
  season length, CV, interannual peak range/std, bimodality metrics (n_peaks, separation, amplitude, valley depth)
- Output per (VI, region): CF-1.8 NetCDF metric map + summary CSV + print-quality 4×5 overview PNG + interactive Plotly HTML (hover shows pixel coordinates and values)
- Parallelised via `ThreadPoolExecutor`; scipy sparse solver releases GIL for true multi-core throughput
- Overview outputs generated by default; disable with `--no-overview-figure` / `--no-overview-html`

### datacube_to_geotiff Pipeline
- Reads datacubes produced by `netcdf_datacube` — accepts a directory or individual file paths
- Three GeoTiff products per (VI, region): per-year (N_years × 3 bands), per-month (36 bands), per-DOY (1095 bands)
- Statistics: median, 5th percentile, 95th percentile at each temporal resolution
- Per-month uses a per-year-then-average method to prevent observation-density bias across years
- LZW-compressed, 256×256 tiled, BigTIFF when > 4 GB; NoData = CF float32 fill value
- Band descriptions readable via `gdalinfo -mdd all` or `rasterio.open().descriptions`
- Streaming band-by-band write — constant peak memory regardless of output size
- Skip any product individually with `--skip-per-year`, `--skip-per-month`, `--skip-per-doy`

### Shared Features
- Spatial subsetting via any GeoPandas-readable vector format (`.shp`, `.gpkg`, `.geojson`, etc.)
- Per-feature splitting: one independent output per attribute value in a shapefile
- Multiple shapefiles in a single run, each with its own optional field splitting
- Date range filtering applied at the NetCDF level before any aggregation
- Valid-range filtering consistent with HLS_VI_Pipeline configuration
- Parallel tile extraction via `concurrent.futures.ProcessPoolExecutor`
- Automatic timestamped log file written to `--output-dir`

---

## Performance

Tile-level extraction is parallelized using `concurrent.futures.ProcessPoolExecutor`. Each
NetCDF tile is processed in a dedicated worker process.

Control the worker count with `--workers N` (default: 8). Set to 1 for fully sequential
processing — useful for debugging or on memory-constrained machines.

| Workers | 23 tiles | Approx. time |
|---------|----------|--------------|
| 1 (sequential) | — | ~10 min |
| 4 | — | ~2.5 min |
| 8 | — | ~1.5 min |

---

## Setup

### 1. Clone the Repository

```bash
git clone https://github.com/stephenconklin/VI_Phenology.git
cd VI_Phenology
```

### 2. Create the Conda Environment

```bash
conda env create -f environment.yml
conda activate vi_phenology
```

### 3. Create Your Local Configuration

Configuration is split across two files:

| File | Purpose | Committed to git? |
|---|---|---|
| `config.env` | Base template — all variables with defaults and inline documentation | Yes |
| `config.local.env` | Your project-specific overrides (actual paths, active pipeline) | **No** (gitignored) |
| `run_phenology.sh` | Execution engine — sources both files, dispatches the selected pipeline | Yes |

Copy `config.env` to `config.local.env` and set your paths and active pipeline:

```bash
cp config.env config.local.env
# then edit config.local.env in your editor
```

`config.local.env` only needs to contain the variables you are overriding — everything
else falls back to `config.env`. A minimal `config.local.env` looks like:

```bash
# config.local.env — my BioSCape project
PIPELINE="netcdf_datacube"

OUTPUT_DIR="/path/to/my/outputs"
NETCDF_DIR="/path/to/my/netcdfs"
VI="NDVI"
SHAPEFILE="/path/to/roi.gpkg"
SHAPEFILE_FIELD="box_nr"
```

To maintain multiple project configurations, keep named copies (e.g.
`config.local.BioSCape.env`, `config.local.Durango.env`) alongside `config.local.env`.
Copy or symlink the active one before each run.

---

## Quickstart

### Recommended — `run_phenology.sh`

After creating `config.local.env` (see Setup above):

```bash
./run_phenology.sh
```

All variables are documented with inline comments in `config.env`.

### Direct CLI — netCDF Datacube Pipeline

Run this first. Clips source tiles to your polygon regions and produces
`*_datacube.nc` files that feed both downstream pipelines:

```bash
python src/netcdf_datacube_extract.py \
  --netcdf-dir /path/to/netcdfs \
  --vi NDVI EVI2 \
  --shapefile /path/to/roi.gpkg \
  --shapefile-field Name \
  --output-dir ./outputs \
  --workers 8
```

```bash
python src/netcdf_datacube_extract.py --help
```

For full details on the datacube pipeline, see [netCDF Datacube Pipeline](datacube.md).

### Direct CLI — Phenology Pipeline

**Datacube input mode** — reads pre-clipped datacubes produced by `netcdf_datacube`
(recommended; skips tile discovery on every re-run):

```bash
python src/vi_phenology.py \
  --input-datacubes /path/to/outputs/my_shapefile_stem \
  --vi NDVI \
  --output-dir ./outputs \
  --smooth-method whittaker \
  --smooth-lambda 100 \
  --plot-style combined \
  --plot-format png html \
  --metrics
```

**Standard mode** — discovers and clips source tiles on each run (single-pass, no
intermediate datacubes needed):

```bash
python src/vi_phenology.py \
  --netcdf-dir /path/to/netcdfs \
  --vi NDVI EVI2 \
  --shapefile /path/to/roi.gpkg \
  --shapefile-field Name \
  --output-dir ./outputs \
  --smooth-method whittaker \
  --smooth-lambda 100 \
  --plot-style combined \
  --plot-format png html \
  --metrics \
  --workers 8
```

```bash
python src/vi_phenology.py --help
```

For the full argument reference, see the [Phenology Pipeline CLI Reference](cli_reference.md).

### Direct CLI — Pixel Phenology Pipeline

Reads datacubes produced by `netcdf_datacube` and computes 19 per-pixel metric maps:

```bash
python src/pixel_phenology_extract.py \
  --input-datacubes /path/to/NDVI_MyRegion_datacube.nc \
  --output-dir ./pixel_metrics \
  --smooth-lambda 100 \
  --min-valid-obs 20 \
  --min-valid-obs-per-year 5 \
  --workers 8
```

```bash
python src/pixel_phenology_extract.py --help
```

### Direct CLI — datacube_to_geotiff Pipeline

Reads datacubes produced by `netcdf_datacube` and writes multi-band GeoTiffs:

```bash
python src/datacube_to_geotiff.py \
  --input-datacubes /path/to/NDVI_MyRegion_datacube.nc \
  --output-dir ./geotiff_stats \
  --workers 4
```

```bash
python src/datacube_to_geotiff.py --help
```

For full details, see [datacube_to_geotiff Pipeline](datacube_to_geotiff.md).

---

## Authors

**Stephen Conklin**, Geospatial Analyst — Pipeline architecture, orchestration, and all original code.
[https://github.com/stephenconklin](https://github.com/stephenconklin)

**G. Burch Fisher, PhD**, Research Scientist — Conceptual guidance and original code adapted for:
- `src/pixel_phenology_extract.py` (Per-pixel phenological metric extraction from CF-1.8 datacubes)

**AI Assistance:** This tool was developed with the assistance of Anthropic Claude / Claude Code. These tools assisted
with code generation and refinement under the direction and review of the author.

---

## License

MIT