datacube_to_geotiff Pipeline

Prerequisite: This pipeline reads datacubes produced by the netCDF Datacube Pipeline. Run that pipeline first to produce *_datacube.nc files, then point GEOTIFF_INPUT_DATACUBES at those files or their parent directory.

The datacube_to_geotiff pipeline reads per-pixel datacubes and writes three multi-band GeoTiffs per (VI, region) — one summarising the record year by year, one month by month, and one day-of-year by day-of-year. These products are designed for delivery to downstream spatial models that expect standard raster inputs rather than CF-1.8 netCDF files.

When to Use This Pipeline

Use the datacube_to_geotiff pipeline when you need:

Multi-band GeoTiff inputs for machine-learning or statistical models
Per-year, per-month, or per-DOY statistics as analysis-ready rasters
A format compatible with GDAL-based tools, ArcGIS, QGIS, or Google Earth Engine

Use the pixel phenology pipeline when you need spatially explicit phenological metric maps (SOS, POS, green-up rate, etc.) derived from the Whittaker-smoothed time series.

Selecting the Pipeline

Set PIPELINE in config.local.env:

PIPELINE="datacube_to_geotiff"

And point it at the datacubes:

GEOTIFF_INPUT_DATACUBES="${OUTPUT_DIR}/my_shapefile_stem"   # directory
# or
GEOTIFF_INPUT_DATACUBES="/path/to/NDVI_MyRegion_datacube.nc"   # single file

See Overview — Setup for the full config file model.

Output Products

Three GeoTiffs are written per (VI, region), each summarising the input time series at a different temporal resolution.

Per-Year (`*_per_year.tif`)

One group of 3 bands per calendar year present in the datacube:

Band	Statistic
`year{YYYY}_median`	Median of all valid observations in that calendar year
`year{YYYY}_p05`	5th percentile
`year{YYYY}_p95`	95th percentile

Band count: N_years × 3 (e.g. 15 bands for a 5-year record).

Per-Month (`*_per_month.tif`)

36 bands: 12 calendar months × 3 statistics. Uses a per-year-then-average method: for each calendar year, the median/p05/p95 across all valid observations in that month are computed; these annual values are then averaged across all years. This prevents years with more observations from dominating the result.

Bands	Description
`month01_median` … `month12_median`	Mean-of-annual-medians per calendar month
`month01_p05` … `month12_p05`	Mean of annual 5th percentiles
`month01_p95` … `month12_p95`	Mean of annual 95th percentiles

Band count: 36 (always, regardless of date range).

Per-DOY (`*_per_doy.tif`)

1095 bands: 365 day-of-year values × 3 statistics. Raw observations from all years are pooled at each DOY before computing statistics. At HLS’s ~5-day revisit cadence, approximately 80% of DOY bands will be all-NoData for a given pixel — only DOYs with actual acquisitions carry values.

Bands	Description
`doy001_median` … `doy365_median`	Median across all years at each DOY
`doy001_p05` … `doy365_p05`	5th percentile
`doy001_p95` … `doy365_p95`	95th percentile

Band count: 1095 (always). For large regions, use --skip-per-doy (or GEOTIFF_PER_DOY=false) — this product can exceed 4 GB.

GeoTiff Format

Property	Value
Format	GeoTiff (BigTIFF when > 4 GB)
Compression	LZW
Tiling	256 × 256 pixels
Data type	float32
NoData	`9.96920996838687e+36` (CF/NetCDF4 float32 fill value, `NC_FILL_FLOAT`)
CRS	Native CRS of the input datacube (UTM, meters)
Band descriptions	Set via GDAL standard field; readable with `gdalinfo -mdd all` or `rasterio.open().descriptions`

Band descriptions are accessible in Python:

import rasterio
with rasterio.open("NDVI_MyRegion_per_year.tif") as src:
    print(src.descriptions)   # ('year2020_median', 'year2020_p05', 'year2020_p95', ...)

File Structure

outputs/                                              ← GEOTIFF_OUTPUT_DIR
├── datacube_to_geotiff_20260320_153100.log           ← log file at output-dir root
└── Mesa_Verde/                                       ← one subfolder per region
    ├── NDVI_Mesa_Verde_per_year.tif
    ├── NDVI_Mesa_Verde_per_month.tif
    └── NDVI_Mesa_Verde_per_doy.tif

VI and region_label are parsed from the input datacube filename: {VI}_{region_label}_datacube.nc → first underscore-separated token = VI, remainder = region_label.

File Naming

File	Location
`{VI}_{region_label}_per_year.tif`	`{GEOTIFF_OUTPUT_DIR}/{region_label}/`
`{VI}_{region_label}_per_month.tif`	`{GEOTIFF_OUTPUT_DIR}/{region_label}/`
`{VI}_{region_label}_per_doy.tif`	`{GEOTIFF_OUTPUT_DIR}/{region_label}/`
`datacube_to_geotiff_{YYYYMMDD_HHMMSS}.log`	`{GEOTIFF_OUTPUT_DIR}/`

Processing Model

For each input datacube ({VI}_{region_label}_datacube.nc):

  1. Open with xarray (lazy); apply optional --start-date / --end-date filter
     Warn if uncompressed array size > 8 GB
     Warn if per-DOY output size estimate > 4 GB

  2. Apply valid-range mask (vi_min, vi_max → NaN)

  3. Write per_year.tif  (unless --skip-per-year)
     For each calendar year: compute median, p05, p95 → stream one band at a time

  4. Write per_month.tif  (unless --skip-per-month)
     Per-year-then-average: per-year percentiles first, then average across years

  5. Write per_doy.tif  (unless --skip-per-doy)
     Pool all years at each DOY: compute median, p05, p95

  Parallelised via ThreadPoolExecutor across input datacubes.
  Each GeoTiff is written one band at a time — peak memory = one spatial band.

CLI Reference

python src/datacube_to_geotiff.py --help

Argument	`config.local.env` variable	Default	Description
`--input-datacubes PATH [PATH ...]`	`GEOTIFF_INPUT_DATACUBES`	(required)	Path(s) to `_datacube.nc` files, or a directory (all `_datacube.nc` files found recursively)
`--output-dir PATH`	`GEOTIFF_OUTPUT_DIR`	`${OUTPUT_DIR}/geotiff_stats`	Root output directory
`--valid-range-ndvi MIN,MAX`	`VALID_RANGE_NDVI`	`-0.1,1.0`	Valid range for NDVI
`--valid-range-evi2 MIN,MAX`	`VALID_RANGE_EVI2`	`-1,2`	Valid range for EVI2
`--valid-range-nirv MIN,MAX`	`VALID_RANGE_NIRV`	`-0.5,1`	Valid range for NIRv
`--start-date YYYY-MM-DD`	`START_DATE`	—	Include only time steps on or after this date
`--end-date YYYY-MM-DD`	`END_DATE`	—	Include only time steps on or before this date
`--skip-per-year`	`GEOTIFF_PER_YEAR=false`	(per-year written)	Skip the per-year GeoTiff
`--skip-per-month`	`GEOTIFF_PER_MONTH=false`	(per-month written)	Skip the per-month GeoTiff
`--skip-per-doy`	`GEOTIFF_PER_DOY=false`	(per-DOY written)	Skip the per-DOY GeoTiff (recommended for large regions)
`--workers N`	`WORKERS`	`4`	Parallel threads for processing multiple datacubes concurrently
`--log-level LEVEL`	—	`INFO`	`DEBUG` `INFO` `WARNING` `ERROR`