datacube_to_geotiff Pipeline

Prerequisite: This pipeline reads datacubes produced by the netCDF Datacube Pipeline. Run that pipeline first to produce *_datacube.nc files, then point GEOTIFF_INPUT_DATACUBES at those files or their parent directory.

The datacube_to_geotiff pipeline reads per-pixel datacubes and writes three multi-band GeoTiffs per (VI, region) — one summarising the record year by year, one month by month, and one day-of-year by day-of-year. These products are designed for delivery to downstream spatial models that expect standard raster inputs rather than CF-1.8 netCDF files.


When to Use This Pipeline

Use the datacube_to_geotiff pipeline when you need:

  • Multi-band GeoTiff inputs for machine-learning or statistical models

  • Per-year, per-month, or per-DOY statistics as analysis-ready rasters

  • A format compatible with GDAL-based tools, ArcGIS, QGIS, or Google Earth Engine

Use the pixel phenology pipeline when you need spatially explicit phenological metric maps (SOS, POS, green-up rate, etc.) derived from the Whittaker-smoothed time series.


Selecting the Pipeline

Set PIPELINE in config.local.env:

PIPELINE="datacube_to_geotiff"

And point it at the datacubes:

GEOTIFF_INPUT_DATACUBES="${OUTPUT_DIR}/my_shapefile_stem"   # directory
# or
GEOTIFF_INPUT_DATACUBES="/path/to/NDVI_MyRegion_datacube.nc"   # single file

See Overview — Setup for the full config file model.


Output Products

Three GeoTiffs are written per (VI, region), each summarising the input time series at a different temporal resolution.

Per-Year (*_per_year.tif)

One group of 3 bands per calendar year present in the datacube:

Band

Statistic

year{YYYY}_median

Median of all valid observations in that calendar year

year{YYYY}_p05

5th percentile

year{YYYY}_p95

95th percentile

Band count: N_years × 3 (e.g. 15 bands for a 5-year record).

Per-Month (*_per_month.tif)

36 bands: 12 calendar months × 3 statistics. Uses a per-year-then-average method: for each calendar year, the median/p05/p95 across all valid observations in that month are computed; these annual values are then averaged across all years. This prevents years with more observations from dominating the result.

Bands

Description

month01_medianmonth12_median

Mean-of-annual-medians per calendar month

month01_p05month12_p05

Mean of annual 5th percentiles

month01_p95month12_p95

Mean of annual 95th percentiles

Band count: 36 (always, regardless of date range).

Per-DOY (*_per_doy.tif)

1095 bands: 365 day-of-year values × 3 statistics. Raw observations from all years are pooled at each DOY before computing statistics. At HLS’s ~5-day revisit cadence, approximately 80% of DOY bands will be all-NoData for a given pixel — only DOYs with actual acquisitions carry values.

Bands

Description

doy001_mediandoy365_median

Median across all years at each DOY

doy001_p05doy365_p05

5th percentile

doy001_p95doy365_p95

95th percentile

Band count: 1095 (always). For large regions, use --skip-per-doy (or GEOTIFF_PER_DOY=false) — this product can exceed 4 GB.


GeoTiff Format

Property

Value

Format

GeoTiff (BigTIFF when > 4 GB)

Compression

LZW

Tiling

256 × 256 pixels

Data type

float32

NoData

9.96920996838687e+36 (CF/NetCDF4 float32 fill value, NC_FILL_FLOAT)

CRS

Native CRS of the input datacube (UTM, meters)

Band descriptions

Set via GDAL standard field; readable with gdalinfo -mdd all or rasterio.open().descriptions

Band descriptions are accessible in Python:

import rasterio
with rasterio.open("NDVI_MyRegion_per_year.tif") as src:
    print(src.descriptions)   # ('year2020_median', 'year2020_p05', 'year2020_p95', ...)

File Structure

outputs/                                              ← GEOTIFF_OUTPUT_DIR
├── datacube_to_geotiff_20260320_153100.log           ← log file at output-dir root
└── Mesa_Verde/                                       ← one subfolder per region
    ├── NDVI_Mesa_Verde_per_year.tif
    ├── NDVI_Mesa_Verde_per_month.tif
    └── NDVI_Mesa_Verde_per_doy.tif

VI and region_label are parsed from the input datacube filename: {VI}_{region_label}_datacube.nc → first underscore-separated token = VI, remainder = region_label.

File Naming

File

Location

{VI}_{region_label}_per_year.tif

{GEOTIFF_OUTPUT_DIR}/{region_label}/

{VI}_{region_label}_per_month.tif

{GEOTIFF_OUTPUT_DIR}/{region_label}/

{VI}_{region_label}_per_doy.tif

{GEOTIFF_OUTPUT_DIR}/{region_label}/

datacube_to_geotiff_{YYYYMMDD_HHMMSS}.log

{GEOTIFF_OUTPUT_DIR}/


Processing Model

For each input datacube ({VI}_{region_label}_datacube.nc):

  1. Open with xarray (lazy); apply optional --start-date / --end-date filter
     Warn if uncompressed array size > 8 GB
     Warn if per-DOY output size estimate > 4 GB

  2. Apply valid-range mask (vi_min, vi_max → NaN)

  3. Write per_year.tif  (unless --skip-per-year)
     For each calendar year: compute median, p05, p95 → stream one band at a time

  4. Write per_month.tif  (unless --skip-per-month)
     Per-year-then-average: per-year percentiles first, then average across years

  5. Write per_doy.tif  (unless --skip-per-doy)
     Pool all years at each DOY: compute median, p05, p95

  Parallelised via ThreadPoolExecutor across input datacubes.
  Each GeoTiff is written one band at a time — peak memory = one spatial band.

CLI Reference

python src/datacube_to_geotiff.py --help

Argument

config.local.env variable

Default

Description

--input-datacubes PATH [PATH ...]

GEOTIFF_INPUT_DATACUBES

(required)

Path(s) to *_datacube.nc files, or a directory (all *_datacube.nc files found recursively)

--output-dir PATH

GEOTIFF_OUTPUT_DIR

${OUTPUT_DIR}/geotiff_stats

Root output directory

--valid-range-ndvi MIN,MAX

VALID_RANGE_NDVI

-0.1,1.0

Valid range for NDVI

--valid-range-evi2 MIN,MAX

VALID_RANGE_EVI2

-1,2

Valid range for EVI2

--valid-range-nirv MIN,MAX

VALID_RANGE_NIRV

-0.5,1

Valid range for NIRv

--start-date YYYY-MM-DD

START_DATE

Include only time steps on or after this date

--end-date YYYY-MM-DD

END_DATE

Include only time steps on or before this date

--skip-per-year

GEOTIFF_PER_YEAR=false

(per-year written)

Skip the per-year GeoTiff

--skip-per-month

GEOTIFF_PER_MONTH=false

(per-month written)

Skip the per-month GeoTiff

--skip-per-doy

GEOTIFF_PER_DOY=false

(per-DOY written)

Skip the per-DOY GeoTiff (recommended for large regions)

--workers N

WORKERS

4

Parallel threads for processing multiple datacubes concurrently

--log-level LEVEL

INFO

DEBUG INFO WARNING ERROR