datacube_to_geotiff Pipeline
Prerequisite: This pipeline reads datacubes produced by the netCDF Datacube Pipeline. Run that pipeline first to produce
*_datacube.ncfiles, then pointGEOTIFF_INPUT_DATACUBESat those files or their parent directory.
The datacube_to_geotiff pipeline reads per-pixel datacubes and writes three
multi-band GeoTiffs per (VI, region) — one summarising the record year by year,
one month by month, and one day-of-year by day-of-year. These products are
designed for delivery to downstream spatial models that expect standard raster
inputs rather than CF-1.8 netCDF files.
When to Use This Pipeline
Use the datacube_to_geotiff pipeline when you need:
Multi-band GeoTiff inputs for machine-learning or statistical models
Per-year, per-month, or per-DOY statistics as analysis-ready rasters
A format compatible with GDAL-based tools, ArcGIS, QGIS, or Google Earth Engine
Use the pixel phenology pipeline when you need spatially explicit phenological metric maps (SOS, POS, green-up rate, etc.) derived from the Whittaker-smoothed time series.
Selecting the Pipeline
Set PIPELINE in config.local.env:
PIPELINE="datacube_to_geotiff"
And point it at the datacubes:
GEOTIFF_INPUT_DATACUBES="${OUTPUT_DIR}/my_shapefile_stem" # directory
# or
GEOTIFF_INPUT_DATACUBES="/path/to/NDVI_MyRegion_datacube.nc" # single file
See Overview — Setup for the full config file model.
Output Products
Three GeoTiffs are written per (VI, region), each summarising the input time series at a different temporal resolution.
Per-Year (*_per_year.tif)
One group of 3 bands per calendar year present in the datacube:
Band |
Statistic |
|---|---|
|
Median of all valid observations in that calendar year |
|
5th percentile |
|
95th percentile |
Band count: N_years × 3 (e.g. 15 bands for a 5-year record).
Per-Month (*_per_month.tif)
36 bands: 12 calendar months × 3 statistics. Uses a per-year-then-average method: for each calendar year, the median/p05/p95 across all valid observations in that month are computed; these annual values are then averaged across all years. This prevents years with more observations from dominating the result.
Bands |
Description |
|---|---|
|
Mean-of-annual-medians per calendar month |
|
Mean of annual 5th percentiles |
|
Mean of annual 95th percentiles |
Band count: 36 (always, regardless of date range).
Per-DOY (*_per_doy.tif)
1095 bands: 365 day-of-year values × 3 statistics. Raw observations from all years are pooled at each DOY before computing statistics. At HLS’s ~5-day revisit cadence, approximately 80% of DOY bands will be all-NoData for a given pixel — only DOYs with actual acquisitions carry values.
Bands |
Description |
|---|---|
|
Median across all years at each DOY |
|
5th percentile |
|
95th percentile |
Band count: 1095 (always). For large regions, use --skip-per-doy
(or GEOTIFF_PER_DOY=false) — this product can exceed 4 GB.
GeoTiff Format
Property |
Value |
|---|---|
Format |
GeoTiff (BigTIFF when > 4 GB) |
Compression |
LZW |
Tiling |
256 × 256 pixels |
Data type |
float32 |
NoData |
|
CRS |
Native CRS of the input datacube (UTM, meters) |
Band descriptions |
Set via GDAL standard field; readable with |
Band descriptions are accessible in Python:
import rasterio
with rasterio.open("NDVI_MyRegion_per_year.tif") as src:
print(src.descriptions) # ('year2020_median', 'year2020_p05', 'year2020_p95', ...)
File Structure
outputs/ ← GEOTIFF_OUTPUT_DIR
├── datacube_to_geotiff_20260320_153100.log ← log file at output-dir root
└── Mesa_Verde/ ← one subfolder per region
├── NDVI_Mesa_Verde_per_year.tif
├── NDVI_Mesa_Verde_per_month.tif
└── NDVI_Mesa_Verde_per_doy.tif
VI and region_label are parsed from the input datacube filename:
{VI}_{region_label}_datacube.nc → first underscore-separated token = VI, remainder = region_label.
File Naming
File |
Location |
|---|---|
|
|
|
|
|
|
|
|
Processing Model
For each input datacube ({VI}_{region_label}_datacube.nc):
1. Open with xarray (lazy); apply optional --start-date / --end-date filter
Warn if uncompressed array size > 8 GB
Warn if per-DOY output size estimate > 4 GB
2. Apply valid-range mask (vi_min, vi_max → NaN)
3. Write per_year.tif (unless --skip-per-year)
For each calendar year: compute median, p05, p95 → stream one band at a time
4. Write per_month.tif (unless --skip-per-month)
Per-year-then-average: per-year percentiles first, then average across years
5. Write per_doy.tif (unless --skip-per-doy)
Pool all years at each DOY: compute median, p05, p95
Parallelised via ThreadPoolExecutor across input datacubes.
Each GeoTiff is written one band at a time — peak memory = one spatial band.
CLI Reference
python src/datacube_to_geotiff.py --help
Argument |
|
Default |
Description |
|---|---|---|---|
|
|
(required) |
Path(s) to |
|
|
|
Root output directory |
|
|
|
Valid range for NDVI |
|
|
|
Valid range for EVI2 |
|
|
|
Valid range for NIRv |
|
|
— |
Include only time steps on or after this date |
|
|
— |
Include only time steps on or before this date |
|
|
(per-year written) |
Skip the per-year GeoTiff |
|
|
(per-month written) |
Skip the per-month GeoTiff |
|
|
(per-DOY written) |
Skip the per-DOY GeoTiff (recommended for large regions) |
|
|
|
Parallel threads for processing multiple datacubes concurrently |
|
— |
|
|