Files
lidar_rendu/CLAUDE.md
Jacquin Antoine 1cf8e1752f Remove PMF, fix NaN in gradient visualizations, fix pos_open/neg_open shared param
- Remove PMF from ground classification options (PDAL recommends SMRF over PMF)
- Auto-detection now uses CSF for urban/complex terrain instead of PMF
- Add z_std > 30m heuristic to auto-select CSF for complex terrain
- Fix pos_open/neg_open lambda missing 'shared' parameter (NameError in workers)
- Fix NaN mask not restored in hillshade, slope, aspect, curvature
  (gradient-based products computed on filled DEM lost NaN transparency)
- Add nan_mask parameter to _save_tif for centralized NaN restoration
- DTM TIF kept by default (no longer deleted after WebP conversion)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-14 00:50:45 +02:00

87 lines
5.0 KiB
Markdown

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
LiDAR archaeological processing pipeline that generates 17 terrain visualizations from LAZ/LAS point clouds. Runs in Docker with optional NVIDIA GPU acceleration (CuPy). Designed for French LiDAR HD data in Lambert 93 (EPSG:2154).
## Commands
All commands run inside Docker. Use `./run.sh` as the primary interface.
```bash
./run.sh -g # Standard run with GPU
./run.sh -g -w 4 # GPU + 4 parallel workers
./run.sh -g -r 0.2 # High resolution (0.2m/px)
./run.sh --test # Run unit tests
./run.sh -g --file LHD_FXX_1000_6882_PTS_LAMB93_IGN69.copc # Single file
./run.sh --ground-classification csf # Force CSF ground classification (complex terrain)
./run.sh -g --keep-tif # Keep TIFF files (allows WebP regeneration without recalculating DTM)
./run.sh # Print help (no args)
```
Direct Docker:
```bash
docker build -t lidar-lidar .
docker run --rm --gpus all -v $(pwd)/input:/data/input:ro -v $(pwd)/output:/data/output lidar-lidar
```
## Architecture
### Module responsibilities
- **`cli.py`** — argparse + logging setup. Entry point via `python -m lidar_pipeline`.
- **`pipeline.py`** — `LidarArchaeoPipeline` orchestrator. `VIZ_STEPS` registry maps names to generate functions. `FilePrefixFilter` for parallel logging. Creates `SharedDEM` once per file and passes it to all visualizations.
- **`dtm.py`** — PDAL ground classification (SMRF/CSF + auto-detection) and DTM generation via scipy `binned_statistic_2d`.
- **`visualizations.py`** — 15 `generate_*` functions + 2 IGN overlay lambdas. All take `(dem_file, basename, vis_dir, resolution, shared=None)` and return a TIF path or None. `SharedDEM` class pre-computes gradient, NaN mask, LRM to avoid redundant I/O and computation.
- **`gpu.py`** — CuPy/numpy abstraction: `HAS_GPU`, `to_gpu()`, `to_cpu()`, `xp_gaussian_filter()`, `xp_uniform_filter()`, `xp_minimum_filter()`, `gpu_cleanup()`. Falls back to CPU gracefully.
- **`ign.py`** — IGN WMTS tile download + overlay generation for orthophoto and topographic maps.
- **`rendering.py`** — `COLORMAPS` dict maps filename keywords to (cmap, title, legend, description). `tif_to_png()` converts TIF→WebP with legend/scale/north arrow. `generate_pdf_report()` creates A3 PDF.
### SharedDEM optimization
`SharedDEM` pre-computes once per file:
- DEM data (single I/O read)
- NaN mask + filled DEM (single `_fill_nans` call, avoiding ~20 redundant calls)
- Gradient components (dy, dx, slope, aspect) shared by hillshade, slope, aspect, curvature
- LRM at 15m kernel (shared by lrm + anomalies)
`_filter_nanaware_from_filled()` applies filters on the pre-filled DEM, skipping the expensive `_fill_nans` interpolation.
### Adding a visualization
Three places must be updated:
1. `visualizations.py` — add `generate_X(dem_file, basename, vis_dir, resolution, shared=None)` function
2. `pipeline.py` `VIZ_STEPS` — add `('name', generate_X)` entry
3. `rendering.py` `COLORMAPS` — add entry keyed by the output filename keyword
### Ground classification
Auto-detection in `dtm.py` `detect_ground_method()`:
- Single-return ratio > 0.6 → CSF (urban terrain, cloth simulation)
- Height std > 30m → CSF (complex/mountainous terrain)
- Default → SMRF (natural terrain)
Override with `--ground-classification {auto,smrf,csf}`.
### NaN handling
DTM small gaps (< 1m from existing data) are filled using `rasterio.fill.fillnodata`. Large gaps remain as NaN. `SharedDEM` fills NaN once; `_filter_nanaware_from_filled()` applies filters on the pre-filled array and restores the NaN mask.
### Flow accumulation
Uses priority-flood algorithm (Wang & Liu 2006) for sink filling, which is O(n log n) instead of iterative minimum_filter. D8 accumulation uses numba JIT; falls back to pure Python if numba unavailable.
### Parallel processing
Uses `ProcessPoolExecutor` with `'spawn'` start method (required for CUDA). Each worker gets its own temp directory (`temp_{basename}`). `_process_file_standalone()` configures its own logger with `_file_filter` for per-file log prefixes.
## Key conventions
- **Language**: UI messages and comments in French. Code identifiers in English.
- **Logging**: Use `logger = logging.getLogger("lidar")`. Prefix per-file logs via `_file_filter.basename`.
- **GPU pattern**: `arr_gpu = to_gpu(arr)` compute `result = to_cpu(arr_gpu)` `gpu_cleanup()` between visualizations.
- **Output format**: Visualizations saved as WebP. TIFF intermediates deleted by default. Use `--keep-tif` to keep DTM+TIF for WebP regeneration with `--force`. No COGs or viewer.
- **Compression**: TIF intermediates use `deflate` compression (faster than LZW for float32 data).
- **Tests**: Run only inside Docker via `./run.sh --test`. Synthetic DEM fixture in `tests/conftest.py`.