- Remove PMF from ground classification options (PDAL recommends SMRF over PMF) - Auto-detection now uses CSF for urban/complex terrain instead of PMF - Add z_std > 30m heuristic to auto-select CSF for complex terrain - Fix pos_open/neg_open lambda missing 'shared' parameter (NameError in workers) - Fix NaN mask not restored in hillshade, slope, aspect, curvature (gradient-based products computed on filled DEM lost NaN transparency) - Add nan_mask parameter to _save_tif for centralized NaN restoration - DTM TIF kept by default (no longer deleted after WebP conversion) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5.0 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
LiDAR archaeological processing pipeline that generates 17 terrain visualizations from LAZ/LAS point clouds. Runs in Docker with optional NVIDIA GPU acceleration (CuPy). Designed for French LiDAR HD data in Lambert 93 (EPSG:2154).
Commands
All commands run inside Docker. Use ./run.sh as the primary interface.
./run.sh -g # Standard run with GPU
./run.sh -g -w 4 # GPU + 4 parallel workers
./run.sh -g -r 0.2 # High resolution (0.2m/px)
./run.sh --test # Run unit tests
./run.sh -g --file LHD_FXX_1000_6882_PTS_LAMB93_IGN69.copc # Single file
./run.sh --ground-classification csf # Force CSF ground classification (complex terrain)
./run.sh -g --keep-tif # Keep TIFF files (allows WebP regeneration without recalculating DTM)
./run.sh # Print help (no args)
Direct Docker:
docker build -t lidar-lidar .
docker run --rm --gpus all -v $(pwd)/input:/data/input:ro -v $(pwd)/output:/data/output lidar-lidar
Architecture
Module responsibilities
cli.py— argparse + logging setup. Entry point viapython -m lidar_pipeline.pipeline.py—LidarArchaeoPipelineorchestrator.VIZ_STEPSregistry maps names to generate functions.FilePrefixFilterfor parallel logging. CreatesSharedDEMonce per file and passes it to all visualizations.dtm.py— PDAL ground classification (SMRF/CSF + auto-detection) and DTM generation via scipybinned_statistic_2d.visualizations.py— 15generate_*functions + 2 IGN overlay lambdas. All take(dem_file, basename, vis_dir, resolution, shared=None)and return a TIF path or None.SharedDEMclass pre-computes gradient, NaN mask, LRM to avoid redundant I/O and computation.gpu.py— CuPy/numpy abstraction:HAS_GPU,to_gpu(),to_cpu(),xp_gaussian_filter(),xp_uniform_filter(),xp_minimum_filter(),gpu_cleanup(). Falls back to CPU gracefully.ign.py— IGN WMTS tile download + overlay generation for orthophoto and topographic maps.rendering.py—COLORMAPSdict maps filename keywords to (cmap, title, legend, description).tif_to_png()converts TIF→WebP with legend/scale/north arrow.generate_pdf_report()creates A3 PDF.
SharedDEM optimization
SharedDEM pre-computes once per file:
- DEM data (single I/O read)
- NaN mask + filled DEM (single
_fill_nanscall, avoiding ~20 redundant calls) - Gradient components (dy, dx, slope, aspect) shared by hillshade, slope, aspect, curvature
- LRM at 15m kernel (shared by lrm + anomalies)
_filter_nanaware_from_filled() applies filters on the pre-filled DEM, skipping the expensive _fill_nans interpolation.
Adding a visualization
Three places must be updated:
visualizations.py— addgenerate_X(dem_file, basename, vis_dir, resolution, shared=None)functionpipeline.pyVIZ_STEPS— add('name', generate_X)entryrendering.pyCOLORMAPS— add entry keyed by the output filename keyword
Ground classification
Auto-detection in dtm.py detect_ground_method():
- Single-return ratio > 0.6 → CSF (urban terrain, cloth simulation)
- Height std > 30m → CSF (complex/mountainous terrain)
- Default → SMRF (natural terrain)
Override with --ground-classification {auto,smrf,csf}.
NaN handling
DTM small gaps (< 1m from existing data) are filled using rasterio.fill.fillnodata. Large gaps remain as NaN. SharedDEM fills NaN once; _filter_nanaware_from_filled() applies filters on the pre-filled array and restores the NaN mask.
Flow accumulation
Uses priority-flood algorithm (Wang & Liu 2006) for sink filling, which is O(n log n) instead of iterative minimum_filter. D8 accumulation uses numba JIT; falls back to pure Python if numba unavailable.
Parallel processing
Uses ProcessPoolExecutor with 'spawn' start method (required for CUDA). Each worker gets its own temp directory (temp_{basename}). _process_file_standalone() configures its own logger with _file_filter for per-file log prefixes.
Key conventions
- Language: UI messages and comments in French. Code identifiers in English.
- Logging: Use
logger = logging.getLogger("lidar"). Prefix per-file logs via_file_filter.basename. - GPU pattern:
arr_gpu = to_gpu(arr)→ compute →result = to_cpu(arr_gpu)→gpu_cleanup()between visualizations. - Output format: Visualizations saved as WebP. TIFF intermediates deleted by default. Use
--keep-tifto keep DTM+TIF for WebP regeneration with--force. No COGs or viewer. - Compression: TIF intermediates use
deflatecompression (faster than LZW for float32 data). - Tests: Run only inside Docker via
./run.sh --test. Synthetic DEM fixture intests/conftest.py.