From 00418d35bce5d49c190e376b944c6cf238966318 Mon Sep 17 00:00:00 2001 From: Jacquin Antoine Date: Thu, 21 May 2026 17:36:33 +0200 Subject: [PATCH] Docs: update CLAUDE.md with CsI correction, detector physics, and inference pipeline Co-Authored-By: Claude Opus 4.7 --- CLAUDE.md | 66 +++++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 54 insertions(+), 12 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index 8347a5a..d64f607 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -19,15 +19,17 @@ Data flow: `detect` writes `monitor_state.json` + `cps_log.jsonl` + daily report ### Web API Routes - `/api/status` — monitor status (connected, CPS, staleness) -- `/api/spectrum/current` — accumulated spectrum (1023 channels, overflow channel excluded) -- `/api/spectrum/difference` — background-subtracted spectrum +- `/api/spectrum/current` — accumulated spectrum (CsI-corrected, 1023 channels) +- `/api/spectrum/difference` — background-subtracted spectrum (CsI-corrected) - `/api/background`, `/api/background/spectrum`, `/api/background/reference`, `/api/background/theoretical` — background data (live, 24h reference, theoretical CsI(Tl) model) - `/api/cps/timeline` — CPS time series - `/api/history`, `/api/history/{date}` — daily detection reports ### Key Physics Constants -Energy calibration: `E(keV) = 0.33 + 2.97 * channel_index` (env vars `ENERGY_CALIBRATION_OFFSET` and `ENERGY_CALIBRATION_SLOPE`). The detector has 1024 raw channels but channel 1023 is an overflow bin — only the first 1023 channels (20–3036 keV) are used for display and inference. CsI(Tl) crystal with 8.4% FWHM at 662 keV. +Energy calibration: `E(keV) = 0.33 + 2.97 * channel_index` (env vars `ENERGY_CALIBRATION_OFFSET` and `ENERGY_CALIBRATION_SLOPE`). The detector has 1024 raw channels but channel 1023 is an overflow bin — only the first 1023 channels (0.33–3036 keV) are used for display and inference. CsI(Tl) crystal with 8.4% FWHM at 662 keV. + +**CsI(Tl) non-linear response correction**: CsI(Tl) has non-proportional scintillation response at low energies, causing peaks to appear at higher energies than their true gamma energy. The correction `E_apparent = E_true * (1 + alpha * exp(-E_true/beta))` with `alpha=0.37, beta=100` shifts the Am-241 peak from 71.6 keV (apparent) back to 59.5 keV (true). This correction is applied in the inference pipeline (`radiacode_monitor.py`) and web display, NOT in training data (which uses theoretical energies). Parameters are configurable via `CSI_NONLINEAR_ALPHA` and `CSI_NONLINEAR_BETA` env vars. ## Commands @@ -35,7 +37,7 @@ Energy calibration: `E(keV) = 0.33 + 2.97 * channel_index` (env vars `ENERGY_CAL # Build all images docker compose build -# Train model (GPU required, ~45 min on RTX 5060 Ti) +# Train model (GPU required, ~30 min on RTX 5060 Ti) docker compose run --rm train # Capture 24h background (leave running, no radioactive source nearby) @@ -49,30 +51,70 @@ docker compose up web # Run both detect and web docker compose up detect web + +# Test detection manually (inside detect container) +docker compose run --rm -v $(pwd)/test_detection.py:/app/test_detection.py detect python /app/test_detection.py ``` No test suite exists in this project. No linter is configured. ## VegaModel -Defined in `train/vega_ml/training/vega/model.py`. Input: 1D spectrum (1023 channels, normalized to max). Output: classification logits (82 isotopes, apply sigmoid for probabilities) + activity predictions (Bq, scaled by max_activity_bq=1000). Loss: `VegaLoss = BCE(logits) + 0.1 * Huber(activities * mask)` — regression only penalizes present isotopes. +Defined in `train/vega_ml/training/vega/model.py`. Input: 1D spectrum (1023 channels). Output: classification logits (82 isotopes, apply sigmoid for probabilities) + activity predictions (Bq, scaled by max_activity_bq=1000). Loss: `VegaLoss = BCE(logits) + 0.1 * Huber(activities * mask)` — regression only penalizes present isotopes. + +**Inference pipeline** (in `radiacode_monitor.py::run_inference`): +1. Subtract background from accumulated spectrum → net_rate +2. Apply CsI(Tl) non-linear correction: `correct_csi_nonlinear(net_rate)` — remaps channels so peaks appear at theoretical energies +3. Normalize with log1p: `log1p(corrected) / max(log1p(corrected))` +4. Feed to VegaModel → sigmoid → filter by threshold The model checkpoint (`models/vega_best.pt`) stores `model_config` and `model_state_dict`. At inference, the detect container dynamically imports `VegaModel` and `IsotopeIndex` from the mounted `vega_ml` volume. -## Synthetic Background Model +## Synthetic Spectrum Generation -The training background uses a realistic CsI(Tl) continuum shape (not a simple exponential): +### Detector Physics Model -- **Continuum**: Asymmetric hump at ~110 keV (sigma_left=55, sigma_right=50 keV) + Compton tail (`0.45*exp(-E/240) + 0.04*exp(-E/700)`) + noise floor. Calibrated against real Radiacode 103 measurements. Implemented in `spectrum_physics.py::generate_realistic_continuum()`. -- **Isotope peaks**: K-40 (1460 keV), Pb-214 (295, 352 keV), Bi-214 (609, 1120, 1764 keV), Ac-228 (911 keV), Pb-212 (239 keV), Tl-208 (583, 2614 keV) — with stochastic activity variation per sample. -- **Hybrid training**: If `MEASURED_BACKGROUND_PATH` points to a valid `.npy` file, 70% measured + 30% synthetic continuum is used. This is controlled by `SpectrumConfig.measured_background_path` and the `--measured_background` CLI argument. +Training spectra include realistic CsI(Tl) detector effects: + +- **Energy calibration**: `E = 0.33 + 2.97 * ch` with 1023 channels (matching real detector) +- **K-escape peaks**: Iodine K-shell X-ray escape at `E - 28.5 keV` with energy-dependent escape fraction (up to 35% at low energies). Implemented in `spectrum_physics.py::_k_escape_fraction()` +- **Asymmetric peaks**: Low-energy tail for peaks below 200 keV (15% tail fraction at 0 keV, 0% above 200 keV). Implemented in `spectrum_physics.py::_asymmetric_peak()` +- **FWHM**: Energy-dependent resolution `FWHM(E) = 0.084 * 662 * sqrt(E/662)` keV (8.4% at 662 keV) + +### Background Model + +The training background uses a realistic CsI(Tl) continuum shape: + +- **Continuum**: Asymmetric hump at ~110 keV (sigma_left=55, sigma_right=50 keV) + Compton tail + noise floor. Calibrated against real Radiacode 103 measurements. +- **Isotope peaks**: K-40, Pb-214, Bi-214, Ac-228, Pb-212, Tl-208 — with stochastic activity variation per sample. +- **Hybrid training**: If `MEASURED_BACKGROUND_PATH` points to a valid `.npy` file, 70% measured + 30% synthetic continuum is used. +- **Background subtraction mode**: 10% of training samples are background-subtracted (simulate the inference pipeline) + +### Training Data Augmentation + +- **Normalization**: log1p (replaces max normalization for better weak-signal detection) +- **Low-signal samples**: 15% of samples use 0.01–5 Bq activities with 30–300s durations +- **Duration range**: 30–300 seconds (covers short accumulations to long measurements) +- **Activity range**: 0.01–100 Bq (covers weak to strong sources) ## Configuration All config is via environment variables in `docker-compose.yml`. Key variables: -- `MODEL_PATH`, `ISOTOPE_INDEX_PATH`, `BACKGROUND_PATH` — file paths (container-mounted volumes) + +**Train container:** +- `NUM_SAMPLES` — number of synthetic spectra (default 50000) +- `BATCH_SIZE` — training batch size (default 32) +- `MIN_DURATION`/`MAX_DURATION` — spectrum duration range in seconds (default 30–300) +- `MEASURED_BACKGROUND_PATH` — path to measured background `.npy` for hybrid training + +**Detect container:** +- `MODEL_PATH`, `ISOTOPE_INDEX_PATH`, `BACKGROUND_PATH` — file paths - `VEGA_DEVICE` — `cpu` or `cuda` - `THRESHOLD` — detection probability threshold (default 0.5) - `SAMPLE_INTERVAL` — seconds between samples (default 60) - `ENERGY_CALIBRATION_OFFSET/SLOPE` — energy calibration constants -- `MEASURED_BACKGROUND_PATH` — path to measured background `.npy` for hybrid training (default: `/data/background_24h.npy`) \ No newline at end of file +- `CSI_NONLINEAR_ALPHA/BETA` — CsI(Tl) non-linear response correction (default 0.37/100.0) + +**Web container:** +- `ENERGY_CALIBRATION_OFFSET/SLOPE` — energy calibration constants +- `CSI_NONLINEAR_ALPHA/BETA` — CsI(Tl) correction parameters (must match detect) \ No newline at end of file