Background réaliste CsI(Tl) + hybridation mesuré/synthétique + dashboard continuum

- Remplace le continuum exponentiel par un modèle réaliste CsI(Tl) dans
  l'entraînement (bosse asymétrique ~110 keV + queue Compton)
- Ajoute l'injection de background mesuré (70% mesuré / 30% synthétique)
  via --measured_background et MEASURED_BACKGROUND_PATH
- Ajoute l'endpoint /api/background/continuum et le toggle "Continuum CsI"
  sur le dashboard background
- Exclut le canal 1023 (overflow bin) de l'affichage web (NUM_CHANNELS=1023)
- Corrige le lissage Gaussien du background (normalisation locale aux bords)
- Met à jour README.md, CLAUDE.md, TUTORIEL.md, TOTO.md, vega_ml/README.md

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Jacquin Antoine
2026-05-19 18:14:00 +02:00
parent 1e0c1a5ea5
commit 75d271c696
17 changed files with 917 additions and 224 deletions

View File

@ -8,22 +8,25 @@ A machine learning system for identifying radioactive isotopes from gamma-ray sp
**Completed:** Vega ML model architecture (CNN-FCNN hybrid)
**Completed:** Training pipeline with GPU support
**Completed:** Inference engine
🔲 **Next:** Generate large training dataset (10,000-100,000 samples)
🔲 **Future:** Real-time inference on Radiacode devices
**Completed:** Realistic CsI(Tl) background model
**Completed:** Hybrid training (measured + synthetic background)
**Completed:** Web dashboard (FastAPI + Chart.js)
🔲 **Next:** Retrain model with realistic background
🔲 **Future:** Real-time inference on Radiacode devices
---
## Overview
This project aims to build a neural network that can identify radioactive isotopes from gamma spectra. Since collecting real gamma spectra requires radioactive sources and is expensive/regulated, we generate **synthetic training data** based on realistic physics models.
This project builds a neural network that identifies radioactive isotopes from gamma spectra. Since collecting real spectra requires radioactive sources and is expensive/regulated, we generate **synthetic training data** based on realistic physics models.
### Target Hardware
- **Training:** NVIDIA RTX 5090 GPU (requires PyTorch nightly with CUDA 12.8)
- **Training:** NVIDIA RTX 5060 Ti GPU (Blackwell, requires PyTorch 2.7+ with CUDA 12.8)
- **Inference:** Radiacode 101, 102, 103, 103G, 110 scintillation detectors
### Data Format
- **Input:** 2D spectrograms (time intervals × 1023 energy channels)
- **Output:** Multi-label isotope classification with activity estimation
- **Input:** 1D spectrum (1023 energy channels, 20-3000 keV, normalized to max)
- **Output:** Multi-label isotope classification (82 isotopes) with activity estimation (Bq)
---
@ -34,8 +37,7 @@ This project aims to build a neural network that can identify radioactive isotop
```bash
# Create virtual environment
python -m venv .venv
.venv\Scripts\activate # Windows
# or: source .venv/bin/activate # Linux/Mac
source .venv/bin/activate # Linux/Mac
# Install dependencies
pip install numpy scipy pillow
@ -47,25 +49,34 @@ pip install --pre torch torchvision --index-url https://download.pytorch.org/whl
### Generate Synthetic Data
```bash
# Generate 10 test samples
python -m synthetic_spectra.generate_spectra
# Generate 10 test samples (default)
python -m vega_ml.synthetic_spectra.generate_spectra --num_samples 10 --output_dir data/synthetic
# With measured background for hybrid training (recommended)
python -m vega_ml.synthetic_spectra.generate_spectra \
--num_samples 50000 \
--output_dir data/synthetic \
--measured_background /path/to/background_24h.npy
```
### Train the Model
```bash
# Quick test run (5 epochs, small dataset)
python training/vega/run_training.py --test
python -m vega_ml.training.vega.run_training --test
# Full training
python training/vega/run_training.py --epochs 100 --batch-size 32
python -m vega_ml.training.vega.run_training \
--data-dir data/synthetic \
--model-dir models \
--epochs 100 --batch-size 64
```
### Run Inference
```bash
# Run inference on synthetic data
python inference/run_inference.py --model models/vega_best.pt --data data/synthetic
python -m vega_ml.inference.run_inference --model models/vega_best.pt --data data/synthetic
```
---
@ -95,56 +106,74 @@ python inference/run_inference.py --model models/vega_best.pt --data data/synthe
## Synthetic Spectra Generation
### Realistic Background Model
The background continuum uses a realistic CsI(Tl) shape calibrated against real Radiacode 103 measurements, not a simple exponential:
- **Asymmetric hump** at ~110 keV (sigma_left=55 keV, sigma_right=50 keV) — the dominant low-energy scatter peak characteristic of CsI(Tl) detectors
- **Compton tail**: 0.45*exp(-E/240) + 0.04*exp(-E/700) — realistic high-energy falloff
- **Noise floor** at 0.8% of peak — prevents zero-count channels
This replaces the previous simple exponential `A*exp(-0.002*E)` which failed to reproduce the characteristic CsI(Tl) response.
### Hybrid Training with Measured Background
When a measured background file (`background_24h.npy`) is available, the generator blends it with the synthetic model:
- **70% measured** background shape (scaled to target CPS)
- **30% synthetic** continuum (for robustness against measurement artifacts)
- Stochastic isotope peaks (K-40, radon, thorium) are still added on top with random activity levels
This is controlled by the `--measured_background` CLI argument or the `MEASURED_BACKGROUND_PATH` environment variable.
### Features
- **82 isotopes** with accurate gamma emission lines
- **Realistic physics:** Gaussian peaks, Poisson noise, Compton continuum, environmental background
- **Realistic physics:** Gaussian peaks, Poisson noise, Compton continuum, CsI(Tl) background shape
- **Multiple detector models:** Radiacode 101, 102, 103, 103G, 110 with correct FWHM and energy ranges
- **Configurable variation:** Activity levels, measurement durations, isotope combinations
- **Decay chains:** Uranium-238, Thorium-232 chains with secular equilibrium
### Sample Distribution
### Sample Distribution (v3)
| Type | Proportion | Description |
|------|------------|-------------|
| Single isotope | 40% | One source + background |
| Dual isotope | 30% | Two sources blended |
| Multi isotope | 20% | 3-5 sources combined |
| Background only | 10% | Environmental only |
### Scaling Up
Edit `synthetic_spectra/generate_spectra.py` to generate larger datasets:
```python
generate_training_batch(
n_samples=100000, # Generate 100k samples
output_dir=Path("data/synthetic/spectra"),
detector_type="radiacode_103"
)
```
| Background only | 15% | Environmental background only |
| Single calibration | 20% | One check source + background |
| Single medical | 8% | Medical isotope + background |
| Single industrial | 5% | Industrial source + background |
| Uranium chain | 10% | U-238 + daughters in equilibrium |
| Thorium chain | 10% | Th-232 + daughters in equilibrium |
| NORM | 7% | Naturally occurring radioactive material |
| Fallout | 5% | Cs-137 + Cs-134 signature |
| Mixed | 10% | Random 2-3 isotope mixes |
| Complex mix | 5% | 4-6 isotopes from various categories |
| Weak source | 5% | Near-detection-limit sources |
---
## Project Structure
```
ml-for-isotope-identification/
train/vega_ml/
├── README.md # This file
├── agents.md # AI agent context documentation
├── .gitignore # Git ignore rules
├── synthetic_spectra/ # Spectrum generation package
│ ├── __init__.py
│ ├── config.py # Detector configurations
│ ├── generator.py # Main generation logic
│ ├── generate_spectra.py # CLI batch generation
│ ├── config.py # Detector configurations (Radiacode 101-110)
│ ├── generator.py # Main generation logic (SpectrumConfig)
│ ├── generate_spectra.py # CLI batch generation (v1)
│ ├── generate_spectra_v3.py # CLI batch generation (v3, parallel)
│ ├── ground_truth/
│ │ ├── isotope_data.py # 82 isotopes database
│ │ └── decay_chains.py # Decay chain definitions
│ └── physics/
│ └── spectrum_physics.py # Physics calculations
│ └── spectrum_physics.py # Physics calculations + realistic CsI(Tl) background
├── training/ # Training infrastructure
│ └── vega/ # Vega model package
│ ├── __init__.py
│ ├── isotope_index.py # Isotope ↔ index mapping
│ ├── model.py # VegaModel architecture
│ ├── model.py # VegaModel architecture + VegaLoss
│ ├── dataset.py # PyTorch Dataset/DataLoader
│ ├── train.py # Training loop & utilities
│ └── run_training.py # CLI training script
@ -176,11 +205,14 @@ ml-for-isotope-identification/
| Radiacode 103G | GAGG(Ce) | 7.4% | 20-3000 keV | 1024 |
| Radiacode 110 | CsI(Tl) | 8.4% | 20-3000 keV | 1024 |
Note: Only the first 1023 channels are used (channel 1023 is an overflow bin).
### Physics Model
- **Peak shape:** Gaussian with FWHM scaling as (E/662)
- **Expected counts:** λ = A × t × I × ε × T
- **Peak shape:** Gaussian with FWHM scaling as sqrt(E/662) for scintillators
- **Expected counts:** lambda = A * t * I * epsilon * T
- **Noise:** Poisson counting statistics
- **Background:** Exponential continuum + environmental isotopes (K-40, Pb-214, Bi-214, etc.)
- **Background:** Realistic CsI(Tl) continuum (asymmetric hump + Compton tail) + environmental isotope peaks (K-40, radon daughters, thorium daughters)
- **Hybrid mode:** Measured background can be blended with synthetic (70/30 ratio) for maximum realism
### Isotope Categories
- Natural background (K-40, Ra-226, Rn-222)
@ -199,21 +231,18 @@ ml-for-isotope-identification/
numpy>=1.24.0
scipy>=1.10.0
pillow>=9.0.0
torch>=2.11.0 (nightly with CUDA 12.8 for RTX 5090)
scikit-learn>=1.3.0
torch>=2.0.0
```
### GPU Support
The RTX 5090 (Blackwell architecture, sm_120) requires PyTorch nightly builds with CUDA 12.8:
For Blackwell GPUs (RTX 50-series, sm_120), use PyTorch 2.7+ with CUDA 12.8:
```bash
pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128
```
### For AI Agents
See [agents.md](agents.md) for comprehensive documentation on:
- System architecture and design decisions
- Physics model implementation details
- Vega model architecture and training
- Configuration options and variation strategies
See [agents.md](agents.md) for comprehensive documentation on system architecture, physics model details, and configuration options.
---
@ -224,12 +253,11 @@ See [agents.md](agents.md) for comprehensive documentation on:
- [x] ~~Implement CNN-FCNN model architecture (Vega)~~
- [x] ~~Create training script with logging~~
- [x] ~~Implement inference module~~
- [ ] Generate large training dataset (100k samples)
- [ ] Train model to convergence
- [ ] Add data augmentation pipeline
- [x] ~~Realistic CsI(Tl) background model~~
- [x] ~~Hybrid training with measured background~~
- [ ] Retrain model with realistic background
- [ ] Add model evaluation metrics & confusion matrix
- [ ] Implement real-time inference module
- [ ] Create Radiacode device integration
- [ ] Implement real-time inference on Radiacode devices
---