Background réaliste CsI(Tl) + hybridation mesuré/synthétique + dashboard continuum

- Remplace le continuum exponentiel par un modèle réaliste CsI(Tl) dans
  l'entraînement (bosse asymétrique ~110 keV + queue Compton)
- Ajoute l'injection de background mesuré (70% mesuré / 30% synthétique)
  via --measured_background et MEASURED_BACKGROUND_PATH
- Ajoute l'endpoint /api/background/continuum et le toggle "Continuum CsI"
  sur le dashboard background
- Exclut le canal 1023 (overflow bin) de l'affichage web (NUM_CHANNELS=1023)
- Corrige le lissage Gaussien du background (normalisation locale aux bords)
- Met à jour README.md, CLAUDE.md, TUTORIEL.md, TOTO.md, vega_ml/README.md

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Jacquin Antoine
2026-05-19 18:14:00 +02:00
parent 1e0c1a5ea5
commit 75d271c696
17 changed files with 917 additions and 224 deletions

View File

@ -8,22 +8,25 @@ A machine learning system for identifying radioactive isotopes from gamma-ray sp
**Completed:** Vega ML model architecture (CNN-FCNN hybrid)
**Completed:** Training pipeline with GPU support
**Completed:** Inference engine
🔲 **Next:** Generate large training dataset (10,000-100,000 samples)
🔲 **Future:** Real-time inference on Radiacode devices
**Completed:** Realistic CsI(Tl) background model
**Completed:** Hybrid training (measured + synthetic background)
**Completed:** Web dashboard (FastAPI + Chart.js)
🔲 **Next:** Retrain model with realistic background
🔲 **Future:** Real-time inference on Radiacode devices
---
## Overview
This project aims to build a neural network that can identify radioactive isotopes from gamma spectra. Since collecting real gamma spectra requires radioactive sources and is expensive/regulated, we generate **synthetic training data** based on realistic physics models.
This project builds a neural network that identifies radioactive isotopes from gamma spectra. Since collecting real spectra requires radioactive sources and is expensive/regulated, we generate **synthetic training data** based on realistic physics models.
### Target Hardware
- **Training:** NVIDIA RTX 5090 GPU (requires PyTorch nightly with CUDA 12.8)
- **Training:** NVIDIA RTX 5060 Ti GPU (Blackwell, requires PyTorch 2.7+ with CUDA 12.8)
- **Inference:** Radiacode 101, 102, 103, 103G, 110 scintillation detectors
### Data Format
- **Input:** 2D spectrograms (time intervals × 1023 energy channels)
- **Output:** Multi-label isotope classification with activity estimation
- **Input:** 1D spectrum (1023 energy channels, 20-3000 keV, normalized to max)
- **Output:** Multi-label isotope classification (82 isotopes) with activity estimation (Bq)
---
@ -34,8 +37,7 @@ This project aims to build a neural network that can identify radioactive isotop
```bash
# Create virtual environment
python -m venv .venv
.venv\Scripts\activate # Windows
# or: source .venv/bin/activate # Linux/Mac
source .venv/bin/activate # Linux/Mac
# Install dependencies
pip install numpy scipy pillow
@ -47,25 +49,34 @@ pip install --pre torch torchvision --index-url https://download.pytorch.org/whl
### Generate Synthetic Data
```bash
# Generate 10 test samples
python -m synthetic_spectra.generate_spectra
# Generate 10 test samples (default)
python -m vega_ml.synthetic_spectra.generate_spectra --num_samples 10 --output_dir data/synthetic
# With measured background for hybrid training (recommended)
python -m vega_ml.synthetic_spectra.generate_spectra \
--num_samples 50000 \
--output_dir data/synthetic \
--measured_background /path/to/background_24h.npy
```
### Train the Model
```bash
# Quick test run (5 epochs, small dataset)
python training/vega/run_training.py --test
python -m vega_ml.training.vega.run_training --test
# Full training
python training/vega/run_training.py --epochs 100 --batch-size 32
python -m vega_ml.training.vega.run_training \
--data-dir data/synthetic \
--model-dir models \
--epochs 100 --batch-size 64
```
### Run Inference
```bash
# Run inference on synthetic data
python inference/run_inference.py --model models/vega_best.pt --data data/synthetic
python -m vega_ml.inference.run_inference --model models/vega_best.pt --data data/synthetic
```
---
@ -95,56 +106,74 @@ python inference/run_inference.py --model models/vega_best.pt --data data/synthe
## Synthetic Spectra Generation
### Realistic Background Model
The background continuum uses a realistic CsI(Tl) shape calibrated against real Radiacode 103 measurements, not a simple exponential:
- **Asymmetric hump** at ~110 keV (sigma_left=55 keV, sigma_right=50 keV) — the dominant low-energy scatter peak characteristic of CsI(Tl) detectors
- **Compton tail**: 0.45*exp(-E/240) + 0.04*exp(-E/700) — realistic high-energy falloff
- **Noise floor** at 0.8% of peak — prevents zero-count channels
This replaces the previous simple exponential `A*exp(-0.002*E)` which failed to reproduce the characteristic CsI(Tl) response.
### Hybrid Training with Measured Background
When a measured background file (`background_24h.npy`) is available, the generator blends it with the synthetic model:
- **70% measured** background shape (scaled to target CPS)
- **30% synthetic** continuum (for robustness against measurement artifacts)
- Stochastic isotope peaks (K-40, radon, thorium) are still added on top with random activity levels
This is controlled by the `--measured_background` CLI argument or the `MEASURED_BACKGROUND_PATH` environment variable.
### Features
- **82 isotopes** with accurate gamma emission lines
- **Realistic physics:** Gaussian peaks, Poisson noise, Compton continuum, environmental background
- **Realistic physics:** Gaussian peaks, Poisson noise, Compton continuum, CsI(Tl) background shape
- **Multiple detector models:** Radiacode 101, 102, 103, 103G, 110 with correct FWHM and energy ranges
- **Configurable variation:** Activity levels, measurement durations, isotope combinations
- **Decay chains:** Uranium-238, Thorium-232 chains with secular equilibrium
### Sample Distribution
### Sample Distribution (v3)
| Type | Proportion | Description |
|------|------------|-------------|
| Single isotope | 40% | One source + background |
| Dual isotope | 30% | Two sources blended |
| Multi isotope | 20% | 3-5 sources combined |
| Background only | 10% | Environmental only |
### Scaling Up
Edit `synthetic_spectra/generate_spectra.py` to generate larger datasets:
```python
generate_training_batch(
n_samples=100000, # Generate 100k samples
output_dir=Path("data/synthetic/spectra"),
detector_type="radiacode_103"
)
```
| Background only | 15% | Environmental background only |
| Single calibration | 20% | One check source + background |
| Single medical | 8% | Medical isotope + background |
| Single industrial | 5% | Industrial source + background |
| Uranium chain | 10% | U-238 + daughters in equilibrium |
| Thorium chain | 10% | Th-232 + daughters in equilibrium |
| NORM | 7% | Naturally occurring radioactive material |
| Fallout | 5% | Cs-137 + Cs-134 signature |
| Mixed | 10% | Random 2-3 isotope mixes |
| Complex mix | 5% | 4-6 isotopes from various categories |
| Weak source | 5% | Near-detection-limit sources |
---
## Project Structure
```
ml-for-isotope-identification/
train/vega_ml/
├── README.md # This file
├── agents.md # AI agent context documentation
├── .gitignore # Git ignore rules
├── synthetic_spectra/ # Spectrum generation package
│ ├── __init__.py
│ ├── config.py # Detector configurations
│ ├── generator.py # Main generation logic
│ ├── generate_spectra.py # CLI batch generation
│ ├── config.py # Detector configurations (Radiacode 101-110)
│ ├── generator.py # Main generation logic (SpectrumConfig)
│ ├── generate_spectra.py # CLI batch generation (v1)
│ ├── generate_spectra_v3.py # CLI batch generation (v3, parallel)
│ ├── ground_truth/
│ │ ├── isotope_data.py # 82 isotopes database
│ │ └── decay_chains.py # Decay chain definitions
│ └── physics/
│ └── spectrum_physics.py # Physics calculations
│ └── spectrum_physics.py # Physics calculations + realistic CsI(Tl) background
├── training/ # Training infrastructure
│ └── vega/ # Vega model package
│ ├── __init__.py
│ ├── isotope_index.py # Isotope ↔ index mapping
│ ├── model.py # VegaModel architecture
│ ├── model.py # VegaModel architecture + VegaLoss
│ ├── dataset.py # PyTorch Dataset/DataLoader
│ ├── train.py # Training loop & utilities
│ └── run_training.py # CLI training script
@ -176,11 +205,14 @@ ml-for-isotope-identification/
| Radiacode 103G | GAGG(Ce) | 7.4% | 20-3000 keV | 1024 |
| Radiacode 110 | CsI(Tl) | 8.4% | 20-3000 keV | 1024 |
Note: Only the first 1023 channels are used (channel 1023 is an overflow bin).
### Physics Model
- **Peak shape:** Gaussian with FWHM scaling as (E/662)
- **Expected counts:** λ = A × t × I × ε × T
- **Peak shape:** Gaussian with FWHM scaling as sqrt(E/662) for scintillators
- **Expected counts:** lambda = A * t * I * epsilon * T
- **Noise:** Poisson counting statistics
- **Background:** Exponential continuum + environmental isotopes (K-40, Pb-214, Bi-214, etc.)
- **Background:** Realistic CsI(Tl) continuum (asymmetric hump + Compton tail) + environmental isotope peaks (K-40, radon daughters, thorium daughters)
- **Hybrid mode:** Measured background can be blended with synthetic (70/30 ratio) for maximum realism
### Isotope Categories
- Natural background (K-40, Ra-226, Rn-222)
@ -199,21 +231,18 @@ ml-for-isotope-identification/
numpy>=1.24.0
scipy>=1.10.0
pillow>=9.0.0
torch>=2.11.0 (nightly with CUDA 12.8 for RTX 5090)
scikit-learn>=1.3.0
torch>=2.0.0
```
### GPU Support
The RTX 5090 (Blackwell architecture, sm_120) requires PyTorch nightly builds with CUDA 12.8:
For Blackwell GPUs (RTX 50-series, sm_120), use PyTorch 2.7+ with CUDA 12.8:
```bash
pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128
```
### For AI Agents
See [agents.md](agents.md) for comprehensive documentation on:
- System architecture and design decisions
- Physics model implementation details
- Vega model architecture and training
- Configuration options and variation strategies
See [agents.md](agents.md) for comprehensive documentation on system architecture, physics model details, and configuration options.
---
@ -224,12 +253,11 @@ See [agents.md](agents.md) for comprehensive documentation on:
- [x] ~~Implement CNN-FCNN model architecture (Vega)~~
- [x] ~~Create training script with logging~~
- [x] ~~Implement inference module~~
- [ ] Generate large training dataset (100k samples)
- [ ] Train model to convergence
- [ ] Add data augmentation pipeline
- [x] ~~Realistic CsI(Tl) background model~~
- [x] ~~Hybrid training with measured background~~
- [ ] Retrain model with realistic background
- [ ] Add model evaluation metrics & confusion matrix
- [ ] Implement real-time inference module
- [ ] Create Radiacode device integration
- [ ] Implement real-time inference on Radiacode devices
---

View File

@ -136,6 +136,7 @@ def generate_training_batch(
background_only_fraction: float = 0.1,
save_png: bool = False,
random_seed: int = None,
measured_background_path: str = None,
) -> list:
"""
Generate a batch of training samples with various configurations.
@ -210,6 +211,7 @@ def generate_training_batch(
duration,
detector_name=detector_name,
include_background=True,
measured_background_path=measured_background_path,
)
# Save spectrum (don't accumulate in memory)
@ -240,6 +242,7 @@ def generate_training_batch(
duration,
detector_name=detector_name,
include_background=True,
measured_background_path=measured_background_path,
)
save_spectrum(
@ -270,6 +273,7 @@ def generate_training_batch(
duration,
detector_name=detector_name,
include_background=True,
measured_background_path=measured_background_path,
)
save_spectrum(
@ -295,6 +299,7 @@ def generate_training_batch(
sources=[], # No additional sources
include_background=True,
detector_name=detector_name,
measured_background_path=measured_background_path,
)
spectrum = generator.generate_spectrum(config)
@ -367,6 +372,13 @@ def main():
default=100.0,
help="Maximum source activity in Bq (default: 100.0)"
)
parser.add_argument(
"--measured_background",
type=str,
default=None,
help="Path to measured background .npy file for hybrid training"
)
parser.add_argument(
"--save_png",
@ -402,6 +414,7 @@ def main():
activity_range=(args.min_activity, args.max_activity),
save_png=args.save_png,
random_seed=args.seed,
measured_background_path=args.measured_background,
)
print("\n" + "=" * 60)

View File

@ -405,6 +405,7 @@ def generate_single_sample(args: Tuple[int, dict]) -> Optional[str]:
include_radon=bg_params['include_radon'],
include_thorium=bg_params['include_thorium'],
detector_name=config['detector_name'],
measured_background_path=config.get('measured_background_path'),
)
# Generate spectrum
@ -437,6 +438,7 @@ def generate_training_data_v3(
scenarios: Optional[List[SampleScenario]] = None,
num_workers: int = None,
random_seed: int = None,
measured_background_path: Optional[str] = None,
) -> int:
"""
Generate training samples in parallel.
@ -498,6 +500,7 @@ def generate_training_data_v3(
'bg_intensity_max': bg_intensity_range[1],
'base_seed': random_seed,
'scenarios': scenarios,
'measured_background_path': measured_background_path,
}
# Create work items
@ -560,9 +563,11 @@ def main():
help='Minimum activity in Bq')
parser.add_argument('--activity_max', type=float, default=100.0,
help='Maximum activity in Bq')
parser.add_argument('--measured_background', type=str, default=None,
help='Path to measured background .npy file for hybrid training')
args = parser.parse_args()
generate_training_data_v3(
num_samples=args.num_samples,
output_dir=Path(args.output_dir),
@ -570,6 +575,7 @@ def main():
activity_range=(args.activity_min, args.activity_max),
num_workers=args.workers,
random_seed=args.seed,
measured_background_path=args.measured_background,
)

View File

@ -63,6 +63,7 @@ class SpectrumConfig:
include_k40: bool = True
include_radon: bool = True
include_thorium: bool = True
measured_background_path: Optional[str] = None
# Detector configuration
detector_name: str = "radiacode_103"
@ -166,7 +167,8 @@ class SpectrumGenerator:
include_k40=background_config.get('include_k40', True),
include_radon=background_config.get('include_radon', True),
include_thorium=background_config.get('include_thorium', True),
detector_config=self.detector_config
detector_config=self.detector_config,
measured_background_path=background_config.get('measured_background_path')
)
spectrum += bg_spectrum
background_isotopes = bg_isotopes
@ -264,6 +266,7 @@ class SpectrumGenerator:
'include_k40': config.include_k40,
'include_radon': config.include_radon,
'include_thorium': config.include_thorium,
'measured_background_path': config.measured_background_path,
}
)
all_source_isotopes.extend(src_iso)

View File

@ -9,6 +9,7 @@ Implements the physics of gamma spectrum generation including:
"""
import numpy as np
from pathlib import Path
from scipy import special
from typing import Optional, Tuple, List
from dataclasses import dataclass
@ -274,14 +275,14 @@ def generate_exponential_background(
) -> np.ndarray:
"""
Generate exponential background continuum.
B(E) = A * exp(-b * E)
Args:
energy_bins: Array of energy bin centers (keV)
amplitude: Background amplitude at E=0
decay_constant: Exponential decay constant (1/keV)
Returns:
Array of background counts
"""
@ -294,26 +295,123 @@ def generate_polynomial_background(
) -> np.ndarray:
"""
Generate polynomial background.
B(E) = Σ c_m * E^m
Args:
energy_bins: Array of energy bin centers (keV)
coefficients: Polynomial coefficients [c0, c1, c2, ...]
Returns:
Array of background counts
"""
if coefficients is None:
coefficients = [10.0, -0.005, 1e-6] # Default quadratic
background = np.zeros_like(energy_bins)
for m, c in enumerate(coefficients):
background += c * (energy_bins ** m)
return np.maximum(0, background)
def generate_realistic_continuum(
energy_bins: np.ndarray,
total_counts: float,
detector_config: Optional[DetectorConfig] = None
) -> np.ndarray:
"""
Generate realistic CsI(Tl) background continuum shape.
Calibrated against real Radiacode 103 background measurements.
Produces the characteristic asymmetric hump at ~110 keV and
Compton-like tail that simple exponentials miss.
Shape components:
- Asymmetric hump centered at ~110 keV (sigma_left=55, sigma_right=50 keV)
- Compton continuum: 0.45*exp(-E/240) + 0.04*exp(-E/700)
- Noise floor at 0.8% of peak
Args:
energy_bins: Array of energy bin centers (keV)
total_counts: Target total counts in the continuum
detector_config: Detector configuration (unused, kept for API consistency)
Returns:
Array of background counts matching real CsI(Tl) continuum shape
"""
E = energy_bins
# Asymmetric hump at ~110 keV (low-energy scatter peak in CsI(Tl))
hump_center = 110.0
sigma_left = 55.0 # Broader on the low-energy side
sigma_right = 50.0 # Narrower on the high-energy side
hump = np.where(
E <= hump_center,
np.exp(-0.5 * ((E - hump_center) / sigma_left) ** 2),
np.exp(-0.5 * ((E - hump_center) / sigma_right) ** 2),
)
# Compton continuum tail
tail = 0.45 * np.exp(-E / 240.0) + 0.04 * np.exp(-E / 700.0)
# Noise floor (low-level baseline)
noise_floor = 0.008
# Combine shape components
continuum = hump + tail + noise_floor
# Normalize to target total counts
if continuum.sum() > 0 and total_counts > 0:
continuum *= total_counts / continuum.sum()
return continuum
def load_measured_background(
path: str,
energy_bins: np.ndarray,
duration_seconds: float
) -> Optional[np.ndarray]:
"""
Load a measured background spectrum from a .npy file and rescale it
to match the target duration.
The .npy file should contain a dict with keys 'counts' and 'duration'.
Args:
path: Path to the .npy background file
energy_bins: Array of energy bin centers (keV) for alignment
duration_seconds: Target duration to scale the spectrum to
Returns:
Background spectrum scaled to target duration, or None if file not found
"""
bg_path = Path(path)
if not bg_path.exists():
return None
try:
bg_data = np.load(str(bg_path), allow_pickle=True).item()
bg_counts = bg_data["counts"].astype(np.float64)
bg_duration = float(bg_data["duration"])
# Truncate or pad to match energy_bins length
num_channels = len(energy_bins)
if len(bg_counts) > num_channels:
bg_counts = bg_counts[:num_channels]
elif len(bg_counts) < num_channels:
bg_counts = np.pad(bg_counts, (0, num_channels - len(bg_counts)))
# Scale to target duration (cps * target_duration)
if bg_duration > 0:
scale = duration_seconds / bg_duration
return bg_counts * scale
return None
except Exception:
return None
def generate_environmental_background(
energy_bins: np.ndarray,
duration_seconds: float,
@ -321,17 +419,19 @@ def generate_environmental_background(
include_k40: bool = True,
include_radon: bool = True,
include_thorium: bool = True,
detector_config: Optional[DetectorConfig] = None
detector_config: Optional[DetectorConfig] = None,
measured_background_path: Optional[str] = None
) -> Tuple[np.ndarray, List[str]]:
"""
Generate realistic environmental background spectrum.
Includes:
- Exponential continuum (cosmic rays, scattered gammas)
- Realistic CsI(Tl) continuum shape (asymmetric hump + Compton tail)
- Or measured background if path provided and file exists
- K-40 peak (1460 keV) - ubiquitous in environment
- Radon daughters (Pb-214, Bi-214) - indoor air
- Thorium daughters (Pb-212, Tl-208) - building materials
Args:
energy_bins: Array of energy bin centers (keV)
duration_seconds: Acquisition time
@ -340,27 +440,47 @@ def generate_environmental_background(
include_radon: Include radon daughter peaks
include_thorium: Include thorium daughter peaks
detector_config: Detector configuration
measured_background_path: Path to .npy file with measured background.
If provided and file exists, used as the continuum base instead
of the synthetic continuum. Isotope peaks are still added on top
with stochastic variation for training diversity.
Returns:
Tuple of (background_spectrum, list_of_background_isotopes)
"""
if detector_config is None:
detector_config = get_default_config()
background_isotopes = []
# Start with exponential continuum
# Use measured background if available, otherwise synthetic continuum
total_continuum_counts = background_cps * duration_seconds * 0.7
background = generate_exponential_background(
energy_bins,
amplitude=total_continuum_counts / 500,
decay_constant=0.002
)
# Normalize continuum to target count rate
if background.sum() > 0:
background *= (total_continuum_counts / background.sum())
measured = None
if measured_background_path:
measured = load_measured_background(
measured_background_path, energy_bins, duration_seconds
)
if measured is not None:
# Scale measured background to match target CPS
measured_total = measured.sum()
if measured_total > 0 and total_continuum_counts > 0:
# Blend: 70% measured shape, 30% synthetic for robustness
synthetic = generate_realistic_continuum(
energy_bins, total_counts=total_continuum_counts * 0.3,
detector_config=detector_config
)
measured_scaled = measured * (total_continuum_counts * 0.7 / measured_total)
background = measured_scaled + synthetic
else:
background = measured
else:
background = generate_realistic_continuum(
energy_bins, total_counts=total_continuum_counts,
detector_config=detector_config
)
# Add K-40 peak (very common)
if include_k40:
k40_activity = np.random.uniform(0.5, 5.0) # Bq
@ -376,11 +496,11 @@ def generate_environmental_background(
)
background += peak
background_isotopes.append("K-40")
# Add radon daughters
if include_radon:
radon_activity = np.random.uniform(0.1, 2.0) # Bq
# Pb-214 lines
for energy, intensity in [(295.22, 0.1842), (351.93, 0.356)]:
peak = generate_peak_spectrum(
@ -394,7 +514,7 @@ def generate_environmental_background(
detector_config
)
background += peak
# Bi-214 lines
for energy, intensity in [(609.31, 0.4549), (1120.29, 0.1492), (1764.49, 0.1531)]:
peak = generate_peak_spectrum(
@ -408,13 +528,13 @@ def generate_environmental_background(
detector_config
)
background += peak
background_isotopes.extend(["Pb-214", "Bi-214"])
# Add thorium daughters
if include_thorium:
thorium_activity = np.random.uniform(0.05, 1.0) # Bq
# Ac-228 line
peak = generate_peak_spectrum(
energy_bins,
@ -427,7 +547,7 @@ def generate_environmental_background(
detector_config
)
background += peak
# Pb-212 line
peak = generate_peak_spectrum(
energy_bins,
@ -440,7 +560,7 @@ def generate_environmental_background(
detector_config
)
background += peak
# Tl-208 lines
for energy, intensity in [(583.19, 0.845 * 0.36), (2614.51, 0.998 * 0.36)]:
# Branching ratio of 36% for Tl-208 path
@ -455,9 +575,9 @@ def generate_environmental_background(
detector_config
)
background += peak
background_isotopes.extend(["Ac-228", "Pb-212", "Tl-208"])
return background, background_isotopes