- Remplace le continuum exponentiel par un modèle réaliste CsI(Tl) dans l'entraînement (bosse asymétrique ~110 keV + queue Compton) - Ajoute l'injection de background mesuré (70% mesuré / 30% synthétique) via --measured_background et MEASURED_BACKGROUND_PATH - Ajoute l'endpoint /api/background/continuum et le toggle "Continuum CsI" sur le dashboard background - Exclut le canal 1023 (overflow bin) de l'affichage web (NUM_CHANNELS=1023) - Corrige le lissage Gaussien du background (normalisation locale aux bords) - Met à jour README.md, CLAUDE.md, TUTORIEL.md, TOTO.md, vega_ml/README.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
275 lines
10 KiB
Markdown
275 lines
10 KiB
Markdown
# ML for Isotope Identification
|
|
|
|
A machine learning system for identifying radioactive isotopes from gamma-ray spectra captured by Radiacode scintillation detectors.
|
|
|
|
## Project Status
|
|
|
|
✅ **Completed:** Synthetic gamma spectra generation system
|
|
✅ **Completed:** Vega ML model architecture (CNN-FCNN hybrid)
|
|
✅ **Completed:** Training pipeline with GPU support
|
|
✅ **Completed:** Inference engine
|
|
✅ **Completed:** Realistic CsI(Tl) background model
|
|
✅ **Completed:** Hybrid training (measured + synthetic background)
|
|
✅ **Completed:** Web dashboard (FastAPI + Chart.js)
|
|
🔲 **Next:** Retrain model with realistic background
|
|
🔲 **Future:** Real-time inference on Radiacode devices
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
This project builds a neural network that identifies radioactive isotopes from gamma spectra. Since collecting real spectra requires radioactive sources and is expensive/regulated, we generate **synthetic training data** based on realistic physics models.
|
|
|
|
### Target Hardware
|
|
- **Training:** NVIDIA RTX 5060 Ti GPU (Blackwell, requires PyTorch 2.7+ with CUDA 12.8)
|
|
- **Inference:** Radiacode 101, 102, 103, 103G, 110 scintillation detectors
|
|
|
|
### Data Format
|
|
- **Input:** 1D spectrum (1023 energy channels, 20-3000 keV, normalized to max)
|
|
- **Output:** Multi-label isotope classification (82 isotopes) with activity estimation (Bq)
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
### Installation
|
|
|
|
```bash
|
|
# Create virtual environment
|
|
python -m venv .venv
|
|
source .venv/bin/activate # Linux/Mac
|
|
|
|
# Install dependencies
|
|
pip install numpy scipy pillow
|
|
|
|
# Install PyTorch (nightly for RTX 5090/Blackwell support)
|
|
pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu128
|
|
```
|
|
|
|
### Generate Synthetic Data
|
|
|
|
```bash
|
|
# Generate 10 test samples (default)
|
|
python -m vega_ml.synthetic_spectra.generate_spectra --num_samples 10 --output_dir data/synthetic
|
|
|
|
# With measured background for hybrid training (recommended)
|
|
python -m vega_ml.synthetic_spectra.generate_spectra \
|
|
--num_samples 50000 \
|
|
--output_dir data/synthetic \
|
|
--measured_background /path/to/background_24h.npy
|
|
```
|
|
|
|
### Train the Model
|
|
|
|
```bash
|
|
# Quick test run (5 epochs, small dataset)
|
|
python -m vega_ml.training.vega.run_training --test
|
|
|
|
# Full training
|
|
python -m vega_ml.training.vega.run_training \
|
|
--data-dir data/synthetic \
|
|
--model-dir models \
|
|
--epochs 100 --batch-size 64
|
|
```
|
|
|
|
### Run Inference
|
|
|
|
```bash
|
|
# Run inference on synthetic data
|
|
python -m vega_ml.inference.run_inference --model models/vega_best.pt --data data/synthetic
|
|
```
|
|
|
|
---
|
|
|
|
## Vega Model Architecture
|
|
|
|
**Vega** is a CNN-FCNN hybrid model optimized for gamma spectrum isotope identification, based on research showing 99%+ accuracy on similar tasks.
|
|
|
|
### Architecture Details
|
|
| Component | Configuration |
|
|
|-----------|---------------|
|
|
| Input | 1023 energy channels |
|
|
| CNN Backbone | 3 ConvBlocks [64, 128, 256 channels] |
|
|
| Kernel Size | 7 (captures spectral features) |
|
|
| FC Layers | [512, 256] with dropout |
|
|
| Output Heads | Dual: Classification (82 isotopes) + Regression (activity) |
|
|
| Total Parameters | 34.5M |
|
|
| Activation | LeakyReLU + BatchNorm |
|
|
|
|
### Training Features
|
|
- **Mixed Precision (AMP):** Faster training on modern GPUs
|
|
- **Multi-task Learning:** Simultaneous isotope ID + activity estimation
|
|
- **Loss Function:** BCE (classification) + Huber (regression)
|
|
- **LR Scheduling:** ReduceLROnPlateau with early stopping
|
|
|
|
---
|
|
|
|
## Synthetic Spectra Generation
|
|
|
|
### Realistic Background Model
|
|
|
|
The background continuum uses a realistic CsI(Tl) shape calibrated against real Radiacode 103 measurements, not a simple exponential:
|
|
|
|
- **Asymmetric hump** at ~110 keV (sigma_left=55 keV, sigma_right=50 keV) — the dominant low-energy scatter peak characteristic of CsI(Tl) detectors
|
|
- **Compton tail**: 0.45*exp(-E/240) + 0.04*exp(-E/700) — realistic high-energy falloff
|
|
- **Noise floor** at 0.8% of peak — prevents zero-count channels
|
|
|
|
This replaces the previous simple exponential `A*exp(-0.002*E)` which failed to reproduce the characteristic CsI(Tl) response.
|
|
|
|
### Hybrid Training with Measured Background
|
|
|
|
When a measured background file (`background_24h.npy`) is available, the generator blends it with the synthetic model:
|
|
- **70% measured** background shape (scaled to target CPS)
|
|
- **30% synthetic** continuum (for robustness against measurement artifacts)
|
|
- Stochastic isotope peaks (K-40, radon, thorium) are still added on top with random activity levels
|
|
|
|
This is controlled by the `--measured_background` CLI argument or the `MEASURED_BACKGROUND_PATH` environment variable.
|
|
|
|
### Features
|
|
- **82 isotopes** with accurate gamma emission lines
|
|
- **Realistic physics:** Gaussian peaks, Poisson noise, Compton continuum, CsI(Tl) background shape
|
|
- **Multiple detector models:** Radiacode 101, 102, 103, 103G, 110 with correct FWHM and energy ranges
|
|
- **Configurable variation:** Activity levels, measurement durations, isotope combinations
|
|
- **Decay chains:** Uranium-238, Thorium-232 chains with secular equilibrium
|
|
|
|
### Sample Distribution (v3)
|
|
| Type | Proportion | Description |
|
|
|------|------------|-------------|
|
|
| Background only | 15% | Environmental background only |
|
|
| Single calibration | 20% | One check source + background |
|
|
| Single medical | 8% | Medical isotope + background |
|
|
| Single industrial | 5% | Industrial source + background |
|
|
| Uranium chain | 10% | U-238 + daughters in equilibrium |
|
|
| Thorium chain | 10% | Th-232 + daughters in equilibrium |
|
|
| NORM | 7% | Naturally occurring radioactive material |
|
|
| Fallout | 5% | Cs-137 + Cs-134 signature |
|
|
| Mixed | 10% | Random 2-3 isotope mixes |
|
|
| Complex mix | 5% | 4-6 isotopes from various categories |
|
|
| Weak source | 5% | Near-detection-limit sources |
|
|
|
|
---
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
train/vega_ml/
|
|
├── README.md # This file
|
|
├── agents.md # AI agent context documentation
|
|
├── .gitignore # Git ignore rules
|
|
│
|
|
├── synthetic_spectra/ # Spectrum generation package
|
|
│ ├── __init__.py
|
|
│ ├── config.py # Detector configurations (Radiacode 101-110)
|
|
│ ├── generator.py # Main generation logic (SpectrumConfig)
|
|
│ ├── generate_spectra.py # CLI batch generation (v1)
|
|
│ ├── generate_spectra_v3.py # CLI batch generation (v3, parallel)
|
|
│ ├── ground_truth/
|
|
│ │ ├── isotope_data.py # 82 isotopes database
|
|
│ │ └── decay_chains.py # Decay chain definitions
|
|
│ └── physics/
|
|
│ └── spectrum_physics.py # Physics calculations + realistic CsI(Tl) background
|
|
│
|
|
├── training/ # Training infrastructure
|
|
│ └── vega/ # Vega model package
|
|
│ ├── __init__.py
|
|
│ ├── isotope_index.py # Isotope ↔ index mapping
|
|
│ ├── model.py # VegaModel architecture + VegaLoss
|
|
│ ├── dataset.py # PyTorch Dataset/DataLoader
|
|
│ ├── train.py # Training loop & utilities
|
|
│ └── run_training.py # CLI training script
|
|
│
|
|
├── inference/ # Inference engine
|
|
│ ├── vega_inference.py # VegaInference class
|
|
│ └── run_inference.py # CLI inference script
|
|
│
|
|
├── models/ # Saved model checkpoints
|
|
│ ├── vega_best.pt # Best validation loss
|
|
│ ├── vega_final.pt # Final epoch
|
|
│ └── vega_history.json # Training metrics
|
|
│
|
|
└── data/ # Generated data (git-ignored)
|
|
└── synthetic/
|
|
└── spectra/
|
|
```
|
|
|
|
---
|
|
|
|
## Technical Details
|
|
|
|
### Detector Specifications
|
|
| Model | Crystal | FWHM @ 662 keV | Energy Range | Channels |
|
|
|-------|---------|----------------|--------------|----------|
|
|
| Radiacode 101 | CsI(Tl) | 9.0% | 20-3000 keV | 1024 |
|
|
| Radiacode 102 | CsI(Tl) | 9.5% | 20-3000 keV | 1024 |
|
|
| Radiacode 103 | CsI(Tl) | 8.4% | 20-3000 keV | 1024 |
|
|
| Radiacode 103G | GAGG(Ce) | 7.4% | 20-3000 keV | 1024 |
|
|
| Radiacode 110 | CsI(Tl) | 8.4% | 20-3000 keV | 1024 |
|
|
|
|
Note: Only the first 1023 channels are used (channel 1023 is an overflow bin).
|
|
|
|
### Physics Model
|
|
- **Peak shape:** Gaussian with FWHM scaling as sqrt(E/662) for scintillators
|
|
- **Expected counts:** lambda = A * t * I * epsilon * T
|
|
- **Noise:** Poisson counting statistics
|
|
- **Background:** Realistic CsI(Tl) continuum (asymmetric hump + Compton tail) + environmental isotope peaks (K-40, radon daughters, thorium daughters)
|
|
- **Hybrid mode:** Measured background can be blended with synthetic (70/30 ratio) for maximum realism
|
|
|
|
### Isotope Categories
|
|
- Natural background (K-40, Ra-226, Rn-222)
|
|
- Decay chains (U-238, Th-232, U-235)
|
|
- Calibration sources (Am-241, Cs-137, Co-60, Ba-133, Eu-152)
|
|
- Medical isotopes (Tc-99m, F-18, I-131, Ga-68)
|
|
- Industrial sources (Ir-192, Se-75)
|
|
- Reactor fallout (Cs-134, Cs-137, Sr-90)
|
|
|
|
---
|
|
|
|
## Development
|
|
|
|
### Dependencies
|
|
```
|
|
numpy>=1.24.0
|
|
scipy>=1.10.0
|
|
pillow>=9.0.0
|
|
scikit-learn>=1.3.0
|
|
torch>=2.0.0
|
|
```
|
|
|
|
### GPU Support
|
|
For Blackwell GPUs (RTX 50-series, sm_120), use PyTorch 2.7+ with CUDA 12.8:
|
|
```bash
|
|
pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128
|
|
```
|
|
|
|
### For AI Agents
|
|
See [agents.md](agents.md) for comprehensive documentation on system architecture, physics model details, and configuration options.
|
|
|
|
---
|
|
|
|
## TODO
|
|
|
|
- [x] ~~Push to repository~~ - Initial commit with generation system
|
|
- [x] ~~Create PyTorch DataLoader for training~~
|
|
- [x] ~~Implement CNN-FCNN model architecture (Vega)~~
|
|
- [x] ~~Create training script with logging~~
|
|
- [x] ~~Implement inference module~~
|
|
- [x] ~~Realistic CsI(Tl) background model~~
|
|
- [x] ~~Hybrid training with measured background~~
|
|
- [ ] Retrain model with realistic background
|
|
- [ ] Add model evaluation metrics & confusion matrix
|
|
- [ ] Implement real-time inference on Radiacode devices
|
|
|
|
---
|
|
|
|
## License
|
|
|
|
[TBD]
|
|
|
|
---
|
|
|
|
## Acknowledgments
|
|
|
|
- Radiacode for device specifications
|
|
- IAEA Nuclear Data Services for isotope data
|
|
- NNDC at Brookhaven National Laboratory
|
|
- Wang et al. research on CNN-FCNN for gamma spectroscopy |