Files
radiacode/train/vega_ml/README.md
Jacquin Antoine 745a64b342 Pipeline complet Radiacode 103 - identification automatique d'isotopes
- VegaModel CNN-FCNN 34.5M params, 82 isotopes, val acc 99.89%
- Generation 50k spectres synthetiques 1D (12-24h durees)
- Entrainement 100 epochs sur RTX 5060 Ti (CUDA 12.8, Blackwell)
- Detection continue avec soustraction du background
- Capture background 24h avec gestion deconnexion
- Docker Compose : conteneur train (GPU) + detect (CPU/USB)
- Modele entraite inclus (vega_best.pt, 395 Mo)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-19 12:29:56 +02:00

247 lines
7.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ML for Isotope Identification
A machine learning system for identifying radioactive isotopes from gamma-ray spectra captured by Radiacode scintillation detectors.
## Project Status
**Completed:** Synthetic gamma spectra generation system
**Completed:** Vega ML model architecture (CNN-FCNN hybrid)
**Completed:** Training pipeline with GPU support
**Completed:** Inference engine
🔲 **Next:** Generate large training dataset (10,000-100,000 samples)
🔲 **Future:** Real-time inference on Radiacode devices
---
## Overview
This project aims to build a neural network that can identify radioactive isotopes from gamma spectra. Since collecting real gamma spectra requires radioactive sources and is expensive/regulated, we generate **synthetic training data** based on realistic physics models.
### Target Hardware
- **Training:** NVIDIA RTX 5090 GPU (requires PyTorch nightly with CUDA 12.8)
- **Inference:** Radiacode 101, 102, 103, 103G, 110 scintillation detectors
### Data Format
- **Input:** 2D spectrograms (time intervals × 1023 energy channels)
- **Output:** Multi-label isotope classification with activity estimation
---
## Quick Start
### Installation
```bash
# Create virtual environment
python -m venv .venv
.venv\Scripts\activate # Windows
# or: source .venv/bin/activate # Linux/Mac
# Install dependencies
pip install numpy scipy pillow
# Install PyTorch (nightly for RTX 5090/Blackwell support)
pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu128
```
### Generate Synthetic Data
```bash
# Generate 10 test samples
python -m synthetic_spectra.generate_spectra
```
### Train the Model
```bash
# Quick test run (5 epochs, small dataset)
python training/vega/run_training.py --test
# Full training
python training/vega/run_training.py --epochs 100 --batch-size 32
```
### Run Inference
```bash
# Run inference on synthetic data
python inference/run_inference.py --model models/vega_best.pt --data data/synthetic
```
---
## Vega Model Architecture
**Vega** is a CNN-FCNN hybrid model optimized for gamma spectrum isotope identification, based on research showing 99%+ accuracy on similar tasks.
### Architecture Details
| Component | Configuration |
|-----------|---------------|
| Input | 1023 energy channels |
| CNN Backbone | 3 ConvBlocks [64, 128, 256 channels] |
| Kernel Size | 7 (captures spectral features) |
| FC Layers | [512, 256] with dropout |
| Output Heads | Dual: Classification (82 isotopes) + Regression (activity) |
| Total Parameters | 34.5M |
| Activation | LeakyReLU + BatchNorm |
### Training Features
- **Mixed Precision (AMP):** Faster training on modern GPUs
- **Multi-task Learning:** Simultaneous isotope ID + activity estimation
- **Loss Function:** BCE (classification) + Huber (regression)
- **LR Scheduling:** ReduceLROnPlateau with early stopping
---
## Synthetic Spectra Generation
### Features
- **82 isotopes** with accurate gamma emission lines
- **Realistic physics:** Gaussian peaks, Poisson noise, Compton continuum, environmental background
- **Multiple detector models:** Radiacode 101, 102, 103, 103G, 110 with correct FWHM and energy ranges
- **Configurable variation:** Activity levels, measurement durations, isotope combinations
### Sample Distribution
| Type | Proportion | Description |
|------|------------|-------------|
| Single isotope | 40% | One source + background |
| Dual isotope | 30% | Two sources blended |
| Multi isotope | 20% | 3-5 sources combined |
| Background only | 10% | Environmental only |
### Scaling Up
Edit `synthetic_spectra/generate_spectra.py` to generate larger datasets:
```python
generate_training_batch(
n_samples=100000, # Generate 100k samples
output_dir=Path("data/synthetic/spectra"),
detector_type="radiacode_103"
)
```
---
## Project Structure
```
ml-for-isotope-identification/
├── README.md # This file
├── agents.md # AI agent context documentation
├── .gitignore # Git ignore rules
├── synthetic_spectra/ # Spectrum generation package
│ ├── __init__.py
│ ├── config.py # Detector configurations
│ ├── generator.py # Main generation logic
│ ├── generate_spectra.py # CLI batch generation
│ ├── ground_truth/
│ │ ├── isotope_data.py # 82 isotopes database
│ │ └── decay_chains.py # Decay chain definitions
│ └── physics/
│ └── spectrum_physics.py # Physics calculations
├── training/ # Training infrastructure
│ └── vega/ # Vega model package
│ ├── __init__.py
│ ├── isotope_index.py # Isotope ↔ index mapping
│ ├── model.py # VegaModel architecture
│ ├── dataset.py # PyTorch Dataset/DataLoader
│ ├── train.py # Training loop & utilities
│ └── run_training.py # CLI training script
├── inference/ # Inference engine
│ ├── vega_inference.py # VegaInference class
│ └── run_inference.py # CLI inference script
├── models/ # Saved model checkpoints
│ ├── vega_best.pt # Best validation loss
│ ├── vega_final.pt # Final epoch
│ └── vega_history.json # Training metrics
└── data/ # Generated data (git-ignored)
└── synthetic/
└── spectra/
```
---
## Technical Details
### Detector Specifications
| Model | Crystal | FWHM @ 662 keV | Energy Range | Channels |
|-------|---------|----------------|--------------|----------|
| Radiacode 101 | CsI(Tl) | 9.0% | 20-3000 keV | 1024 |
| Radiacode 102 | CsI(Tl) | 9.5% | 20-3000 keV | 1024 |
| Radiacode 103 | CsI(Tl) | 8.4% | 20-3000 keV | 1024 |
| Radiacode 103G | GAGG(Ce) | 7.4% | 20-3000 keV | 1024 |
| Radiacode 110 | CsI(Tl) | 8.4% | 20-3000 keV | 1024 |
### Physics Model
- **Peak shape:** Gaussian with FWHM scaling as √(E/662)
- **Expected counts:** λ = A × t × I × ε × T
- **Noise:** Poisson counting statistics
- **Background:** Exponential continuum + environmental isotopes (K-40, Pb-214, Bi-214, etc.)
### Isotope Categories
- Natural background (K-40, Ra-226, Rn-222)
- Decay chains (U-238, Th-232, U-235)
- Calibration sources (Am-241, Cs-137, Co-60, Ba-133, Eu-152)
- Medical isotopes (Tc-99m, F-18, I-131, Ga-68)
- Industrial sources (Ir-192, Se-75)
- Reactor fallout (Cs-134, Cs-137, Sr-90)
---
## Development
### Dependencies
```
numpy>=1.24.0
scipy>=1.10.0
pillow>=9.0.0
torch>=2.11.0 (nightly with CUDA 12.8 for RTX 5090)
```
### GPU Support
The RTX 5090 (Blackwell architecture, sm_120) requires PyTorch nightly builds with CUDA 12.8:
```bash
pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128
```
### For AI Agents
See [agents.md](agents.md) for comprehensive documentation on:
- System architecture and design decisions
- Physics model implementation details
- Vega model architecture and training
- Configuration options and variation strategies
---
## TODO
- [x] ~~Push to repository~~ - Initial commit with generation system
- [x] ~~Create PyTorch DataLoader for training~~
- [x] ~~Implement CNN-FCNN model architecture (Vega)~~
- [x] ~~Create training script with logging~~
- [x] ~~Implement inference module~~
- [ ] Generate large training dataset (100k samples)
- [ ] Train model to convergence
- [ ] Add data augmentation pipeline
- [ ] Add model evaluation metrics & confusion matrix
- [ ] Implement real-time inference module
- [ ] Create Radiacode device integration
---
## License
[TBD]
---
## Acknowledgments
- Radiacode for device specifications
- IAEA Nuclear Data Services for isotope data
- NNDC at Brookhaven National Laboratory
- Wang et al. research on CNN-FCNN for gamma spectroscopy