- VegaModel CNN-FCNN 34.5M params, 82 isotopes, val acc 99.89% - Generation 50k spectres synthetiques 1D (12-24h durees) - Entrainement 100 epochs sur RTX 5060 Ti (CUDA 12.8, Blackwell) - Detection continue avec soustraction du background - Capture background 24h avec gestion deconnexion - Docker Compose : conteneur train (GPU) + detect (CPU/USB) - Modele entraite inclus (vega_best.pt, 395 Mo) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
247 lines
7.9 KiB
Markdown
247 lines
7.9 KiB
Markdown
# ML for Isotope Identification
|
||
|
||
A machine learning system for identifying radioactive isotopes from gamma-ray spectra captured by Radiacode scintillation detectors.
|
||
|
||
## Project Status
|
||
|
||
✅ **Completed:** Synthetic gamma spectra generation system
|
||
✅ **Completed:** Vega ML model architecture (CNN-FCNN hybrid)
|
||
✅ **Completed:** Training pipeline with GPU support
|
||
✅ **Completed:** Inference engine
|
||
🔲 **Next:** Generate large training dataset (10,000-100,000 samples)
|
||
🔲 **Future:** Real-time inference on Radiacode devices
|
||
|
||
---
|
||
|
||
## Overview
|
||
|
||
This project aims to build a neural network that can identify radioactive isotopes from gamma spectra. Since collecting real gamma spectra requires radioactive sources and is expensive/regulated, we generate **synthetic training data** based on realistic physics models.
|
||
|
||
### Target Hardware
|
||
- **Training:** NVIDIA RTX 5090 GPU (requires PyTorch nightly with CUDA 12.8)
|
||
- **Inference:** Radiacode 101, 102, 103, 103G, 110 scintillation detectors
|
||
|
||
### Data Format
|
||
- **Input:** 2D spectrograms (time intervals × 1023 energy channels)
|
||
- **Output:** Multi-label isotope classification with activity estimation
|
||
|
||
---
|
||
|
||
## Quick Start
|
||
|
||
### Installation
|
||
|
||
```bash
|
||
# Create virtual environment
|
||
python -m venv .venv
|
||
.venv\Scripts\activate # Windows
|
||
# or: source .venv/bin/activate # Linux/Mac
|
||
|
||
# Install dependencies
|
||
pip install numpy scipy pillow
|
||
|
||
# Install PyTorch (nightly for RTX 5090/Blackwell support)
|
||
pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu128
|
||
```
|
||
|
||
### Generate Synthetic Data
|
||
|
||
```bash
|
||
# Generate 10 test samples
|
||
python -m synthetic_spectra.generate_spectra
|
||
```
|
||
|
||
### Train the Model
|
||
|
||
```bash
|
||
# Quick test run (5 epochs, small dataset)
|
||
python training/vega/run_training.py --test
|
||
|
||
# Full training
|
||
python training/vega/run_training.py --epochs 100 --batch-size 32
|
||
```
|
||
|
||
### Run Inference
|
||
|
||
```bash
|
||
# Run inference on synthetic data
|
||
python inference/run_inference.py --model models/vega_best.pt --data data/synthetic
|
||
```
|
||
|
||
---
|
||
|
||
## Vega Model Architecture
|
||
|
||
**Vega** is a CNN-FCNN hybrid model optimized for gamma spectrum isotope identification, based on research showing 99%+ accuracy on similar tasks.
|
||
|
||
### Architecture Details
|
||
| Component | Configuration |
|
||
|-----------|---------------|
|
||
| Input | 1023 energy channels |
|
||
| CNN Backbone | 3 ConvBlocks [64, 128, 256 channels] |
|
||
| Kernel Size | 7 (captures spectral features) |
|
||
| FC Layers | [512, 256] with dropout |
|
||
| Output Heads | Dual: Classification (82 isotopes) + Regression (activity) |
|
||
| Total Parameters | 34.5M |
|
||
| Activation | LeakyReLU + BatchNorm |
|
||
|
||
### Training Features
|
||
- **Mixed Precision (AMP):** Faster training on modern GPUs
|
||
- **Multi-task Learning:** Simultaneous isotope ID + activity estimation
|
||
- **Loss Function:** BCE (classification) + Huber (regression)
|
||
- **LR Scheduling:** ReduceLROnPlateau with early stopping
|
||
|
||
---
|
||
|
||
## Synthetic Spectra Generation
|
||
|
||
### Features
|
||
- **82 isotopes** with accurate gamma emission lines
|
||
- **Realistic physics:** Gaussian peaks, Poisson noise, Compton continuum, environmental background
|
||
- **Multiple detector models:** Radiacode 101, 102, 103, 103G, 110 with correct FWHM and energy ranges
|
||
- **Configurable variation:** Activity levels, measurement durations, isotope combinations
|
||
|
||
### Sample Distribution
|
||
| Type | Proportion | Description |
|
||
|------|------------|-------------|
|
||
| Single isotope | 40% | One source + background |
|
||
| Dual isotope | 30% | Two sources blended |
|
||
| Multi isotope | 20% | 3-5 sources combined |
|
||
| Background only | 10% | Environmental only |
|
||
|
||
### Scaling Up
|
||
Edit `synthetic_spectra/generate_spectra.py` to generate larger datasets:
|
||
```python
|
||
generate_training_batch(
|
||
n_samples=100000, # Generate 100k samples
|
||
output_dir=Path("data/synthetic/spectra"),
|
||
detector_type="radiacode_103"
|
||
)
|
||
```
|
||
|
||
---
|
||
|
||
## Project Structure
|
||
|
||
```
|
||
ml-for-isotope-identification/
|
||
├── README.md # This file
|
||
├── agents.md # AI agent context documentation
|
||
├── .gitignore # Git ignore rules
|
||
│
|
||
├── synthetic_spectra/ # Spectrum generation package
|
||
│ ├── __init__.py
|
||
│ ├── config.py # Detector configurations
|
||
│ ├── generator.py # Main generation logic
|
||
│ ├── generate_spectra.py # CLI batch generation
|
||
│ ├── ground_truth/
|
||
│ │ ├── isotope_data.py # 82 isotopes database
|
||
│ │ └── decay_chains.py # Decay chain definitions
|
||
│ └── physics/
|
||
│ └── spectrum_physics.py # Physics calculations
|
||
│
|
||
├── training/ # Training infrastructure
|
||
│ └── vega/ # Vega model package
|
||
│ ├── __init__.py
|
||
│ ├── isotope_index.py # Isotope ↔ index mapping
|
||
│ ├── model.py # VegaModel architecture
|
||
│ ├── dataset.py # PyTorch Dataset/DataLoader
|
||
│ ├── train.py # Training loop & utilities
|
||
│ └── run_training.py # CLI training script
|
||
│
|
||
├── inference/ # Inference engine
|
||
│ ├── vega_inference.py # VegaInference class
|
||
│ └── run_inference.py # CLI inference script
|
||
│
|
||
├── models/ # Saved model checkpoints
|
||
│ ├── vega_best.pt # Best validation loss
|
||
│ ├── vega_final.pt # Final epoch
|
||
│ └── vega_history.json # Training metrics
|
||
│
|
||
└── data/ # Generated data (git-ignored)
|
||
└── synthetic/
|
||
└── spectra/
|
||
```
|
||
|
||
---
|
||
|
||
## Technical Details
|
||
|
||
### Detector Specifications
|
||
| Model | Crystal | FWHM @ 662 keV | Energy Range | Channels |
|
||
|-------|---------|----------------|--------------|----------|
|
||
| Radiacode 101 | CsI(Tl) | 9.0% | 20-3000 keV | 1024 |
|
||
| Radiacode 102 | CsI(Tl) | 9.5% | 20-3000 keV | 1024 |
|
||
| Radiacode 103 | CsI(Tl) | 8.4% | 20-3000 keV | 1024 |
|
||
| Radiacode 103G | GAGG(Ce) | 7.4% | 20-3000 keV | 1024 |
|
||
| Radiacode 110 | CsI(Tl) | 8.4% | 20-3000 keV | 1024 |
|
||
|
||
### Physics Model
|
||
- **Peak shape:** Gaussian with FWHM scaling as √(E/662)
|
||
- **Expected counts:** λ = A × t × I × ε × T
|
||
- **Noise:** Poisson counting statistics
|
||
- **Background:** Exponential continuum + environmental isotopes (K-40, Pb-214, Bi-214, etc.)
|
||
|
||
### Isotope Categories
|
||
- Natural background (K-40, Ra-226, Rn-222)
|
||
- Decay chains (U-238, Th-232, U-235)
|
||
- Calibration sources (Am-241, Cs-137, Co-60, Ba-133, Eu-152)
|
||
- Medical isotopes (Tc-99m, F-18, I-131, Ga-68)
|
||
- Industrial sources (Ir-192, Se-75)
|
||
- Reactor fallout (Cs-134, Cs-137, Sr-90)
|
||
|
||
---
|
||
|
||
## Development
|
||
|
||
### Dependencies
|
||
```
|
||
numpy>=1.24.0
|
||
scipy>=1.10.0
|
||
pillow>=9.0.0
|
||
torch>=2.11.0 (nightly with CUDA 12.8 for RTX 5090)
|
||
```
|
||
|
||
### GPU Support
|
||
The RTX 5090 (Blackwell architecture, sm_120) requires PyTorch nightly builds with CUDA 12.8:
|
||
```bash
|
||
pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128
|
||
```
|
||
|
||
### For AI Agents
|
||
See [agents.md](agents.md) for comprehensive documentation on:
|
||
- System architecture and design decisions
|
||
- Physics model implementation details
|
||
- Vega model architecture and training
|
||
- Configuration options and variation strategies
|
||
|
||
---
|
||
|
||
## TODO
|
||
|
||
- [x] ~~Push to repository~~ - Initial commit with generation system
|
||
- [x] ~~Create PyTorch DataLoader for training~~
|
||
- [x] ~~Implement CNN-FCNN model architecture (Vega)~~
|
||
- [x] ~~Create training script with logging~~
|
||
- [x] ~~Implement inference module~~
|
||
- [ ] Generate large training dataset (100k samples)
|
||
- [ ] Train model to convergence
|
||
- [ ] Add data augmentation pipeline
|
||
- [ ] Add model evaluation metrics & confusion matrix
|
||
- [ ] Implement real-time inference module
|
||
- [ ] Create Radiacode device integration
|
||
|
||
---
|
||
|
||
## License
|
||
|
||
[TBD]
|
||
|
||
---
|
||
|
||
## Acknowledgments
|
||
|
||
- Radiacode for device specifications
|
||
- IAEA Nuclear Data Services for isotope data
|
||
- NNDC at Brookhaven National Laboratory
|
||
- Wang et al. research on CNN-FCNN for gamma spectroscopy |