Pipeline complet Radiacode 103 - identification automatique d'isotopes

- VegaModel CNN-FCNN 34.5M params, 82 isotopes, val acc 99.89% - Generation 50k spectres synthetiques 1D (12-24h durees) - Entrainement 100 epochs sur RTX 5060 Ti (CUDA 12.8, Blackwell) - Detection continue avec soustraction du background - Capture background 24h avec gestion deconnexion - Docker Compose : conteneur train (GPU) + detect (CPU/USB) - Modele entraite inclus (vega_best.pt, 395 Mo) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-19 12:29:56 +02:00
commit 745a64b342
52 changed files with 17558 additions and 0 deletions
--- a/train/vega_ml/README.md
+++ b/train/vega_ml/README.md
@ -0,0 +1,247 @@
+# ML for Isotope Identification
+
+A machine learning system for identifying radioactive isotopes from gamma-ray spectra captured by Radiacode scintillation detectors.
+
+## Project Status
+
+✅ **Completed:** Synthetic gamma spectra generation system  
+✅ **Completed:** Vega ML model architecture (CNN-FCNN hybrid)  
+✅ **Completed:** Training pipeline with GPU support  
+✅ **Completed:** Inference engine  
+🔲 **Next:** Generate large training dataset (10,000-100,000 samples)  
+🔲 **Future:** Real-time inference on Radiacode devices
+
+---
+
+## Overview
+
+This project aims to build a neural network that can identify radioactive isotopes from gamma spectra. Since collecting real gamma spectra requires radioactive sources and is expensive/regulated, we generate **synthetic training data** based on realistic physics models.
+
+### Target Hardware
+- **Training:** NVIDIA RTX 5090 GPU (requires PyTorch nightly with CUDA 12.8)
+- **Inference:** Radiacode 101, 102, 103, 103G, 110 scintillation detectors
+
+### Data Format
+- **Input:** 2D spectrograms (time intervals × 1023 energy channels)
+- **Output:** Multi-label isotope classification with activity estimation
+
+---
+
+## Quick Start
+
+### Installation
+
+```bash
+# Create virtual environment
+python -m venv .venv
+.venv\Scripts\activate  # Windows
+# or: source .venv/bin/activate  # Linux/Mac
+
+# Install dependencies
+pip install numpy scipy pillow
+
+# Install PyTorch (nightly for RTX 5090/Blackwell support)
+pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu128
+```
+
+### Generate Synthetic Data
+
+```bash
+# Generate 10 test samples
+python -m synthetic_spectra.generate_spectra
+```
+
+### Train the Model
+
+```bash
+# Quick test run (5 epochs, small dataset)
+python training/vega/run_training.py --test
+
+# Full training
+python training/vega/run_training.py --epochs 100 --batch-size 32
+```
+
+### Run Inference
+
+```bash
+# Run inference on synthetic data
+python inference/run_inference.py --model models/vega_best.pt --data data/synthetic
+```
+
+---
+
+## Vega Model Architecture
+
+**Vega** is a CNN-FCNN hybrid model optimized for gamma spectrum isotope identification, based on research showing 99%+ accuracy on similar tasks.
+
+### Architecture Details
+| Component | Configuration |
+|-----------|---------------|
+| Input | 1023 energy channels |
+| CNN Backbone | 3 ConvBlocks [64, 128, 256 channels] |
+| Kernel Size | 7 (captures spectral features) |
+| FC Layers | [512, 256] with dropout |
+| Output Heads | Dual: Classification (82 isotopes) + Regression (activity) |
+| Total Parameters | 34.5M |
+| Activation | LeakyReLU + BatchNorm |
+
+### Training Features
+- **Mixed Precision (AMP):** Faster training on modern GPUs
+- **Multi-task Learning:** Simultaneous isotope ID + activity estimation
+- **Loss Function:** BCE (classification) + Huber (regression)
+- **LR Scheduling:** ReduceLROnPlateau with early stopping
+
+---
+
+## Synthetic Spectra Generation
+
+### Features
+- **82 isotopes** with accurate gamma emission lines
+- **Realistic physics:** Gaussian peaks, Poisson noise, Compton continuum, environmental background
+- **Multiple detector models:** Radiacode 101, 102, 103, 103G, 110 with correct FWHM and energy ranges
+- **Configurable variation:** Activity levels, measurement durations, isotope combinations
+
+### Sample Distribution
+| Type | Proportion | Description |
+|------|------------|-------------|
+| Single isotope | 40% | One source + background |
+| Dual isotope | 30% | Two sources blended |
+| Multi isotope | 20% | 3-5 sources combined |
+| Background only | 10% | Environmental only |
+
+### Scaling Up
+Edit `synthetic_spectra/generate_spectra.py` to generate larger datasets:
+```python
+generate_training_batch(
+    n_samples=100000,  # Generate 100k samples
+    output_dir=Path("data/synthetic/spectra"),
+    detector_type="radiacode_103"
+)
+```
+
+---
+
+## Project Structure
+
+```
+ml-for-isotope-identification/
+├── README.md                    # This file
+├── agents.md                    # AI agent context documentation
+├── .gitignore                   # Git ignore rules
+│
+├── synthetic_spectra/           # Spectrum generation package
+│   ├── __init__.py
+│   ├── config.py                # Detector configurations
+│   ├── generator.py             # Main generation logic
+│   ├── generate_spectra.py      # CLI batch generation
+│   ├── ground_truth/
+│   │   ├── isotope_data.py      # 82 isotopes database
+│   │   └── decay_chains.py      # Decay chain definitions
+│   └── physics/
+│       └── spectrum_physics.py  # Physics calculations
+│
+├── training/                    # Training infrastructure
+│   └── vega/                    # Vega model package
+│       ├── __init__.py
+│       ├── isotope_index.py     # Isotope ↔ index mapping
+│       ├── model.py             # VegaModel architecture
+│       ├── dataset.py           # PyTorch Dataset/DataLoader
+│       ├── train.py             # Training loop & utilities
+│       └── run_training.py      # CLI training script
+│
+├── inference/                   # Inference engine
+│   ├── vega_inference.py        # VegaInference class
+│   └── run_inference.py         # CLI inference script
+│
+├── models/                      # Saved model checkpoints
+│   ├── vega_best.pt             # Best validation loss
+│   ├── vega_final.pt            # Final epoch
+│   └── vega_history.json        # Training metrics
+│
+└── data/                        # Generated data (git-ignored)
+    └── synthetic/
+        └── spectra/
+```
+
+---
+
+## Technical Details
+
+### Detector Specifications
+| Model | Crystal | FWHM @ 662 keV | Energy Range | Channels |
+|-------|---------|----------------|--------------|----------|
+| Radiacode 101 | CsI(Tl) | 9.0% | 20-3000 keV | 1024 |
+| Radiacode 102 | CsI(Tl) | 9.5% | 20-3000 keV | 1024 |
+| Radiacode 103 | CsI(Tl) | 8.4% | 20-3000 keV | 1024 |
+| Radiacode 103G | GAGG(Ce) | 7.4% | 20-3000 keV | 1024 |
+| Radiacode 110 | CsI(Tl) | 8.4% | 20-3000 keV | 1024 |
+
+### Physics Model
+- **Peak shape:** Gaussian with FWHM scaling as √(E/662)
+- **Expected counts:** λ = A × t × I × ε × T
+- **Noise:** Poisson counting statistics
+- **Background:** Exponential continuum + environmental isotopes (K-40, Pb-214, Bi-214, etc.)
+
+### Isotope Categories
+- Natural background (K-40, Ra-226, Rn-222)
+- Decay chains (U-238, Th-232, U-235)
+- Calibration sources (Am-241, Cs-137, Co-60, Ba-133, Eu-152)
+- Medical isotopes (Tc-99m, F-18, I-131, Ga-68)
+- Industrial sources (Ir-192, Se-75)
+- Reactor fallout (Cs-134, Cs-137, Sr-90)
+
+---
+
+## Development
+
+### Dependencies
+```
+numpy>=1.24.0
+scipy>=1.10.0
+pillow>=9.0.0
+torch>=2.11.0 (nightly with CUDA 12.8 for RTX 5090)
+```
+
+### GPU Support
+The RTX 5090 (Blackwell architecture, sm_120) requires PyTorch nightly builds with CUDA 12.8:
+```bash
+pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128
+```
+
+### For AI Agents
+See [agents.md](agents.md) for comprehensive documentation on:
+- System architecture and design decisions
+- Physics model implementation details
+- Vega model architecture and training
+- Configuration options and variation strategies
+
+---
+
+## TODO
+
+- [x] ~~Push to repository~~ - Initial commit with generation system
+- [x] ~~Create PyTorch DataLoader for training~~
+- [x] ~~Implement CNN-FCNN model architecture (Vega)~~
+- [x] ~~Create training script with logging~~
+- [x] ~~Implement inference module~~
+- [ ] Generate large training dataset (100k samples)
+- [ ] Train model to convergence
+- [ ] Add data augmentation pipeline
+- [ ] Add model evaluation metrics & confusion matrix
+- [ ] Implement real-time inference module
+- [ ] Create Radiacode device integration
+
+---
+
+## License
+
+[TBD]
+
+---
+
+## Acknowledgments
+
+- Radiacode for device specifications
+- IAEA Nuclear Data Services for isotope data
+- NNDC at Brookhaven National Laboratory
+- Wang et al. research on CNN-FCNN for gamma spectroscopy