# ML for Isotope Identification A machine learning system for identifying radioactive isotopes from gamma-ray spectra captured by Radiacode scintillation detectors. ## Project Status ✅ **Completed:** Synthetic gamma spectra generation system ✅ **Completed:** Vega ML model architecture (CNN-FCNN hybrid) ✅ **Completed:** Training pipeline with GPU support ✅ **Completed:** Inference engine ✅ **Completed:** Realistic CsI(Tl) background model ✅ **Completed:** Hybrid training (measured + synthetic background) ✅ **Completed:** Web dashboard (FastAPI + Chart.js) 🔲 **Next:** Retrain model with realistic background 🔲 **Future:** Real-time inference on Radiacode devices --- ## Overview This project builds a neural network that identifies radioactive isotopes from gamma spectra. Since collecting real spectra requires radioactive sources and is expensive/regulated, we generate **synthetic training data** based on realistic physics models. ### Target Hardware - **Training:** NVIDIA RTX 5060 Ti GPU (Blackwell, requires PyTorch 2.7+ with CUDA 12.8) - **Inference:** Radiacode 101, 102, 103, 103G, 110 scintillation detectors ### Data Format - **Input:** 1D spectrum (1023 energy channels, 20-3000 keV, normalized to max) - **Output:** Multi-label isotope classification (82 isotopes) with activity estimation (Bq) --- ## Quick Start ### Installation ```bash # Create virtual environment python -m venv .venv source .venv/bin/activate # Linux/Mac # Install dependencies pip install numpy scipy pillow # Install PyTorch (nightly for RTX 5090/Blackwell support) pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu128 ``` ### Generate Synthetic Data ```bash # Generate 10 test samples (default) python -m vega_ml.synthetic_spectra.generate_spectra --num_samples 10 --output_dir data/synthetic # With measured background for hybrid training (recommended) python -m vega_ml.synthetic_spectra.generate_spectra \ --num_samples 50000 \ --output_dir data/synthetic \ --measured_background /path/to/background_24h.npy ``` ### Train the Model ```bash # Quick test run (5 epochs, small dataset) python -m vega_ml.training.vega.run_training --test # Full training python -m vega_ml.training.vega.run_training \ --data-dir data/synthetic \ --model-dir models \ --epochs 100 --batch-size 64 ``` ### Run Inference ```bash # Run inference on synthetic data python -m vega_ml.inference.run_inference --model models/vega_best.pt --data data/synthetic ``` --- ## Vega Model Architecture **Vega** is a CNN-FCNN hybrid model optimized for gamma spectrum isotope identification, based on research showing 99%+ accuracy on similar tasks. ### Architecture Details | Component | Configuration | |-----------|---------------| | Input | 1023 energy channels | | CNN Backbone | 3 ConvBlocks [64, 128, 256 channels] | | Kernel Size | 7 (captures spectral features) | | FC Layers | [512, 256] with dropout | | Output Heads | Dual: Classification (82 isotopes) + Regression (activity) | | Total Parameters | 34.5M | | Activation | LeakyReLU + BatchNorm | ### Training Features - **Mixed Precision (AMP):** Faster training on modern GPUs - **Multi-task Learning:** Simultaneous isotope ID + activity estimation - **Loss Function:** BCE (classification) + Huber (regression) - **LR Scheduling:** ReduceLROnPlateau with early stopping --- ## Synthetic Spectra Generation ### Realistic Background Model The background continuum uses a realistic CsI(Tl) shape calibrated against real Radiacode 103 measurements, not a simple exponential: - **Asymmetric hump** at ~110 keV (sigma_left=55 keV, sigma_right=50 keV) — the dominant low-energy scatter peak characteristic of CsI(Tl) detectors - **Compton tail**: 0.45*exp(-E/240) + 0.04*exp(-E/700) — realistic high-energy falloff - **Noise floor** at 0.8% of peak — prevents zero-count channels This replaces the previous simple exponential `A*exp(-0.002*E)` which failed to reproduce the characteristic CsI(Tl) response. ### Hybrid Training with Measured Background When a measured background file (`background_24h.npy`) is available, the generator blends it with the synthetic model: - **70% measured** background shape (scaled to target CPS) - **30% synthetic** continuum (for robustness against measurement artifacts) - Stochastic isotope peaks (K-40, radon, thorium) are still added on top with random activity levels This is controlled by the `--measured_background` CLI argument or the `MEASURED_BACKGROUND_PATH` environment variable. ### Features - **82 isotopes** with accurate gamma emission lines - **Realistic physics:** Gaussian peaks, Poisson noise, Compton continuum, CsI(Tl) background shape - **Multiple detector models:** Radiacode 101, 102, 103, 103G, 110 with correct FWHM and energy ranges - **Configurable variation:** Activity levels, measurement durations, isotope combinations - **Decay chains:** Uranium-238, Thorium-232 chains with secular equilibrium ### Sample Distribution (v3) | Type | Proportion | Description | |------|------------|-------------| | Background only | 15% | Environmental background only | | Single calibration | 20% | One check source + background | | Single medical | 8% | Medical isotope + background | | Single industrial | 5% | Industrial source + background | | Uranium chain | 10% | U-238 + daughters in equilibrium | | Thorium chain | 10% | Th-232 + daughters in equilibrium | | NORM | 7% | Naturally occurring radioactive material | | Fallout | 5% | Cs-137 + Cs-134 signature | | Mixed | 10% | Random 2-3 isotope mixes | | Complex mix | 5% | 4-6 isotopes from various categories | | Weak source | 5% | Near-detection-limit sources | --- ## Project Structure ``` train/vega_ml/ ├── README.md # This file ├── agents.md # AI agent context documentation ├── .gitignore # Git ignore rules │ ├── synthetic_spectra/ # Spectrum generation package │ ├── __init__.py │ ├── config.py # Detector configurations (Radiacode 101-110) │ ├── generator.py # Main generation logic (SpectrumConfig) │ ├── generate_spectra.py # CLI batch generation (v1) │ ├── generate_spectra_v3.py # CLI batch generation (v3, parallel) │ ├── ground_truth/ │ │ ├── isotope_data.py # 82 isotopes database │ │ └── decay_chains.py # Decay chain definitions │ └── physics/ │ └── spectrum_physics.py # Physics calculations + realistic CsI(Tl) background │ ├── training/ # Training infrastructure │ └── vega/ # Vega model package │ ├── __init__.py │ ├── isotope_index.py # Isotope ↔ index mapping │ ├── model.py # VegaModel architecture + VegaLoss │ ├── dataset.py # PyTorch Dataset/DataLoader │ ├── train.py # Training loop & utilities │ └── run_training.py # CLI training script │ ├── inference/ # Inference engine │ ├── vega_inference.py # VegaInference class │ └── run_inference.py # CLI inference script │ ├── models/ # Saved model checkpoints │ ├── vega_best.pt # Best validation loss │ ├── vega_final.pt # Final epoch │ └── vega_history.json # Training metrics │ └── data/ # Generated data (git-ignored) └── synthetic/ └── spectra/ ``` --- ## Technical Details ### Detector Specifications | Model | Crystal | FWHM @ 662 keV | Energy Range | Channels | |-------|---------|----------------|--------------|----------| | Radiacode 101 | CsI(Tl) | 9.0% | 20-3000 keV | 1024 | | Radiacode 102 | CsI(Tl) | 9.5% | 20-3000 keV | 1024 | | Radiacode 103 | CsI(Tl) | 8.4% | 20-3000 keV | 1024 | | Radiacode 103G | GAGG(Ce) | 7.4% | 20-3000 keV | 1024 | | Radiacode 110 | CsI(Tl) | 8.4% | 20-3000 keV | 1024 | Note: Only the first 1023 channels are used (channel 1023 is an overflow bin). ### Physics Model - **Peak shape:** Gaussian with FWHM scaling as sqrt(E/662) for scintillators - **Expected counts:** lambda = A * t * I * epsilon * T - **Noise:** Poisson counting statistics - **Background:** Realistic CsI(Tl) continuum (asymmetric hump + Compton tail) + environmental isotope peaks (K-40, radon daughters, thorium daughters) - **Hybrid mode:** Measured background can be blended with synthetic (70/30 ratio) for maximum realism ### Isotope Categories - Natural background (K-40, Ra-226, Rn-222) - Decay chains (U-238, Th-232, U-235) - Calibration sources (Am-241, Cs-137, Co-60, Ba-133, Eu-152) - Medical isotopes (Tc-99m, F-18, I-131, Ga-68) - Industrial sources (Ir-192, Se-75) - Reactor fallout (Cs-134, Cs-137, Sr-90) --- ## Development ### Dependencies ``` numpy>=1.24.0 scipy>=1.10.0 pillow>=9.0.0 scikit-learn>=1.3.0 torch>=2.0.0 ``` ### GPU Support For Blackwell GPUs (RTX 50-series, sm_120), use PyTorch 2.7+ with CUDA 12.8: ```bash pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128 ``` ### For AI Agents See [agents.md](agents.md) for comprehensive documentation on system architecture, physics model details, and configuration options. --- ## TODO - [x] ~~Push to repository~~ - Initial commit with generation system - [x] ~~Create PyTorch DataLoader for training~~ - [x] ~~Implement CNN-FCNN model architecture (Vega)~~ - [x] ~~Create training script with logging~~ - [x] ~~Implement inference module~~ - [x] ~~Realistic CsI(Tl) background model~~ - [x] ~~Hybrid training with measured background~~ - [ ] Retrain model with realistic background - [ ] Add model evaluation metrics & confusion matrix - [ ] Implement real-time inference on Radiacode devices --- ## License [TBD] --- ## Acknowledgments - Radiacode for device specifications - IAEA Nuclear Data Services for isotope data - NNDC at Brookhaven National Laboratory - Wang et al. research on CNN-FCNN for gamma spectroscopy