Fix: CsI(Tl) non-linear response correction + detector calibration overhaul

Root cause of Am-241 misidentification: the Radiacode 103's CsI(Tl) crystal shifts low-energy peaks upward (59.5 keV → 71.6 keV for Am-241) due to non-proportional scintillation response. The model was trained on theoretical peak positions and couldn't match the shifted real peaks. Changes: - Add inverse CsI(Tl) non-linear correction to inference pipeline (radiacode_monitor.py, web/config.py, test_detection.py) E_apparent = E_true * (1 + 0.37 * exp(-E_true/100)) Corrects channel mapping so peaks appear at theoretical energies - Fix energy calibration: DetectorConfig now uses E = 0.33 + 2.97*ch with 1023 channels, matching the real detector (was energy_min=20, skip_first_channel=True, different channel width) - Add K-escape peaks for CsI(Tl) iodine X-ray escape (E - 28.5 keV) - Add asymmetric peak shapes for low-energy tails (< 200 keV) - Add log1p normalization in dataset and inference (replaces max-norm) - Add background-subtracted training mode (subtract_background flag) - Add low-signal augmentation (0.01-5 Bq activities, 30-300s durations) - Update docker-compose.yml: batch_size=32, duration=30-300s, CSI_NONLINEAR_ALPHA/BETA env vars for detect and web - Web dashboard: apply CsI correction to displayed spectra - Various UI fixes (Chart.js width, zoom/pan, isotope lines) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 17:35:22 +02:00
parent 3b4446b181
commit 0847a3fc80
21 changed files with 913 additions and 278 deletions
--- a/train/vega_ml/training/vega/dataset.py
+++ b/train/vega_ml/training/vega/dataset.py
@ -31,24 +31,38 @@ class SpectrumSample:
    detector: str


+def normalize_log1p(spectrum: np.ndarray) -> np.ndarray:
+    """Log1p normalization: log(1 + x) / max(log(1 + x)).
+
+    Preserves relative signal levels across channels, works well when
+    many channels are zero (e.g. after background subtraction).
+    """
+    log_spec = np.log1p(np.maximum(spectrum, 0))
+    max_val = log_spec.max()
+    if max_val > 0:
+        return log_spec / max_val
+    return log_spec
+
+
 class SpectrumDataset(Dataset):
    """
    PyTorch Dataset for synthetic gamma spectra.
-    
+
    Loads spectra from numpy files and their labels from JSON files.
    Supports both individual JSON files per sample (efficient for large datasets)
    and combined labels.json (legacy format).
-    
+
    Converts to tensors suitable for the Vega model.
    """
-    
+
    def __init__(
        self,
        data_dir: Path,
        isotope_index: Optional[IsotopeIndex] = None,
        max_activity_bq: float = 1000.0,
        collapse_time: bool = True,
-        transform=None
+        transform=None,
+        normalization: str = "log1p"
    ):
        """
        Initialize the dataset.
@ -66,6 +80,7 @@ class SpectrumDataset(Dataset):
        self.max_activity_bq = max_activity_bq
        self.collapse_time = collapse_time
        self.transform = transform
+        self.normalization = normalization
        
        # Detect label format and load sample list
        self.use_individual_labels = self._detect_label_format()
@ -156,7 +171,15 @@ class SpectrumDataset(Dataset):
        if self.collapse_time and spectrum.ndim == 2:
            # Average across time intervals to get single spectrum
            spectrum = spectrum.mean(axis=0)
-        
+
+        # Normalize spectrum
+        if self.normalization == "log1p":
+            spectrum = normalize_log1p(spectrum)
+        elif self.normalization == "max":
+            max_val = spectrum.max()
+            if max_val > 0:
+                spectrum = spectrum / max_val
+
        # Convert to tensor
        spectrum_tensor = torch.tensor(spectrum, dtype=torch.float32)