feat(bot-detector): add parallel Autoencoder scorer (#9)

- TrafficAutoEncoder class: symmetric AE (n→64→32→16→32→64→n) with BatchNorm+ReLU - Trained alongside EIF on human_baseline, saved/loaded with model versioning - Score = per-sample MSE reconstruction error, combined with EIF via AE_WEIGHT (α=0.30) - AE latent space (16-dim) used for HDBSCAN clustering instead of raw features - Configurable: AE_WEIGHT, AE_EPOCHS, AE_LATENT_DIM, AE_LEARNING_RATE - Graceful fallback: if torch unavailable or AE fails, EIF-only scoring continues - ClickHouse: ae_recon_error column added to ml_all_scores - Tests: 5 new tests (AE train/score, encode latent, state dict save/load, weight combination) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-08 02:40:39 +02:00
parent f6e2d3c0ca
commit 57cf6c3828
4 changed files with 378 additions and 12 deletions
--- a/shared/clickhouse/06_ml_tables.sql
+++ b/shared/clickhouse/06_ml_tables.sql
@ -94,6 +94,8 @@ CREATE TABLE IF NOT EXISTS ja4_processing.ml_all_scores
    fuzzing_index     Float32,
    post_ratio        Float32,
    campaign_id       Int32,
+    -- Autoencoder reconstruction error (parallel scorer)
+    ae_recon_error    Float32 DEFAULT 0,
    -- Anubis enrichment (deploy_schema.sql item 12)
    anubis_bot_name     LowCardinality(String) DEFAULT '',
    anubis_bot_action   LowCardinality(String) DEFAULT '',