feat: LEGITIMATE_BROWSER classification from JA4 + behavioral consistency

Add browser legitimacy classification (A9) to the bot detection pipeline: - New features: is_known_browser (binary) and browser_consistency_score [0..5] combining 5 signals: JA4 browser match, modern_browser_score, Accept-Language, cookies, Sec-Fetch-* presence - Post-scoring: sessions with known browser JA4 + consistency >= 4/5 + NORMAL/LOW threat level are reclassified as LEGITIMATE_BROWSER - Spoofing detection: inconsistent behavior (known JA4 but low consistency) stays in normal anomaly scoring — prevents evasion via JA4 spoofing - XGBoost treats LEGITIMATE_BROWSER as non-threat (negative label) - ClickHouse: browser_family column added to ml_detected_anomalies and ml_all_scores - Dashboard: browser_family filter/sort on detections and scores endpoints, legitimate_browsers count and browser_stats in overview - 6 new unit tests covering classification threshold, spoofing, exclusion logic Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-08 15:46:22 +02:00
parent 7d09c614c3
commit 9a48fb9d29
4 changed files with 215 additions and 7 deletions
--- a/shared/clickhouse/06_ml_tables.sql
+++ b/shared/clickhouse/06_ml_tables.sql
@ -24,6 +24,7 @@
 CREATE TABLE IF NOT EXISTS ja4_processing.ml_detected_anomalies
 (
    detected_at DateTime, src_ip IPv6, ja4 String, host String, bot_name String,
+    browser_family LowCardinality(String) DEFAULT '',
    anomaly_score Float32, threat_level String, model_name String, recurrence UInt32,
    asn_number String, asn_org String, asn_detail String, asn_domain String,
    country_code String, asn_label String,
@ -80,6 +81,7 @@ CREATE TABLE IF NOT EXISTS ja4_processing.ml_all_scores
    ja4               String,
    host              String,
    bot_name          String,
+    browser_family    LowCardinality(String) DEFAULT '',
    anomaly_score     Float32,
    raw_anomaly_score Float32,
    threat_level      String,