feat(bot-detector): add XGBoost supervised third voice (#10)
Triple-voice ensemble architecture: - EIF (non-supervisé, anomalies zero-day) - Autoencoder (non-supervisé, corrélations non-linéaires) - XGBoost (supervisé, patterns connus + feedback SOC) XGBoost implementation: - Trained on historical ml_all_scores labels (NORMAL=0, HIGH/CRITICAL/DENY/KNOWN=1) - Weekly retraining (XGB_RETRAIN_INTERVAL_H=168), min 100 labels required - Score = predict_proba, combined via meta-learner: (1-β)*(EIF+AE) + β*xgb_prob - Configurable: XGB_WEIGHT (β=0.20), XGB_MIN_LABELS, XGB_RETRAIN_INTERVAL_HOURS - Graceful fallback: if xgboost unavailable or labels insufficient, EIF+AE only - ClickHouse: xgb_prob column added to ml_all_scores - Tests: 4 new tests (availability, train/predict, meta-learner, save/load) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@ -96,6 +96,8 @@ CREATE TABLE IF NOT EXISTS ja4_processing.ml_all_scores
|
||||
campaign_id Int32,
|
||||
-- Autoencoder reconstruction error (parallel scorer)
|
||||
ae_recon_error Float32 DEFAULT 0,
|
||||
-- XGBoost supervised probability (third voice)
|
||||
xgb_prob Float32 DEFAULT 0,
|
||||
-- Anubis enrichment (deploy_schema.sql item 12)
|
||||
anubis_bot_name LowCardinality(String) DEFAULT '',
|
||||
anubis_bot_action LowCardinality(String) DEFAULT '',
|
||||
|
||||
Reference in New Issue
Block a user