feat(bot-detector): upgrade to state-of-the-art detection pipeline

- Fix UnboundLocalError on global _consecutive_failures/_service_healthy
- Add SQL identifier validation for DB names at startup
- Replace Z-score drift detection with KS test (scipy.stats.ks_2samp)
- Replace DBSCAN with HDBSCAN (adaptive clustering, no epsilon needed)
- Fix NaN→0 blanket imputation with per-feature median/sentinel strategy
- Add 80/20 temporal train/validation split with offline metrics logging
- Integrate thesis §5 features from view_thesis_features_1h:
  path_transition_entropy, cadence_cv, burst/pause ratios,
  host_diversity, host_sweep_speed, host_coverage_uniformity,
  ja4_drift_ratio (Complet model only)
- Add SOC feedback loop: read classifications from audit_logs,
  reclassify FP IPs as human, exclude TP IPs from baseline
- Update dependencies: clickhouse-connect 0.8.12, scikit-learn 1.6.1,
  pandas 2.2.3, shap 0.47.2, add scipy>=1.14, hdbscan>=0.8.38

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
toto
2026-04-08 02:09:18 +02:00
parent 6d02f21c1e
commit 3ae8c7d9c9
2 changed files with 205 additions and 20 deletions

View File

@ -1,6 +1,8 @@
clickhouse-connect==0.8.0
pandas==2.2.0
scikit-learn==1.4.0
shap==0.44.1
clickhouse-connect==0.8.12
pandas==2.2.3
scikit-learn==1.6.1
shap==0.47.2
scipy>=1.14
hdbscan>=0.8.38
pyyaml>=6.0
ja4-common @ file:///app/shared/ja4_common