feat(bot-detector): implement 8 state-of-art improvements

- EIF: Extended Isolation Forest via isotree (fallback to sklearn IF)
- Benford's Law deviation feature on inter-request timing
- Lag-1 autocorrelation feature for cadence analysis
- Validation gate: reject model if val_anomaly_rate > 20%
- Feature pruning: remove variance < 1e-6 features before training
- Quantile drift: replace N(μ,σ) synthetic with quantile interpolation
- Thread safety: Lock for _service_healthy/_consecutive_failures
- Score normalization: inverted to [0,1] where 1=most anomalous

SQL: add lag1_autocorrelation + benford_deviation to view_thesis_features_1h
Tests: 10 new test functions covering all improvements
Integration: verify_mvs.py checks new thesis feature columns

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
toto
2026-04-08 02:31:26 +02:00
parent 0d1a6a81e0
commit f6e2d3c0ca
5 changed files with 318 additions and 33 deletions

View File

@ -230,6 +230,24 @@ def main() -> None:
else:
print(f" \033[93m?\033[0m §5.3 cadence features pas de données")
# Vérification des colonnes §5.3 nouvelles (lag1_autocorrelation, benford_deviation)
result = client.query(
f"SELECT avg(lag1_autocorrelation) AS avg_lag1, avg(benford_deviation) AS avg_benford "
f"FROM {CLICKHOUSE_DB}.view_thesis_features_1h "
f"WHERE lag1_autocorrelation IS NOT NULL"
)
if result.result_rows:
avg_lag1 = float(result.result_rows[0][0])
avg_benford = float(result.result_rows[0][1])
ok_lag1 = -1.0 <= avg_lag1 <= 1.0
ok_benford = avg_benford >= 0
print(f" {'' if ok_lag1 else ''} §5.3 lag1_autocorrelation avg {avg_lag1:.4f} (attendu [-1, 1])")
print(f" {'' if ok_benford else ''} §5.3 benford_deviation avg {avg_benford:.4f} (attendu >= 0)")
if not ok_lag1: failures += 1
if not ok_benford: failures += 1
else:
print(f" \033[93m?\033[0m §5.3 lag1/benford features pas de données")
# Vérification des colonnes §5.5
result = client.query(
f"SELECT avg(ja4_drift_ratio) AS avg_drift, "