feat(bot-detector): Browser Signature Detection engine (parallel mode)
Étape A — browser_signatures.py
Données pures : BROWSER_SIGNATURES (Chrome/Firefox/Safari), NON_BROWSER_SIGNATURES
(curl/httpx/go), BROWSER_THRESHOLDS, DIMENSION_WEIGHTS. Valeurs H2 extraites
des captures réelles (format Akamai avec virgules, non semicolons).
Étape B — browser_matcher.py
Moteur vectorisé 7 dimensions (H2 SETTINGS 0.30, WINDOW_UPDATE 0.15,
pseudo-header order 0.15, H2 PRIORITY 0.10, HTTP headers 0.15, TLS 0.10,
JA4 dict 0.05). run_browser_matcher(df) ajoute bm_family/bm_score/bm_decision.
CDN edge case : dimension H2 neutralisée (0.5) si has_xff=1.
BROWSER_MATCHER_REPLACE=false par défaut (mode DUAL_MODE logging uniquement).
Étape C — 06_browser_signature_detection.sql (migration)
Crée browser_h2_signatures (table MergeTree avec 12 fingerprints de référence).
Recrée dict_browser_h2 depuis la table avec champ confidence (remplace CSV).
Étape D — 07_ai_features_view.sql
+h2_wu_val dans le JOIN http_logs, +h2_window_update_value, +h2_dict_family,
+h2_dict_confidence, +h2_window_{chrome,firefox,safari,absent},
+h2_order_{chromesafari,firefox}, +h2_priority_present, +h2_pseudo_ord_raw,
+tls_h2_family_mismatch (détection incohérence famille JA4 vs famille H2).
Étape E — preprocessing.py + pipeline.py
preprocessing.py: appelle run_browser_matcher() après compute_browser_axes(),
ajoute 7 nouvelles features binaires H2 à FEATURES et binary_features.
pipeline.py: appelle log_dual_mode_comparison() après la classification A9.
BROWSER_MATCHER_REPLACE=true active le remplacement du bypass.
Étape F — test_browser_matcher.py
8 tests : Chrome/Firefox/Safari full match, curl rejeté, httpcloak partiel,
TLS↔H2 mismatch, CDN proxy neutralisation, go net/http rejeté.
Tous 8 PASSED (+ 36 tests existants inchangés).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@ -22,6 +22,7 @@ from .scoring import (
|
||||
compute_exiffi_importance, compute_ae_feature_errors, get_meta_learner,
|
||||
FINGERPRINT_COHERENCE_THRESHOLD,
|
||||
)
|
||||
from .browser_matcher import log_dual_mode_comparison, BROWSER_MATCHER_ENABLED, BROWSER_MATCHER_REPLACE
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
@ -273,6 +274,33 @@ def run_semi_supervised_logic(df, features, name, cycle_id, recurrence_map):
|
||||
'axis_means': ax_means,
|
||||
})
|
||||
|
||||
# ── A9b — DUAL_MODE : journaliser les décisions browser_matcher vs browser_confidence ──
|
||||
# Quand BROWSER_MATCHER_REPLACE=true, browser_matcher pilote le bypass à la place.
|
||||
if BROWSER_MATCHER_ENABLED and 'bm_decision' in unknown_traffic.columns:
|
||||
log_dual_mode_comparison(unknown_traffic, cycle_id, name)
|
||||
if BROWSER_MATCHER_REPLACE:
|
||||
# Appliquer la décision du matcher (remplace le résultat du bloc A9 ci-dessus)
|
||||
bm_legit = unknown_traffic['bm_decision'] == 'LEGITIMATE_BROWSER'
|
||||
if bm_legit.any():
|
||||
unknown_traffic.loc[bm_legit, 'threat_level'] = 'LEGITIMATE_BROWSER'
|
||||
unknown_traffic.loc[bm_legit, 'reason'] = (
|
||||
'[BrowserMatcher] '
|
||||
+ unknown_traffic.loc[bm_legit, 'bm_family'].fillna('Unknown')
|
||||
+ ' (score=' + unknown_traffic.loc[bm_legit, 'bm_score'].round(2).astype(str) + ')'
|
||||
)
|
||||
log_info(
|
||||
f"[{name}][BrowserMatcher] {bm_legit.sum()} bypass(es) appliqué(s) "
|
||||
f"(BROWSER_MATCHER_REPLACE=true)"
|
||||
)
|
||||
# Atténuation par score partiel pour les zones grises
|
||||
bm_partial = unknown_traffic['bm_decision'] == 'PARTIAL'
|
||||
if bm_partial.any():
|
||||
partial_scores = unknown_traffic.loc[bm_partial, 'bm_score'].fillna(0.0)
|
||||
unknown_traffic.loc[bm_partial, 'raw_anomaly_score'] = (
|
||||
unknown_traffic.loc[bm_partial, 'raw_anomaly_score']
|
||||
* (1 - 0.5 * partial_scores.values)
|
||||
)
|
||||
|
||||
# Capturer toutes les sessions scorées (avant filtrage par seuil) — pour ml_all_scores
|
||||
all_scored = unknown_traffic.copy()
|
||||
|
||||
|
||||
Reference in New Issue
Block a user