feat(bot-detector): Browser Signature Detection engine (parallel mode)
Étape A — browser_signatures.py
Données pures : BROWSER_SIGNATURES (Chrome/Firefox/Safari), NON_BROWSER_SIGNATURES
(curl/httpx/go), BROWSER_THRESHOLDS, DIMENSION_WEIGHTS. Valeurs H2 extraites
des captures réelles (format Akamai avec virgules, non semicolons).
Étape B — browser_matcher.py
Moteur vectorisé 7 dimensions (H2 SETTINGS 0.30, WINDOW_UPDATE 0.15,
pseudo-header order 0.15, H2 PRIORITY 0.10, HTTP headers 0.15, TLS 0.10,
JA4 dict 0.05). run_browser_matcher(df) ajoute bm_family/bm_score/bm_decision.
CDN edge case : dimension H2 neutralisée (0.5) si has_xff=1.
BROWSER_MATCHER_REPLACE=false par défaut (mode DUAL_MODE logging uniquement).
Étape C — 06_browser_signature_detection.sql (migration)
Crée browser_h2_signatures (table MergeTree avec 12 fingerprints de référence).
Recrée dict_browser_h2 depuis la table avec champ confidence (remplace CSV).
Étape D — 07_ai_features_view.sql
+h2_wu_val dans le JOIN http_logs, +h2_window_update_value, +h2_dict_family,
+h2_dict_confidence, +h2_window_{chrome,firefox,safari,absent},
+h2_order_{chromesafari,firefox}, +h2_priority_present, +h2_pseudo_ord_raw,
+tls_h2_family_mismatch (détection incohérence famille JA4 vs famille H2).
Étape E — preprocessing.py + pipeline.py
preprocessing.py: appelle run_browser_matcher() après compute_browser_axes(),
ajoute 7 nouvelles features binaires H2 à FEATURES et binary_features.
pipeline.py: appelle log_dual_mode_comparison() après la classification A9.
BROWSER_MATCHER_REPLACE=true active le remplacement du bypass.
Étape F — test_browser_matcher.py
8 tests : Chrome/Firefox/Safari full match, curl rejeté, httpcloak partiel,
TLS↔H2 mismatch, CDN proxy neutralisation, go net/http rejeté.
Tous 8 PASSED (+ 36 tests existants inchangés).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@ -9,6 +9,7 @@ import numpy as np
|
||||
from .config import BROWSER_CONFIDENCE_THRESHOLD
|
||||
from .log import log_info
|
||||
from .browser import _compute_browser_axes, _parse_ja4_columns, _infer_browser_family
|
||||
from .browser_matcher import run_browser_matcher, log_dual_mode_comparison, BROWSER_MATCHER_ENABLED
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
@ -52,6 +53,10 @@ FEATURES = [
|
||||
# §2 — Features HTTP/2 (fingerprint SETTINGS, cohérence H2↔JA4)
|
||||
'h2_settings_known', 'h2_pseudo_order_match',
|
||||
'h2_ja4_coherence', 'h2_settings_rare',
|
||||
# §4 — Signaux atomiques H2 pour le browser_matcher (Famille 4 : Cohérence cross-layer)
|
||||
'tls_h2_family_mismatch',
|
||||
'h2_window_chrome', 'h2_window_firefox', 'h2_window_safari', 'h2_window_absent',
|
||||
'h2_order_chromesafari', 'h2_order_firefox',
|
||||
# §3 — Score de cohérence de fingerprint cross-layer
|
||||
'fingerprint_coherence_score',
|
||||
]
|
||||
@ -92,6 +97,13 @@ def preprocess_df(df: pd.DataFrame) -> pd.DataFrame:
|
||||
'axis_nav_behavior', 'axis_tls_coherence', 'axis_h2_coherence']:
|
||||
df[ax] = browser_axes[ax]
|
||||
|
||||
# ── A9b — Browser Signature Matcher (parallèle à browser_confidence) ─────
|
||||
# En mode DUAL_MODE (BROWSER_MATCHER_REPLACE=false), les colonnes bm_* sont
|
||||
# ajoutées pour journalisation uniquement — le bypass reste piloté par
|
||||
# browser_confidence jusqu'à la validation complète.
|
||||
if BROWSER_MATCHER_ENABLED:
|
||||
df = run_browser_matcher(df)
|
||||
|
||||
# Rétro-compatibilité
|
||||
df['is_known_browser'] = browser_axes['axis_ja4_known'].astype(int)
|
||||
df['browser_consistency_score'] = (
|
||||
@ -117,6 +129,10 @@ def preprocess_df(df: pd.DataFrame) -> pd.DataFrame:
|
||||
'is_fake_navigation', 'has_xff', 'sec_ch_mobile_mismatch',
|
||||
# §2 — Features HTTP/2 binaires
|
||||
'h2_settings_known', 'h2_pseudo_order_match', 'h2_ja4_coherence', 'h2_settings_rare',
|
||||
# §4 — Signaux atomiques H2 binaires
|
||||
'tls_h2_family_mismatch',
|
||||
'h2_window_chrome', 'h2_window_firefox', 'h2_window_safari', 'h2_window_absent',
|
||||
'h2_order_chromesafari', 'h2_order_firefox', 'h2_priority_present',
|
||||
}
|
||||
for col in df.columns:
|
||||
if col in binary_features:
|
||||
|
||||
Reference in New Issue
Block a user