refactor: suppression dépendance User-Agent de la détection navigateur

Changements SQL :
- modern_browser_score : sec-ch-ua→100, Sec-Fetch→70 (plus de UA fallback)
- Ajout has_sec_ch_ua (UInt8) dans agg_header_fingerprint_1h et ml_all_scores
- mss_mobile_mismatch utilise has_sec_ch_ua au lieu de modern_browser_score
- header_order_confidence : PARTITION BY ja4 au lieu de first_ua
- sec_ch_mobile_mismatch : comparaison Client Hints interne (sans UA)
- Migration 03_remove_ua_browser_detection.sql

Changements Python :
- browser.py Axe 3 : Client Hints + Sec-Fetch + is_fake_navigation (PAS de UA)
- Pondération axes : ja4_known 0.30, tls_coherence 0.20 (signaux TLS renforcés)
- preprocessing.py : has_sec_ch_ua ajouté aux features et binary_features

Fichiers modifiés : 8 SQL/Python + 1 migration, 36/36 tests passent.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
toto
2026-04-09 23:06:01 +02:00
parent 00e99e5464
commit 14db3d9040
9 changed files with 101 additions and 38 deletions

View File

@ -606,23 +606,23 @@ def test_browser_consistency_score_range():
"""browser_consistency_score is in [0, 5] and sums 5 binary signals."""
df = pd.DataFrame({
'browser_family': ['Chromium', '', 'Firefox', 'Safari'],
'modern_browser_score': [100, 0, 80, 50],
'has_sec_ch_ua': [1, 0, 1, 0],
'has_accept_language': [1, 0, 1, 1],
'has_cookie': [1, 0, 1, 0],
'sec_fetch_absence_rate': [0.0, 1.0, 0.1, 0.6],
})
is_known = (df['browser_family'] != '').astype(int)
mbs_ok = (df['modern_browser_score'] >= 50).astype(int)
scu_ok = (df['has_sec_ch_ua'] > 0).astype(int)
hal_ok = (df['has_accept_language'] > 0).astype(int)
hck_ok = (df['has_cookie'] > 0).astype(int)
sfa_ok = (df['sec_fetch_absence_rate'] < 0.5).astype(int)
bcs = is_known + mbs_ok + hal_ok + hck_ok + sfa_ok
bcs = is_known + scu_ok + hal_ok + hck_ok + sfa_ok
assert bcs.min() >= 0 and bcs.max() <= 5
assert bcs.iloc[0] == 5, "Chromium with all signals should score 5"
assert bcs.iloc[1] == 0, "Empty browser with no signals should score 0"
assert bcs.iloc[2] == 5, "Firefox with all signals should score 5"
assert bcs.iloc[3] == 3, "Safari without cookie and high sec_fetch_absence should score 3"
assert bcs.iloc[3] == 2, "Safari without CH/cookie and high sec_fetch_absence should score 2"
def test_legitimate_browser_classification_threshold():