refactor: suppression dépendance User-Agent de la détection navigateur

Changements SQL :
- modern_browser_score : sec-ch-ua→100, Sec-Fetch→70 (plus de UA fallback)
- Ajout has_sec_ch_ua (UInt8) dans agg_header_fingerprint_1h et ml_all_scores
- mss_mobile_mismatch utilise has_sec_ch_ua au lieu de modern_browser_score
- header_order_confidence : PARTITION BY ja4 au lieu de first_ua
- sec_ch_mobile_mismatch : comparaison Client Hints interne (sans UA)
- Migration 03_remove_ua_browser_detection.sql

Changements Python :
- browser.py Axe 3 : Client Hints + Sec-Fetch + is_fake_navigation (PAS de UA)
- Pondération axes : ja4_known 0.30, tls_coherence 0.20 (signaux TLS renforcés)
- preprocessing.py : has_sec_ch_ua ajouté aux features et binary_features

Fichiers modifiés : 8 SQL/Python + 1 migration, 36/36 tests passent.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
toto
2026-04-09 23:06:01 +02:00
parent 00e99e5464
commit 14db3d9040
9 changed files with 101 additions and 38 deletions

View File

@ -38,6 +38,7 @@ WITH base_data AS (
h.header_order_hash AS header_order_hash, h.header_count AS header_count,
h.has_accept_language AS has_accept_language, h.has_cookie AS has_cookie,
h.has_referer AS has_referer, h.modern_browser_score AS modern_browser_score,
h.has_sec_ch_ua AS has_sec_ch_ua,
h.ua_ch_mismatch AS ua_ch_mismatch,
(a.count_post / (a.hits + 1)) AS post_ratio,
(a.uniq_query_params / (a.uniq_paths + 1)) AS fuzzing_index,
@ -46,7 +47,7 @@ WITH base_data AS (
(a.orphan_count / (a.hits + 1)) AS orphan_ratio,
(a.ip_id_zero_count / (a.hits + 1)) AS ip_id_zero_ratio,
(a.hits / (a.unique_conn_id + 1)) AS multiplexing_efficiency,
IF(a.mss_1460_count > (a.hits * 0.8) AND h.modern_browser_score > 70, 1, 0) AS mss_mobile_mismatch,
IF(a.mss_1460_count > (a.hits * 0.8) AND h.has_sec_ch_ua > 0, 1, 0) AS mss_mobile_mismatch,
a.request_size_variance AS request_size_variance,
IF(a.tls_alpn = 'h2' AND a.http_version != '2', 1, 0) AS alpn_http_mismatch,
IF(length(a.tls_alpn) = 0 OR a.tls_alpn = '00', 1, 0) AS is_alpn_missing,
@ -62,7 +63,7 @@ WITH base_data AS (
(sum(a.hits) OVER (PARTITION BY a.ja4, a.src_asn) / (sum(a.hits) OVER (PARTITION BY a.ja4) + 1)) AS ja4_asn_concentration,
(sum(a.hits) OVER (PARTITION BY a.ja4, a.src_country_code) / (sum(a.hits) OVER (PARTITION BY a.ja4) + 1)) AS ja4_country_concentration,
IF(sum(a.hits) OVER (PARTITION BY a.ja4) < 100, 1, 0) AS is_rare_ja4,
(count() OVER (PARTITION BY h.header_order_hash, a.first_ua) / (count() OVER (PARTITION BY a.first_ua) + 1)) AS header_order_confidence,
(count() OVER (PARTITION BY h.header_order_hash, a.ja4) / (count() OVER (PARTITION BY a.ja4) + 1)) AS header_order_confidence,
uniqExact(h.header_order_hash) OVER (PARTITION BY a.src_ip) AS distinct_header_orders,
(a.uniq_paths / (a.hits + 1)) AS path_diversity_ratio,
a.url_depth_variance AS url_depth_variance,
@ -127,7 +128,8 @@ WITH base_data AS (
window_start, src_ip, any(header_order_hash) AS header_order_hash,
max(header_count) AS header_count, max(has_accept_language) AS has_accept_language,
max(has_cookie) AS has_cookie, max(has_referer) AS has_referer,
max(modern_browser_score) AS modern_browser_score, max(ua_ch_mismatch) AS ua_ch_mismatch,
max(modern_browser_score) AS modern_browser_score, max(has_sec_ch_ua) AS has_sec_ch_ua,
max(ua_ch_mismatch) AS ua_ch_mismatch,
any(sec_fetch_mode) AS sec_fetch_mode, any(sec_fetch_dest) AS sec_fetch_dest
FROM ja4_processing.agg_header_fingerprint_1h
WHERE window_start >= now() - INTERVAL 24 HOUR