feat(clustering): intégration Fingerprint HTTP Headers (agg_header_fingerprint_1h)

Sources des nouvelles features :
- agg_header_fingerprint_1h : Cookie, Referer par src_ip (JOIN sur IPv6)
- ml_detected_anomalies : header_order_shared_count, distinct_header_orders (déjà jointé)

Nouvelles features (indices 27-30) :
  [27] FP Popularité  : popularité du fingerprint headers (log1p/log1p(500k))
       fingerprint rare (bot artisanal) → 0.0 ; très populaire (browser) → 1.0
  [28] FP Rotation    : distinct_header_orders (log1p/log1p(10))
       rotation de fingerprint entre requêtes = comportement bot
  [29] Cookie Présent : présence header Cookie (engagement utilisateur réel)
  [30] Referer Présent: présence header Referer (navigation HTTP normale)

risk_score_from_centroid() : 14 termes, somme=1.0
  + hfp_rare (1-popularité) × 0.06 + hfp_rotating × 0.06
  ML × 0.25 reste dominant

name_cluster() : 2 nouveaux labels
  '🔄 Bot fingerprint tournant' : hfp_rotating>0.6 + anomalie>0.15
  '🕵️ Fingerprint rare suspect' : hfp_popular<0.15 + anomalie>0.20
  '🌐 Navigateur légitime' : fingerprint populaire confirmé

N_FEATURES : 27 → 31

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
SOC Analyst
2026-03-19 11:13:37 +01:00
parent 8fb054c8b7
commit 6ff59a36d7
2 changed files with 114 additions and 66 deletions

View File

@ -95,7 +95,18 @@ SELECT
avg(ml.has_accept_language) AS hdr_accept_lang,
any(vh.hdr_enc) AS hdr_has_encoding,
any(vh.hdr_sec_fetch) AS hdr_has_sec_fetch,
any(vh.hdr_count) AS hdr_count_raw
any(vh.hdr_count) AS hdr_count_raw,
-- Fingerprint HTTP Headers (depuis agg_header_fingerprint_1h + ml_detected_anomalies)
-- header_order_shared_count : nb d'IPs partageant le même fingerprint
-- → faible = fingerprint rare = comportement suspect
avg(ml.header_order_shared_count) AS hfp_shared_count,
-- distinct_header_orders : nb de fingerprints distincts émis par cette IP
-- → élevé = rotation de fingerprint = comportement bot
avg(ml.distinct_header_orders) AS hfp_distinct_orders,
-- Cookie et Referer issus de la table dédiée aux empreintes
any(hfp.hfp_cookie) AS hfp_cookie,
any(hfp.hfp_referer) AS hfp_referer
FROM mabase_prod.agg_host_ip_ja4_1h t
LEFT JOIN mabase_prod.ml_detected_anomalies ml
ON t.src_ip = ml.src_ip AND t.ja4 = ml.ja4
@ -112,6 +123,15 @@ LEFT JOIN (
AND log_date >= today() - 2
GROUP BY src_ip_v6, ja4
) vh ON t.src_ip = vh.src_ip_v6 AND t.ja4 = vh.ja4
LEFT JOIN (
SELECT
src_ip,
avg(has_cookie) AS hfp_cookie,
avg(has_referer) AS hfp_referer
FROM mabase_prod.agg_header_fingerprint_1h
WHERE window_start >= now() - INTERVAL %(hours)s HOUR
GROUP BY src_ip
) hfp ON t.src_ip = hfp.src_ip
WHERE t.window_start >= now() - INTERVAL %(hours)s HOUR
AND t.tcp_ttl_raw > 0
GROUP BY t.src_ip, t.ja4
@ -124,6 +144,7 @@ _SQL_COLS = [
"h2_eff", "hdr_conf", "ua_ch_mismatch", "asset_ratio", "direct_ratio",
"ja4_count", "ua_rotating", "threat", "country", "asn_org",
"hdr_accept_lang", "hdr_has_encoding", "hdr_has_sec_fetch", "hdr_count_raw",
"hfp_shared_count", "hfp_distinct_orders", "hfp_cookie", "hfp_referer",
]