feat(clustering): intégration Fingerprint HTTP Headers (agg_header_fingerprint_1h)
Sources des nouvelles features :
- agg_header_fingerprint_1h : Cookie, Referer par src_ip (JOIN sur IPv6)
- ml_detected_anomalies : header_order_shared_count, distinct_header_orders (déjà jointé)
Nouvelles features (indices 27-30) :
[27] FP Popularité : popularité du fingerprint headers (log1p/log1p(500k))
fingerprint rare (bot artisanal) → 0.0 ; très populaire (browser) → 1.0
[28] FP Rotation : distinct_header_orders (log1p/log1p(10))
rotation de fingerprint entre requêtes = comportement bot
[29] Cookie Présent : présence header Cookie (engagement utilisateur réel)
[30] Referer Présent: présence header Referer (navigation HTTP normale)
risk_score_from_centroid() : 14 termes, somme=1.0
+ hfp_rare (1-popularité) × 0.06 + hfp_rotating × 0.06
ML × 0.25 reste dominant
name_cluster() : 2 nouveaux labels
'🔄 Bot fingerprint tournant' : hfp_rotating>0.6 + anomalie>0.15
'🕵️ Fingerprint rare suspect' : hfp_popular<0.15 + anomalie>0.20
'🌐 Navigateur légitime' : fingerprint populaire confirmé
N_FEATURES : 27 → 31
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@ -95,7 +95,18 @@ SELECT
|
||||
avg(ml.has_accept_language) AS hdr_accept_lang,
|
||||
any(vh.hdr_enc) AS hdr_has_encoding,
|
||||
any(vh.hdr_sec_fetch) AS hdr_has_sec_fetch,
|
||||
any(vh.hdr_count) AS hdr_count_raw
|
||||
any(vh.hdr_count) AS hdr_count_raw,
|
||||
|
||||
-- Fingerprint HTTP Headers (depuis agg_header_fingerprint_1h + ml_detected_anomalies)
|
||||
-- header_order_shared_count : nb d'IPs partageant le même fingerprint
|
||||
-- → faible = fingerprint rare = comportement suspect
|
||||
avg(ml.header_order_shared_count) AS hfp_shared_count,
|
||||
-- distinct_header_orders : nb de fingerprints distincts émis par cette IP
|
||||
-- → élevé = rotation de fingerprint = comportement bot
|
||||
avg(ml.distinct_header_orders) AS hfp_distinct_orders,
|
||||
-- Cookie et Referer issus de la table dédiée aux empreintes
|
||||
any(hfp.hfp_cookie) AS hfp_cookie,
|
||||
any(hfp.hfp_referer) AS hfp_referer
|
||||
FROM mabase_prod.agg_host_ip_ja4_1h t
|
||||
LEFT JOIN mabase_prod.ml_detected_anomalies ml
|
||||
ON t.src_ip = ml.src_ip AND t.ja4 = ml.ja4
|
||||
@ -112,6 +123,15 @@ LEFT JOIN (
|
||||
AND log_date >= today() - 2
|
||||
GROUP BY src_ip_v6, ja4
|
||||
) vh ON t.src_ip = vh.src_ip_v6 AND t.ja4 = vh.ja4
|
||||
LEFT JOIN (
|
||||
SELECT
|
||||
src_ip,
|
||||
avg(has_cookie) AS hfp_cookie,
|
||||
avg(has_referer) AS hfp_referer
|
||||
FROM mabase_prod.agg_header_fingerprint_1h
|
||||
WHERE window_start >= now() - INTERVAL %(hours)s HOUR
|
||||
GROUP BY src_ip
|
||||
) hfp ON t.src_ip = hfp.src_ip
|
||||
WHERE t.window_start >= now() - INTERVAL %(hours)s HOUR
|
||||
AND t.tcp_ttl_raw > 0
|
||||
GROUP BY t.src_ip, t.ja4
|
||||
@ -124,6 +144,7 @@ _SQL_COLS = [
|
||||
"h2_eff", "hdr_conf", "ua_ch_mismatch", "asset_ratio", "direct_ratio",
|
||||
"ja4_count", "ua_rotating", "threat", "country", "asn_org",
|
||||
"hdr_accept_lang", "hdr_has_encoding", "hdr_has_sec_fetch", "hdr_count_raw",
|
||||
"hfp_shared_count", "hfp_distinct_orders", "hfp_cookie", "hfp_referer",
|
||||
]
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user