feat(ml): replace Autoencoder with RealNVP Normalizing Flow and add SessionTransformer embeddings

Replace TrafficAutoEncoder (MSE reconstruction scoring) with TrafficNormalizingFlow
(RealNVP via FrEIA, 4 affine coupling blocks, anomaly score = -log p(x)) for
mathematically rigorous density estimation. Add SessionTransformer module producing
32-dimensional sequence embeddings from raw HTTP request sequences (path, method,
timing) via a lightweight TransformerEncoder, replacing path_transition_entropy and
cadence_cv features. Update thesis documentation sections 2.4.2b and 3.8 accordingly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Jacquin Antoine
2026-04-13 15:11:21 +02:00
parent 0e5f94dd0d
commit c1821dcbc4
14 changed files with 515 additions and 3590 deletions

View File

@ -168,6 +168,22 @@ def fetch_and_analyze():
except Exception as e:
log_info(f'[Thèse §5] view_thesis_features_1h inaccessible : {e} — features avancées ignorées.')
# ── §5.2 — Embeddings Transformer de séquence (remplace path_transition_entropy + cadence_cv)
try:
from .session_transformer import extract_sequence_embeddings
df_embs = extract_sequence_embeddings(df, client)
if df_embs is not None and not df_embs.empty:
df = df.merge(df_embs, on=['src_ip', 'ja4', 'host'], how='left')
for i in range(32):
col = f'seq_emb_{i}'
if col in df.columns:
df[col] = df[col].fillna(0.0)
log_info(f'[Transformer §5.2] {len(df_embs)} sessions enrichies avec 32 embeddings séquentiels.')
except Exception as e:
log_info(f'[Transformer §5.2] Embeddings indisponibles : {e}')
for i in range(32):
df[f'seq_emb_{i}'] = 0.0
df = preprocess_df(df)
# §5 — Enrichissement avec le score de flotte JA4×ASN (bipartite fleet detection)