feat(ml): replace NetworkX/Louvain with PyTorch Geometric GraphSAGE for fleet detection

Rewrite fleet.py to use a GNN-based approach: nodes are src_ip with ML feature
vectors, edges connect IPs sharing (JA4, ASN) pairs, GraphSAGE (2 SAGEConv
layers, in→64→32) produces 32D embeddings clustered by HDBSCAN. PyG NeighborLoader
activates for >50k nodes. Update thesis docs (§5.2, §6.4, §2, §8) to reflect
GraphSAGE architecture and PyG scalability.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Jacquin Antoine
2026-04-13 15:45:34 +02:00
parent c1821dcbc4
commit c6cb12981c
8 changed files with 378 additions and 264 deletions

View File

@ -206,7 +206,7 @@ Toutes les features des familles F1F6 et F8 proviennent de cette couche, agr
|-----|---------|-------------------|
| `view_ai_features_1h` | Features ML principales par session/heure | F1F7 corrélées |
| `view_thesis_features_1h` | Features temporelles avancées | F8 (Benford, entropie, autocorrélation) |
| `agg_path_sequences_1h` | Séquences de chemins visités | path_transition_entropy, path_diversity |
| `agg_path_sequences_1h` | Séquences de chemins visités | session_transformer_embedding (via http_logs bruts), path_diversity |
| `agg_request_timing_1h` | Timing inter-requêtes en ms | cadence_cv, lag1_autocorrelation, burst_ratio |
| `agg_resource_cascade_1h` | Cascade de ressources (HTML→CSS→JS→images) | root_to_first_asset_delay, asset_load_stddev |