Commit Graph

6 Commits

Author SHA1 Message Date
aa233bc55c docs(thesis): v3 — corrections + §3.9 browser_matcher + XFF proxy accuracy
User-authored updates verified and corrected:
- Correction 1: 85 features / 8 familles (was 65+/7)
- Correction 2: diagram adds MetaLearner, ExIFFI, fleet.py
- Correction 3: axis 5 weight 0.20→0.15, new axis 6 (H2 Coherence 0.05)
- Correction 4: 5 quantiles (p10-p90), p5/p95 as future work
- Correction 5: §5.6 DNS Shadow + §5.7 Compression Ratio named as future
- New §3.9 Browser Signature Detection (browser_matcher 7 dimensions)
- New §2.4.5 ExIFFI + MetaLearner + KL divergence drift
- New §2.5.3 HTTP/2 fingerprinting passif literature
- Updated §5.2 fleet.py implementation details
- Updated §5.8 cross_domain_path_similarity + Jaccard

Additional fixes (code-accuracy alignment):
- XFF proxy: 4 H2 dimensions neutralized (70% weight), not redistributed
- Module count: 12→13 (browser_signatures.py added)
- §6.5 limitations table: precise proxy weight impact (70%→30%)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-10 19:45:02 +02:00
c77d479d6c docs(thesis): 5 corrections — 85 features, MetaLearner diagram, browser axes note, quantile clarification, §5.6/5.7 named
- Correction 1 (l.65, 701): '65+ features sur 7 familles' → '85 features sur 8 familles'
- Correction 2 (l.374-378): diagramme ASCII bot_detector — ajout MetaLearner, ExIFFI, fleet.py
- Correction 3 (après l.506): note poids axe 5 réduit 0.20→0.15, axe 6 ajouté 0.05
- Correction 4 (l.279): clarification quantiles actuels 5 (p10→p90), p5/p95 = futur
- Correction 5 (l.776): §5.6 (DNS UDP/53) et §5.7 (Apache compression) nommés explicitement

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-10 13:51:31 +02:00
51dd376f7a docs: mise à jour complète — 7/8 techniques, 85 features, 12 modules
Reflète l'état réel du système après les étapes 1-9 du roadmap :

- §5.2 (fleet_detector NetworkX/Louvain) et §5.8 (Jaccard cross-domain) : 
- MetaLearner (régression logistique, fallback poids fixes) : documenté
- ExIFFI (profondeur isolation EIF) + erreur AE par feature : documenté
- KL divergence en complément du KS, drift adversarial : documenté
- HTTP/2 fingerprinting (h2_fingerprint, dict_browser_h2, axis_h2_coherence) : documenté
- Métriques de cycle (metrics.py, ml_performance_metrics, alertes) : documenté
- Browser confidence : 5 axes → 6 axes (axis_h2_coherence)
- 85 features (73 FEATURES + 12 FEATURES_COMPLET), 12 modules, 53 routes dashboard
- Conformité thèse : 99.4% (était 97.9%), §5 : 87.5% (était 62.5%)
- Tables nouvelles : fleet_detections, ml_performance_metrics, soc_feedback
- Dictionnaires : 8 (dict_browser_h2 ajouté)
- Dashboard : 16 pages + 37 API routes (fleet, health ajoutés)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-10 01:31:20 +02:00
7bdc6e2865 docs: mise à jour du document de thèse (§2-§8)
- §2.1.3: Simplifié Anubis à 2 dictionnaires (dict_anubis_ip, dict_anubis_asn) avec priorité COALESCE
- §2.4.2: Ajouté bibliothèque isotree, formule de calibration, ntrees=300, sérialisation joblib
- §2.4.2b/§2.4.4: Remplacé DBSCAN par HDBSCAN partout
- §2.4.2c: Remplacé régression logistique par pondération linéaire fixe, ajouté formule et poids
- §2.4.3: Clarifié approximation par 5 quantiles pour la détection de dérive
- §3.1: Mis à jour le diagramme ASCII (dual-database, 3×EIF+AE+XGB, HDBSCAN, 55 routes)
- §3.8: Mis à jour la trifurcation + ajouté détection multifactorielle navigateur (5 axes)
- §4: Élargi taxonomie de 51 à 65+ features sur 8 familles
- §5: Ajouté statut d'implémentation (/) à chaque technique
- §6: Ajouté §6.6 résultats de déploiement (3M+ logs, 34K sessions/cycle)
- §7: Mis à jour conclusion (65+ features, 5/8 techniques, refactorisation modulaire)
- §8: Ajouté références isotree, PyTorch, HDBSCAN, XGBoost, SHAP

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-09 21:59:34 +02:00
0d1a6a81e0 docs: update thesis with EIF, autoencoders, ensemble architecture, quantile drift
- §2.4.2: Add Extended Isolation Forest theory (Hariri et al., TKDE 2021)
- §2.4.2b: New section on autoencoders for network anomaly detection
  (Kitsune, β-VAE, hybrid AE+IF studies)
- §2.4.2c: New section on hybrid supervised+unsupervised ensembles
  (triple-voice architecture: EIF + AE + XGBoost + meta-learner)
- §2.4.3: Enhanced drift detection with quantile digest and validation gate
- §6.2: Multi-level baseline contamination mitigation
- §7: Updated conclusion reflecting ensemble architecture
- §8: 10 new references (Hariri 2021, Mirsky 2018, Baptiste 2026, etc.)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-08 02:23:00 +02:00
51b8eb57a8 feat: port v14 schema fixes, migration, MV verifier, thesis from ja4/
deploy_views.sql (v13 → v14):
- CRITICAL: ml_detected_anomalies ORDER BY (src_ip) → (src_ip, ja4, host, model_name)
  ReplacingMergeTree was collapsing all detections to 1 row per IP on merge
- Add PARTITION BY toDate + ttl_only_drop_parts on all 4 data tables
- ml_all_scores TTL 3d → 7d; ml_detected_anomalies TTL 30d → 7d
- agg_host_ip_ja4_1h + agg_header_fingerprint_1h: add partition + TTL 7d
- view_ip_recurrence: add WHERE detected_at >= now() - 7 DAY (was full scan)
- Remove dead views: summary/timeseries/threat_dist/variability
- Add view_dashboard_entities (fixes HTTP 500 in clustering/incidents/fingerprints)
- Add view_dashboard_user_agents (fixes HTTP 500 in fingerprints/metrics)
- Add view_ai_features_24h (enables ENABLE_MULTIWINDOW in bot_detector)
- Mark max_requests_per_sec as DEPRECATED (always 0)

New files:
- correlator/sql/migrations/01_ttl_adjustments.sql: ALTER TABLE migration
- tests/integration/verify_mvs.py: MV pipeline verification assertions
- docs/THESIS_HTTP_Traffic_Detection.md: detection techniques thesis

All DB references use ja4_processing/ja4_logs (no mabase_prod).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-07 23:51:56 +02:00