ja4-platform

Author	SHA1	Message	Date
toto	85d3b95b7b	feat: HTTP/2 passive fingerprinting with individual SETTINGS fields Complete implementation of HTTP/2 passive fingerprinting per thesis §2.5.3: mod-reqin-log (C module): - Replace connection-level filter with ap_hook_process_connection (APR_HOOK_FIRST) to capture H2 preface before mod_http2 takes over the connection - AP_MODE_SPECULATIVE read of 512 bytes from c->input_filters - Parse SETTINGS, WINDOW_UPDATE, PRIORITY flags, pseudo-header order - Output individual SETTINGS params as separate JSON fields (IDs 1-6, 8) - Read H2 notes from c1 (master connection) for mod_http2 secondary conns - Fix header_order_signature JSON length bug (26→strlen) ClickHouse schema: - Add 8 new columns to http_logs: h2_has_priority, h2_header_table_size, h2_enable_push, h2_max_concurrent_streams, h2_initial_window_size, h2_max_frame_size, h2_max_header_list_size, h2_enable_connect_protocol - Use Int32/Int64 with DEFAULT -1 to distinguish absent vs zero - Update mv_http_logs to extract individual fields via JSONHas/JSONExtractInt - Migration 04_http2_fields.sql updated for existing deployments Correlator: - Accept both timestamp_ns and timestamp field names (backward compat) Integration: - Enable HTTP/2 in Apache: Protocols h2 http/1.1 in httpd-integration.conf Validated end-to-end via Playwright: H2 curl traffic → mod-reqin-log → correlator → ClickHouse with all 12 H2 columns populated correctly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-11 02:33:45 +02:00
toto	bd81331411	maj these	2026-04-11 00:27:20 +02:00
toto	8da1b7d8e6	tests/integration/platform/csv-stubs/browser_h2.csv	2026-04-10 23:13:35 +02:00
toto	aa233bc55c	docs(thesis): v3 — corrections + §3.9 browser_matcher + XFF proxy accuracy User-authored updates verified and corrected: - Correction 1: 85 features / 8 familles (was 65+/7) - Correction 2: diagram adds MetaLearner, ExIFFI, fleet.py - Correction 3: axis 5 weight 0.20→0.15, new axis 6 (H2 Coherence 0.05) - Correction 4: 5 quantiles (p10-p90), p5/p95 as future work - Correction 5: §5.6 DNS Shadow + §5.7 Compression Ratio named as future - New §3.9 Browser Signature Detection (browser_matcher 7 dimensions) - New §2.4.5 ExIFFI + MetaLearner + KL divergence drift - New §2.5.3 HTTP/2 fingerprinting passif literature - Updated §5.2 fleet.py implementation details - Updated §5.8 cross_domain_path_similarity + Jaccard Additional fixes (code-accuracy alignment): - XFF proxy: 4 H2 dimensions neutralized (70% weight), not redistributed - Module count: 12→13 (browser_signatures.py added) - §6.5 limitations table: precise proxy weight impact (70%→30%) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 19:45:02 +02:00
toto	d098de1a66	fix(bot-detector): neutralize H2 dimensions behind proxy (X-Forwarded-For) When has_xff=1, the H2 connection is terminated by the reverse proxy/CDN, so client H2 fingerprints are lost. Previously only D1 (h2_settings) was neutralized; D2 (window_update), D3 (pseudo_order), and D4 (priority) still penalized proxied traffic — a real Chrome behind Cloudflare scored 0.0 on 3 dimensions (45% of total weight). Now all 4 H2 dimensions return 0.5 (neutral) when has_xff>0, and non-browser H2 detection is also disabled behind proxies. Tests: 10/10 passed including 3 new XFF-specific cases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 15:15:20 +02:00
toto	261205028d	fix(dashboard): campaigns scatter chart — show campaigns not IPs - API /api/campaigns/scatter: aggregate by campaign_id instead of per-IP Returns avg_score, avg_velocity, unique_ips, ja4_list, asn_list, country_list - Template: one bubble per campaign, sized by IP count - Tooltip: campaign-level info (IPs, score, velocity, ASNs, pays, JA4s) - Click navigates to campaign detail (not IP detail) - Updated doc panel text Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 15:09:02 +02:00
toto	fb73c60e7d	feat(dashboard): fingerprint discovery page — extract and group JA4/H2/headers from traffic - GET /api/fingerprint-discovery: queries http_logs, groups by JA4, aggregates UA family, header presence rates (Sec-CH-UA, Sec-Fetch, Accept-Language, zstd, brotli, gzip, XFF), H2 data, TLS info, dict lookups - /fingerprints page: KPIs, doughnut chart by family, stacked header bars, filterable/sortable profile table, expandable detail panel - Promote button: push H2 fingerprints to browser_h2_signatures via existing POST /api/browser-signatures/entries endpoint - Nav link: Découverte added after Navigateurs in sidebar Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 15:02:53 +02:00
toto	fde6864311	feat(dashboard): browser signatures management UI - Ajoute dict_browser_h2 dans /reflists (lecture seule via dict_browser_h2) - Nouveaux endpoints API : GET /api/browser-signatures/entries — liste browser_h2_signatures (fallback dict CSV si migration 06 non appliquée) POST /api/browser-signatures/entries — ajout fingerprint + reload dict DELETE /api/browser-signatures/entries — suppression + reload dict - Page /browsers : 2 nouvelles sections 'Base de signatures H2' — tableau des 10 fingerprints, form d'ajout, mode lecture seule automatique si migration 06 non appliquée 'Règles de scoring browser_matcher.py' — tableau statique des 7 dimensions (poids, valeurs par famille, seuils de bypass) - Integration : browser_h2.csv copié dans user_files au démarrage ClickHouse Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 14:46:07 +02:00
toto	da1b579d4f	fix(dashboard): rename duplicate /api/browsers route to /api/browser-signatures La route /api/browsers existait déjà (distribution JA4 par famille). La nouvelle route du browser_matcher était en conflit — FastAPI utilisait la première définition. Renommage en /api/browser-signatures. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 14:17:38 +02:00
toto	9c308747bd	feat(dashboard): page Browser Signature Detection (/browsers) Nouvelle page dédiée à l'analyse passive des signatures navigateur (§4) : API — GET /api/browsers : Requête view_ai_features_1h pour : - Compteurs globaux (total, sessions_with_h2, matched, mismatch %) - Distribution h2_dict_family (Chrome/Firefox/Safari/Edge) - Répartition des signaux WINDOW_UPDATE (chrome/firefox/safari/absent/autre) - Mismatch TLS↔H2 par famille JA4 (total + count + %) - Top 20 sessions suspectes (tls_h2_family_mismatch=1, triées par hits) Page /browsers : - 6 KPI header (sessions, avec H2, famille connue, taux match, mismatch, % mismatch) - Doc banner expliquant browser_matcher §4 et le mode DUAL_MODE - Donut : familles H2 (dict_browser_h2 lookup) - Bar horizontal : WINDOW_UPDATE signals par famille - Bar groupé + ligne : mismatch TLS↔H2 par famille JA4 (count + %) - Table : top 20 imposteurs potentiels avec IP cliquable, pseudo-order, cohérence - Mini-KPIs : ordres pseudo-headers Chrome/Safari, Firefox, inconnu, PRIORITY frames - Lien nav 'Navigateurs' dans le groupe Surveillance de base.html Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 14:02:39 +02:00
toto	e52cdcc01f	feat(bot-detector): Browser Signature Detection engine (parallel mode) Étape A — browser_signatures.py Données pures : BROWSER_SIGNATURES (Chrome/Firefox/Safari), NON_BROWSER_SIGNATURES (curl/httpx/go), BROWSER_THRESHOLDS, DIMENSION_WEIGHTS. Valeurs H2 extraites des captures réelles (format Akamai avec virgules, non semicolons). Étape B — browser_matcher.py Moteur vectorisé 7 dimensions (H2 SETTINGS 0.30, WINDOW_UPDATE 0.15, pseudo-header order 0.15, H2 PRIORITY 0.10, HTTP headers 0.15, TLS 0.10, JA4 dict 0.05). run_browser_matcher(df) ajoute bm_family/bm_score/bm_decision. CDN edge case : dimension H2 neutralisée (0.5) si has_xff=1. BROWSER_MATCHER_REPLACE=false par défaut (mode DUAL_MODE logging uniquement). Étape C — 06_browser_signature_detection.sql (migration) Crée browser_h2_signatures (table MergeTree avec 12 fingerprints de référence). Recrée dict_browser_h2 depuis la table avec champ confidence (remplace CSV). Étape D — 07_ai_features_view.sql +h2_wu_val dans le JOIN http_logs, +h2_window_update_value, +h2_dict_family, +h2_dict_confidence, +h2_window_{chrome,firefox,safari,absent}, +h2_order_{chromesafari,firefox}, +h2_priority_present, +h2_pseudo_ord_raw, +tls_h2_family_mismatch (détection incohérence famille JA4 vs famille H2). Étape E — preprocessing.py + pipeline.py preprocessing.py: appelle run_browser_matcher() après compute_browser_axes(), ajoute 7 nouvelles features binaires H2 à FEATURES et binary_features. pipeline.py: appelle log_dual_mode_comparison() après la classification A9. BROWSER_MATCHER_REPLACE=true active le remplacement du bypass. Étape F — test_browser_matcher.py 8 tests : Chrome/Firefox/Safari full match, curl rejeté, httpcloak partiel, TLS↔H2 mismatch, CDN proxy neutralisation, go net/http rejeté. Tous 8 PASSED (+ 36 tests existants inchangés). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 13:52:57 +02:00
toto	c77d479d6c	docs(thesis): 5 corrections — 85 features, MetaLearner diagram, browser axes note, quantile clarification, §5.6/5.7 named - Correction 1 (l.65, 701): '65+ features sur 7 familles' → '85 features sur 8 familles' - Correction 2 (l.374-378): diagramme ASCII bot_detector — ajout MetaLearner, ExIFFI, fleet.py - Correction 3 (après l.506): note poids axe 5 réduit 0.20→0.15, axe 6 ajouté 0.05 - Correction 4 (l.279): clarification quantiles actuels 5 (p10→p90), p5/p95 = futur - Correction 5 (l.776): §5.6 (DNS UDP/53) et §5.7 (Apache compression) nommés explicitement Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 13:51:31 +02:00
toto	79dbb23d6f	feat(dashboard): sélecteur de plage temporelle sur /campaigns Avant : toutes les vues de campagnes étaient fixes à 7 jours. Après : sélecteur 1j / 7j (défaut) / 14j / 30j / 90j en haut à droite. - Ajout du paramètre ?days= (1–90, défaut 7) à : GET /api/campaigns GET /api/campaigns/graph GET /api/campaigns/scatter GET /api/campaigns/{cid} - Le sélecteur recharge simultanément les 3 vues (cartes, scatter, graphe) et le panneau de détail avec la même fenêtre temporelle - Le compteur de campagnes indique la plage active : (4 campagnes — 30j) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 13:24:08 +02:00
toto	9548b1782d	fix: corriger ORDER BY ml_detected_anomalies dans le schéma de base CH 24.8 refuse MODIFY ORDER BY sur des colonnes existantes (erreur BAD_ARGUMENTS 36). La migration 01 ne pouvait donc pas corriger l'ORDER BY en post-init. Correctif : - 06_ml_tables.sql : ORDER BY (src_ip) → ORDER BY (src_ip, ja4, host, model_name) + TTL 30j → 7j (cohérent avec l'architecture documentée) - 01_ttl_adjustments.sql : supprime le MODIFY ORDER BY impossible, conserve uniquement les MODIFY TTL (valides pour les déploiements existants) Résultat : make init-stack sans aucun ⚠ ni ✗ Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 01:34:07 +02:00
toto	51dd376f7a	docs: mise à jour complète — 7/8 techniques, 85 features, 12 modules Reflète l'état réel du système après les étapes 1-9 du roadmap : - §5.2 (fleet_detector NetworkX/Louvain) et §5.8 (Jaccard cross-domain) : ✅ - MetaLearner (régression logistique, fallback poids fixes) : documenté - ExIFFI (profondeur isolation EIF) + erreur AE par feature : documenté - KL divergence en complément du KS, drift adversarial : documenté - HTTP/2 fingerprinting (h2_fingerprint, dict_browser_h2, axis_h2_coherence) : documenté - Métriques de cycle (metrics.py, ml_performance_metrics, alertes) : documenté - Browser confidence : 5 axes → 6 axes (axis_h2_coherence) - 85 features (73 FEATURES + 12 FEATURES_COMPLET), 12 modules, 53 routes dashboard - Conformité thèse : 99.4% (était 97.9%), §5 : 87.5% (était 62.5%) - Tables nouvelles : fleet_detections, ml_performance_metrics, soc_feedback - Dictionnaires : 8 (dict_browser_h2 ajouté) - Dashboard : 16 pages + 37 API routes (fleet, health ajoutés) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 01:31:20 +02:00
toto	edbb4aed2c	fix(import): add h2 columns with defaults for prod data missing 4 cols The prod data export was made before http/2 columns were added to http_logs (h2_fingerprint, h2_settings_fp, h2_window_update, h2_pseudo_order). The INSERT SELECT now provides empty/zero literals for those 4 columns so the 56-col Native export imports into the 60-col table without a column count mismatch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 01:16:36 +02:00
toto	92432085e2	fix(campaigns): fix IP navigation URL encoding fmtIP() returns an HTML <a> tag string. Using encodeURIComponent(fmtIP(ip)) was URL-encoding the entire HTML markup instead of the raw IP address, resulting in /ip/%3Ca%20href%3D... navigation. Fix: extract raw IP (stripping ::ffff: prefix) before building the URL. Applied to all 3 click handlers in campaigns.html: - members table row onclick - scatter chart point click - force graph node click Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 01:08:53 +02:00
toto	7a04e47041	fix(sql+api): fix view column mismatches and ClickHouse 24.8 JOIN issue - view_form_bruteforce_detected: add post_count, distinct_paths, first_seen, last_seen - view_host_ip_ja4_rotation: add host, distinct_ja4, ja4_list, window_start - Replace uniqExact/groupUniqArray with count()/groupArray (no nested-agg error) - api.py campaigns/graph: move a.src_ip < b.src_ip from JOIN ON to WHERE (ClickHouse 24.8 forbids cross-table inequality in JOIN ON condition) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 01:05:04 +02:00
toto	040437921c	fix(init-stack): pre-drop mv_http_logs + http_logs before schema apply Ensure h2 columns are always included on fresh init. Also add migration loop for fleet_detections and ml_performance_metrics tables. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 01:00:04 +02:00
toto	b409a70970	fix(views): align SQL views with dashboard API expected columns - view_form_bruteforce_detected: add post_count, distinct_paths, first_seen, last_seen - view_host_ip_ja4_rotation: add host, distinct_ja4, ja4_list, window_start - view_ip_recurrence: add worst_threat alias + top_ja4, top_host columns All three views were missing columns referenced by /api/brute-force, /api/ja4-rotation and /api/recurrence endpoints, causing 500 errors on the Tactiques page. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 00:59:57 +02:00
toto	2f2c5e03bb	fix(sql): contournement bug scope ClickHouse 24.8 dans view_ai_features_1h - Restructure 07_ai_features_view.sql : single anonymous inner subquery avec aliases explicites sur toutes les colonnes (a.xxx AS xxx, h.xxx AS xxx, h2.xxx AS xxx) pour résoudre l'ambiguïté PARTITION BY src_ip dans l'outer SELECT - Supprime les CTEs multiples (h2_agg, enriched) qui déclenchaient le bug - Fix migration 04_http2_fields.sql : ordre DEFAULT avant CODEC (syntax ClickHouse) - make init-stack : 0 erreur sur 13 fichiers SQL Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 00:48:05 +02:00
toto	a108814a56	feat: roadmap détection bots §2-9 — HTTP/2, cohérence, drift, flotte, Jaccard, ExIFFI, méta-learner, métriques Étape 2 — Fingerprinting HTTP/2 dans le pipeline ML : - Ajout du dictionnaire dict_browser_h2 (11 familles de navigateurs) dans 05_aggregation_tables.sql - Ajout du CTE h2_agg et 4 features HTTP/2 dans 07_ai_features_view.sql : h2_settings_known, h2_pseudo_order_match, h2_ja4_coherence, h2_settings_rare - Calcul du fingerprint_coherence_score (5 axes pondérés) dans la vue - Ajout du 6e axe axis_h2_coherence dans browser.py (poids rééquilibrés) - browser_h2.csv : 11 fingerprints Akamai → famille navigateur Étape 3 — Pré-filtre de cohérence sur la baseline humaine : - pipeline.py exclut les sessions avec fingerprint_coherence_score < seuil de la baseline d'entraînement - FINGERPRINT_COHERENCE_THRESHOLD configurable via env (défaut 0.25) - Log des sessions exclues pour analyse SOC Étape 4 — Détection de drift améliorée : - scoring.py : passage de 5 à 9 quantiles (p5…p95) - Ajout de la divergence KL en complément du test KS - Détection de drift adversarial (≥80% des features dérivent dans la même direction) - Split temporel strict pour la validation Étape 5 — Graphe bipartite JA4×ASN (§5.2) : - fleet.py : détection de flottes via NetworkX + Louvain (imports optionnels) - enrich_with_fleet_score() : ajout fleet_score + fleet_campaign_flag au DataFrame - cycle.py : appel après preprocess_df avec log du nombre de sessions en flotte - SQL migration 05_fleet_metrics_tables.sql : table fleet_detections (TTL 7j) - Dashboard : /fleet + /api/fleet (communautés détectées) + template fleet.html Étape 6 — Cross-domain Jaccard §5.8 : - 12_thesis_features.sql : CTE jaccard_paths → cross_domain_path_similarity - Signal : même chemins (/admin, /wp-login) sur plusieurs hosts = scanner Étape 7 — ExIFFI + erreurs AE par feature : - scoring.py : compute_exiffi_importance() par permutation, compute_ae_feature_errors() - pipeline.py : calcul ExIFFI sur X_test, mapping index → dict pour anomalies - build_reason() enrichi avec exiffi_top quand SHAP inactif Étape 8 — Méta-learner pour la pondération de l'ensemble : - scoring.py : classe MetaLearner (LogisticRegression, fallback poids fixes <1000 labels) - Collecte des labels depuis le cycle courant (known_bots, légitimes, Anubis) - pipeline.py : remplacement des poids fixes par MetaLearner.predict() Étape 9 — Métriques de performance et monitoring : - metrics.py : record_cycle_metrics() — taux anomalie, drift, corrélation, latence - SQL migration 05_fleet_metrics_tables.sql : table ml_performance_metrics (TTL 90j) - Dashboard : /health + /api/health + template health.html - cycle.py : appel record_cycle_metrics en fin de cycle (Complet + Applicatif) Tests : 36/36 bot-detector tests passent Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 00:11:35 +02:00
toto	8ca4a1e849	feat(mod_reqin_log): fingerprinting HTTP/2 passif (Akamai format) Ajoute un filtre d'entrée de connexion (AP_FTYPE_CONNECTION, APR_HOOK_LAST) qui s'insère entre mod_ssl et mod_http2 pour lire de manière non-destructive le preface HTTP/2 (RFC 9113 §3.4) et en extraire : - h2_fingerprint : fingerprint Akamai complet ex. '1:65536,2:0,4:6291456,6:262144\|15663105\|0\|m,a,s,p' - h2_settings_fp : entrées SETTINGS brutes (ex. '1:65536,4:6291456') - h2_window_update : incrément WINDOW_UPDATE (ex. '15663105') - h2_pseudo_order : ordre des pseudo-headers (ex. 'm,a,s,p' Chrome, 'm,p,s,a' Firefox) Technique : lecture spéculative AP_MODE_SPECULATIVE (non-destructive) de 512 octets — la donnée reste disponible pour mod_http2. Le filtre se retire de la chaîne après la première invocation. Stockage dans c->notes (H2_NOTE_*) puis émission JSON dans log_request(). ClickHouse : 4 nouvelles colonnes dans http_logs + JSONExtract dans mv_http_logs. Migration pour déploiements existants : 04_http2_fields.sql. 14 tests unitaires (cmocka) couvrent Chrome/Firefox/HTTP1/troncature/HPACK. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 23:46:50 +02:00
toto	bc11cfa8eb	fix: init-stack rock-solid — drop/recreate derived tables before views Root cause: CREATE TABLE IF NOT EXISTS is a no-op on existing tables, so stale schemas miss new columns. Views (07+) then fail with UNKNOWN_IDENTIFIER errors. Fix: split SQL execution into 3 phases: Phase 1: databases, raw tables, dictionaries (00-04) Phase 2: DROP all derived tables (agg_, ml_) — safe, repopulated by MVs Phase 3: recreate derived tables + views with full current schema (05-12) This removes the incomplete inline migrations and makes the script truly idempotent regardless of prior schema version. Tested: fresh --reset, existing stale DB, idempotent re-run. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 23:21:15 +02:00
toto	895d7894a9	docs: mise à jour copilot-instructions.md - bot-detector : monolithe → 10 modules - Ajout convention browser detection sans UA (5 axes, Client Hints) - Ajout targets Makefile : init-stack, import-prod-data, purge-db, help - Anubis : simplifié IP/CIDR + ASN (suppression dict_anubis_ua / REGEXP_TREE) - Tests bot-detector : clarification imports lourds Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 23:11:24 +02:00
toto	14db3d9040	refactor: suppression dépendance User-Agent de la détection navigateur Changements SQL : - modern_browser_score : sec-ch-ua→100, Sec-Fetch→70 (plus de UA fallback) - Ajout has_sec_ch_ua (UInt8) dans agg_header_fingerprint_1h et ml_all_scores - mss_mobile_mismatch utilise has_sec_ch_ua au lieu de modern_browser_score - header_order_confidence : PARTITION BY ja4 au lieu de first_ua - sec_ch_mobile_mismatch : comparaison Client Hints interne (sans UA) - Migration 03_remove_ua_browser_detection.sql Changements Python : - browser.py Axe 3 : Client Hints + Sec-Fetch + is_fake_navigation (PAS de UA) - Pondération axes : ja4_known 0.30, tls_coherence 0.20 (signaux TLS renforcés) - preprocessing.py : has_sec_ch_ua ajouté aux features et binary_features Fichiers modifiés : 8 SQL/Python + 1 migration, 36/36 tests passent. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 23:06:01 +02:00
toto	00e99e5464	fix(bot-detector): make scoring functions public (remove underscore prefix) compute_shap_top_features, build_reason, cluster_anomalies renamed from private (_prefixed) to public to match pipeline.py imports. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 22:49:48 +02:00
toto	629f7b334d	fix(bot-detector): rename _compute_drift_score to public, fix import Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 22:48:21 +02:00
toto	de6d8da931	fix(bot-detector): FEATURES_BASE → FEATURES import name mismatch Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 22:42:32 +02:00
toto	1fa6aec784	fix: SQL view ordering, purge-db flag, ctest directory - 12_thesis_features.sql: move view_resource_cascade_1h before view_thesis_features_1h - Makefile: purge-db uses --reset (not --clean) - mod-reqin-log: ctest --test-dir build/tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 22:39:25 +02:00
toto	6d64c2a8a8	fix(rpm): add systemd-rpm-macros to Dockerfile.package, fix correlator spec_version - sentinel/correlator: install systemd-rpm-macros in rpm-builder stage - correlator: use build_version macro (not version) to avoid recursive expansion - mod-reqin-log: fix ctest --test-dir to find tests in build/tests/ Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 22:33:53 +02:00
toto	ea488c0b11	feat: add make help with all targets documented Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 22:22:25 +02:00
toto	0ba66729da	feat: add make purge-db target for full database reset Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 22:21:15 +02:00
toto	6b3cc54652	docs: réécriture audit, DOCUMENTATION.md et IMPROVEMENTS.md pour architecture modulaire - AUDIT: conformité mise à jour 97.9% (142/145), références modulaires - DOCUMENTATION.md: 1083 lignes, 7 sections, 11 modules documentés - IMPROVEMENTS.md: A1-A10/B1-B10 annotés ✅/🔄/❌ avec localisations Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 22:14:18 +02:00
toto	c96c41fb45	docs: réécriture complète de la documentation des services en français - bot-detector.md : architecture 11 modules, 77/65 features, ensemble triple voix (EIF+AE+XGBoost), browser 5 axes, HDBSCAN, toutes les variables d'environnement vérifiées depuis le code source - dashboard.md : corrigé stack (Jinja2+htmx, pas React+Vite), 14 pages + 35 API routes + health, dual-database, IPv4/IPv6 - python-ja4common.md : ajouté CLICKHOUSE_DB_PROCESSING/LOGS, schéma dual-database, note dashboard n'utilise pas ja4_common Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 22:04:58 +02:00
toto	8f5e771096	docs: réécriture complète de la documentation base de données en français Réécriture des 3 fichiers de documentation de la base de données ClickHouse : - docs/database/schema.md : couverture complète des 2 bases, 14+ tables, 7 dictionnaires, 8 MVs, 8 vues, TTL, partitions, moteurs et colonnes - docs/database/migrations.md : 13 fichiers SQL (ajout 10-12), prérequis mis à jour (ClickHouse 24.8+, 5 CSV), deploy_schema.sh, init-stack.sh, vérification et rollback complets - shared/clickhouse/README.md : référence rapide des 13 fichiers, deploy_schema.sh, patron double-base, prérequis Suppression des références obsolètes : dict_anubis_ua, dict_anubis_country, anubis_ua_rules, anubis_country_rules. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 22:03:37 +02:00
toto	d05969867f	docs: rewrite architecture/README, update deployment/development - architecture.md: complete rewrite (French) with dual-database diagram, 5-phase data flow, full table ownership, triple-voice ML pipeline, 7 dictionaries, 13 SQL files, updated tech stack - README.md: complete rewrite (English) with updated pipeline diagram, services table, scripts section, integration tests, full doc index, Go 1.24.6 workspace - deployment.md: update to 13 SQL files, remove Anubis UA/Country refs, add scripts section, add ensemble env vars (AE_WEIGHT, XGB_WEIGHT), update verification queries and network diagram - development.md: translate to French, add bot-detector 11-module structure, add Python ML deps, add scripts/integration test sections, fix bot-detector run command, add make targets Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 22:00:29 +02:00
toto	7bdc6e2865	docs: mise à jour du document de thèse (§2-§8) - §2.1.3: Simplifié Anubis à 2 dictionnaires (dict_anubis_ip, dict_anubis_asn) avec priorité COALESCE - §2.4.2: Ajouté bibliothèque isotree, formule de calibration, ntrees=300, sérialisation joblib - §2.4.2b/§2.4.4: Remplacé DBSCAN par HDBSCAN partout - §2.4.2c: Remplacé régression logistique par pondération linéaire fixe, ajouté formule et poids - §2.4.3: Clarifié approximation par 5 quantiles pour la détection de dérive - §3.1: Mis à jour le diagramme ASCII (dual-database, 3×EIF+AE+XGB, HDBSCAN, 55 routes) - §3.8: Mis à jour la trifurcation + ajouté détection multifactorielle navigateur (5 axes) - §4: Élargi taxonomie de 51 à 65+ features sur 8 familles - §5: Ajouté statut d'implémentation (✅/❌) à chaque technique - §6: Ajouté §6.6 résultats de déploiement (3M+ logs, 34K sessions/cycle) - §7: Mis à jour conclusion (65+ features, 5/8 techniques, refactorisation modulaire) - §8: Ajouté références isotree, PyTorch, HDBSCAN, XGBoost, SHAP Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 21:59:34 +02:00
toto	9ea36ad22e	feat(scripts): complete stack init + prod data import with date shift Schema cleanup: - Remove anubis_ua_rules table stub from 03_anubis_tables.sql - Remove anubis_ua_rules from bot-detector deploy_schema.sql - Remove UA seed step from clickhouse-init.sh (no more REGEXP_TREE dependency) - Drop dict_anubis_ua, dict_anubis_country, anubis_ua_rules, anubis_country_rules New scripts: - scripts/init-stack.sh: comprehensive ClickHouse init (13 SQL files + migrations + validation + cleanup of obsolete tables). Supports --reset, --import-prod. - scripts/import-prod-data.sh: imports pre-exported prod data (Native format) with dynamic date shift (max(time) → now). Supports --shift, --no-truncate. - scripts/data/prod-export/: directory for cached Native format exports Makefile targets: init-stack, import-prod-data, init-and-import Tested: init-stack.sh passes all 13 SQL + 7 critical tables + 7 dicts import-prod-data.sh: 3M rows in ~37s with auto date shift Dashboard: 55 routes OK, bot-detector: 36/36 tests pass Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 21:40:05 +02:00
toto	d8ca804a55	feat(scripts): add reload-prod-logs.sh for prod→dev data sync Exports http_logs from prod ClickHouse via HTTP API, imports into dev with dynamic date shifting (max(time) → now() by default). Features: - Batch export in Native format (200K rows/batch, ~10s each) - Auto date shift: prod max(time) aligned to current time - --shift N: manual override (seconds) - --days N: filter to last N days only - --cron: silent mode for scheduled runs - Staging table approach: export → staging → INSERT SELECT with shift → cleanup Tested: 3,054,122 rows imported in ~3 minutes, dates 2026-04-03→2026-04-09. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 15:41:38 +02:00
toto	8180f4af04	refactor(anubis): simplify to IP/CIDR + ASN only, remove UA and Country rules - Remove UA regex extraction (extract_ua_regex, _extract_ua_from_all/any) - Remove Country rule collection from parse_bot_policies_inline - Simplify fetch_rules.py: collect_all_rules returns (ip_rules, asn_rules) - Remove insert_ua_rules and insert_country_rules functions - reload_dicts now only reloads dict_anubis_ip + dict_anubis_asn - Simplify CASE blocks in 04_mv_http_logs.sql, 07_ai_features_view.sql, view_ai_features_anubis.sql, mv_http_logs.sql: IP > ASN (was 5-level UA+IP > UA > IP > ASN > Country cascade) - Remove dict_anubis_country + dict_anubis_ua from 03_anubis_tables.sql (UA table kept as stub for REGEXP_TREE catch-all compatibility) - Remove anubis_country_rules table from schema - Remove Anubis UA and Country tabs from dashboard reflists page - Remove anubis_ua_rules/country_rules from API reflist queries - deploy_schema.sql simplified from 339 to 122 lines - 764 lines removed across 9 files Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 15:25:33 +02:00
toto	98abbc80c7	feat(dashboard): page Listes de référence — visualisation CSV/dictionnaires Nouvelle page /reflists pour visualiser les 9 dictionnaires ClickHouse : - bot_ip (3.5K entrées) : IP/CIDR de bots connus - bot_ja4 (31) : fingerprints JA4 de bots - browser_ja4 (1.2K) : fingerprints JA4 navigateurs → famille, lib TLS - asn_reputation (82.5K) : ASN → réputation (isp, datacenter, cdn…) - iplocate_asn (714K) : géolocalisation IP → ASN, pays, nom - anubis_ua_rules, anubis_ip_rules, anubis_asn_rules, anubis_country_rules Fonctionnalités : - 9 onglets de navigation entre les listes - Recherche textuelle avec filtrage côté ClickHouse - Pagination (200 entrées/page) - Tri par colonne (ASC/DESC) - Graphique de répartition (ECharts) par catégorie - KPIs dictionnaires en haut de page - Infobulles de documentation API : /api/dictionaries, /api/reflist/{name}, /api/reflist/{name}/stats Helpers : esc() (HTML escape) ajouté à base.html Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 14:56:54 +02:00
toto	039086a0b3	feat: nouvelles techniques de détection et page tactiques SOC SQL: - Ajout 5 colonnes d'agrégation (count_xff, count_unusual_ct, count_non_std_port, count_login_post, sec_ch_mobile_mismatch) - Exposition de 5 features calculées dans view_ai_features_1h - Migration ALTER TABLE pour déploiements existants Bot-detector: - 7 nouvelles features ML (has_xff, unusual_content_type_ratio, non_standard_port_ratio, login_post_concentration, sec_ch_mobile_mismatch, true_window_size, window_mss_ratio) - Propagation campaign_id vers ml_all_scores (était toujours -1) - Escalade campagne : HIGH→CRITICAL si cluster ≥5 membres Dashboard: - Page Tactiques SOC : brute-force, rotation JA4, récurrence, alertes temps réel — 4 KPIs + 4 panneaux + infobulles doc - Ajout fmtDate() helper global - Navigation sidebar mise à jour Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 14:29:18 +02:00
toto	702c0d5edb	feat(dashboard): add JA4 fingerprint and cluster investigation pages - /ja4/{fingerprint} page: 8 KPIs, timeline, threat pie, IP scores table, ASN/geo charts, HTTP logs, AI features — full JA4 investigation - /cluster/{cid} page: 8 KPIs, timeline, threat/JA4/ASN/host charts, member table with bulk classify — full campaign investigation - /api/ja4/{fingerprint} and /api/cluster/{cid} API endpoints - fmtJA4 links now navigate to /ja4/ investigation page - campaigns.html: 'Ouvrir' button links to /cluster/{cid} full page - Fix: double-brace {{param}} in non-f-string queries → single {param} (was causing HTTP 500 on all parameterized ClickHouse queries) - 50 routes total, all tests pass, 0 JS console errors Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 14:05:52 +02:00
toto	70188b508c	fix(dashboard): eliminate @apply CSS, fix status column, fix click propagation Playwright testing revealed 3 critical bugs: 1. Tailwind CDN @apply with custom brand-* colors produces empty CSS rules, breaking ALL design components (kpi-card, data-table, badges, filter-btn, section-card, nav-item). Fix: replace all @apply directives with equivalent raw CSS values. 2. Traffic API and IP detail API reference non-existent 'status' column in http_logs table → HTTP 500 on /traffic and /ip/{ip}. Fix: remove status from SELECT, sort whitelist, filters, and templates. 3. Nested <a> links (fmtJA4, fmtASN, fmtCountry, fmtBotName) inside clickable <tr onclick> capture clicks, preventing row navigation to /ip/ detail. Fix: add event.stopPropagation() to all formatter links. Verified with Playwright: 10 pages × 0 JS errors, all tooltips hidden by default, sidebar toggle works, keyboard shortcuts (Alt+1-9, Alt+B), classification form saves to DB, campaign detail panel opens on click. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 13:54:38 +02:00
toto	6babc55e3e	fix(dashboard): hover infobulles, full-width layout, UX polish - Fix doc tooltips: split CSS into <style type='text/tailwindcss'> for @apply directives + raw CSS for reliable doc panel rendering - Convert doc panels from click-toggle to hover-based infobulles with arrow pointer, fade-in animation, and auto-dismiss on mobile - Replace '?' icons with 'ⓘ' across all 11 templates (51 tooltips) - Full-width layout: reduce padding on mobile (px-3), scale up on desktop (lg:px-5, xl:px-6) for maximum screen utilization - Auto-collapse sidebar on narrow screens (<1024px) - Keyboard shortcuts: Alt+1–9 for page navigation, Alt+B toggle sidebar - Add LEGITIMATE_BROWSER filter button to detections page - Sticky header with stronger blur (backdrop-blur-md) - All 46 routes pass tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 13:30:16 +02:00
toto	63ba6d203c	feat(dashboard): complete SOC dashboard with full monitoring and workflows - models.html: Full rewrite — 6 KPIs, scoring volume timeline, anomaly rate chart, threat breakdown per model, enhanced model cards with validation gate - classify.html: SOC workflow — suggested unclassified IPs, quick-classify buttons, classification stats pie, pre-fill from URL params - traffic.html: Clickable rows → ip_detail, column sorting, status column, search filter, doc tooltips on all chart sections - scores.html: Search input, clickable rows → ip_detail, LEGITIMATE_BROWSER filter button, doc tooltips on distribution + scatter charts - ip_detail.html: Resource cascade section (headless browser detection), status column in HTTP logs table - detections.html: Doc tooltips on threat/reason/ASN chart sections - features.html: Doc tooltips on radar/importance/scatter sections - api.py: 4 new endpoints — /api/models/timeline, /api/models/threats, /api/classify/stats, /api/classify/suggested. Traffic API: status + search. 46 routes total. All tests pass (dashboard + bot-detector 36/36). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 01:25:01 +02:00
toto	396baa90d2	feat(dashboard): visualisation clusters HDBSCAN - Page /campaigns dédiée avec 4 vues graphiques : · Scatter plot (score vs vélocité, bulles colorées par campagne) · Graphe réseau force-directed (IPs liées par JA4 partagé) · Grille de cartes campagne (KPIs, ASN, pays, JA4) · Panneau détail (radar comportemental, timeline horaire, table membres) - 4 nouveaux endpoints API : · GET /api/campaigns (fix: campaign_id >= 0 au lieu de != '') · GET /api/campaigns/graph (nœuds + arêtes) · GET /api/campaigns/scatter (score/vélocité par IP) · GET /api/campaigns/{cid} (détail + profil + timeline) - Sidebar: lien Campagnes ajouté dans Surveillance - Overview: campagnes clickables → lien vers /campaigns Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 01:11:16 +02:00
toto	f1547423b5	refactor(bot-detector): suppression monolithe, tests multifactoriels - Suppression de bot_detector.py (1982 lignes) remplacé par 11 modules - Tests navigateur mis à jour pour le système multifactoriel (browser_confidence) - 36/36 tests passent avec la nouvelle structure modulaire Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 01:03:17 +02:00
toto	1f103392ac	refactor(bot-detector): extract monolith into modular package Split bot_detector.py (~1982 lines) into 10 focused modules: - config.py: all configuration constants and optional imports - log.py: logging utilities (log_info, log_decision, append_training_history) - infra.py: ClickHouse client, health check HTTP server, shutdown - browser.py: multifactorial browser identification (5 axes) - scoring.py: drift detection, feature validation, SHAP, clustering - models.py: EIF, Autoencoder, XGBoost model management - preprocessing.py: data preprocessing and feature list definitions - pipeline.py: core semi-supervised scoring loop - cycle.py: main analysis cycle orchestration - __main__.py: entry point with startup banner Update Dockerfile to copy package directory and use python -m bot_detector. All 36 existing tests pass unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 01:02:04 +02:00

1 2 3

145 Commits