ja4-platform

Author	SHA1	Message	Date
toto	a1e4c1dad5	feat: add ja4ebpf service — eBPF-based TLS/TCP fingerprinting daemon - TC ingress hook captures TCP SYN (L3/L4) and TLS ClientHello - Uprobes on SSL_read/SSL_set_fd capture decrypted TLS data - Kprobes on accept4 correlate socket FDs to client IP:port - JA4 fingerprint computed from parsed TLS ClientHello - HTTP/2 SETTINGS and WINDOW_UPDATE extracted from decrypted streams - Session manager with sharded map (256 shards) and GC goroutine - Slowloris detection: sessions with no requests after 10s threshold - ClickHouse batch writer to ja4_logs.http_logs_raw (raw_json) - All tests pass: 17 parser + 10 correlation tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-11 22:43:26 +02:00
toto	7eb3ad21fd	feat(dashboard): afficher SETTINGS H2 individuels dans la table mismatch - /api/browser-signatures : top_mismatches inclut désormais les 7 colonnes SETTINGS individuelles (h2_header_table_size, h2_enable_push, h2_max_concurrent_streams, h2_initial_window_size, h2_max_frame_size, h2_max_header_list_size, h2_enable_connect_protocol) - stats : ajout sessions_with_priority (countIf h2_priority_present > 0) - browsers.html : colonne SETTINGS compact dans la table suspects (format '3:100, 4:65536, 2:0' — IDs Akamai avec valeurs non-nulles) - Compteur pseudo-priority utilise la vraie valeur sessions_with_priority au lieu d'afficher '—' Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-11 03:11:17 +02:00
toto	f704541f83	feat(h2): direct per-parameter SETTINGS comparison in browser_matcher - Rewrote _d1_h2_settings() with 3-signal weighted formula: direct_score×0.60 + dict_match×0.30 + ja4_coherence×0.10 when individual SETTINGS cols are available in the DataFrame - Added _H2_SETTINGS_COLS dict (IDs 1,2,3,4,5,6,8 → column names) - Fallback to dict_match×0.80 + ja4_coherence×0.20 for backward compat - Fix view_ai_features_1h: pass 7 individual SETTINGS columns through base_data CTE (h2_header_table_size, h2_enable_push, h2_max_concurrent_streams, h2_initial_window_size, h2_max_frame_size, h2_max_header_list_size, h2_enable_connect_protocol) - Remove non-existent h2_dict_confidence reference from view SQL (dict_browser_h2 only exposes browser_family attribute) - Add 7 new pytest cases: exact match, one wrong setting, forbidden key penalty, unknown fingerprint with correct settings, fallback path, CDN proxy neutralisation, full Chrome simulation - 53/53 bot-detector tests pass - Update thesis §3.9.2: document direct comparison algorithm + fallback Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-11 03:05:36 +02:00
toto	85d3b95b7b	feat: HTTP/2 passive fingerprinting with individual SETTINGS fields Complete implementation of HTTP/2 passive fingerprinting per thesis §2.5.3: mod-reqin-log (C module): - Replace connection-level filter with ap_hook_process_connection (APR_HOOK_FIRST) to capture H2 preface before mod_http2 takes over the connection - AP_MODE_SPECULATIVE read of 512 bytes from c->input_filters - Parse SETTINGS, WINDOW_UPDATE, PRIORITY flags, pseudo-header order - Output individual SETTINGS params as separate JSON fields (IDs 1-6, 8) - Read H2 notes from c1 (master connection) for mod_http2 secondary conns - Fix header_order_signature JSON length bug (26→strlen) ClickHouse schema: - Add 8 new columns to http_logs: h2_has_priority, h2_header_table_size, h2_enable_push, h2_max_concurrent_streams, h2_initial_window_size, h2_max_frame_size, h2_max_header_list_size, h2_enable_connect_protocol - Use Int32/Int64 with DEFAULT -1 to distinguish absent vs zero - Update mv_http_logs to extract individual fields via JSONHas/JSONExtractInt - Migration 04_http2_fields.sql updated for existing deployments Correlator: - Accept both timestamp_ns and timestamp field names (backward compat) Integration: - Enable HTTP/2 in Apache: Protocols h2 http/1.1 in httpd-integration.conf Validated end-to-end via Playwright: H2 curl traffic → mod-reqin-log → correlator → ClickHouse with all 12 H2 columns populated correctly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-11 02:33:45 +02:00
toto	d098de1a66	fix(bot-detector): neutralize H2 dimensions behind proxy (X-Forwarded-For) When has_xff=1, the H2 connection is terminated by the reverse proxy/CDN, so client H2 fingerprints are lost. Previously only D1 (h2_settings) was neutralized; D2 (window_update), D3 (pseudo_order), and D4 (priority) still penalized proxied traffic — a real Chrome behind Cloudflare scored 0.0 on 3 dimensions (45% of total weight). Now all 4 H2 dimensions return 0.5 (neutral) when has_xff>0, and non-browser H2 detection is also disabled behind proxies. Tests: 10/10 passed including 3 new XFF-specific cases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 15:15:20 +02:00
toto	261205028d	fix(dashboard): campaigns scatter chart — show campaigns not IPs - API /api/campaigns/scatter: aggregate by campaign_id instead of per-IP Returns avg_score, avg_velocity, unique_ips, ja4_list, asn_list, country_list - Template: one bubble per campaign, sized by IP count - Tooltip: campaign-level info (IPs, score, velocity, ASNs, pays, JA4s) - Click navigates to campaign detail (not IP detail) - Updated doc panel text Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 15:09:02 +02:00
toto	fb73c60e7d	feat(dashboard): fingerprint discovery page — extract and group JA4/H2/headers from traffic - GET /api/fingerprint-discovery: queries http_logs, groups by JA4, aggregates UA family, header presence rates (Sec-CH-UA, Sec-Fetch, Accept-Language, zstd, brotli, gzip, XFF), H2 data, TLS info, dict lookups - /fingerprints page: KPIs, doughnut chart by family, stacked header bars, filterable/sortable profile table, expandable detail panel - Promote button: push H2 fingerprints to browser_h2_signatures via existing POST /api/browser-signatures/entries endpoint - Nav link: Découverte added after Navigateurs in sidebar Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 15:02:53 +02:00
toto	fde6864311	feat(dashboard): browser signatures management UI - Ajoute dict_browser_h2 dans /reflists (lecture seule via dict_browser_h2) - Nouveaux endpoints API : GET /api/browser-signatures/entries — liste browser_h2_signatures (fallback dict CSV si migration 06 non appliquée) POST /api/browser-signatures/entries — ajout fingerprint + reload dict DELETE /api/browser-signatures/entries — suppression + reload dict - Page /browsers : 2 nouvelles sections 'Base de signatures H2' — tableau des 10 fingerprints, form d'ajout, mode lecture seule automatique si migration 06 non appliquée 'Règles de scoring browser_matcher.py' — tableau statique des 7 dimensions (poids, valeurs par famille, seuils de bypass) - Integration : browser_h2.csv copié dans user_files au démarrage ClickHouse Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 14:46:07 +02:00
toto	da1b579d4f	fix(dashboard): rename duplicate /api/browsers route to /api/browser-signatures La route /api/browsers existait déjà (distribution JA4 par famille). La nouvelle route du browser_matcher était en conflit — FastAPI utilisait la première définition. Renommage en /api/browser-signatures. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 14:17:38 +02:00
toto	9c308747bd	feat(dashboard): page Browser Signature Detection (/browsers) Nouvelle page dédiée à l'analyse passive des signatures navigateur (§4) : API — GET /api/browsers : Requête view_ai_features_1h pour : - Compteurs globaux (total, sessions_with_h2, matched, mismatch %) - Distribution h2_dict_family (Chrome/Firefox/Safari/Edge) - Répartition des signaux WINDOW_UPDATE (chrome/firefox/safari/absent/autre) - Mismatch TLS↔H2 par famille JA4 (total + count + %) - Top 20 sessions suspectes (tls_h2_family_mismatch=1, triées par hits) Page /browsers : - 6 KPI header (sessions, avec H2, famille connue, taux match, mismatch, % mismatch) - Doc banner expliquant browser_matcher §4 et le mode DUAL_MODE - Donut : familles H2 (dict_browser_h2 lookup) - Bar horizontal : WINDOW_UPDATE signals par famille - Bar groupé + ligne : mismatch TLS↔H2 par famille JA4 (count + %) - Table : top 20 imposteurs potentiels avec IP cliquable, pseudo-order, cohérence - Mini-KPIs : ordres pseudo-headers Chrome/Safari, Firefox, inconnu, PRIORITY frames - Lien nav 'Navigateurs' dans le groupe Surveillance de base.html Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 14:02:39 +02:00
toto	e52cdcc01f	feat(bot-detector): Browser Signature Detection engine (parallel mode) Étape A — browser_signatures.py Données pures : BROWSER_SIGNATURES (Chrome/Firefox/Safari), NON_BROWSER_SIGNATURES (curl/httpx/go), BROWSER_THRESHOLDS, DIMENSION_WEIGHTS. Valeurs H2 extraites des captures réelles (format Akamai avec virgules, non semicolons). Étape B — browser_matcher.py Moteur vectorisé 7 dimensions (H2 SETTINGS 0.30, WINDOW_UPDATE 0.15, pseudo-header order 0.15, H2 PRIORITY 0.10, HTTP headers 0.15, TLS 0.10, JA4 dict 0.05). run_browser_matcher(df) ajoute bm_family/bm_score/bm_decision. CDN edge case : dimension H2 neutralisée (0.5) si has_xff=1. BROWSER_MATCHER_REPLACE=false par défaut (mode DUAL_MODE logging uniquement). Étape C — 06_browser_signature_detection.sql (migration) Crée browser_h2_signatures (table MergeTree avec 12 fingerprints de référence). Recrée dict_browser_h2 depuis la table avec champ confidence (remplace CSV). Étape D — 07_ai_features_view.sql +h2_wu_val dans le JOIN http_logs, +h2_window_update_value, +h2_dict_family, +h2_dict_confidence, +h2_window_{chrome,firefox,safari,absent}, +h2_order_{chromesafari,firefox}, +h2_priority_present, +h2_pseudo_ord_raw, +tls_h2_family_mismatch (détection incohérence famille JA4 vs famille H2). Étape E — preprocessing.py + pipeline.py preprocessing.py: appelle run_browser_matcher() après compute_browser_axes(), ajoute 7 nouvelles features binaires H2 à FEATURES et binary_features. pipeline.py: appelle log_dual_mode_comparison() après la classification A9. BROWSER_MATCHER_REPLACE=true active le remplacement du bypass. Étape F — test_browser_matcher.py 8 tests : Chrome/Firefox/Safari full match, curl rejeté, httpcloak partiel, TLS↔H2 mismatch, CDN proxy neutralisation, go net/http rejeté. Tous 8 PASSED (+ 36 tests existants inchangés). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 13:52:57 +02:00
toto	79dbb23d6f	feat(dashboard): sélecteur de plage temporelle sur /campaigns Avant : toutes les vues de campagnes étaient fixes à 7 jours. Après : sélecteur 1j / 7j (défaut) / 14j / 30j / 90j en haut à droite. - Ajout du paramètre ?days= (1–90, défaut 7) à : GET /api/campaigns GET /api/campaigns/graph GET /api/campaigns/scatter GET /api/campaigns/{cid} - Le sélecteur recharge simultanément les 3 vues (cartes, scatter, graphe) et le panneau de détail avec la même fenêtre temporelle - Le compteur de campagnes indique la plage active : (4 campagnes — 30j) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 13:24:08 +02:00
toto	9548b1782d	fix: corriger ORDER BY ml_detected_anomalies dans le schéma de base CH 24.8 refuse MODIFY ORDER BY sur des colonnes existantes (erreur BAD_ARGUMENTS 36). La migration 01 ne pouvait donc pas corriger l'ORDER BY en post-init. Correctif : - 06_ml_tables.sql : ORDER BY (src_ip) → ORDER BY (src_ip, ja4, host, model_name) + TTL 30j → 7j (cohérent avec l'architecture documentée) - 01_ttl_adjustments.sql : supprime le MODIFY ORDER BY impossible, conserve uniquement les MODIFY TTL (valides pour les déploiements existants) Résultat : make init-stack sans aucun ⚠ ni ✗ Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 01:34:07 +02:00
toto	92432085e2	fix(campaigns): fix IP navigation URL encoding fmtIP() returns an HTML <a> tag string. Using encodeURIComponent(fmtIP(ip)) was URL-encoding the entire HTML markup instead of the raw IP address, resulting in /ip/%3Ca%20href%3D... navigation. Fix: extract raw IP (stripping ::ffff: prefix) before building the URL. Applied to all 3 click handlers in campaigns.html: - members table row onclick - scatter chart point click - force graph node click Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 01:08:53 +02:00
toto	7a04e47041	fix(sql+api): fix view column mismatches and ClickHouse 24.8 JOIN issue - view_form_bruteforce_detected: add post_count, distinct_paths, first_seen, last_seen - view_host_ip_ja4_rotation: add host, distinct_ja4, ja4_list, window_start - Replace uniqExact/groupUniqArray with count()/groupArray (no nested-agg error) - api.py campaigns/graph: move a.src_ip < b.src_ip from JOIN ON to WHERE (ClickHouse 24.8 forbids cross-table inequality in JOIN ON condition) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 01:05:04 +02:00
toto	2f2c5e03bb	fix(sql): contournement bug scope ClickHouse 24.8 dans view_ai_features_1h - Restructure 07_ai_features_view.sql : single anonymous inner subquery avec aliases explicites sur toutes les colonnes (a.xxx AS xxx, h.xxx AS xxx, h2.xxx AS xxx) pour résoudre l'ambiguïté PARTITION BY src_ip dans l'outer SELECT - Supprime les CTEs multiples (h2_agg, enriched) qui déclenchaient le bug - Fix migration 04_http2_fields.sql : ordre DEFAULT avant CODEC (syntax ClickHouse) - make init-stack : 0 erreur sur 13 fichiers SQL Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 00:48:05 +02:00
toto	a108814a56	feat: roadmap détection bots §2-9 — HTTP/2, cohérence, drift, flotte, Jaccard, ExIFFI, méta-learner, métriques Étape 2 — Fingerprinting HTTP/2 dans le pipeline ML : - Ajout du dictionnaire dict_browser_h2 (11 familles de navigateurs) dans 05_aggregation_tables.sql - Ajout du CTE h2_agg et 4 features HTTP/2 dans 07_ai_features_view.sql : h2_settings_known, h2_pseudo_order_match, h2_ja4_coherence, h2_settings_rare - Calcul du fingerprint_coherence_score (5 axes pondérés) dans la vue - Ajout du 6e axe axis_h2_coherence dans browser.py (poids rééquilibrés) - browser_h2.csv : 11 fingerprints Akamai → famille navigateur Étape 3 — Pré-filtre de cohérence sur la baseline humaine : - pipeline.py exclut les sessions avec fingerprint_coherence_score < seuil de la baseline d'entraînement - FINGERPRINT_COHERENCE_THRESHOLD configurable via env (défaut 0.25) - Log des sessions exclues pour analyse SOC Étape 4 — Détection de drift améliorée : - scoring.py : passage de 5 à 9 quantiles (p5…p95) - Ajout de la divergence KL en complément du test KS - Détection de drift adversarial (≥80% des features dérivent dans la même direction) - Split temporel strict pour la validation Étape 5 — Graphe bipartite JA4×ASN (§5.2) : - fleet.py : détection de flottes via NetworkX + Louvain (imports optionnels) - enrich_with_fleet_score() : ajout fleet_score + fleet_campaign_flag au DataFrame - cycle.py : appel après preprocess_df avec log du nombre de sessions en flotte - SQL migration 05_fleet_metrics_tables.sql : table fleet_detections (TTL 7j) - Dashboard : /fleet + /api/fleet (communautés détectées) + template fleet.html Étape 6 — Cross-domain Jaccard §5.8 : - 12_thesis_features.sql : CTE jaccard_paths → cross_domain_path_similarity - Signal : même chemins (/admin, /wp-login) sur plusieurs hosts = scanner Étape 7 — ExIFFI + erreurs AE par feature : - scoring.py : compute_exiffi_importance() par permutation, compute_ae_feature_errors() - pipeline.py : calcul ExIFFI sur X_test, mapping index → dict pour anomalies - build_reason() enrichi avec exiffi_top quand SHAP inactif Étape 8 — Méta-learner pour la pondération de l'ensemble : - scoring.py : classe MetaLearner (LogisticRegression, fallback poids fixes <1000 labels) - Collecte des labels depuis le cycle courant (known_bots, légitimes, Anubis) - pipeline.py : remplacement des poids fixes par MetaLearner.predict() Étape 9 — Métriques de performance et monitoring : - metrics.py : record_cycle_metrics() — taux anomalie, drift, corrélation, latence - SQL migration 05_fleet_metrics_tables.sql : table ml_performance_metrics (TTL 90j) - Dashboard : /health + /api/health + template health.html - cycle.py : appel record_cycle_metrics en fin de cycle (Complet + Applicatif) Tests : 36/36 bot-detector tests passent Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 00:11:35 +02:00
toto	8ca4a1e849	feat(mod_reqin_log): fingerprinting HTTP/2 passif (Akamai format) Ajoute un filtre d'entrée de connexion (AP_FTYPE_CONNECTION, APR_HOOK_LAST) qui s'insère entre mod_ssl et mod_http2 pour lire de manière non-destructive le preface HTTP/2 (RFC 9113 §3.4) et en extraire : - h2_fingerprint : fingerprint Akamai complet ex. '1:65536,2:0,4:6291456,6:262144\|15663105\|0\|m,a,s,p' - h2_settings_fp : entrées SETTINGS brutes (ex. '1:65536,4:6291456') - h2_window_update : incrément WINDOW_UPDATE (ex. '15663105') - h2_pseudo_order : ordre des pseudo-headers (ex. 'm,a,s,p' Chrome, 'm,p,s,a' Firefox) Technique : lecture spéculative AP_MODE_SPECULATIVE (non-destructive) de 512 octets — la donnée reste disponible pour mod_http2. Le filtre se retire de la chaîne après la première invocation. Stockage dans c->notes (H2_NOTE_*) puis émission JSON dans log_request(). ClickHouse : 4 nouvelles colonnes dans http_logs + JSONExtract dans mv_http_logs. Migration pour déploiements existants : 04_http2_fields.sql. 14 tests unitaires (cmocka) couvrent Chrome/Firefox/HTTP1/troncature/HPACK. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 23:46:50 +02:00
toto	14db3d9040	refactor: suppression dépendance User-Agent de la détection navigateur Changements SQL : - modern_browser_score : sec-ch-ua→100, Sec-Fetch→70 (plus de UA fallback) - Ajout has_sec_ch_ua (UInt8) dans agg_header_fingerprint_1h et ml_all_scores - mss_mobile_mismatch utilise has_sec_ch_ua au lieu de modern_browser_score - header_order_confidence : PARTITION BY ja4 au lieu de first_ua - sec_ch_mobile_mismatch : comparaison Client Hints interne (sans UA) - Migration 03_remove_ua_browser_detection.sql Changements Python : - browser.py Axe 3 : Client Hints + Sec-Fetch + is_fake_navigation (PAS de UA) - Pondération axes : ja4_known 0.30, tls_coherence 0.20 (signaux TLS renforcés) - preprocessing.py : has_sec_ch_ua ajouté aux features et binary_features Fichiers modifiés : 8 SQL/Python + 1 migration, 36/36 tests passent. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 23:06:01 +02:00
toto	00e99e5464	fix(bot-detector): make scoring functions public (remove underscore prefix) compute_shap_top_features, build_reason, cluster_anomalies renamed from private (_prefixed) to public to match pipeline.py imports. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 22:49:48 +02:00
toto	629f7b334d	fix(bot-detector): rename _compute_drift_score to public, fix import Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 22:48:21 +02:00
toto	de6d8da931	fix(bot-detector): FEATURES_BASE → FEATURES import name mismatch Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 22:42:32 +02:00
toto	6d64c2a8a8	fix(rpm): add systemd-rpm-macros to Dockerfile.package, fix correlator spec_version - sentinel/correlator: install systemd-rpm-macros in rpm-builder stage - correlator: use build_version macro (not version) to avoid recursive expansion - mod-reqin-log: fix ctest --test-dir to find tests in build/tests/ Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 22:33:53 +02:00
toto	6b3cc54652	docs: réécriture audit, DOCUMENTATION.md et IMPROVEMENTS.md pour architecture modulaire - AUDIT: conformité mise à jour 97.9% (142/145), références modulaires - DOCUMENTATION.md: 1083 lignes, 7 sections, 11 modules documentés - IMPROVEMENTS.md: A1-A10/B1-B10 annotés ✅/🔄/❌ avec localisations Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 22:14:18 +02:00
toto	9ea36ad22e	feat(scripts): complete stack init + prod data import with date shift Schema cleanup: - Remove anubis_ua_rules table stub from 03_anubis_tables.sql - Remove anubis_ua_rules from bot-detector deploy_schema.sql - Remove UA seed step from clickhouse-init.sh (no more REGEXP_TREE dependency) - Drop dict_anubis_ua, dict_anubis_country, anubis_ua_rules, anubis_country_rules New scripts: - scripts/init-stack.sh: comprehensive ClickHouse init (13 SQL files + migrations + validation + cleanup of obsolete tables). Supports --reset, --import-prod. - scripts/import-prod-data.sh: imports pre-exported prod data (Native format) with dynamic date shift (max(time) → now). Supports --shift, --no-truncate. - scripts/data/prod-export/: directory for cached Native format exports Makefile targets: init-stack, import-prod-data, init-and-import Tested: init-stack.sh passes all 13 SQL + 7 critical tables + 7 dicts import-prod-data.sh: 3M rows in ~37s with auto date shift Dashboard: 55 routes OK, bot-detector: 36/36 tests pass Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 21:40:05 +02:00
toto	8180f4af04	refactor(anubis): simplify to IP/CIDR + ASN only, remove UA and Country rules - Remove UA regex extraction (extract_ua_regex, _extract_ua_from_all/any) - Remove Country rule collection from parse_bot_policies_inline - Simplify fetch_rules.py: collect_all_rules returns (ip_rules, asn_rules) - Remove insert_ua_rules and insert_country_rules functions - reload_dicts now only reloads dict_anubis_ip + dict_anubis_asn - Simplify CASE blocks in 04_mv_http_logs.sql, 07_ai_features_view.sql, view_ai_features_anubis.sql, mv_http_logs.sql: IP > ASN (was 5-level UA+IP > UA > IP > ASN > Country cascade) - Remove dict_anubis_country + dict_anubis_ua from 03_anubis_tables.sql (UA table kept as stub for REGEXP_TREE catch-all compatibility) - Remove anubis_country_rules table from schema - Remove Anubis UA and Country tabs from dashboard reflists page - Remove anubis_ua_rules/country_rules from API reflist queries - deploy_schema.sql simplified from 339 to 122 lines - 764 lines removed across 9 files Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 15:25:33 +02:00
toto	98abbc80c7	feat(dashboard): page Listes de référence — visualisation CSV/dictionnaires Nouvelle page /reflists pour visualiser les 9 dictionnaires ClickHouse : - bot_ip (3.5K entrées) : IP/CIDR de bots connus - bot_ja4 (31) : fingerprints JA4 de bots - browser_ja4 (1.2K) : fingerprints JA4 navigateurs → famille, lib TLS - asn_reputation (82.5K) : ASN → réputation (isp, datacenter, cdn…) - iplocate_asn (714K) : géolocalisation IP → ASN, pays, nom - anubis_ua_rules, anubis_ip_rules, anubis_asn_rules, anubis_country_rules Fonctionnalités : - 9 onglets de navigation entre les listes - Recherche textuelle avec filtrage côté ClickHouse - Pagination (200 entrées/page) - Tri par colonne (ASC/DESC) - Graphique de répartition (ECharts) par catégorie - KPIs dictionnaires en haut de page - Infobulles de documentation API : /api/dictionaries, /api/reflist/{name}, /api/reflist/{name}/stats Helpers : esc() (HTML escape) ajouté à base.html Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 14:56:54 +02:00
toto	039086a0b3	feat: nouvelles techniques de détection et page tactiques SOC SQL: - Ajout 5 colonnes d'agrégation (count_xff, count_unusual_ct, count_non_std_port, count_login_post, sec_ch_mobile_mismatch) - Exposition de 5 features calculées dans view_ai_features_1h - Migration ALTER TABLE pour déploiements existants Bot-detector: - 7 nouvelles features ML (has_xff, unusual_content_type_ratio, non_standard_port_ratio, login_post_concentration, sec_ch_mobile_mismatch, true_window_size, window_mss_ratio) - Propagation campaign_id vers ml_all_scores (était toujours -1) - Escalade campagne : HIGH→CRITICAL si cluster ≥5 membres Dashboard: - Page Tactiques SOC : brute-force, rotation JA4, récurrence, alertes temps réel — 4 KPIs + 4 panneaux + infobulles doc - Ajout fmtDate() helper global - Navigation sidebar mise à jour Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 14:29:18 +02:00
toto	702c0d5edb	feat(dashboard): add JA4 fingerprint and cluster investigation pages - /ja4/{fingerprint} page: 8 KPIs, timeline, threat pie, IP scores table, ASN/geo charts, HTTP logs, AI features — full JA4 investigation - /cluster/{cid} page: 8 KPIs, timeline, threat/JA4/ASN/host charts, member table with bulk classify — full campaign investigation - /api/ja4/{fingerprint} and /api/cluster/{cid} API endpoints - fmtJA4 links now navigate to /ja4/ investigation page - campaigns.html: 'Ouvrir' button links to /cluster/{cid} full page - Fix: double-brace {{param}} in non-f-string queries → single {param} (was causing HTTP 500 on all parameterized ClickHouse queries) - 50 routes total, all tests pass, 0 JS console errors Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 14:05:52 +02:00
toto	70188b508c	fix(dashboard): eliminate @apply CSS, fix status column, fix click propagation Playwright testing revealed 3 critical bugs: 1. Tailwind CDN @apply with custom brand-* colors produces empty CSS rules, breaking ALL design components (kpi-card, data-table, badges, filter-btn, section-card, nav-item). Fix: replace all @apply directives with equivalent raw CSS values. 2. Traffic API and IP detail API reference non-existent 'status' column in http_logs table → HTTP 500 on /traffic and /ip/{ip}. Fix: remove status from SELECT, sort whitelist, filters, and templates. 3. Nested <a> links (fmtJA4, fmtASN, fmtCountry, fmtBotName) inside clickable <tr onclick> capture clicks, preventing row navigation to /ip/ detail. Fix: add event.stopPropagation() to all formatter links. Verified with Playwright: 10 pages × 0 JS errors, all tooltips hidden by default, sidebar toggle works, keyboard shortcuts (Alt+1-9, Alt+B), classification form saves to DB, campaign detail panel opens on click. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 13:54:38 +02:00
toto	6babc55e3e	fix(dashboard): hover infobulles, full-width layout, UX polish - Fix doc tooltips: split CSS into <style type='text/tailwindcss'> for @apply directives + raw CSS for reliable doc panel rendering - Convert doc panels from click-toggle to hover-based infobulles with arrow pointer, fade-in animation, and auto-dismiss on mobile - Replace '?' icons with 'ⓘ' across all 11 templates (51 tooltips) - Full-width layout: reduce padding on mobile (px-3), scale up on desktop (lg:px-5, xl:px-6) for maximum screen utilization - Auto-collapse sidebar on narrow screens (<1024px) - Keyboard shortcuts: Alt+1–9 for page navigation, Alt+B toggle sidebar - Add LEGITIMATE_BROWSER filter button to detections page - Sticky header with stronger blur (backdrop-blur-md) - All 46 routes pass tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 13:30:16 +02:00
toto	63ba6d203c	feat(dashboard): complete SOC dashboard with full monitoring and workflows - models.html: Full rewrite — 6 KPIs, scoring volume timeline, anomaly rate chart, threat breakdown per model, enhanced model cards with validation gate - classify.html: SOC workflow — suggested unclassified IPs, quick-classify buttons, classification stats pie, pre-fill from URL params - traffic.html: Clickable rows → ip_detail, column sorting, status column, search filter, doc tooltips on all chart sections - scores.html: Search input, clickable rows → ip_detail, LEGITIMATE_BROWSER filter button, doc tooltips on distribution + scatter charts - ip_detail.html: Resource cascade section (headless browser detection), status column in HTTP logs table - detections.html: Doc tooltips on threat/reason/ASN chart sections - features.html: Doc tooltips on radar/importance/scatter sections - api.py: 4 new endpoints — /api/models/timeline, /api/models/threats, /api/classify/stats, /api/classify/suggested. Traffic API: status + search. 46 routes total. All tests pass (dashboard + bot-detector 36/36). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 01:25:01 +02:00
toto	396baa90d2	feat(dashboard): visualisation clusters HDBSCAN - Page /campaigns dédiée avec 4 vues graphiques : · Scatter plot (score vs vélocité, bulles colorées par campagne) · Graphe réseau force-directed (IPs liées par JA4 partagé) · Grille de cartes campagne (KPIs, ASN, pays, JA4) · Panneau détail (radar comportemental, timeline horaire, table membres) - 4 nouveaux endpoints API : · GET /api/campaigns (fix: campaign_id >= 0 au lieu de != '') · GET /api/campaigns/graph (nœuds + arêtes) · GET /api/campaigns/scatter (score/vélocité par IP) · GET /api/campaigns/{cid} (détail + profil + timeline) - Sidebar: lien Campagnes ajouté dans Surveillance - Overview: campagnes clickables → lien vers /campaigns Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 01:11:16 +02:00
toto	f1547423b5	refactor(bot-detector): suppression monolithe, tests multifactoriels - Suppression de bot_detector.py (1982 lignes) remplacé par 11 modules - Tests navigateur mis à jour pour le système multifactoriel (browser_confidence) - 36/36 tests passent avec la nouvelle structure modulaire Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 01:03:17 +02:00
toto	1f103392ac	refactor(bot-detector): extract monolith into modular package Split bot_detector.py (~1982 lines) into 10 focused modules: - config.py: all configuration constants and optional imports - log.py: logging utilities (log_info, log_decision, append_training_history) - infra.py: ClickHouse client, health check HTTP server, shutdown - browser.py: multifactorial browser identification (5 axes) - scoring.py: drift detection, feature validation, SHAP, clustering - models.py: EIF, Autoencoder, XGBoost model management - preprocessing.py: data preprocessing and feature list definitions - pipeline.py: core semi-supervised scoring loop - cycle.py: main analysis cycle orchestration - __main__.py: entry point with startup banner Update Dockerfile to copy package directory and use python -m bot_detector. All 36 existing tests pass unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 01:02:04 +02:00
toto	2d04288e95	feat(dashboard): SOC workflow overhaul — sidebar nav, doc tooltips, full-width layout - base.html: collapsible sidebar navigation, doc tooltip system, JS helpers (fmtNum, fmtPct, fmtDuration, ecGrid, buildTable, docHTML) - overview.html: SOC command center with stacked timeline, live alerts, campaigns panel, browser donut, 6 KPIs - detections.html: threat color dots, raw score column, click-to-navigate rows - network.html: JA4 rotation, brute-force, persistent threats tables, 6 KPIs - ip_detail.html: ASN/country KPIs, AE/XGB/campaign columns, enriched features - scores/traffic/features/models/classify: page_title blocks + doc tooltips - api.py: 9 new endpoints (campaigns, brute-force, ja4-rotation, recurrence, cascade, alerts, timeline-detail, ua-rotation) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 00:29:34 +02:00
toto	c994ad4466	fix: XGB label query + SHAP isotree compatibility XGB: query was selecting features from ml_all_scores which doesn't store them. Now joins ml_all_scores (labels) with view_ai_features_1h (features). Dynamically discovers available columns to skip thesis §5 features not present in the view. Returns (model, features) tuple. SHAP: TreeExplainer doesn't support isotree. Fall back to permutation- based Explainer(model.decision_function, X_sample) for isotree. Verified: XGB trained on 50000 labels (18436 positives), triple-voice ensemble scoring active (EIF+AE+XGB), SHAP silent. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 00:06:54 +02:00
toto	c6666e2bba	fix: isotree score convention — proper sklearn calibration isotree decision_function returns [0,1] (higher=anomalous, 0.5=boundary). The entire pipeline (normalize_scores, score_to_threat_level, compute_adaptive_threshold) expects sklearn convention (negative=anomalous). Previous fix (-raw_scores) negated all values, making everything below -0.30 → all CRITICAL. New fix: 0.5 - isotree_score maps correctly to sklearn's convention: isotree 0.80 → -0.30 (CRITICAL) isotree 0.65 → -0.15 (HIGH) isotree 0.55 → -0.05 (MEDIUM) isotree 0.50 → 0.00 (boundary) Verified: 27,952 LEGITIMATE_BROWSER + 15,843 HIGH + 15,059 MEDIUM Tests: 36/36 pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-08 23:56:05 +02:00
toto	db306fb9da	fix: P0 audit bugs — bot-detector + dashboard + SQL Bot-detector: - B1.1: campaign_id and raw_anomaly_score now inserted into ml_detected_anomalies - B1.4/B1.5: log_decision argument order fixed (cycle_id, name) - B1.7: AE broadcast error — model now returns features list, scoring uses model's features instead of current cycle's (prevents dim mismatch) - B1.8: Anubis ALLOW bots now get bot_name from anubis_bot_name Dashboard: - C1.1: XSS in ip_detail.html — {{ ip \| tojson }} instead of raw string - C1.2: Stored XSS via innerHTML — added escapeHtml() helper, all user-facing formatters (fmtIP, fmtASN, fmtCountry, fmtJA4, fmtBotName, fmtLabel) sanitized - C2.1: status filter now correctly filters http_version column - C2.2: heatmap toDayOfWeek() - 1 for 0-indexed JS days SQL: - B1.3: view_ip_recurrence worst_score uses max() not min() (0=normal, 1=anomal) - B1.6: view_resource_cascade_1h joined into view_thesis_features_1h (§5.4) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-08 23:33:00 +02:00
toto	98289ccf04	fix: ASN dictionary pipeline + verbose bot-detector logging - Fix dict_iplocate_asn: remove non-existent org/domain columns (4→4 cols) - Add CSV header to iplocate-ip-to-asn.csv (CSVWithNames format) - Replace org/domain dictGet calls with empty string literals in MV - Full 714K CIDR stub for complete ASN resolution in tests - Add header generation to generate_asn_data.py - Verbose bot-detector stdout: data summary, triage breakdown, model training details, scoring stats, browser classification, boxed results - Fix IPv6 filter in traffic seeder (_ips_from_cidrs) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-08 17:43:55 +02:00
toto	5c5bca71d1	feat: rewrite ASN classification with PeeringDB + expanded heuristics Major improvements to generate_asn_data.py: - Add PeeringDB network data source (34K networks with info_type) - Add new categories: education, government, enterprise - Rename 'human' label to 'isp' across all consumers - Expand keyword heuristics (ISP, datacenter, hosting, CDN, education, gov) - Add hard-coded lists for education, government, enterprise ASNs - Support both --output-dir and --output-asn/--output-ipasn CLI interfaces - Add --no-peeringdb flag for offline use Results: unknown dropped from 86% to 57%, ISP coverage 21.8K ASNs, education 3.1K, enterprise 5.7K, government 520. Updated consumers: - bot_detector.py: 'human' -> 'isp' for baseline selection - dashboard api.py: 'human' -> 'isp' in SQL queries - run-tests.sh: 'human' -> 'isp' in integration test assertions - update-csv-data.sh: updated label description comment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-08 16:02:07 +02:00
toto	9a48fb9d29	feat: LEGITIMATE_BROWSER classification from JA4 + behavioral consistency Add browser legitimacy classification (A9) to the bot detection pipeline: - New features: is_known_browser (binary) and browser_consistency_score [0..5] combining 5 signals: JA4 browser match, modern_browser_score, Accept-Language, cookies, Sec-Fetch-* presence - Post-scoring: sessions with known browser JA4 + consistency >= 4/5 + NORMAL/LOW threat level are reclassified as LEGITIMATE_BROWSER - Spoofing detection: inconsistent behavior (known JA4 but low consistency) stays in normal anomaly scoring — prevents evasion via JA4 spoofing - XGBoost treats LEGITIMATE_BROWSER as non-threat (negative label) - ClickHouse: browser_family column added to ml_detected_anomalies and ml_all_scores - Dashboard: browser_family filter/sort on detections and scores endpoints, legitimate_browsers count and browser_stats in overview - 6 new unit tests covering classification threshold, spoofing, exclusion logic Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-08 15:46:22 +02:00
toto	7d09c614c3	feat: browser JA4 detection, Anubis bot rules, worldwide ASN data - Add generate_browser_ja4.py: 1,186 browser JA4 fingerprints from FoxIO + ja4db.com covering 11 families (Chromium, Firefox, Safari, Edge, Tor, Opera, Vivaldi...) - Rewrite generate_bot_ip.py: Anubis YAML rules (Google, Bing, Apple, DuckDuck, OpenAI, Perplexity bots) + Tor exit nodes + cloud scanner IPs (3,555 entries) - Rewrite generate_asn_data.py: worldwide iptoasn.com data (78,049 ASNs, 714K CIDRs) - Add dict_browser_ja4 ClickHouse dictionary + browser_family in AI features views - Add /api/browsers dashboard endpoint - Fix CSV quoting for fields containing commas (User-Agent strings) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-08 15:27:37 +02:00
toto	b6184e6529	feat: CSV generation scripts, API filter params, enriched CSV stubs - scripts/generate_bot_ip.py: download Tor exit nodes + curate scanner IPs (1353 entries) - scripts/generate_bot_ja4.py: 31 bot JA4 fingerprints across 16 families - scripts/generate_asn_data.py: 38 ASNs + 96 IP-to-ASN prefixes - scripts/update-csv-data.sh: master orchestrator with --install-stubs - api.py: add asn_org/country_code/ja4/bot_name filters on detections+scores - pages.py: add /network route - csv-stubs: enriched with generated data (Tor nodes, scanner IPs, etc.) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-08 15:05:43 +02:00
toto	c6ca352db9	feat(dashboard): add clickable drill-down to all data elements Add navigation helpers (fmtASN, fmtCountry, fmtJA4, fmtBotName, fmtThreatLink, fmtLabel) to base.html for SOC analyst drill-down. Update all templates: - overview.html: clickable table cells + ECharts click handlers for ASN, country, JA4, bot, and threat charts - detections.html: URL param pre-filters, active filter bar with clear buttons, clickable ASN/country/JA4/threat in table - scores.html: URL param pre-filters, clickable threat/JA4/country - traffic.html: clickable JA4 and country columns - ip_detail.html: clickable threat/JA4 in detections, clickable asn_org/country_code/asn_label in AI features grid - network.html: click handlers on ASN treemap and country sunburst, fmtJA4Full/fmtLabel/fmtBotName/fmtASN in tables - features.html: scatter plot click navigates to /ip/{ip} Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-08 14:58:48 +02:00
toto	f448dcb4b0	fix(rpm): standardize systemd scriptlets and unit installation paths - Add BuildRequires: systemd-rpm-macros to sentinel and correlator specs - Replace manual systemctl calls with %systemd_post, %systemd_preun, %systemd_postun_with_restart macros (handles daemon-reload, stop/disable, try-restart on upgrade correctly and is a no-op in containers) - ja4sentinel.spec: use %{_unitdir} macro instead of hardcoded path (/usr/lib/systemd/system); remove cross-service /var/run/logcorrelator from %files and %post (owned by logcorrelator package, not sentinel) - logcorrelator.spec: move unit from /etc/systemd/system (admin namespace) to %{_unitdir} (/usr/lib/systemd/system) — correct packaging location; move user/group creation from %post to %pre so file ownership is valid during RPM install phase; add Requires(pre): shadow-utils; fix bare directory entries in %files with %dir macro; add version fallback macro so spec is buildable without --define version - test-rpm.sh: auto-build RPM via Dockerfile.package if dist/rpm/ is empty; update service file path check to /usr/lib/systemd/system/ Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-08 10:49:21 +02:00
toto	f7ee5e63f8	fix(docker): add g++ for isotree build, add dashboard Dockerfile.tests - bot-detector Dockerfile + Dockerfile.tests: install g++ for isotree C++ extension - dashboard Dockerfile.tests: new smoke test (verify FastAPI app loads) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-08 08:08:13 +02:00
toto	b735bab5a5	feat(dashboard): rebuild SOC dashboard + fix ClickHouse SQL Complete rewrite of the SOC dashboard using FastAPI + Jinja2 + htmx + Chart.js + Tailwind CSS. Replaces the old React/Vite frontend with server-rendered templates. Dashboard pages: - Overview: KPIs, timeline chart, threat distribution, top IPs - Detections: paginated/filterable anomaly table - Scores: ml_all_scores with AE error & XGB prob columns - Traffic: HTTP logs with method/host filters - IP Investigation: full deep-dive (scores, features, HTTP logs, classify) - Classification: SOC feedback form + history - Features: AI + thesis feature stats - Models: scoring stats + model metadata API: 9 JSON endpoints with parameterized queries, sort whitelists SQL fixes: - 05_aggregation_tables: add deduplicate_merge_projection_mode - 11_views: fix nested aggregate (argMax inside sum) - 12_thesis_features: remove invalid 'let' bindings, fix groupArrayIf type Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-08 03:21:05 +02:00
toto	8d58f2b932	feat(bot-detector): add XGBoost supervised third voice (#10 ) Triple-voice ensemble architecture: - EIF (non-supervisé, anomalies zero-day) - Autoencoder (non-supervisé, corrélations non-linéaires) - XGBoost (supervisé, patterns connus + feedback SOC) XGBoost implementation: - Trained on historical ml_all_scores labels (NORMAL=0, HIGH/CRITICAL/DENY/KNOWN=1) - Weekly retraining (XGB_RETRAIN_INTERVAL_H=168), min 100 labels required - Score = predict_proba, combined via meta-learner: (1-β)(EIF+AE) + βxgb_prob - Configurable: XGB_WEIGHT (β=0.20), XGB_MIN_LABELS, XGB_RETRAIN_INTERVAL_HOURS - Graceful fallback: if xgboost unavailable or labels insufficient, EIF+AE only - ClickHouse: xgb_prob column added to ml_all_scores - Tests: 4 new tests (availability, train/predict, meta-learner, save/load) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-08 02:45:57 +02:00
toto	57cf6c3828	feat(bot-detector): add parallel Autoencoder scorer (#9 ) - TrafficAutoEncoder class: symmetric AE (n→64→32→16→32→64→n) with BatchNorm+ReLU - Trained alongside EIF on human_baseline, saved/loaded with model versioning - Score = per-sample MSE reconstruction error, combined with EIF via AE_WEIGHT (α=0.30) - AE latent space (16-dim) used for HDBSCAN clustering instead of raw features - Configurable: AE_WEIGHT, AE_EPOCHS, AE_LATENT_DIM, AE_LEARNING_RATE - Graceful fallback: if torch unavailable or AE fails, EIF-only scoring continues - ClickHouse: ae_recon_error column added to ml_all_scores - Tests: 5 new tests (AE train/score, encode latent, state dict save/load, weight combination) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-08 02:40:39 +02:00

1 2

62 Commits