ja4-platform

Author	SHA1	Message	Date
Jacquin Antoine	f88b739992	feat(e2e): add distributed E2E test framework with parametric traffic generation Add run-e2e-test.sh with CLI parameters (--hits, --http-ratio, --dns, --tls, --src-ips, --keep-analysis, --up) for configurable traffic generation. Traffic runs from VM endpoints with multiple source IPs (alias IPs on eth0) to produce distinct sessions for the ML pipeline. Fix curl TLS flags (--tlsv1.2 instead of --tls-v1-2), skip redundant local verification in distributed mode, and fix dashboard is_available() cache that never retried after ClickHouse recovery. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-15 00:09:32 +02:00
toto	7eb3ad21fd	feat(dashboard): afficher SETTINGS H2 individuels dans la table mismatch - /api/browser-signatures : top_mismatches inclut désormais les 7 colonnes SETTINGS individuelles (h2_header_table_size, h2_enable_push, h2_max_concurrent_streams, h2_initial_window_size, h2_max_frame_size, h2_max_header_list_size, h2_enable_connect_protocol) - stats : ajout sessions_with_priority (countIf h2_priority_present > 0) - browsers.html : colonne SETTINGS compact dans la table suspects (format '3:100, 4:65536, 2:0' — IDs Akamai avec valeurs non-nulles) - Compteur pseudo-priority utilise la vraie valeur sessions_with_priority au lieu d'afficher '—' Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-11 03:11:17 +02:00
toto	85d3b95b7b	feat: HTTP/2 passive fingerprinting with individual SETTINGS fields Complete implementation of HTTP/2 passive fingerprinting per thesis §2.5.3: mod-reqin-log (C module): - Replace connection-level filter with ap_hook_process_connection (APR_HOOK_FIRST) to capture H2 preface before mod_http2 takes over the connection - AP_MODE_SPECULATIVE read of 512 bytes from c->input_filters - Parse SETTINGS, WINDOW_UPDATE, PRIORITY flags, pseudo-header order - Output individual SETTINGS params as separate JSON fields (IDs 1-6, 8) - Read H2 notes from c1 (master connection) for mod_http2 secondary conns - Fix header_order_signature JSON length bug (26→strlen) ClickHouse schema: - Add 8 new columns to http_logs: h2_has_priority, h2_header_table_size, h2_enable_push, h2_max_concurrent_streams, h2_initial_window_size, h2_max_frame_size, h2_max_header_list_size, h2_enable_connect_protocol - Use Int32/Int64 with DEFAULT -1 to distinguish absent vs zero - Update mv_http_logs to extract individual fields via JSONHas/JSONExtractInt - Migration 04_http2_fields.sql updated for existing deployments Correlator: - Accept both timestamp_ns and timestamp field names (backward compat) Integration: - Enable HTTP/2 in Apache: Protocols h2 http/1.1 in httpd-integration.conf Validated end-to-end via Playwright: H2 curl traffic → mod-reqin-log → correlator → ClickHouse with all 12 H2 columns populated correctly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-11 02:33:45 +02:00
toto	261205028d	fix(dashboard): campaigns scatter chart — show campaigns not IPs - API /api/campaigns/scatter: aggregate by campaign_id instead of per-IP Returns avg_score, avg_velocity, unique_ips, ja4_list, asn_list, country_list - Template: one bubble per campaign, sized by IP count - Tooltip: campaign-level info (IPs, score, velocity, ASNs, pays, JA4s) - Click navigates to campaign detail (not IP detail) - Updated doc panel text Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 15:09:02 +02:00
toto	fb73c60e7d	feat(dashboard): fingerprint discovery page — extract and group JA4/H2/headers from traffic - GET /api/fingerprint-discovery: queries http_logs, groups by JA4, aggregates UA family, header presence rates (Sec-CH-UA, Sec-Fetch, Accept-Language, zstd, brotli, gzip, XFF), H2 data, TLS info, dict lookups - /fingerprints page: KPIs, doughnut chart by family, stacked header bars, filterable/sortable profile table, expandable detail panel - Promote button: push H2 fingerprints to browser_h2_signatures via existing POST /api/browser-signatures/entries endpoint - Nav link: Découverte added after Navigateurs in sidebar Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 15:02:53 +02:00
toto	fde6864311	feat(dashboard): browser signatures management UI - Ajoute dict_browser_h2 dans /reflists (lecture seule via dict_browser_h2) - Nouveaux endpoints API : GET /api/browser-signatures/entries — liste browser_h2_signatures (fallback dict CSV si migration 06 non appliquée) POST /api/browser-signatures/entries — ajout fingerprint + reload dict DELETE /api/browser-signatures/entries — suppression + reload dict - Page /browsers : 2 nouvelles sections 'Base de signatures H2' — tableau des 10 fingerprints, form d'ajout, mode lecture seule automatique si migration 06 non appliquée 'Règles de scoring browser_matcher.py' — tableau statique des 7 dimensions (poids, valeurs par famille, seuils de bypass) - Integration : browser_h2.csv copié dans user_files au démarrage ClickHouse Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 14:46:07 +02:00
toto	da1b579d4f	fix(dashboard): rename duplicate /api/browsers route to /api/browser-signatures La route /api/browsers existait déjà (distribution JA4 par famille). La nouvelle route du browser_matcher était en conflit — FastAPI utilisait la première définition. Renommage en /api/browser-signatures. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 14:17:38 +02:00
toto	9c308747bd	feat(dashboard): page Browser Signature Detection (/browsers) Nouvelle page dédiée à l'analyse passive des signatures navigateur (§4) : API — GET /api/browsers : Requête view_ai_features_1h pour : - Compteurs globaux (total, sessions_with_h2, matched, mismatch %) - Distribution h2_dict_family (Chrome/Firefox/Safari/Edge) - Répartition des signaux WINDOW_UPDATE (chrome/firefox/safari/absent/autre) - Mismatch TLS↔H2 par famille JA4 (total + count + %) - Top 20 sessions suspectes (tls_h2_family_mismatch=1, triées par hits) Page /browsers : - 6 KPI header (sessions, avec H2, famille connue, taux match, mismatch, % mismatch) - Doc banner expliquant browser_matcher §4 et le mode DUAL_MODE - Donut : familles H2 (dict_browser_h2 lookup) - Bar horizontal : WINDOW_UPDATE signals par famille - Bar groupé + ligne : mismatch TLS↔H2 par famille JA4 (count + %) - Table : top 20 imposteurs potentiels avec IP cliquable, pseudo-order, cohérence - Mini-KPIs : ordres pseudo-headers Chrome/Safari, Firefox, inconnu, PRIORITY frames - Lien nav 'Navigateurs' dans le groupe Surveillance de base.html Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 14:02:39 +02:00
toto	79dbb23d6f	feat(dashboard): sélecteur de plage temporelle sur /campaigns Avant : toutes les vues de campagnes étaient fixes à 7 jours. Après : sélecteur 1j / 7j (défaut) / 14j / 30j / 90j en haut à droite. - Ajout du paramètre ?days= (1–90, défaut 7) à : GET /api/campaigns GET /api/campaigns/graph GET /api/campaigns/scatter GET /api/campaigns/{cid} - Le sélecteur recharge simultanément les 3 vues (cartes, scatter, graphe) et le panneau de détail avec la même fenêtre temporelle - Le compteur de campagnes indique la plage active : (4 campagnes — 30j) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 13:24:08 +02:00
toto	7a04e47041	fix(sql+api): fix view column mismatches and ClickHouse 24.8 JOIN issue - view_form_bruteforce_detected: add post_count, distinct_paths, first_seen, last_seen - view_host_ip_ja4_rotation: add host, distinct_ja4, ja4_list, window_start - Replace uniqExact/groupUniqArray with count()/groupArray (no nested-agg error) - api.py campaigns/graph: move a.src_ip < b.src_ip from JOIN ON to WHERE (ClickHouse 24.8 forbids cross-table inequality in JOIN ON condition) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 01:05:04 +02:00
toto	a108814a56	feat: roadmap détection bots §2-9 — HTTP/2, cohérence, drift, flotte, Jaccard, ExIFFI, méta-learner, métriques Étape 2 — Fingerprinting HTTP/2 dans le pipeline ML : - Ajout du dictionnaire dict_browser_h2 (11 familles de navigateurs) dans 05_aggregation_tables.sql - Ajout du CTE h2_agg et 4 features HTTP/2 dans 07_ai_features_view.sql : h2_settings_known, h2_pseudo_order_match, h2_ja4_coherence, h2_settings_rare - Calcul du fingerprint_coherence_score (5 axes pondérés) dans la vue - Ajout du 6e axe axis_h2_coherence dans browser.py (poids rééquilibrés) - browser_h2.csv : 11 fingerprints Akamai → famille navigateur Étape 3 — Pré-filtre de cohérence sur la baseline humaine : - pipeline.py exclut les sessions avec fingerprint_coherence_score < seuil de la baseline d'entraînement - FINGERPRINT_COHERENCE_THRESHOLD configurable via env (défaut 0.25) - Log des sessions exclues pour analyse SOC Étape 4 — Détection de drift améliorée : - scoring.py : passage de 5 à 9 quantiles (p5…p95) - Ajout de la divergence KL en complément du test KS - Détection de drift adversarial (≥80% des features dérivent dans la même direction) - Split temporel strict pour la validation Étape 5 — Graphe bipartite JA4×ASN (§5.2) : - fleet.py : détection de flottes via NetworkX + Louvain (imports optionnels) - enrich_with_fleet_score() : ajout fleet_score + fleet_campaign_flag au DataFrame - cycle.py : appel après preprocess_df avec log du nombre de sessions en flotte - SQL migration 05_fleet_metrics_tables.sql : table fleet_detections (TTL 7j) - Dashboard : /fleet + /api/fleet (communautés détectées) + template fleet.html Étape 6 — Cross-domain Jaccard §5.8 : - 12_thesis_features.sql : CTE jaccard_paths → cross_domain_path_similarity - Signal : même chemins (/admin, /wp-login) sur plusieurs hosts = scanner Étape 7 — ExIFFI + erreurs AE par feature : - scoring.py : compute_exiffi_importance() par permutation, compute_ae_feature_errors() - pipeline.py : calcul ExIFFI sur X_test, mapping index → dict pour anomalies - build_reason() enrichi avec exiffi_top quand SHAP inactif Étape 8 — Méta-learner pour la pondération de l'ensemble : - scoring.py : classe MetaLearner (LogisticRegression, fallback poids fixes <1000 labels) - Collecte des labels depuis le cycle courant (known_bots, légitimes, Anubis) - pipeline.py : remplacement des poids fixes par MetaLearner.predict() Étape 9 — Métriques de performance et monitoring : - metrics.py : record_cycle_metrics() — taux anomalie, drift, corrélation, latence - SQL migration 05_fleet_metrics_tables.sql : table ml_performance_metrics (TTL 90j) - Dashboard : /health + /api/health + template health.html - cycle.py : appel record_cycle_metrics en fin de cycle (Complet + Applicatif) Tests : 36/36 bot-detector tests passent Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 00:11:35 +02:00
toto	8180f4af04	refactor(anubis): simplify to IP/CIDR + ASN only, remove UA and Country rules - Remove UA regex extraction (extract_ua_regex, _extract_ua_from_all/any) - Remove Country rule collection from parse_bot_policies_inline - Simplify fetch_rules.py: collect_all_rules returns (ip_rules, asn_rules) - Remove insert_ua_rules and insert_country_rules functions - reload_dicts now only reloads dict_anubis_ip + dict_anubis_asn - Simplify CASE blocks in 04_mv_http_logs.sql, 07_ai_features_view.sql, view_ai_features_anubis.sql, mv_http_logs.sql: IP > ASN (was 5-level UA+IP > UA > IP > ASN > Country cascade) - Remove dict_anubis_country + dict_anubis_ua from 03_anubis_tables.sql (UA table kept as stub for REGEXP_TREE catch-all compatibility) - Remove anubis_country_rules table from schema - Remove Anubis UA and Country tabs from dashboard reflists page - Remove anubis_ua_rules/country_rules from API reflist queries - deploy_schema.sql simplified from 339 to 122 lines - 764 lines removed across 9 files Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 15:25:33 +02:00
toto	98abbc80c7	feat(dashboard): page Listes de référence — visualisation CSV/dictionnaires Nouvelle page /reflists pour visualiser les 9 dictionnaires ClickHouse : - bot_ip (3.5K entrées) : IP/CIDR de bots connus - bot_ja4 (31) : fingerprints JA4 de bots - browser_ja4 (1.2K) : fingerprints JA4 navigateurs → famille, lib TLS - asn_reputation (82.5K) : ASN → réputation (isp, datacenter, cdn…) - iplocate_asn (714K) : géolocalisation IP → ASN, pays, nom - anubis_ua_rules, anubis_ip_rules, anubis_asn_rules, anubis_country_rules Fonctionnalités : - 9 onglets de navigation entre les listes - Recherche textuelle avec filtrage côté ClickHouse - Pagination (200 entrées/page) - Tri par colonne (ASC/DESC) - Graphique de répartition (ECharts) par catégorie - KPIs dictionnaires en haut de page - Infobulles de documentation API : /api/dictionaries, /api/reflist/{name}, /api/reflist/{name}/stats Helpers : esc() (HTML escape) ajouté à base.html Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 14:56:54 +02:00
toto	039086a0b3	feat: nouvelles techniques de détection et page tactiques SOC SQL: - Ajout 5 colonnes d'agrégation (count_xff, count_unusual_ct, count_non_std_port, count_login_post, sec_ch_mobile_mismatch) - Exposition de 5 features calculées dans view_ai_features_1h - Migration ALTER TABLE pour déploiements existants Bot-detector: - 7 nouvelles features ML (has_xff, unusual_content_type_ratio, non_standard_port_ratio, login_post_concentration, sec_ch_mobile_mismatch, true_window_size, window_mss_ratio) - Propagation campaign_id vers ml_all_scores (était toujours -1) - Escalade campagne : HIGH→CRITICAL si cluster ≥5 membres Dashboard: - Page Tactiques SOC : brute-force, rotation JA4, récurrence, alertes temps réel — 4 KPIs + 4 panneaux + infobulles doc - Ajout fmtDate() helper global - Navigation sidebar mise à jour Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 14:29:18 +02:00
toto	702c0d5edb	feat(dashboard): add JA4 fingerprint and cluster investigation pages - /ja4/{fingerprint} page: 8 KPIs, timeline, threat pie, IP scores table, ASN/geo charts, HTTP logs, AI features — full JA4 investigation - /cluster/{cid} page: 8 KPIs, timeline, threat/JA4/ASN/host charts, member table with bulk classify — full campaign investigation - /api/ja4/{fingerprint} and /api/cluster/{cid} API endpoints - fmtJA4 links now navigate to /ja4/ investigation page - campaigns.html: 'Ouvrir' button links to /cluster/{cid} full page - Fix: double-brace {{param}} in non-f-string queries → single {param} (was causing HTTP 500 on all parameterized ClickHouse queries) - 50 routes total, all tests pass, 0 JS console errors Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 14:05:52 +02:00
toto	70188b508c	fix(dashboard): eliminate @apply CSS, fix status column, fix click propagation Playwright testing revealed 3 critical bugs: 1. Tailwind CDN @apply with custom brand-* colors produces empty CSS rules, breaking ALL design components (kpi-card, data-table, badges, filter-btn, section-card, nav-item). Fix: replace all @apply directives with equivalent raw CSS values. 2. Traffic API and IP detail API reference non-existent 'status' column in http_logs table → HTTP 500 on /traffic and /ip/{ip}. Fix: remove status from SELECT, sort whitelist, filters, and templates. 3. Nested <a> links (fmtJA4, fmtASN, fmtCountry, fmtBotName) inside clickable <tr onclick> capture clicks, preventing row navigation to /ip/ detail. Fix: add event.stopPropagation() to all formatter links. Verified with Playwright: 10 pages × 0 JS errors, all tooltips hidden by default, sidebar toggle works, keyboard shortcuts (Alt+1-9, Alt+B), classification form saves to DB, campaign detail panel opens on click. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 13:54:38 +02:00
toto	63ba6d203c	feat(dashboard): complete SOC dashboard with full monitoring and workflows - models.html: Full rewrite — 6 KPIs, scoring volume timeline, anomaly rate chart, threat breakdown per model, enhanced model cards with validation gate - classify.html: SOC workflow — suggested unclassified IPs, quick-classify buttons, classification stats pie, pre-fill from URL params - traffic.html: Clickable rows → ip_detail, column sorting, status column, search filter, doc tooltips on all chart sections - scores.html: Search input, clickable rows → ip_detail, LEGITIMATE_BROWSER filter button, doc tooltips on distribution + scatter charts - ip_detail.html: Resource cascade section (headless browser detection), status column in HTTP logs table - detections.html: Doc tooltips on threat/reason/ASN chart sections - features.html: Doc tooltips on radar/importance/scatter sections - api.py: 4 new endpoints — /api/models/timeline, /api/models/threats, /api/classify/stats, /api/classify/suggested. Traffic API: status + search. 46 routes total. All tests pass (dashboard + bot-detector 36/36). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 01:25:01 +02:00
toto	396baa90d2	feat(dashboard): visualisation clusters HDBSCAN - Page /campaigns dédiée avec 4 vues graphiques : · Scatter plot (score vs vélocité, bulles colorées par campagne) · Graphe réseau force-directed (IPs liées par JA4 partagé) · Grille de cartes campagne (KPIs, ASN, pays, JA4) · Panneau détail (radar comportemental, timeline horaire, table membres) - 4 nouveaux endpoints API : · GET /api/campaigns (fix: campaign_id >= 0 au lieu de != '') · GET /api/campaigns/graph (nœuds + arêtes) · GET /api/campaigns/scatter (score/vélocité par IP) · GET /api/campaigns/{cid} (détail + profil + timeline) - Sidebar: lien Campagnes ajouté dans Surveillance - Overview: campagnes clickables → lien vers /campaigns Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 01:11:16 +02:00
toto	2d04288e95	feat(dashboard): SOC workflow overhaul — sidebar nav, doc tooltips, full-width layout - base.html: collapsible sidebar navigation, doc tooltip system, JS helpers (fmtNum, fmtPct, fmtDuration, ecGrid, buildTable, docHTML) - overview.html: SOC command center with stacked timeline, live alerts, campaigns panel, browser donut, 6 KPIs - detections.html: threat color dots, raw score column, click-to-navigate rows - network.html: JA4 rotation, brute-force, persistent threats tables, 6 KPIs - ip_detail.html: ASN/country KPIs, AE/XGB/campaign columns, enriched features - scores/traffic/features/models/classify: page_title blocks + doc tooltips - api.py: 9 new endpoints (campaigns, brute-force, ja4-rotation, recurrence, cascade, alerts, timeline-detail, ua-rotation) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 00:29:34 +02:00
toto	db306fb9da	fix: P0 audit bugs — bot-detector + dashboard + SQL Bot-detector: - B1.1: campaign_id and raw_anomaly_score now inserted into ml_detected_anomalies - B1.4/B1.5: log_decision argument order fixed (cycle_id, name) - B1.7: AE broadcast error — model now returns features list, scoring uses model's features instead of current cycle's (prevents dim mismatch) - B1.8: Anubis ALLOW bots now get bot_name from anubis_bot_name Dashboard: - C1.1: XSS in ip_detail.html — {{ ip \| tojson }} instead of raw string - C1.2: Stored XSS via innerHTML — added escapeHtml() helper, all user-facing formatters (fmtIP, fmtASN, fmtCountry, fmtJA4, fmtBotName, fmtLabel) sanitized - C2.1: status filter now correctly filters http_version column - C2.2: heatmap toDayOfWeek() - 1 for 0-indexed JS days SQL: - B1.3: view_ip_recurrence worst_score uses max() not min() (0=normal, 1=anomal) - B1.6: view_resource_cascade_1h joined into view_thesis_features_1h (§5.4) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-08 23:33:00 +02:00
toto	5c5bca71d1	feat: rewrite ASN classification with PeeringDB + expanded heuristics Major improvements to generate_asn_data.py: - Add PeeringDB network data source (34K networks with info_type) - Add new categories: education, government, enterprise - Rename 'human' label to 'isp' across all consumers - Expand keyword heuristics (ISP, datacenter, hosting, CDN, education, gov) - Add hard-coded lists for education, government, enterprise ASNs - Support both --output-dir and --output-asn/--output-ipasn CLI interfaces - Add --no-peeringdb flag for offline use Results: unknown dropped from 86% to 57%, ISP coverage 21.8K ASNs, education 3.1K, enterprise 5.7K, government 520. Updated consumers: - bot_detector.py: 'human' -> 'isp' for baseline selection - dashboard api.py: 'human' -> 'isp' in SQL queries - run-tests.sh: 'human' -> 'isp' in integration test assertions - update-csv-data.sh: updated label description comment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-08 16:02:07 +02:00
toto	9a48fb9d29	feat: LEGITIMATE_BROWSER classification from JA4 + behavioral consistency Add browser legitimacy classification (A9) to the bot detection pipeline: - New features: is_known_browser (binary) and browser_consistency_score [0..5] combining 5 signals: JA4 browser match, modern_browser_score, Accept-Language, cookies, Sec-Fetch-* presence - Post-scoring: sessions with known browser JA4 + consistency >= 4/5 + NORMAL/LOW threat level are reclassified as LEGITIMATE_BROWSER - Spoofing detection: inconsistent behavior (known JA4 but low consistency) stays in normal anomaly scoring — prevents evasion via JA4 spoofing - XGBoost treats LEGITIMATE_BROWSER as non-threat (negative label) - ClickHouse: browser_family column added to ml_detected_anomalies and ml_all_scores - Dashboard: browser_family filter/sort on detections and scores endpoints, legitimate_browsers count and browser_stats in overview - 6 new unit tests covering classification threshold, spoofing, exclusion logic Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-08 15:46:22 +02:00
toto	7d09c614c3	feat: browser JA4 detection, Anubis bot rules, worldwide ASN data - Add generate_browser_ja4.py: 1,186 browser JA4 fingerprints from FoxIO + ja4db.com covering 11 families (Chromium, Firefox, Safari, Edge, Tor, Opera, Vivaldi...) - Rewrite generate_bot_ip.py: Anubis YAML rules (Google, Bing, Apple, DuckDuck, OpenAI, Perplexity bots) + Tor exit nodes + cloud scanner IPs (3,555 entries) - Rewrite generate_asn_data.py: worldwide iptoasn.com data (78,049 ASNs, 714K CIDRs) - Add dict_browser_ja4 ClickHouse dictionary + browser_family in AI features views - Add /api/browsers dashboard endpoint - Fix CSV quoting for fields containing commas (User-Agent strings) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-08 15:27:37 +02:00
toto	b6184e6529	feat: CSV generation scripts, API filter params, enriched CSV stubs - scripts/generate_bot_ip.py: download Tor exit nodes + curate scanner IPs (1353 entries) - scripts/generate_bot_ja4.py: 31 bot JA4 fingerprints across 16 families - scripts/generate_asn_data.py: 38 ASNs + 96 IP-to-ASN prefixes - scripts/update-csv-data.sh: master orchestrator with --install-stubs - api.py: add asn_org/country_code/ja4/bot_name filters on detections+scores - pages.py: add /network route - csv-stubs: enriched with generated data (Tor nodes, scanner IPs, etc.) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-08 15:05:43 +02:00
toto	b735bab5a5	feat(dashboard): rebuild SOC dashboard + fix ClickHouse SQL Complete rewrite of the SOC dashboard using FastAPI + Jinja2 + htmx + Chart.js + Tailwind CSS. Replaces the old React/Vite frontend with server-rendered templates. Dashboard pages: - Overview: KPIs, timeline chart, threat distribution, top IPs - Detections: paginated/filterable anomaly table - Scores: ml_all_scores with AE error & XGB prob columns - Traffic: HTTP logs with method/host filters - IP Investigation: full deep-dive (scores, features, HTTP logs, classify) - Classification: SOC feedback form + history - Features: AI + thesis feature stats - Models: scoring stats + model metadata API: 9 JSON endpoints with parameterized queries, sort whitelists SQL fixes: - 05_aggregation_tables: add deduplicate_merge_projection_mode - 11_views: fix nested aggregate (argMax inside sum) - 12_thesis_features: remove invalid 'let' bindings, fix groupArrayIf type Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-08 03:21:05 +02:00
toto	ecceb04174	perf(clickhouse): P3 — view_ip_recurrence avec filtre TTL + supprimer FINAL view_ip_recurrence : Ajout de WHERE detected_at >= now() - INTERVAL 30 DAY → Avec PARTITION BY (P1), ClickHouse élagage les partitions hors de cette plage avant même de lire les données. La vue ne scanne que les partitions actives (au lieu des 30 partitions journalières complètes). → ORDER BY (src_ip) garantit que le GROUP BY src_ip lit des données contiguës (aucune réorganisation mémoire). rotation.py — supprimer FINAL sur ml_detected_anomalies : FINAL force une déduplication complète du ReplacingMergeTree en mémoire (équivalent à un DISTINCT sur toute la table) — une des opérations les plus coûteuses dans ClickHouse. Fix : remplacer le sous-SELECT FINAL par view_ip_recurrence (déjà aggrégée par src_ip, retourne recurrence directement sans FINAL). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-07 22:33:29 +02:00
toto	2bfb4b7282	perf(dashboard): P2 — remplacer replaceRegexpAll dans les WHERE par IPv4MappedToIPv6 Problème : 8 clauses WHERE appliquaient une fonction sur la colonne src_ip : WHERE replaceRegexpAll(toString(src_ip), '^::ffff:', '') = %(ip)s → ClickHouse ne peut pas utiliser l'index de tri ou les skipping indexes quand une fonction est appliquée à la colonne filtrée. Fix : transformer l'INPUT (le paramètre) plutôt que la colonne : WHERE src_ip = IPv4MappedToIPv6(toIPv4(%(ip)s)) → src_ip reste intact → ClickHouse utilise les indexes (P1) et la projection proj_by_ip (P1) pour ces requêtes. Fichiers modifiés : investigation_summary.py — 6 WHERE (ml_detected_anomalies, agg_host_ip_ja4_1h, view_form_bruteforce_detected, view_host_ip_ja4_rotation, view_ip_recurrence) ml_features.py — 1 WHERE (view_ai_features_1h) rotation.py — 1 WHERE (agg_host_ip_ja4_1h) Note : les 27 autres occurrences de replaceRegexpAll dans les SELECT sont des transformations d'affichage (IPv6→IPv4 pour l'UI) et ne bloquent pas les indexes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-07 22:31:57 +02:00
toto	14323f7b05	perf(clickhouse): P10 — créer les 4 vues métier manquantes + corriger préfixes DB Bug de production : view_form_bruteforce_detected, view_host_ip_ja4_rotation, view_dashboard_entities, view_dashboard_user_agents étaient référencées dans 13 endpoints du dashboard mais n'existaient nulle part dans le schéma. Tous ces endpoints retournaient HTTP 500 en production. shared/clickhouse/11_views.sql (nouveau) : view_form_bruteforce_detected Source : agg_host_ip_ja4_1h (24h) Logique : GROUP BY (src_ip, host) HAVING count_post >= 10 Usage : bruteforce.py (3 endpoints), investigation_summary.py view_host_ip_ja4_rotation Source : agg_host_ip_ja4_1h (24h) Logique : uniqExact(ja4) par src_ip, HAVING >= 2 (rotation de fingerprint) Usage : rotation.py (3 endpoints), investigation_summary.py view_dashboard_entities Source : http_logs (7 jours), UNION ALL 5 branches (ip/ja4/country/asn/host) Colonnes : entity_type, entity_value, src_ip, ja4, host, log_date, client_headers Array(String), asns Array, countries Array, user_agents Array Usage : entities.py (5 endpoints), clustering.py view_dashboard_user_agents Source : http_logs (7 jours), GROUP BY (src_ip, ja4, hour) Colonnes : src_ip, ja4, hour, log_date, user_agents Array(String), requests Usage : variability.py (4 endpoints), fingerprints.py (5 endpoints) attributes.py (2 endpoints) deploy_schema.sh : ajout de 10_perf_indexes.sql et 11_views.sql dans la liste routes/variability.py + fingerprints.py : Correction de 9 requêtes utilisant view_dashboard_user_agents sans préfixe de base de données → remplacé par {settings.CLICKHOUSE_DB_PROCESSING}.view_* Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-07 22:30:09 +02:00
toto	3dfeba860b	docs: add standardized comments to all services (Python, Go, Bash) - Add docs/commenting-standard.md defining per-language comment standards (Go godoc, Python PEP-257, C Doxygen, Bash header blocks, SQL banners) - services/dashboard: 100% docstring coverage (100/100 functions) - All FastAPI route handlers, helpers, classes, and models documented - Language: French (project convention) - services/bot-detector: 100% docstring coverage (53/53 symbols) - bot_detector.py: 14 functions + module docstring - anubis/fetch_rules.py: 9 functions - shared/python/ja4_common: full docstrings on ClickHouseClient (7 methods) and ClickHouseSettings class - services/correlator: 24 godoc comments added across 6 Go files - correlation_service.go: 10 private helpers - unixsocket/source.go: 6 parsing/socket helpers - correlated_log.go: 4 field extraction helpers - orchestrator.go, logger.go, main.go: 4 comments - services/correlator/scripts/audit-architecture.sh: standardized header block Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-07 21:32:29 +02:00
toto	b6391afbeb	refactor: replace hardcoded mabase_prod DB prefix with configurable settings Replace all hardcoded 'mabase_prod.' table prefixes in dashboard route SQL queries with configurable database names from settings: - http_logs, http_logs_raw → settings.CLICKHOUSE_DB_LOGS - All other tables → settings.CLICKHOUSE_DB_PROCESSING Also qualify previously unqualified table references (bare FROM/JOIN table_name) with the appropriate database prefix for consistency. Each route file now imports 'from ..config import settings' and uses f-strings with {settings.CLICKHOUSE_DB_PROCESSING} or {settings.CLICKHOUSE_DB_LOGS} for database-qualified table names. Files updated: analysis, attributes, audit, botnets, bruteforce, clustering, detections, entities, fingerprints, header_fingerprint, heatmap, incidents, investigation_summary, metrics, ml_features, rotation, search, tcp_spoofing, variability (19 files). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-07 19:03:05 +02:00
toto	d469e39da7	feat: ja4-platform monorepo — 5 services unified, tests & RPM builds standardized Services: - ja4sentinel: TLS/JA4 fingerprint capture daemon (Go, libpcap) - logcorrelator: JA4 log correlation engine (Go, ClickHouse) - mod_reqin_log: Apache module (C, JSON request logging) - bot_detector: ML bot detection pipeline (Python) - dashboard: FastAPI/Streamlit analytics UI (Python) Shared libraries: - shared/go/ja4common: logger, config, shutdown, ipfilter (Go module) - shared/python/ja4_common: ClickHouseClient, ClickHouseSettings (Python package) - shared/clickhouse/: canonical SQL migrations (10 files) Build & packaging: - Unified 3-stage Dockerfile.package for Go RPMs (el8/el9/el10) - go.work workspace linking sentinel, correlator, ja4common - Makefile with test-all, build-all, rpm-* targets Fixes applied: - go.work: 1.21 → 1.24.6 (required by sentinel) - correlator Dockerfiles: golang:1.21 → golang:1.24 - replace directives in go.mod for ja4common local path - pyproject.toml: setuptools.backends → setuptools.build_meta - Removed static libpcap linking (unavailable on Rocky 9) - Fixed data races in output/writers_test.go (sync.Mutex + atomic.Int32) - Rewrote corrupted test files (logger_test.go × 2) Test coverage: - correlator: 67.1% total (unixsocket 80.5%, config 91.7%, app 83.3%, multi 87.7%, stdout 100%) - sentinel: all 10 packages pass (api, capture, config, fingerprint, ipfilter, logging, output, tlsparse) Documentation: - README.md + docs/ (architecture, development, 5 services, shared libs, DB schema & migrations) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-07 16:42:59 +02:00

31 Commits