ja4-platform

Author	SHA1	Message	Date
toto	039086a0b3	feat: nouvelles techniques de détection et page tactiques SOC SQL: - Ajout 5 colonnes d'agrégation (count_xff, count_unusual_ct, count_non_std_port, count_login_post, sec_ch_mobile_mismatch) - Exposition de 5 features calculées dans view_ai_features_1h - Migration ALTER TABLE pour déploiements existants Bot-detector: - 7 nouvelles features ML (has_xff, unusual_content_type_ratio, non_standard_port_ratio, login_post_concentration, sec_ch_mobile_mismatch, true_window_size, window_mss_ratio) - Propagation campaign_id vers ml_all_scores (était toujours -1) - Escalade campagne : HIGH→CRITICAL si cluster ≥5 membres Dashboard: - Page Tactiques SOC : brute-force, rotation JA4, récurrence, alertes temps réel — 4 KPIs + 4 panneaux + infobulles doc - Ajout fmtDate() helper global - Navigation sidebar mise à jour Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 14:29:18 +02:00
toto	702c0d5edb	feat(dashboard): add JA4 fingerprint and cluster investigation pages - /ja4/{fingerprint} page: 8 KPIs, timeline, threat pie, IP scores table, ASN/geo charts, HTTP logs, AI features — full JA4 investigation - /cluster/{cid} page: 8 KPIs, timeline, threat/JA4/ASN/host charts, member table with bulk classify — full campaign investigation - /api/ja4/{fingerprint} and /api/cluster/{cid} API endpoints - fmtJA4 links now navigate to /ja4/ investigation page - campaigns.html: 'Ouvrir' button links to /cluster/{cid} full page - Fix: double-brace {{param}} in non-f-string queries → single {param} (was causing HTTP 500 on all parameterized ClickHouse queries) - 50 routes total, all tests pass, 0 JS console errors Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 14:05:52 +02:00
toto	70188b508c	fix(dashboard): eliminate @apply CSS, fix status column, fix click propagation Playwright testing revealed 3 critical bugs: 1. Tailwind CDN @apply with custom brand-* colors produces empty CSS rules, breaking ALL design components (kpi-card, data-table, badges, filter-btn, section-card, nav-item). Fix: replace all @apply directives with equivalent raw CSS values. 2. Traffic API and IP detail API reference non-existent 'status' column in http_logs table → HTTP 500 on /traffic and /ip/{ip}. Fix: remove status from SELECT, sort whitelist, filters, and templates. 3. Nested <a> links (fmtJA4, fmtASN, fmtCountry, fmtBotName) inside clickable <tr onclick> capture clicks, preventing row navigation to /ip/ detail. Fix: add event.stopPropagation() to all formatter links. Verified with Playwright: 10 pages × 0 JS errors, all tooltips hidden by default, sidebar toggle works, keyboard shortcuts (Alt+1-9, Alt+B), classification form saves to DB, campaign detail panel opens on click. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 13:54:38 +02:00
toto	63ba6d203c	feat(dashboard): complete SOC dashboard with full monitoring and workflows - models.html: Full rewrite — 6 KPIs, scoring volume timeline, anomaly rate chart, threat breakdown per model, enhanced model cards with validation gate - classify.html: SOC workflow — suggested unclassified IPs, quick-classify buttons, classification stats pie, pre-fill from URL params - traffic.html: Clickable rows → ip_detail, column sorting, status column, search filter, doc tooltips on all chart sections - scores.html: Search input, clickable rows → ip_detail, LEGITIMATE_BROWSER filter button, doc tooltips on distribution + scatter charts - ip_detail.html: Resource cascade section (headless browser detection), status column in HTTP logs table - detections.html: Doc tooltips on threat/reason/ASN chart sections - features.html: Doc tooltips on radar/importance/scatter sections - api.py: 4 new endpoints — /api/models/timeline, /api/models/threats, /api/classify/stats, /api/classify/suggested. Traffic API: status + search. 46 routes total. All tests pass (dashboard + bot-detector 36/36). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 01:25:01 +02:00
toto	396baa90d2	feat(dashboard): visualisation clusters HDBSCAN - Page /campaigns dédiée avec 4 vues graphiques : · Scatter plot (score vs vélocité, bulles colorées par campagne) · Graphe réseau force-directed (IPs liées par JA4 partagé) · Grille de cartes campagne (KPIs, ASN, pays, JA4) · Panneau détail (radar comportemental, timeline horaire, table membres) - 4 nouveaux endpoints API : · GET /api/campaigns (fix: campaign_id >= 0 au lieu de != '') · GET /api/campaigns/graph (nœuds + arêtes) · GET /api/campaigns/scatter (score/vélocité par IP) · GET /api/campaigns/{cid} (détail + profil + timeline) - Sidebar: lien Campagnes ajouté dans Surveillance - Overview: campagnes clickables → lien vers /campaigns Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 01:11:16 +02:00
toto	2d04288e95	feat(dashboard): SOC workflow overhaul — sidebar nav, doc tooltips, full-width layout - base.html: collapsible sidebar navigation, doc tooltip system, JS helpers (fmtNum, fmtPct, fmtDuration, ecGrid, buildTable, docHTML) - overview.html: SOC command center with stacked timeline, live alerts, campaigns panel, browser donut, 6 KPIs - detections.html: threat color dots, raw score column, click-to-navigate rows - network.html: JA4 rotation, brute-force, persistent threats tables, 6 KPIs - ip_detail.html: ASN/country KPIs, AE/XGB/campaign columns, enriched features - scores/traffic/features/models/classify: page_title blocks + doc tooltips - api.py: 9 new endpoints (campaigns, brute-force, ja4-rotation, recurrence, cascade, alerts, timeline-detail, ua-rotation) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 00:29:34 +02:00
toto	db306fb9da	fix: P0 audit bugs — bot-detector + dashboard + SQL Bot-detector: - B1.1: campaign_id and raw_anomaly_score now inserted into ml_detected_anomalies - B1.4/B1.5: log_decision argument order fixed (cycle_id, name) - B1.7: AE broadcast error — model now returns features list, scoring uses model's features instead of current cycle's (prevents dim mismatch) - B1.8: Anubis ALLOW bots now get bot_name from anubis_bot_name Dashboard: - C1.1: XSS in ip_detail.html — {{ ip \| tojson }} instead of raw string - C1.2: Stored XSS via innerHTML — added escapeHtml() helper, all user-facing formatters (fmtIP, fmtASN, fmtCountry, fmtJA4, fmtBotName, fmtLabel) sanitized - C2.1: status filter now correctly filters http_version column - C2.2: heatmap toDayOfWeek() - 1 for 0-indexed JS days SQL: - B1.3: view_ip_recurrence worst_score uses max() not min() (0=normal, 1=anomal) - B1.6: view_resource_cascade_1h joined into view_thesis_features_1h (§5.4) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-08 23:33:00 +02:00
toto	5c5bca71d1	feat: rewrite ASN classification with PeeringDB + expanded heuristics Major improvements to generate_asn_data.py: - Add PeeringDB network data source (34K networks with info_type) - Add new categories: education, government, enterprise - Rename 'human' label to 'isp' across all consumers - Expand keyword heuristics (ISP, datacenter, hosting, CDN, education, gov) - Add hard-coded lists for education, government, enterprise ASNs - Support both --output-dir and --output-asn/--output-ipasn CLI interfaces - Add --no-peeringdb flag for offline use Results: unknown dropped from 86% to 57%, ISP coverage 21.8K ASNs, education 3.1K, enterprise 5.7K, government 520. Updated consumers: - bot_detector.py: 'human' -> 'isp' for baseline selection - dashboard api.py: 'human' -> 'isp' in SQL queries - run-tests.sh: 'human' -> 'isp' in integration test assertions - update-csv-data.sh: updated label description comment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-08 16:02:07 +02:00
toto	9a48fb9d29	feat: LEGITIMATE_BROWSER classification from JA4 + behavioral consistency Add browser legitimacy classification (A9) to the bot detection pipeline: - New features: is_known_browser (binary) and browser_consistency_score [0..5] combining 5 signals: JA4 browser match, modern_browser_score, Accept-Language, cookies, Sec-Fetch-* presence - Post-scoring: sessions with known browser JA4 + consistency >= 4/5 + NORMAL/LOW threat level are reclassified as LEGITIMATE_BROWSER - Spoofing detection: inconsistent behavior (known JA4 but low consistency) stays in normal anomaly scoring — prevents evasion via JA4 spoofing - XGBoost treats LEGITIMATE_BROWSER as non-threat (negative label) - ClickHouse: browser_family column added to ml_detected_anomalies and ml_all_scores - Dashboard: browser_family filter/sort on detections and scores endpoints, legitimate_browsers count and browser_stats in overview - 6 new unit tests covering classification threshold, spoofing, exclusion logic Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-08 15:46:22 +02:00
toto	7d09c614c3	feat: browser JA4 detection, Anubis bot rules, worldwide ASN data - Add generate_browser_ja4.py: 1,186 browser JA4 fingerprints from FoxIO + ja4db.com covering 11 families (Chromium, Firefox, Safari, Edge, Tor, Opera, Vivaldi...) - Rewrite generate_bot_ip.py: Anubis YAML rules (Google, Bing, Apple, DuckDuck, OpenAI, Perplexity bots) + Tor exit nodes + cloud scanner IPs (3,555 entries) - Rewrite generate_asn_data.py: worldwide iptoasn.com data (78,049 ASNs, 714K CIDRs) - Add dict_browser_ja4 ClickHouse dictionary + browser_family in AI features views - Add /api/browsers dashboard endpoint - Fix CSV quoting for fields containing commas (User-Agent strings) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-08 15:27:37 +02:00
toto	b6184e6529	feat: CSV generation scripts, API filter params, enriched CSV stubs - scripts/generate_bot_ip.py: download Tor exit nodes + curate scanner IPs (1353 entries) - scripts/generate_bot_ja4.py: 31 bot JA4 fingerprints across 16 families - scripts/generate_asn_data.py: 38 ASNs + 96 IP-to-ASN prefixes - scripts/update-csv-data.sh: master orchestrator with --install-stubs - api.py: add asn_org/country_code/ja4/bot_name filters on detections+scores - pages.py: add /network route - csv-stubs: enriched with generated data (Tor nodes, scanner IPs, etc.) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-08 15:05:43 +02:00
toto	b735bab5a5	feat(dashboard): rebuild SOC dashboard + fix ClickHouse SQL Complete rewrite of the SOC dashboard using FastAPI + Jinja2 + htmx + Chart.js + Tailwind CSS. Replaces the old React/Vite frontend with server-rendered templates. Dashboard pages: - Overview: KPIs, timeline chart, threat distribution, top IPs - Detections: paginated/filterable anomaly table - Scores: ml_all_scores with AE error & XGB prob columns - Traffic: HTTP logs with method/host filters - IP Investigation: full deep-dive (scores, features, HTTP logs, classify) - Classification: SOC feedback form + history - Features: AI + thesis feature stats - Models: scoring stats + model metadata API: 9 JSON endpoints with parameterized queries, sort whitelists SQL fixes: - 05_aggregation_tables: add deduplicate_merge_projection_mode - 11_views: fix nested aggregate (argMax inside sum) - 12_thesis_features: remove invalid 'let' bindings, fix groupArrayIf type Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-08 03:21:05 +02:00
toto	ecceb04174	perf(clickhouse): P3 — view_ip_recurrence avec filtre TTL + supprimer FINAL view_ip_recurrence : Ajout de WHERE detected_at >= now() - INTERVAL 30 DAY → Avec PARTITION BY (P1), ClickHouse élagage les partitions hors de cette plage avant même de lire les données. La vue ne scanne que les partitions actives (au lieu des 30 partitions journalières complètes). → ORDER BY (src_ip) garantit que le GROUP BY src_ip lit des données contiguës (aucune réorganisation mémoire). rotation.py — supprimer FINAL sur ml_detected_anomalies : FINAL force une déduplication complète du ReplacingMergeTree en mémoire (équivalent à un DISTINCT sur toute la table) — une des opérations les plus coûteuses dans ClickHouse. Fix : remplacer le sous-SELECT FINAL par view_ip_recurrence (déjà aggrégée par src_ip, retourne recurrence directement sans FINAL). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-07 22:33:29 +02:00
toto	2bfb4b7282	perf(dashboard): P2 — remplacer replaceRegexpAll dans les WHERE par IPv4MappedToIPv6 Problème : 8 clauses WHERE appliquaient une fonction sur la colonne src_ip : WHERE replaceRegexpAll(toString(src_ip), '^::ffff:', '') = %(ip)s → ClickHouse ne peut pas utiliser l'index de tri ou les skipping indexes quand une fonction est appliquée à la colonne filtrée. Fix : transformer l'INPUT (le paramètre) plutôt que la colonne : WHERE src_ip = IPv4MappedToIPv6(toIPv4(%(ip)s)) → src_ip reste intact → ClickHouse utilise les indexes (P1) et la projection proj_by_ip (P1) pour ces requêtes. Fichiers modifiés : investigation_summary.py — 6 WHERE (ml_detected_anomalies, agg_host_ip_ja4_1h, view_form_bruteforce_detected, view_host_ip_ja4_rotation, view_ip_recurrence) ml_features.py — 1 WHERE (view_ai_features_1h) rotation.py — 1 WHERE (agg_host_ip_ja4_1h) Note : les 27 autres occurrences de replaceRegexpAll dans les SELECT sont des transformations d'affichage (IPv6→IPv4 pour l'UI) et ne bloquent pas les indexes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-07 22:31:57 +02:00
toto	14323f7b05	perf(clickhouse): P10 — créer les 4 vues métier manquantes + corriger préfixes DB Bug de production : view_form_bruteforce_detected, view_host_ip_ja4_rotation, view_dashboard_entities, view_dashboard_user_agents étaient référencées dans 13 endpoints du dashboard mais n'existaient nulle part dans le schéma. Tous ces endpoints retournaient HTTP 500 en production. shared/clickhouse/11_views.sql (nouveau) : view_form_bruteforce_detected Source : agg_host_ip_ja4_1h (24h) Logique : GROUP BY (src_ip, host) HAVING count_post >= 10 Usage : bruteforce.py (3 endpoints), investigation_summary.py view_host_ip_ja4_rotation Source : agg_host_ip_ja4_1h (24h) Logique : uniqExact(ja4) par src_ip, HAVING >= 2 (rotation de fingerprint) Usage : rotation.py (3 endpoints), investigation_summary.py view_dashboard_entities Source : http_logs (7 jours), UNION ALL 5 branches (ip/ja4/country/asn/host) Colonnes : entity_type, entity_value, src_ip, ja4, host, log_date, client_headers Array(String), asns Array, countries Array, user_agents Array Usage : entities.py (5 endpoints), clustering.py view_dashboard_user_agents Source : http_logs (7 jours), GROUP BY (src_ip, ja4, hour) Colonnes : src_ip, ja4, hour, log_date, user_agents Array(String), requests Usage : variability.py (4 endpoints), fingerprints.py (5 endpoints) attributes.py (2 endpoints) deploy_schema.sh : ajout de 10_perf_indexes.sql et 11_views.sql dans la liste routes/variability.py + fingerprints.py : Correction de 9 requêtes utilisant view_dashboard_user_agents sans préfixe de base de données → remplacé par {settings.CLICKHOUSE_DB_PROCESSING}.view_* Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-07 22:30:09 +02:00
toto	3dfeba860b	docs: add standardized comments to all services (Python, Go, Bash) - Add docs/commenting-standard.md defining per-language comment standards (Go godoc, Python PEP-257, C Doxygen, Bash header blocks, SQL banners) - services/dashboard: 100% docstring coverage (100/100 functions) - All FastAPI route handlers, helpers, classes, and models documented - Language: French (project convention) - services/bot-detector: 100% docstring coverage (53/53 symbols) - bot_detector.py: 14 functions + module docstring - anubis/fetch_rules.py: 9 functions - shared/python/ja4_common: full docstrings on ClickHouseClient (7 methods) and ClickHouseSettings class - services/correlator: 24 godoc comments added across 6 Go files - correlation_service.go: 10 private helpers - unixsocket/source.go: 6 parsing/socket helpers - correlated_log.go: 4 field extraction helpers - orchestrator.go, logger.go, main.go: 4 comments - services/correlator/scripts/audit-architecture.sh: standardized header block Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-07 21:32:29 +02:00
toto	b6391afbeb	refactor: replace hardcoded mabase_prod DB prefix with configurable settings Replace all hardcoded 'mabase_prod.' table prefixes in dashboard route SQL queries with configurable database names from settings: - http_logs, http_logs_raw → settings.CLICKHOUSE_DB_LOGS - All other tables → settings.CLICKHOUSE_DB_PROCESSING Also qualify previously unqualified table references (bare FROM/JOIN table_name) with the appropriate database prefix for consistency. Each route file now imports 'from ..config import settings' and uses f-strings with {settings.CLICKHOUSE_DB_PROCESSING} or {settings.CLICKHOUSE_DB_LOGS} for database-qualified table names. Files updated: analysis, attributes, audit, botnets, bruteforce, clustering, detections, entities, fingerprints, header_fingerprint, heatmap, incidents, investigation_summary, metrics, ml_features, rotation, search, tcp_spoofing, variability (19 files). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-07 19:03:05 +02:00
toto	d469e39da7	feat: ja4-platform monorepo — 5 services unified, tests & RPM builds standardized Services: - ja4sentinel: TLS/JA4 fingerprint capture daemon (Go, libpcap) - logcorrelator: JA4 log correlation engine (Go, ClickHouse) - mod_reqin_log: Apache module (C, JSON request logging) - bot_detector: ML bot detection pipeline (Python) - dashboard: FastAPI/Streamlit analytics UI (Python) Shared libraries: - shared/go/ja4common: logger, config, shutdown, ipfilter (Go module) - shared/python/ja4_common: ClickHouseClient, ClickHouseSettings (Python package) - shared/clickhouse/: canonical SQL migrations (10 files) Build & packaging: - Unified 3-stage Dockerfile.package for Go RPMs (el8/el9/el10) - go.work workspace linking sentinel, correlator, ja4common - Makefile with test-all, build-all, rpm-* targets Fixes applied: - go.work: 1.21 → 1.24.6 (required by sentinel) - correlator Dockerfiles: golang:1.21 → golang:1.24 - replace directives in go.mod for ja4common local path - pyproject.toml: setuptools.backends → setuptools.build_meta - Removed static libpcap linking (unavailable on Rocky 9) - Fixed data races in output/writers_test.go (sync.Mutex + atomic.Int32) - Rewrote corrupted test files (logger_test.go × 2) Test coverage: - correlator: 67.1% total (unixsocket 80.5%, config 91.7%, app 83.3%, multi 87.7%, stdout 100%) - sentinel: all 10 packages pass (api, capture, config, fingerprint, ipfilter, logging, output, tlsparse) Documentation: - README.md + docs/ (architecture, development, 5 services, shared libs, DB schema & migrations) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-07 16:42:59 +02:00

18 Commits