b6184e6529
feat: CSV generation scripts, API filter params, enriched CSV stubs
...
- scripts/generate_bot_ip.py: download Tor exit nodes + curate scanner IPs (1353 entries)
- scripts/generate_bot_ja4.py: 31 bot JA4 fingerprints across 16 families
- scripts/generate_asn_data.py: 38 ASNs + 96 IP-to-ASN prefixes
- scripts/update-csv-data.sh: master orchestrator with --install-stubs
- api.py: add asn_org/country_code/ja4/bot_name filters on detections+scores
- pages.py: add /network route
- csv-stubs: enriched with generated data (Tor nodes, scanner IPs, etc.)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-08 15:05:43 +02:00
c6ca352db9
feat(dashboard): add clickable drill-down to all data elements
...
Add navigation helpers (fmtASN, fmtCountry, fmtJA4, fmtBotName,
fmtThreatLink, fmtLabel) to base.html for SOC analyst drill-down.
Update all templates:
- overview.html: clickable table cells + ECharts click handlers for
ASN, country, JA4, bot, and threat charts
- detections.html: URL param pre-filters, active filter bar with
clear buttons, clickable ASN/country/JA4/threat in table
- scores.html: URL param pre-filters, clickable threat/JA4/country
- traffic.html: clickable JA4 and country columns
- ip_detail.html: clickable threat/JA4 in detections, clickable
asn_org/country_code/asn_label in AI features grid
- network.html: click handlers on ASN treemap and country sunburst,
fmtJA4Full/fmtLabel/fmtBotName/fmtASN in tables
- features.html: scatter plot click navigates to /ip/{ip}
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-08 14:58:48 +02:00
b735bab5a5
feat(dashboard): rebuild SOC dashboard + fix ClickHouse SQL
...
Complete rewrite of the SOC dashboard using FastAPI + Jinja2 + htmx + Chart.js + Tailwind CSS.
Replaces the old React/Vite frontend with server-rendered templates.
Dashboard pages:
- Overview: KPIs, timeline chart, threat distribution, top IPs
- Detections: paginated/filterable anomaly table
- Scores: ml_all_scores with AE error & XGB prob columns
- Traffic: HTTP logs with method/host filters
- IP Investigation: full deep-dive (scores, features, HTTP logs, classify)
- Classification: SOC feedback form + history
- Features: AI + thesis feature stats
- Models: scoring stats + model metadata
API: 9 JSON endpoints with parameterized queries, sort whitelists
SQL fixes:
- 05_aggregation_tables: add deduplicate_merge_projection_mode
- 11_views: fix nested aggregate (argMax inside sum)
- 12_thesis_features: remove invalid 'let' bindings, fix groupArrayIf type
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-08 03:21:05 +02:00
ecceb04174
perf(clickhouse): P3 — view_ip_recurrence avec filtre TTL + supprimer FINAL
...
view_ip_recurrence :
Ajout de WHERE detected_at >= now() - INTERVAL 30 DAY
→ Avec PARTITION BY (P1), ClickHouse élagage les partitions hors de cette
plage avant même de lire les données. La vue ne scanne que les partitions
actives (au lieu des 30 partitions journalières complètes).
→ ORDER BY (src_ip) garantit que le GROUP BY src_ip lit des données
contiguës (aucune réorganisation mémoire).
rotation.py — supprimer FINAL sur ml_detected_anomalies :
FINAL force une déduplication complète du ReplacingMergeTree en mémoire
(équivalent à un DISTINCT sur toute la table) — une des opérations les plus
coûteuses dans ClickHouse.
Fix : remplacer le sous-SELECT FINAL par view_ip_recurrence (déjà aggrégée
par src_ip, retourne recurrence directement sans FINAL).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-07 22:33:29 +02:00
2bfb4b7282
perf(dashboard): P2 — remplacer replaceRegexpAll dans les WHERE par IPv4MappedToIPv6
...
Problème : 8 clauses WHERE appliquaient une fonction sur la colonne src_ip :
WHERE replaceRegexpAll(toString(src_ip), '^::ffff:', '') = %(ip)s
→ ClickHouse ne peut pas utiliser l'index de tri ou les skipping indexes
quand une fonction est appliquée à la colonne filtrée.
Fix : transformer l'INPUT (le paramètre) plutôt que la colonne :
WHERE src_ip = IPv4MappedToIPv6(toIPv4(%(ip)s))
→ src_ip reste intact → ClickHouse utilise les indexes (P1) et la
projection proj_by_ip (P1) pour ces requêtes.
Fichiers modifiés :
investigation_summary.py — 6 WHERE (ml_detected_anomalies, agg_host_ip_ja4_1h,
view_form_bruteforce_detected, view_host_ip_ja4_rotation,
view_ip_recurrence)
ml_features.py — 1 WHERE (view_ai_features_1h)
rotation.py — 1 WHERE (agg_host_ip_ja4_1h)
Note : les 27 autres occurrences de replaceRegexpAll dans les SELECT sont des
transformations d'affichage (IPv6→IPv4 pour l'UI) et ne bloquent pas les indexes.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-07 22:31:57 +02:00
14323f7b05
perf(clickhouse): P10 — créer les 4 vues métier manquantes + corriger préfixes DB
...
Bug de production : view_form_bruteforce_detected, view_host_ip_ja4_rotation,
view_dashboard_entities, view_dashboard_user_agents étaient référencées dans
13 endpoints du dashboard mais n'existaient nulle part dans le schéma.
Tous ces endpoints retournaient HTTP 500 en production.
shared/clickhouse/11_views.sql (nouveau) :
view_form_bruteforce_detected
Source : agg_host_ip_ja4_1h (24h)
Logique : GROUP BY (src_ip, host) HAVING count_post >= 10
Usage : bruteforce.py (3 endpoints), investigation_summary.py
view_host_ip_ja4_rotation
Source : agg_host_ip_ja4_1h (24h)
Logique : uniqExact(ja4) par src_ip, HAVING >= 2 (rotation de fingerprint)
Usage : rotation.py (3 endpoints), investigation_summary.py
view_dashboard_entities
Source : http_logs (7 jours), UNION ALL 5 branches (ip/ja4/country/asn/host)
Colonnes : entity_type, entity_value, src_ip, ja4, host, log_date,
client_headers Array(String), asns Array, countries Array,
user_agents Array
Usage : entities.py (5 endpoints), clustering.py
view_dashboard_user_agents
Source : http_logs (7 jours), GROUP BY (src_ip, ja4, hour)
Colonnes : src_ip, ja4, hour, log_date, user_agents Array(String), requests
Usage : variability.py (4 endpoints), fingerprints.py (5 endpoints)
attributes.py (2 endpoints)
deploy_schema.sh : ajout de 10_perf_indexes.sql et 11_views.sql dans la liste
routes/variability.py + fingerprints.py :
Correction de 9 requêtes utilisant view_dashboard_user_agents sans préfixe
de base de données → remplacé par {settings.CLICKHOUSE_DB_PROCESSING}.view_*
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-07 22:30:09 +02:00
3dfeba860b
docs: add standardized comments to all services (Python, Go, Bash)
...
- Add docs/commenting-standard.md defining per-language comment standards
(Go godoc, Python PEP-257, C Doxygen, Bash header blocks, SQL banners)
- services/dashboard: 100% docstring coverage (100/100 functions)
- All FastAPI route handlers, helpers, classes, and models documented
- Language: French (project convention)
- services/bot-detector: 100% docstring coverage (53/53 symbols)
- bot_detector.py: 14 functions + module docstring
- anubis/fetch_rules.py: 9 functions
- shared/python/ja4_common: full docstrings on ClickHouseClient (7 methods)
and ClickHouseSettings class
- services/correlator: 24 godoc comments added across 6 Go files
- correlation_service.go: 10 private helpers
- unixsocket/source.go: 6 parsing/socket helpers
- correlated_log.go: 4 field extraction helpers
- orchestrator.go, logger.go, main.go: 4 comments
- services/correlator/scripts/audit-architecture.sh: standardized header block
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-07 21:32:29 +02:00
9f3e0621e5
feat: split ClickHouse into dual configurable databases (ja4_logs / ja4_processing)
...
Architecture:
- ja4_logs: raw log ingestion (http_logs_raw, http_logs, mv_http_logs)
- ja4_processing: analytics, aggregation, ML, dictionaries, audit
Configuration (env vars):
- CLICKHOUSE_DB_LOGS (default: ja4_logs)
- CLICKHOUSE_DB_PROCESSING (default: ja4_processing)
Changes:
- SQL migrations (10 files): all mabase_prod refs → ja4_logs or ja4_processing
with correct cross-database references (MVs, views, dicts)
- deploy_schema.sh: substitutes DB names from env vars at deploy time
- Python shared settings: added CLICKHOUSE_DB_LOGS + CLICKHOUSE_DB_PROCESSING
- Dashboard routes (19 files): replaced ~80 hardcoded mabase_prod refs
with settings.CLICKHOUSE_DB_LOGS / settings.CLICKHOUSE_DB_PROCESSING
- Bot-detector: DB → CLICKHOUSE_DB_PROCESSING, fetch_rules.py configurable
- Correlator: DSN example updated to ja4_logs
- Docker-compose + .env files: new env vars with defaults
- All documentation updated (14 markdown files)
All tests pass: sentinel 10/10, correlator 67.1%, bot-detector 11, dashboard 20, ja4_common 18
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-07 19:10:35 +02:00
b6391afbeb
refactor: replace hardcoded mabase_prod DB prefix with configurable settings
...
Replace all hardcoded 'mabase_prod.' table prefixes in dashboard route
SQL queries with configurable database names from settings:
- http_logs, http_logs_raw → settings.CLICKHOUSE_DB_LOGS
- All other tables → settings.CLICKHOUSE_DB_PROCESSING
Also qualify previously unqualified table references (bare FROM/JOIN
table_name) with the appropriate database prefix for consistency.
Each route file now imports 'from ..config import settings' and uses
f-strings with {settings.CLICKHOUSE_DB_PROCESSING} or
{settings.CLICKHOUSE_DB_LOGS} for database-qualified table names.
Files updated: analysis, attributes, audit, botnets, bruteforce,
clustering, detections, entities, fingerprints, header_fingerprint,
heatmap, incidents, investigation_summary, metrics, ml_features,
rotation, search, tcp_spoofing, variability (19 files).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-07 19:03:05 +02:00
d469e39da7
feat: ja4-platform monorepo — 5 services unified, tests & RPM builds standardized
...
Services:
- ja4sentinel: TLS/JA4 fingerprint capture daemon (Go, libpcap)
- logcorrelator: JA4 log correlation engine (Go, ClickHouse)
- mod_reqin_log: Apache module (C, JSON request logging)
- bot_detector: ML bot detection pipeline (Python)
- dashboard: FastAPI/Streamlit analytics UI (Python)
Shared libraries:
- shared/go/ja4common: logger, config, shutdown, ipfilter (Go module)
- shared/python/ja4_common: ClickHouseClient, ClickHouseSettings (Python package)
- shared/clickhouse/: canonical SQL migrations (10 files)
Build & packaging:
- Unified 3-stage Dockerfile.package for Go RPMs (el8/el9/el10)
- go.work workspace linking sentinel, correlator, ja4common
- Makefile with test-all, build-all, rpm-* targets
Fixes applied:
- go.work: 1.21 → 1.24.6 (required by sentinel)
- correlator Dockerfiles: golang:1.21 → golang:1.24
- replace directives in go.mod for ja4common local path
- pyproject.toml: setuptools.backends → setuptools.build_meta
- Removed static libpcap linking (unavailable on Rocky 9)
- Fixed data races in output/writers_test.go (sync.Mutex + atomic.Int32)
- Rewrote corrupted test files (logger_test.go × 2)
Test coverage:
- correlator: 67.1% total (unixsocket 80.5%, config 91.7%, app 83.3%, multi 87.7%, stdout 100%)
- sentinel: all 10 packages pass (api, capture, config, fingerprint, ipfilter, logging, output, tlsparse)
Documentation:
- README.md + docs/ (architecture, development, 5 services, shared libs, DB schema & migrations)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-07 16:42:59 +02:00