Commit Graph

5 Commits

Author SHA1 Message Date
14323f7b05 perf(clickhouse): P10 — créer les 4 vues métier manquantes + corriger préfixes DB
Bug de production : view_form_bruteforce_detected, view_host_ip_ja4_rotation,
view_dashboard_entities, view_dashboard_user_agents étaient référencées dans
13 endpoints du dashboard mais n'existaient nulle part dans le schéma.
Tous ces endpoints retournaient HTTP 500 en production.

shared/clickhouse/11_views.sql (nouveau) :

  view_form_bruteforce_detected
    Source : agg_host_ip_ja4_1h (24h)
    Logique : GROUP BY (src_ip, host) HAVING count_post >= 10
    Usage   : bruteforce.py (3 endpoints), investigation_summary.py

  view_host_ip_ja4_rotation
    Source : agg_host_ip_ja4_1h (24h)
    Logique : uniqExact(ja4) par src_ip, HAVING >= 2 (rotation de fingerprint)
    Usage   : rotation.py (3 endpoints), investigation_summary.py

  view_dashboard_entities
    Source : http_logs (7 jours), UNION ALL 5 branches (ip/ja4/country/asn/host)
    Colonnes : entity_type, entity_value, src_ip, ja4, host, log_date,
               client_headers Array(String), asns Array, countries Array,
               user_agents Array
    Usage   : entities.py (5 endpoints), clustering.py

  view_dashboard_user_agents
    Source : http_logs (7 jours), GROUP BY (src_ip, ja4, hour)
    Colonnes : src_ip, ja4, hour, log_date, user_agents Array(String), requests
    Usage   : variability.py (4 endpoints), fingerprints.py (5 endpoints)
              attributes.py (2 endpoints)

deploy_schema.sh : ajout de 10_perf_indexes.sql et 11_views.sql dans la liste

routes/variability.py + fingerprints.py :
  Correction de 9 requêtes utilisant view_dashboard_user_agents sans préfixe
  de base de données → remplacé par {settings.CLICKHOUSE_DB_PROCESSING}.view_*

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-07 22:30:09 +02:00
f4ffe3410a perf(clickhouse): P1 — partition + skipping indexes sur ml_detected_anomalies, http_logs, agg_host_ip_ja4_1h
Problème : toutes les requêtes du dashboard WHERE detected_at >= now() - INTERVAL N
faisaient un full scan car ml_detected_anomalies avait ORDER BY (src_ip) sans
partition ni index temporel.

Changements :
- 06_ml_tables.sql :
  * ml_detected_anomalies : PARTITION BY toYYYYMMDD(detected_at)
    → élagage de partitions journalières sur toutes les requêtes temporelles
  * INDEX idx_detected_at (minmax) → skip des granules hors plage
  * INDEX idx_threat_level set(8) → skip pour countIf(threat_level = ...)
  * INDEX idx_bot_name bloom_filter → skip pour bot_name != ''
  * ttl_only_drop_parts = 1 → TTL par suppression de partition entière
  * ml_all_scores : même traitement (PARTITION BY + 2 indexes)

- 04_mv_http_logs.sql :
  * http_logs : INDEX idx_src_ip bloom_filter(0.01)
    → les requêtes WHERE src_ip = X (analysis.py, variability.py) sautent
    ~90% des granules sans scanner toute la plage temporelle
  * INDEX idx_ja4 bloom_filter(0.01) → idem pour filtres JA4

- 05_aggregation_tables.sql :
  * agg_host_ip_ja4_1h : PROJECTION proj_by_ip ORDER BY (src_ip, window_start, ...)
    → investigation_summary.py et rotation.py (WHERE src_ip = X) utilisent
    automatiquement la projection au lieu de scanner tous les window_start

- 10_perf_indexes.sql (nouveau) :
  * Migration ALTER TABLE pour instances existantes
  * ADD INDEX + MATERIALIZE INDEX pour les 4 tables
  * ADD PROJECTION + MATERIALIZE PROJECTION pour agg_host_ip_ja4_1h
  * Note : PARTITION BY sur table existante nécessite recréation (documenté)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-07 22:28:04 +02:00
d4e7e674d8 feat: full-stack Docker Compose integration tests
- 4-container stack: ClickHouse, platform (Rocky 9), bot-detector, dashboard
- Platform builds sentinel on Rocky (CGO+libpcap native), correlator static
- mod-reqin-log compiled with apxs on Rocky (matching RPM build target)
- ClickHouse init script patches credentials for test env (sed-based)
- 8-phase test runner: schema, traffic gen, pipeline, dashboard API, bot-detector, sentinel
- All 13 checks pass, 3 non-blocking warnings (empty dicts, log paths)

SQL schema fixes discovered during integration:
- 02_dictionaries: IPv6CIDR → String (not a valid ClickHouse type)
- 03_anubis_tables: dict_anubis_ua missing has_ip/rule_id/category attrs
- 03_anubis_tables: dict_anubis_country FLAT() → COMPLEX_KEY_HASHED() (String key)
- 09_audit_table: CODEC before DEFAULT → DEFAULT before CODEC

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-07 20:33:25 +02:00
9f3e0621e5 feat: split ClickHouse into dual configurable databases (ja4_logs / ja4_processing)
Architecture:
- ja4_logs: raw log ingestion (http_logs_raw, http_logs, mv_http_logs)
- ja4_processing: analytics, aggregation, ML, dictionaries, audit

Configuration (env vars):
- CLICKHOUSE_DB_LOGS (default: ja4_logs)
- CLICKHOUSE_DB_PROCESSING (default: ja4_processing)

Changes:
- SQL migrations (10 files): all mabase_prod refs → ja4_logs or ja4_processing
  with correct cross-database references (MVs, views, dicts)
- deploy_schema.sh: substitutes DB names from env vars at deploy time
- Python shared settings: added CLICKHOUSE_DB_LOGS + CLICKHOUSE_DB_PROCESSING
- Dashboard routes (19 files): replaced ~80 hardcoded mabase_prod refs
  with settings.CLICKHOUSE_DB_LOGS / settings.CLICKHOUSE_DB_PROCESSING
- Bot-detector: DB → CLICKHOUSE_DB_PROCESSING, fetch_rules.py configurable
- Correlator: DSN example updated to ja4_logs
- Docker-compose + .env files: new env vars with defaults
- All documentation updated (14 markdown files)

All tests pass: sentinel 10/10, correlator 67.1%, bot-detector 11, dashboard 20, ja4_common 18

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-07 19:10:35 +02:00
d469e39da7 feat: ja4-platform monorepo — 5 services unified, tests & RPM builds standardized
Services:
- ja4sentinel: TLS/JA4 fingerprint capture daemon (Go, libpcap)
- logcorrelator: JA4 log correlation engine (Go, ClickHouse)
- mod_reqin_log: Apache module (C, JSON request logging)
- bot_detector: ML bot detection pipeline (Python)
- dashboard: FastAPI/Streamlit analytics UI (Python)

Shared libraries:
- shared/go/ja4common: logger, config, shutdown, ipfilter (Go module)
- shared/python/ja4_common: ClickHouseClient, ClickHouseSettings (Python package)
- shared/clickhouse/: canonical SQL migrations (10 files)

Build & packaging:
- Unified 3-stage Dockerfile.package for Go RPMs (el8/el9/el10)
- go.work workspace linking sentinel, correlator, ja4common
- Makefile with test-all, build-all, rpm-* targets

Fixes applied:
- go.work: 1.21 → 1.24.6 (required by sentinel)
- correlator Dockerfiles: golang:1.21 → golang:1.24
- replace directives in go.mod for ja4common local path
- pyproject.toml: setuptools.backends → setuptools.build_meta
- Removed static libpcap linking (unavailable on Rocky 9)
- Fixed data races in output/writers_test.go (sync.Mutex + atomic.Int32)
- Rewrote corrupted test files (logger_test.go × 2)

Test coverage:
- correlator: 67.1% total (unixsocket 80.5%, config 91.7%, app 83.3%, multi 87.7%, stdout 100%)
- sentinel: all 10 packages pass (api, capture, config, fingerprint, ipfilter, logging, output, tlsparse)

Documentation:
- README.md + docs/ (architecture, development, 5 services, shared libs, DB schema & migrations)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-07 16:42:59 +02:00