Files
ja4-platform/services/bot-detector/docker-compose.yml
toto d469e39da7 feat: ja4-platform monorepo — 5 services unified, tests & RPM builds standardized
Services:
- ja4sentinel: TLS/JA4 fingerprint capture daemon (Go, libpcap)
- logcorrelator: JA4 log correlation engine (Go, ClickHouse)
- mod_reqin_log: Apache module (C, JSON request logging)
- bot_detector: ML bot detection pipeline (Python)
- dashboard: FastAPI/Streamlit analytics UI (Python)

Shared libraries:
- shared/go/ja4common: logger, config, shutdown, ipfilter (Go module)
- shared/python/ja4_common: ClickHouseClient, ClickHouseSettings (Python package)
- shared/clickhouse/: canonical SQL migrations (10 files)

Build & packaging:
- Unified 3-stage Dockerfile.package for Go RPMs (el8/el9/el10)
- go.work workspace linking sentinel, correlator, ja4common
- Makefile with test-all, build-all, rpm-* targets

Fixes applied:
- go.work: 1.21 → 1.24.6 (required by sentinel)
- correlator Dockerfiles: golang:1.21 → golang:1.24
- replace directives in go.mod for ja4common local path
- pyproject.toml: setuptools.backends → setuptools.build_meta
- Removed static libpcap linking (unavailable on Rocky 9)
- Fixed data races in output/writers_test.go (sync.Mutex + atomic.Int32)
- Rewrote corrupted test files (logger_test.go × 2)

Test coverage:
- correlator: 67.1% total (unixsocket 80.5%, config 91.7%, app 83.3%, multi 87.7%, stdout 100%)
- sentinel: all 10 packages pass (api, capture, config, fingerprint, ipfilter, logging, output, tlsparse)

Documentation:
- README.md + docs/ (architecture, development, 5 services, shared libs, DB schema & migrations)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-07 16:42:59 +02:00

79 lines
4.6 KiB
YAML

version: '3.8' # Champ déprécié depuis Docker Compose v2.x mais toléré — peut être supprimé
services:
bot_detector_ai:
build: bot_detector
container_name: bot_detector_ai
restart: unless-stopped
ports:
- "8080:8080" # Health check → GET http://localhost:8080/
env_file:
- .env
environment:
# ── ClickHouse ────────────────────────────────────────────────────────
CLICKHOUSE_HOST: ${CLICKHOUSE_HOST:-clickhouse}
CLICKHOUSE_DB: ${CLICKHOUSE_DB:-mabase_prod}
CLICKHOUSE_USER: ${CLICKHOUSE_USER:-admin}
CLICKHOUSE_PASSWORD: ${CLICKHOUSE_PASSWORD:-}
# ── Modèle IA ─────────────────────────────────────────────────────────
ISOLATION_CONTAMINATION: ${ISOLATION_CONTAMINATION:-0.02}
ANOMALY_THRESHOLD: ${ANOMALY_THRESHOLD:--0.03}
# ── Cycle ─────────────────────────────────────────────────────────────
CYCLE_INTERVAL_SEC: ${CYCLE_INTERVAL_SEC:-300}
MAX_CONSECUTIVE_FAILURES: ${MAX_CONSECUTIVE_FAILURES:-3}
# ── Logs ──────────────────────────────────────────────────────────────
BOT_DETECTOR_LOG: ${BOT_DETECTOR_LOG:-/var/log/bot_detector/decisions.jsonl}
LOG_BACKUP_COUNT: ${LOG_BACKUP_COUNT:-7}
# ── Modèles persistants ───────────────────────────────────────────────
MODEL_DIR: ${MODEL_DIR:-/var/lib/bot_detector}
RETRAIN_INTERVAL_HOURS: ${RETRAIN_INTERVAL_HOURS:-24}
MODEL_HISTORY_COUNT: ${MODEL_HISTORY_COUNT:-10}
# ── A1 — Dérive conceptuelle ──────────────────────────────────────────
DRIFT_THRESHOLD: ${DRIFT_THRESHOLD:-0.30}
# ── A2 — Seuil adaptatif ──────────────────────────────────────────────
ANOMALY_PERCENTILE: ${ANOMALY_PERCENTILE:-5}
# ── A3 — Analyse multi-fenêtres ───────────────────────────────────────
ENABLE_MULTIWINDOW: ${ENABLE_MULTIWINDOW:-false}
MULTIWINDOW_VIEW: ${MULTIWINDOW_VIEW:-view_ai_features_24h}
# ── A4 — Explainabilité SHAP ──────────────────────────────────────────
ENABLE_SHAP: ${ENABLE_SHAP:-true}
# ── A5 — Déduplication inter-cycles avec TTL ──────────────────────────
DEDUP_TTL_MIN: ${DEDUP_TTL_MIN:-60}
# ── A6 — Pondération du score par récurrence ──────────────────────────
RECURRENCE_WEIGHT: ${RECURRENCE_WEIGHT:-0.005}
# ── A7 — Validation de complétude des features ────────────────────────
MIN_VALID_FEATURE_RATIO: ${MIN_VALID_FEATURE_RATIO:-0.50}
# ── A8 — Clustering comportemental des anomalies ──────────────────────
ENABLE_CLUSTERING: ${ENABLE_CLUSTERING:-true}
CLUSTERING_MIN_SAMPLES: ${CLUSTERING_MIN_SAMPLES:-3}
# ── Health check ──────────────────────────────────────────────────────
HEALTH_PORT: ${HEALTH_PORT:-8080}
volumes:
# Logs structurés JSONL (analyse a posteriori)
- ./bot_detector_logs:/var/log/bot_detector
# Modèles Isolation Forest sérialisés (joblib)
- ./bot_detector_models:/var/lib/bot_detector
# Fichiers CSV de réputation partagés avec ClickHouse (FILE engine)
# Montés en read-only côté bot_detector (écriture via ClickHouse uniquement)
- ./reputation/data/user_files/bot_ip.csv:/data/bot_ip.csv:ro
- ./reputation/data/user_files/bot_ja4.csv:/data/bot_ja4.csv:ro
- ./reputation/data/user_files/asn_reputation.csv:/data/asn_reputation.csv:ro