f4ffe3410aa36c5f8602d44267a61e1e115d60eb
Problème : toutes les requêtes du dashboard WHERE detected_at >= now() - INTERVAL N
faisaient un full scan car ml_detected_anomalies avait ORDER BY (src_ip) sans
partition ni index temporel.
Changements :
- 06_ml_tables.sql :
* ml_detected_anomalies : PARTITION BY toYYYYMMDD(detected_at)
→ élagage de partitions journalières sur toutes les requêtes temporelles
* INDEX idx_detected_at (minmax) → skip des granules hors plage
* INDEX idx_threat_level set(8) → skip pour countIf(threat_level = ...)
* INDEX idx_bot_name bloom_filter → skip pour bot_name != ''
* ttl_only_drop_parts = 1 → TTL par suppression de partition entière
* ml_all_scores : même traitement (PARTITION BY + 2 indexes)
- 04_mv_http_logs.sql :
* http_logs : INDEX idx_src_ip bloom_filter(0.01)
→ les requêtes WHERE src_ip = X (analysis.py, variability.py) sautent
~90% des granules sans scanner toute la plage temporelle
* INDEX idx_ja4 bloom_filter(0.01) → idem pour filtres JA4
- 05_aggregation_tables.sql :
* agg_host_ip_ja4_1h : PROJECTION proj_by_ip ORDER BY (src_ip, window_start, ...)
→ investigation_summary.py et rotation.py (WHERE src_ip = X) utilisent
automatiquement la projection au lieu de scanner tous les window_start
- 10_perf_indexes.sql (nouveau) :
* Migration ALTER TABLE pour instances existantes
* ADD INDEX + MATERIALIZE INDEX pour les 4 tables
* ADD PROJECTION + MATERIALIZE PROJECTION pour agg_host_ip_ja4_1h
* Note : PARTITION BY sur table existante nécessite recréation (documenté)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ja4-platform
ja4-platform is a monorepo security pipeline for TLS fingerprinting (JA4/JA3) and bot detection. It captures live network traffic, correlates TLS handshakes with HTTP requests, detects anomalous behavior using machine learning (Isolation Forest), and presents results through a SOC analyst dashboard — all backed by ClickHouse as the central data store.
Pipeline Overview
┌─────────────────────────────────────────────────────────────────────────────┐
│ Linux Server (Apache) │
│ │
│ ┌─────────────────┐ ┌─────────────────────┐ │
│ │ mod-reqin-log │───────▶│ UNIX socket (HTTP) │──┐ │
│ │ (Apache module) │ JSON │ /var/run/logcorr/ │ │ │
│ │ C · httpd DSO │ │ http.socket │ │ │
│ └─────────────────┘ └─────────────────────┘ │ │
│ ▼ │
│ ┌─────────────────┐ ┌─────────────────────┐ ┌──────────────────┐ │
│ │ sentinel │───────▶│ UNIX socket (TLS) │─▶│ correlator │ │
│ │ (TLS capture) │ JSON │ /var/run/logcorr/ │ │ (event join) │ │
│ │ Go · libpcap │ │ network.socket │ │ Go · hex. arch │ │
│ └─────────────────┘ └─────────────────────┘ └────────┬─────────┘ │
│ │ │
└────────────────────────────────────────────────────────────────┼────────────┘
│ INSERT
▼
┌──────────────────┐
│ ClickHouse │
│ ja4_processing │
│ (all tables) │
└────────┬─────────┘
│ SELECT
┌────────────────────┼────────────────────┐
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ bot-detector │ │ dashboard │
│ (ML anomaly det) │ │ (SOC web UI) │
│ Python · sklearn │ │ FastAPI + React │
└──────────────────┘ └──────────────────┘
Services
| Service | Language | Purpose | Interface |
|---|---|---|---|
| sentinel | Go | Live TLS packet capture, JA4/JA3 fingerprint generation | UNIX socket (network.socket) |
| mod-reqin-log | C | Apache HTTPD module, HTTP request JSON logging | UNIX socket (http.socket) |
| correlator | Go | Joins HTTP + TLS events by src_ip:src_port + time window |
ClickHouse INSERT, file, stdout |
| bot-detector | Python | Isolation Forest ML anomaly detection on aggregated traffic | ClickHouse read/write, HTTP :8080 |
| dashboard | Python/JS | SOC analyst web dashboard (FastAPI + React) | HTTP :8000 |
Shared Libraries
| Library | Language | Description |
|---|---|---|
| go/ja4common | Go | Logger, config loader, shutdown handler, IP filter |
| python/ja4_common | Python | ClickHouse client singleton, settings |
Quickstart
Prerequisites
- Docker (with BuildKit) and Docker Compose
make- No native Go, Python, or C toolchains required — all builds run inside Docker
Build All Services
make build-all
Run All Tests
make test-all
Build RPM Packages
make rpm-all
# RPMs written to services/<service>/dist/
Documentation
| Document | Description |
|---|---|
| Architecture | System architecture, data flow, component interactions |
| Development | Build, test, package, and extend the platform |
| Database Schema | Every ClickHouse table, view, dictionary, and materialized view |
| Database Migrations | Migration order, application, verification, and rollback |
Service Documentation
- Sentinel — TLS capture daemon
- mod-reqin-log — Apache HTTP logging module
- Correlator — HTTP/TLS event correlation engine
- Bot Detector — ML anomaly detection
- Dashboard — SOC web dashboard and API
Shared Library Documentation
- go-ja4common — Go shared library
- python-ja4common — Python shared library
Go Workspace
The repository uses a Go workspace (go.work) to link the Go modules:
go 1.21
use (
./services/sentinel
./services/correlator
./shared/go/ja4common
)
License
See individual service directories for license information.
Description
Languages
Python
38.2%
HTML
24.8%
Go
16.1%
Shell
15.1%
C
3.5%
Other
2.3%