- 12_thesis_features.sql: move view_resource_cascade_1h before view_thesis_features_1h - Makefile: purge-db uses --reset (not --clean) - mod-reqin-log: ctest --test-dir build/tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ja4-platform
ja4-platform is a monorepo security pipeline for TLS fingerprinting (JA4/JA3) and bot detection. It captures live network traffic, correlates TLS handshakes with HTTP requests, applies triple-voice ML anomaly detection (Extended Isolation Forest + Autoencoder + XGBoost), and surfaces results through a SOC analyst dashboard — all backed by ClickHouse with a dual-database architecture.
Pipeline Overview
┌──────────────────────────────────────────────────────────────────────────────┐
│ Linux Server (Apache) │
│ │
│ ┌─────────────────┐ UNIX socket (DGRAM) ┌──────────────────┐ │
│ │ mod-reqin-log │──── http.socket ────────────────▶│ │ │
│ │ (Apache C11) │ (source A) │ correlator │ │
│ └─────────────────┘ │ (Go · hex. │ │
│ │ architecture) │ │
│ ┌─────────────────┐ UNIX socket (DGRAM) │ │ │
│ │ sentinel │──── network.socket ─────────────▶│ Joins by │ │
│ │ (Go · libpcap) │ (source B) │ src_ip:src_port│ │
│ │ JA4/JA3 gen. │ └────────┬─────────┘ │
│ └─────────────────┘ │ │
└──────────────────────────────────────────────────────────────────┼────────────┘
│ INSERT
▼
┌──────────────────────────────────────┐
│ ClickHouse 24.8 │
│ │
│ ja4_logs ja4_processing │
│ ┌──────────┐ ┌──────────────┐ │
│ │_raw → MV │────▶│ agg_* (×6) │ │
│ │→ http_logs│ │ ml_* (×2) │ │
│ └──────────┘ │ views, dicts │ │
│ └──────────────┘ │
└─────────┬───────────────┬────────────┘
│ │
┌────────────────┘ └───────────────┐
▼ ▼
┌────────────────────┐ ┌────────────────────┐
│ bot-detector │ │ dashboard │
│ Python 3.11 │ │ FastAPI + Jinja2 │
│ EIF + AE + XGBoost │ │ htmx + Chart.js │
│ HDBSCAN · SHAP │ │ 55 routes · 14 pp │
└────────────────────┘ └────────────────────┘
Services
| Service | Language | Description | Interface |
|---|---|---|---|
| sentinel | Go 1.24.6 | TLS/TCP packet capture via libpcap, JA4/JA3 fingerprint generation | UNIX socket → network.socket |
| mod-reqin-log | C11 | Apache HTTPD module, HTTP request JSON logging | UNIX socket → http.socket |
| correlator | Go 1.24.6 | Hexagonal architecture, correlates HTTP+TLS events by src_ip:src_port |
ClickHouse INSERT (Native TCP) |
| bot-detector | Python 3.11 | Triple-voice ML ensemble (EIF+AE+XGB), HDBSCAN campaigns, SHAP explainability | ClickHouse read/write, HTTP :8080 |
| dashboard | Python 3.11 | SOC analyst dashboard: 55 routes, 15 templates, 14 pages | HTTP :8000 |
Shared Libraries
| Library | Language | Description |
|---|---|---|
| go/ja4common | Go | Logger, config loader, graceful shutdown handler, IP filter |
| python/ja4_common | Python | ClickHouseClient singleton, ClickHouseSettings (pydantic-settings) |
Quickstart
Prerequisites
- Docker (with BuildKit) and Docker Compose
make- No native Go, Python, or C toolchains required — all builds run inside Docker
Build All Services
make build-all
Run All Tests
make test-all
Build RPM Packages
make rpm-all
# RPMs written to services/<service>/dist/rpm/el{8,9,10}/
Scripts
Helper scripts are located in scripts/:
| Script | Description |
|---|---|
init-stack.sh |
Full ClickHouse stack initialization — deploys schema, loads CSV data, verifies all components |
import-prod-data.sh |
Imports pre-exported production data into the dev database with dynamic date shifting |
reload-prod-logs.sh |
Exports http_logs from production and re-imports into the dev database |
update-csv-data.sh |
Downloads and generates all CSV reference data (bot IPs, JA4 signatures, ASN reputation) |
generate_bot_ip.py |
Generates bot_ip.csv from known scanner/bot sources + Tor exit nodes |
generate_bot_ja4.py |
Generates bot_ja4.csv from known bot TLS fingerprints |
generate_asn_data.py |
Generates asn_reputation.csv (ASN→label mapping) |
generate_browser_ja4.py |
Generates browser JA4 reference data for legitimate browser detection |
Corresponding Makefile targets:
make init-stack # runs scripts/init-stack.sh
make import-prod-data # runs scripts/import-prod-data.sh
make init-and-import # init-stack + import-prod-data
make reload-prod-logs # runs scripts/reload-prod-logs.sh
Integration Tests
Full-stack integration tests run against Docker Compose with a real ClickHouse instance:
make test-integration # 8 phases: build → start → schema → traffic → pipeline → dashboard → bot-detector → sentinel
make test-integration-keep # same but leaves stack running after
make test-integration-down # tear down integration stack
The integration test suite is located in tests/integration/ and resets the database between runs.
Documentation
| Document | Description |
|---|---|
| Architecture | System architecture, data flow, component interactions |
| Deployment | Step-by-step production deployment guide |
| Development | Build, test, package, and extend the platform |
| Database Schema | Every ClickHouse table, view, dictionary, and materialized view |
| Database Migrations | Migration order, application, verification, and rollback |
| Commenting Standard | Code commenting conventions (French comments, English identifiers) |
| Thesis Reference | Academic reference: HTTP traffic detection techniques |
| Audit vs Thesis | Comparison between platform implementation and thesis techniques |
Service Documentation
- Sentinel — TLS/TCP capture daemon (Go + libpcap)
- mod-reqin-log — Apache HTTP logging module (C11)
- Correlator — HTTP/TLS event correlation engine (Go)
- Bot Detector — Triple-voice ML anomaly detection (Python)
- Dashboard — SOC analyst dashboard and API (FastAPI)
Shared Library Documentation
- go-ja4common — Go shared library (logger, config, shutdown, ipfilter)
- python-ja4common — Python shared library (ClickHouse client, settings)
Go Workspace
The repository uses a Go workspace (go.work) to link the Go modules:
go 1.24.6
use (
./services/sentinel
./services/correlator
./shared/go/ja4common
)
Both Go services have a replace directive in their go.mod pointing to ../../shared/go/ja4common. The workspace takes precedence for local development; the replace is needed for Docker builds where go.work is not available.
License
See individual service directories for license information.