# ja4-platform **ja4-platform** is a monorepo security pipeline for TLS fingerprinting (JA4/JA3) and bot detection. It captures live network traffic, correlates TLS handshakes with HTTP requests, applies triple-voice ML anomaly detection (Extended Isolation Forest + Autoencoder + XGBoost), and surfaces results through a SOC analyst dashboard — all backed by ClickHouse with a dual-database architecture. ## Pipeline Overview ``` ┌──────────────────────────────────────────────────────────────────────────────┐ │ Linux Server (Apache) │ │ │ │ ┌─────────────────┐ UNIX socket (DGRAM) ┌──────────────────┐ │ │ │ mod-reqin-log │──── http.socket ────────────────▶│ │ │ │ │ (Apache C11) │ (source A) │ correlator │ │ │ └─────────────────┘ │ (Go · hex. │ │ │ │ architecture) │ │ │ ┌─────────────────┐ UNIX socket (DGRAM) │ │ │ │ │ sentinel │──── network.socket ─────────────▶│ Joins by │ │ │ │ (Go · libpcap) │ (source B) │ src_ip:src_port│ │ │ │ JA4/JA3 gen. │ └────────┬─────────┘ │ │ └─────────────────┘ │ │ └──────────────────────────────────────────────────────────────────┼────────────┘ │ INSERT ▼ ┌──────────────────────────────────────┐ │ ClickHouse 24.8 │ │ │ │ ja4_logs ja4_processing │ │ ┌──────────┐ ┌──────────────┐ │ │ │_raw → MV │────▶│ agg_* (×6) │ │ │ │→ http_logs│ │ ml_* (×2) │ │ │ └──────────┘ │ views, dicts │ │ │ └──────────────┘ │ └─────────┬───────────────┬────────────┘ │ │ ┌────────────────┘ └───────────────┐ ▼ ▼ ┌────────────────────┐ ┌────────────────────┐ │ bot-detector │ │ dashboard │ │ Python 3.11 │ │ FastAPI + Jinja2 │ │ EIF + AE + XGBoost │ │ htmx + Chart.js │ │ HDBSCAN · SHAP │ │ 55 routes · 14 pp │ └────────────────────┘ └────────────────────┘ ``` ## Services | Service | Language | Description | Interface | |---------|----------|-------------|-----------| | [sentinel](docs/services/sentinel.md) | Go 1.24.6 | TLS/TCP packet capture via libpcap, JA4/JA3 fingerprint generation | UNIX socket → `network.socket` | | [mod-reqin-log](docs/services/mod-reqin-log.md) | C11 | Apache HTTPD module, HTTP request JSON logging | UNIX socket → `http.socket` | | [correlator](docs/services/correlator.md) | Go 1.24.6 | Hexagonal architecture, correlates HTTP+TLS events by `src_ip:src_port` | ClickHouse INSERT (Native TCP) | | [bot-detector](docs/services/bot-detector.md) | Python 3.11 | Triple-voice ML ensemble (EIF+AE+XGB), HDBSCAN campaigns, SHAP explainability | ClickHouse read/write, HTTP `:8080` | | [dashboard](docs/services/dashboard.md) | Python 3.11 | SOC analyst dashboard: 55 routes, 15 templates, 14 pages | HTTP `:8000` | ## Shared Libraries | Library | Language | Description | |---------|----------|-------------| | [go/ja4common](docs/shared/go-ja4common.md) | Go | Logger, config loader, graceful shutdown handler, IP filter | | [python/ja4_common](docs/shared/python-ja4common.md) | Python | `ClickHouseClient` singleton, `ClickHouseSettings` (pydantic-settings) | ## Quickstart ### Prerequisites - Docker (with BuildKit) and Docker Compose - `make` - No native Go, Python, or C toolchains required — all builds run inside Docker ### Build All Services ```bash make build-all ``` ### Run All Tests ```bash make test-all ``` ### Build RPM Packages ```bash make rpm-all # RPMs written to services//dist/rpm/el{8,9,10}/ ``` ## Scripts Helper scripts are located in `scripts/`: | Script | Description | |--------|-------------| | `init-stack.sh` | Full ClickHouse stack initialization — deploys schema, loads CSV data, verifies all components | | `import-prod-data.sh` | Imports pre-exported production data into the dev database with dynamic date shifting | | `reload-prod-logs.sh` | Exports `http_logs` from production and re-imports into the dev database | | `update-csv-data.sh` | Downloads and generates all CSV reference data (bot IPs, JA4 signatures, ASN reputation) | | `generate_bot_ip.py` | Generates `bot_ip.csv` from known scanner/bot sources + Tor exit nodes | | `generate_bot_ja4.py` | Generates `bot_ja4.csv` from known bot TLS fingerprints | | `generate_asn_data.py` | Generates `asn_reputation.csv` (ASN→label mapping) | | `generate_browser_ja4.py` | Generates browser JA4 reference data for legitimate browser detection | Corresponding Makefile targets: ```bash make init-stack # runs scripts/init-stack.sh make import-prod-data # runs scripts/import-prod-data.sh make init-and-import # init-stack + import-prod-data make reload-prod-logs # runs scripts/reload-prod-logs.sh ``` ## Integration Tests Full-stack integration tests run against Docker Compose with a real ClickHouse instance: ```bash make test-integration # 8 phases: build → start → schema → traffic → pipeline → dashboard → bot-detector → sentinel make test-integration-keep # same but leaves stack running after make test-integration-down # tear down integration stack ``` The integration test suite is located in `tests/integration/` and resets the database between runs. ## Documentation | Document | Description | |----------|-------------| | [Architecture](docs/architecture.md) | System architecture, data flow, component interactions | | [Deployment](docs/deployment.md) | Step-by-step production deployment guide | | [Development](docs/development.md) | Build, test, package, and extend the platform | | [Database Schema](docs/database/schema.md) | Every ClickHouse table, view, dictionary, and materialized view | | [Database Migrations](docs/database/migrations.md) | Migration order, application, verification, and rollback | | [Commenting Standard](docs/commenting-standard.md) | Code commenting conventions (French comments, English identifiers) | | [Thesis Reference](docs/THESIS_HTTP_Traffic_Detection.md) | Academic reference: HTTP traffic detection techniques | | [Audit vs Thesis](docs/AUDIT_Detection_vs_Thesis.md) | Comparison between platform implementation and thesis techniques | ### Service Documentation - [Sentinel](docs/services/sentinel.md) — TLS/TCP capture daemon (Go + libpcap) - [mod-reqin-log](docs/services/mod-reqin-log.md) — Apache HTTP logging module (C11) - [Correlator](docs/services/correlator.md) — HTTP/TLS event correlation engine (Go) - [Bot Detector](docs/services/bot-detector.md) — Triple-voice ML anomaly detection (Python) - [Dashboard](docs/services/dashboard.md) — SOC analyst dashboard and API (FastAPI) ### Shared Library Documentation - [go-ja4common](docs/shared/go-ja4common.md) — Go shared library (logger, config, shutdown, ipfilter) - [python-ja4common](docs/shared/python-ja4common.md) — Python shared library (ClickHouse client, settings) ## Go Workspace The repository uses a Go workspace (`go.work`) to link the Go modules: ``` go 1.24.6 use ( ./services/sentinel ./services/correlator ./shared/go/ja4common ) ``` Both Go services have a `replace` directive in their `go.mod` pointing to `../../shared/go/ja4common`. The workspace takes precedence for local development; the `replace` is needed for Docker builds where `go.work` is not available. ## License See individual service directories for license information.