fc882dd3e72e31ed364e897af315ff9e21c6bb90
Option A — X-Forwarded-For + mod_remoteip:
- httpd-integration.conf: load mod_remoteip, trust all Docker RFC-1918
subnets (172/192.168/10). mod_reqin_log uses r->useragent_ip which
mod_remoteip updates from XFF → each request logged with distinct src_ip
- generate_traffic.py: XFF always set (was 30% only); human scenarios
use 91.121/78.41/90.x ranges, bot scenarios use 185.220/45.155/193.32;
pool of 1168 human IPs and 180 bot IPs; default --requests 500
Option D — Direct ClickHouse seeder (seed_clickhouse.py, stdlib only):
- Inserts ~4000 rows into http_logs_raw triggering full MV chain:
http_logs_raw → mv_http_logs → http_logs
→ mv_agg_host_ip_ja4_1h → agg_host_ip_ja4_1h
• 720 human sessions: IPs in OVH/SFR/Orange ASN ranges (16276/15557/3215)
→ dict_asn_reputation maps these to asn_label='human'
→ satisfies bot_detector human_baseline >= 500 threshold
• 150 scanner sessions: datacenter IPs, attack paths (/.env, wp-login,
SQLi, path traversal), scanner UAs, minimal TCP fingerprints
• 100 known-bot sessions: IPs matching bot_ip.csv entries
• 20 brute-force clusters: 20-50 POST /login per IP
All TCP/TLS metadata is profile-realistic (window, MSS, TTL, JA4, JA3)
CSV stubs (mounted at /var/lib/clickhouse/user_files/):
- iplocate-ip-to-asn.csv: 13 CIDR→ASN mappings (OVH/SFR/Orange/Tor/Contabo)
- asn_reputation.csv: 13 ASN→label (8 'human', 3 'datacenter'/'hosting')
- bot_ip.csv: 14 known scanner/Tor IPs (Shodan, Censys, Tor exits)
- bot_ja4.csv: 5 bot JA4 fingerprints (curl, python-requests, masscan, zgrab)
run-tests.sh:
- Phase 4a: seeder runs before live traffic (ensures bot_detector baseline)
- Phase 4b: live traffic gen at 500 requests (up from 200)
- Phase 5f: new assertions — agg_host_ip_ja4_1h populated, ≥500 human
rows in view_ai_features_1h, known-bot labels present
- Phase 7: verifies ml_all_scores populated (bot_detector ran a cycle)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ja4-platform
ja4-platform is a monorepo security pipeline for TLS fingerprinting (JA4/JA3) and bot detection. It captures live network traffic, correlates TLS handshakes with HTTP requests, detects anomalous behavior using machine learning (Isolation Forest), and presents results through a SOC analyst dashboard — all backed by ClickHouse as the central data store.
Pipeline Overview
┌─────────────────────────────────────────────────────────────────────────────┐
│ Linux Server (Apache) │
│ │
│ ┌─────────────────┐ ┌─────────────────────┐ │
│ │ mod-reqin-log │───────▶│ UNIX socket (HTTP) │──┐ │
│ │ (Apache module) │ JSON │ /var/run/logcorr/ │ │ │
│ │ C · httpd DSO │ │ http.socket │ │ │
│ └─────────────────┘ └─────────────────────┘ │ │
│ ▼ │
│ ┌─────────────────┐ ┌─────────────────────┐ ┌──────────────────┐ │
│ │ sentinel │───────▶│ UNIX socket (TLS) │─▶│ correlator │ │
│ │ (TLS capture) │ JSON │ /var/run/logcorr/ │ │ (event join) │ │
│ │ Go · libpcap │ │ network.socket │ │ Go · hex. arch │ │
│ └─────────────────┘ └─────────────────────┘ └────────┬─────────┘ │
│ │ │
└────────────────────────────────────────────────────────────────┼────────────┘
│ INSERT
▼
┌──────────────────┐
│ ClickHouse │
│ ja4_processing │
│ (all tables) │
└────────┬─────────┘
│ SELECT
┌────────────────────┼────────────────────┐
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ bot-detector │ │ dashboard │
│ (ML anomaly det) │ │ (SOC web UI) │
│ Python · sklearn │ │ FastAPI + React │
└──────────────────┘ └──────────────────┘
Services
| Service | Language | Purpose | Interface |
|---|---|---|---|
| sentinel | Go | Live TLS packet capture, JA4/JA3 fingerprint generation | UNIX socket (network.socket) |
| mod-reqin-log | C | Apache HTTPD module, HTTP request JSON logging | UNIX socket (http.socket) |
| correlator | Go | Joins HTTP + TLS events by src_ip:src_port + time window |
ClickHouse INSERT, file, stdout |
| bot-detector | Python | Isolation Forest ML anomaly detection on aggregated traffic | ClickHouse read/write, HTTP :8080 |
| dashboard | Python/JS | SOC analyst web dashboard (FastAPI + React) | HTTP :8000 |
Shared Libraries
| Library | Language | Description |
|---|---|---|
| go/ja4common | Go | Logger, config loader, shutdown handler, IP filter |
| python/ja4_common | Python | ClickHouse client singleton, settings |
Quickstart
Prerequisites
- Docker (with BuildKit) and Docker Compose
make- No native Go, Python, or C toolchains required — all builds run inside Docker
Build All Services
make build-all
Run All Tests
make test-all
Build RPM Packages
make rpm-all
# RPMs written to services/<service>/dist/
Documentation
| Document | Description |
|---|---|
| Architecture | System architecture, data flow, component interactions |
| Development | Build, test, package, and extend the platform |
| Database Schema | Every ClickHouse table, view, dictionary, and materialized view |
| Database Migrations | Migration order, application, verification, and rollback |
Service Documentation
- Sentinel — TLS capture daemon
- mod-reqin-log — Apache HTTP logging module
- Correlator — HTTP/TLS event correlation engine
- Bot Detector — ML anomaly detection
- Dashboard — SOC web dashboard and API
Shared Library Documentation
- go-ja4common — Go shared library
- python-ja4common — Python shared library
Go Workspace
The repository uses a Go workspace (go.work) to link the Go modules:
go 1.21
use (
./services/sentinel
./services/correlator
./shared/go/ja4common
)
License
See individual service directories for license information.
Description
Languages
Python
38.2%
HTML
24.8%
Go
16.1%
Shell
15.1%
C
3.5%
Other
2.3%