Files
ja4-platform/README.md
toto 9f3e0621e5 feat: split ClickHouse into dual configurable databases (ja4_logs / ja4_processing)
Architecture:
- ja4_logs: raw log ingestion (http_logs_raw, http_logs, mv_http_logs)
- ja4_processing: analytics, aggregation, ML, dictionaries, audit

Configuration (env vars):
- CLICKHOUSE_DB_LOGS (default: ja4_logs)
- CLICKHOUSE_DB_PROCESSING (default: ja4_processing)

Changes:
- SQL migrations (10 files): all mabase_prod refs → ja4_logs or ja4_processing
  with correct cross-database references (MVs, views, dicts)
- deploy_schema.sh: substitutes DB names from env vars at deploy time
- Python shared settings: added CLICKHOUSE_DB_LOGS + CLICKHOUSE_DB_PROCESSING
- Dashboard routes (19 files): replaced ~80 hardcoded mabase_prod refs
  with settings.CLICKHOUSE_DB_LOGS / settings.CLICKHOUSE_DB_PROCESSING
- Bot-detector: DB → CLICKHOUSE_DB_PROCESSING, fetch_rules.py configurable
- Correlator: DSN example updated to ja4_logs
- Docker-compose + .env files: new env vars with defaults
- All documentation updated (14 markdown files)

All tests pass: sentinel 10/10, correlator 67.1%, bot-detector 11, dashboard 20, ja4_common 18

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-07 19:10:35 +02:00

124 lines
6.9 KiB
Markdown

# ja4-platform
**ja4-platform** is a monorepo security pipeline for TLS fingerprinting (JA4/JA3) and bot detection. It captures live network traffic, correlates TLS handshakes with HTTP requests, detects anomalous behavior using machine learning (Isolation Forest), and presents results through a SOC analyst dashboard — all backed by ClickHouse as the central data store.
## Pipeline Overview
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ Linux Server (Apache) │
│ │
│ ┌─────────────────┐ ┌─────────────────────┐ │
│ │ mod-reqin-log │───────▶│ UNIX socket (HTTP) │──┐ │
│ │ (Apache module) │ JSON │ /var/run/logcorr/ │ │ │
│ │ C · httpd DSO │ │ http.socket │ │ │
│ └─────────────────┘ └─────────────────────┘ │ │
│ ▼ │
│ ┌─────────────────┐ ┌─────────────────────┐ ┌──────────────────┐ │
│ │ sentinel │───────▶│ UNIX socket (TLS) │─▶│ correlator │ │
│ │ (TLS capture) │ JSON │ /var/run/logcorr/ │ │ (event join) │ │
│ │ Go · libpcap │ │ network.socket │ │ Go · hex. arch │ │
│ └─────────────────┘ └─────────────────────┘ └────────┬─────────┘ │
│ │ │
└────────────────────────────────────────────────────────────────┼────────────┘
│ INSERT
┌──────────────────┐
│ ClickHouse │
│ ja4_processing │
│ (all tables) │
└────────┬─────────┘
│ SELECT
┌────────────────────┼────────────────────┐
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ bot-detector │ │ dashboard │
│ (ML anomaly det) │ │ (SOC web UI) │
│ Python · sklearn │ │ FastAPI + React │
└──────────────────┘ └──────────────────┘
```
## Services
| Service | Language | Purpose | Interface |
|---------|----------|---------|-----------|
| [sentinel](docs/services/sentinel.md) | Go | Live TLS packet capture, JA4/JA3 fingerprint generation | UNIX socket (`network.socket`) |
| [mod-reqin-log](docs/services/mod-reqin-log.md) | C | Apache HTTPD module, HTTP request JSON logging | UNIX socket (`http.socket`) |
| [correlator](docs/services/correlator.md) | Go | Joins HTTP + TLS events by `src_ip:src_port` + time window | ClickHouse INSERT, file, stdout |
| [bot-detector](docs/services/bot-detector.md) | Python | Isolation Forest ML anomaly detection on aggregated traffic | ClickHouse read/write, HTTP `:8080` |
| [dashboard](docs/services/dashboard.md) | Python/JS | SOC analyst web dashboard (FastAPI + React) | HTTP `:8000` |
## Shared Libraries
| Library | Language | Description |
|---------|----------|-------------|
| [go/ja4common](docs/shared/go-ja4common.md) | Go | Logger, config loader, shutdown handler, IP filter |
| [python/ja4_common](docs/shared/python-ja4common.md) | Python | ClickHouse client singleton, settings |
## Quickstart
### Prerequisites
- Docker (with BuildKit) and Docker Compose
- `make`
- No native Go, Python, or C toolchains required — all builds run inside Docker
### Build All Services
```bash
make build-all
```
### Run All Tests
```bash
make test-all
```
### Build RPM Packages
```bash
make rpm-all
# RPMs written to services/<service>/dist/
```
## Documentation
| Document | Description |
|----------|-------------|
| [Architecture](docs/architecture.md) | System architecture, data flow, component interactions |
| [Development](docs/development.md) | Build, test, package, and extend the platform |
| [Database Schema](docs/database/schema.md) | Every ClickHouse table, view, dictionary, and materialized view |
| [Database Migrations](docs/database/migrations.md) | Migration order, application, verification, and rollback |
### Service Documentation
- [Sentinel](docs/services/sentinel.md) — TLS capture daemon
- [mod-reqin-log](docs/services/mod-reqin-log.md) — Apache HTTP logging module
- [Correlator](docs/services/correlator.md) — HTTP/TLS event correlation engine
- [Bot Detector](docs/services/bot-detector.md) — ML anomaly detection
- [Dashboard](docs/services/dashboard.md) — SOC web dashboard and API
### Shared Library Documentation
- [go-ja4common](docs/shared/go-ja4common.md) — Go shared library
- [python-ja4common](docs/shared/python-ja4common.md) — Python shared library
## Go Workspace
The repository uses a Go workspace (`go.work`) to link the Go modules:
```
go 1.21
use (
./services/sentinel
./services/correlator
./shared/go/ja4common
)
```
## License
See individual service directories for license information.