fix(ja4ebpf): split bpf2go generate into Ja4Tc + Ja4Ssl, fix RPM systemd-rpm-macros

- Use two separate //go:generate directives (Ja4Tc for tc_capture.c, Ja4Ssl
  for uprobe_ssl.c) to avoid duplicate LICENSE symbol and multi-file clang issue
- Update loader.go to hold tcObjs/sslObjs separately with correct field names:
  UprobeSslSetFd, UprobeSslReadEntry, UretprobeSslReadExit,
  KprobeAccept4Entry, KretprobeAccept4Exit
- Add systemd-rpm-macros to all three RPM build stages (el8/el9/el10)
  so that %{_unitdir} macro resolves correctly
- RPMs now build successfully for el8, el9, el10

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
toto
2026-04-11 23:21:11 +02:00
parent a1e4c1dad5
commit 3b047b680a
155 changed files with 197011 additions and 599 deletions

219
README.md
View File

@ -1,169 +1,166 @@
# ja4-platform
**ja4-platform** is a monorepo security pipeline for TLS fingerprinting (JA4/JA3) and bot detection. It captures live network traffic, correlates TLS handshakes with HTTP requests, applies triple-voice ML anomaly detection (Extended Isolation Forest + Autoencoder + XGBoost), and surfaces results through a SOC analyst dashboard — all backed by ClickHouse with a dual-database architecture.
**ja4-platform** est un pipeline de sécurité monorepo pour le fingerprinting TLS (JA4/JA3) et la détection de bots HTTP. Un agent unique basé sur eBPF observe passivement le trafic réseau de manière non-intrusive, reconstruit les sessions TCP/TLS/HTTP en mémoire, et alimente une base ClickHouse pour la détection d'anomalies par apprentissage automatique et la présentation dans un tableau de bord SOC.
## Pipeline Overview
```
┌──────────────────────────────────────────────────────────────────────────────┐
Linux Server (Apache)
│ ┌─────────────────┐ UNIX socket (DGRAM) ┌──────────────────┐
mod-reqin-log │──── http.socket ────────────────▶│
(Apache C11) (source A) correlator
│ └─────────────────┘ │ (Go · hex.
│ architecture) │ │
│ ┌─────────────────┐ UNIX socket (DGRAM) │ │ │
│ │ sentinel │──── network.socket ─────────────▶│ Joins by
(Go · libpcap) │ (source B) src_ip:src_port│
│ │ JA4/JA3 gen. │ └────────┬─────────┘ │
│ └─────────────────┘ │
└──────────────────────────────────────────────────────────────────┼────────────┘
│ INSERT
┌──────────────────────────────────────┐
│ ClickHouse 24.8 │
│ │
│ ja4_logs ja4_processing │
│ ┌──────────┐ ┌──────────────┐ │
│ │_raw → MV │────▶│ agg_* (×6) │ │
│ │→ http_logs│ │ ml_* (×2)
└──────────┘ │ views, dicts │ │
│ └──────────────┘ │
└─────────┬───────────────┬────────────┘
┌────────────────┘ └───────────────┐
┌────────────────────┐ ┌────────────────────┐
│ bot-detector │ │ dashboard │
Python 3.11 │ │ FastAPI + Jinja2 │
EIF + AE + XGBoost │ htmx + Chart.js │
│ HDBSCAN · SHAP │ │ 55 routes · 14 pp │
└────────────────────┘ └────────────────────┘
+-----------------------------------------------------------------------+
| Linux Server (Apache / Nginx / Varnish / Hitch) |
| |
| +-------------------------------------------------------------+ |
| | ja4ebpf (agent eBPF GO) | |
| | | |
| | Hook TC ingress (L3/L4/L5 - passif) : | |
| reseau| SYN -> TTL, DF, IP-ID, MSS, Window, Scale | |
| ----->| TLS -> ClientHello : JA4, ALPN, SNI, extensions | |
| XDP/TC| HTTP -> payload port 80/8080 (magic bytes router) | |
| | | |
| uprobe| Hook SSL_read (L7 - trafic dechiffre) : | |
| ----->| HTTP/1.1 -> methode, path, headers (ordre exact) | |
| | HTTP/2 -> SETTINGS, WINDOW_UPDATE, pseudo-headers | |
| | | |
| | Correlation in-memory src_ip:src_port | |
| | 256 shards . timeout 500ms . GC 100ms | |
| +----------------------------+---------------------------------+ |
+-----------------------------------------------------------------------+
| INSERT batch (Native TCP :9000)
v
+------------------------------------------+
| ClickHouse 24.8 |
| |
| ja4_logs ja4_processing |
| +-----------+ +--------------+ |
| |_raw -> MV |------->| agg_* (x6) | |
| |-> http_logs | ml_* (x2) | |
| +-----------+ | views, dicts | |
+------+---------------------------+----+
| |
+--------+ +----------+
v v
+--------------------+ +--------------------+
| bot-detector | | dashboard |
| Python 3.11 | | FastAPI + Jinja2 |
| EIF + AE + XGBoost| | htmx + Chart.js |
| HDBSCAN . SHAP | | SOC analyst UI |
+--------------------+ +--------------------+
```
## Services
| Service | Language | Description | Interface |
|---------|----------|-------------|-----------|
| [sentinel](docs/services/sentinel.md) | Go 1.24.6 | TLS/TCP packet capture via libpcap, JA4/JA3 fingerprint generation | UNIX socket → `network.socket` |
| [mod-reqin-log](docs/services/mod-reqin-log.md) | C11 | Apache HTTPD module, HTTP request JSON logging | UNIX socket → `http.socket` |
| [correlator](docs/services/correlator.md) | Go 1.24.6 | Hexagonal architecture, correlates HTTP+TLS events by `src_ip:src_port` | ClickHouse INSERT (Native TCP) |
| [bot-detector](docs/services/bot-detector.md) | Python 3.11 | Triple-voice ML ensemble (EIF+AE+XGB), HDBSCAN campaigns, SHAP explainability | ClickHouse read/write, HTTP `:8080` |
| [dashboard](docs/services/dashboard.md) | Python 3.11 | SOC analyst dashboard: 55 routes, 15 templates, 14 pages | HTTP `:8000` |
| Service | Langage | Description | Interface |
|---------|---------|-------------|-----------|
| [ja4ebpf](docs/services/ja4ebpf.md) | Go 1.24 + C eBPF (CO-RE) | Agent eBPF passif : TC ingress (L3/L4/L5), uprobe SSL_read (L7 HTTPS), TC port 80/8080 (HTTP clair), corrélation in-memory, insert ClickHouse | INSERT batch ClickHouse |
| [bot-detector](docs/services/bot-detector.md) | Python 3.11 | Ensemble ML triple-voix (EIF+AE+XGB), clustering HDBSCAN, explicabilité SHAP | ClickHouse read/write, HTTP `:8080` |
| [dashboard](docs/services/dashboard.md) | Python 3.11 | Tableau de bord SOC : 9 endpoints JSON, 8 pages HTML | HTTP `:8000` |
## Shared Libraries
## Bibliothèques partagées
| Library | Language | Description |
|---------|----------|-------------|
| Bibliothèque | Langage | Description |
|--------------|---------|-------------|
| [go/ja4common](docs/shared/go-ja4common.md) | Go | Logger, config loader, graceful shutdown handler, IP filter |
| [python/ja4_common](docs/shared/python-ja4common.md) | Python | `ClickHouseClient` singleton, `ClickHouseSettings` (pydantic-settings) |
## Quickstart
### Prerequisites
### Prérequis
- Docker (with BuildKit) and Docker Compose
- Docker (avec BuildKit) et Docker Compose
- `make`
- No native Go, Python, or C toolchains required — all builds run inside Docker
- Aucune toolchain Go, Python, C ou eBPF n'est requise sur la machine hôte — tous les builds s'exécutent dans Docker Rocky Linux.
### Build All Services
### Build de tous les services
```bash
make build-all
```
### Run All Tests
### Exécution de tous les tests
```bash
make test-all
```
### Build RPM Packages
### Build des paquets RPM
```bash
make rpm-all
# RPMs written to services/<service>/dist/rpm/el{8,9,10}/
make rpm-ja4ebpf
# RPMs écrits dans services/ja4ebpf/dist/rpm/el{8,9,10}/
```
## Tests d'intégration
Tests full-stack avec Docker Compose et une vraie instance ClickHouse :
```bash
make test-integration # stack Apache référence (8 phases)
make test-nginx # stack nginx + ja4ebpf
make test-nginx-varnish # nginx + Varnish + ja4ebpf
make test-hitch-varnish # hitch (TLS) + Varnish + ja4ebpf
make test-all-stacks # les 3 stacks serveur en séquence
make test-integration-keep # laisse la stack en fonctionnement
make test-integration-down # démontage de la stack
```
La suite de tests se trouve dans `tests/integration/` et réinitialise la base entre chaque exécution.
## Scripts
Helper scripts are located in `scripts/`:
| Script | Description |
|--------|-------------|
| `init-stack.sh` | Full ClickHouse stack initialization — deploys schema, loads CSV data, verifies all components |
| `import-prod-data.sh` | Imports pre-exported production data into the dev database with dynamic date shifting |
| `reload-prod-logs.sh` | Exports `http_logs` from production and re-imports into the dev database |
| `update-csv-data.sh` | Downloads and generates all CSV reference data (bot IPs, JA4 signatures, ASN reputation) |
| `generate_bot_ip.py` | Generates `bot_ip.csv` from known scanner/bot sources + Tor exit nodes |
| `generate_bot_ja4.py` | Generates `bot_ja4.csv` from known bot TLS fingerprints |
| `generate_asn_data.py` | Generates `asn_reputation.csv` (ASN→label mapping) |
| `generate_browser_ja4.py` | Generates browser JA4 reference data for legitimate browser detection |
| `init-stack.sh` | Initialisation complète du schéma ClickHouse + chargement CSV |
| `import-prod-data.sh` | Import de données de production avec décalage temporel |
| `reload-prod-logs.sh` | Export prod -> réimport dev avec décalage |
| `update-csv-data.sh` | Génération des CSV de référence (bot IPs, signatures JA4, ASN) |
Corresponding Makefile targets:
Cibles Makefile correspondantes :
```bash
make init-stack # runs scripts/init-stack.sh
make import-prod-data # runs scripts/import-prod-data.sh
make init-stack # schéma + CSV
make import-prod-data # données prod
make init-and-import # init-stack + import-prod-data
make reload-prod-logs # runs scripts/reload-prod-logs.sh
make reload-prod-logs # rechargement depuis la prod
```
## Integration Tests
Full-stack integration tests run against Docker Compose with a real ClickHouse instance:
```bash
make test-integration # 8 phases: build → start → schema → traffic → pipeline → dashboard → bot-detector → sentinel
make test-integration-keep # same but leaves stack running after
make test-integration-down # tear down integration stack
```
The integration test suite is located in `tests/integration/` and resets the database between runs.
## Documentation
| Document | Description |
|----------|-------------|
| [Architecture](docs/architecture.md) | System architecture, data flow, component interactions |
| [Deployment](docs/deployment.md) | Step-by-step production deployment guide |
| [Development](docs/development.md) | Build, test, package, and extend the platform |
| [Database Schema](docs/database/schema.md) | Every ClickHouse table, view, dictionary, and materialized view |
| [Database Migrations](docs/database/migrations.md) | Migration order, application, verification, and rollback |
| [Commenting Standard](docs/commenting-standard.md) | Code commenting conventions (French comments, English identifiers) |
| [Thesis Reference](docs/THESIS_HTTP_Traffic_Detection.md) | Academic reference: HTTP traffic detection techniques |
| [Audit vs Thesis](docs/AUDIT_Detection_vs_Thesis.md) | Comparison between platform implementation and thesis techniques |
### Service Documentation
- [Sentinel](docs/services/sentinel.md) — TLS/TCP capture daemon (Go + libpcap)
- [mod-reqin-log](docs/services/mod-reqin-log.md) — Apache HTTP logging module (C11)
- [Correlator](docs/services/correlator.md) — HTTP/TLS event correlation engine (Go)
- [Bot Detector](docs/services/bot-detector.md) — Triple-voice ML anomaly detection (Python)
- [Dashboard](docs/services/dashboard.md) — SOC analyst dashboard and API (FastAPI)
### Shared Library Documentation
- [go-ja4common](docs/shared/go-ja4common.md) — Go shared library (logger, config, shutdown, ipfilter)
- [python-ja4common](docs/shared/python-ja4common.md) — Python shared library (ClickHouse client, settings)
## Go Workspace
The repository uses a Go workspace (`go.work`) to link the Go modules:
Le workspace `go.work` lie les modules Go du dépôt :
```
go 1.24.6
use (
./services/sentinel
./services/correlator
./shared/go/ja4common
./services/ja4ebpf
)
```
Both Go services have a `replace` directive in their `go.mod` pointing to `../../shared/go/ja4common`. The workspace takes precedence for local development; the `replace` is needed for Docker builds where `go.work` is not available.
`ja4ebpf` utilise une directive `replace` dans son `go.mod` vers `../../shared/go/ja4common`. Le workspace prend priorité en développement local ; la directive `replace` est nécessaire pour les builds Docker.
## Documentation
| Document | Description |
|----------|-------------|
| [Architecture](docs/architecture.md) | Architecture système, flux de données, interactions entre composants |
| [Deployment](docs/deployment.md) | Guide de déploiement en production |
| [Development](docs/development.md) | Build, test, packaging et extension de la plateforme |
| [Database Schema](docs/database/schema.md) | Tables, vues, dictionnaires et vues matérialisées ClickHouse |
| [Database Migrations](docs/database/migrations.md) | Ordre de migration, application, vérification et rollback |
| [Commenting Standard](docs/commenting-standard.md) | Conventions de commentaires (commentaires français, identifiants anglais) |
| [Thesis Reference](docs/THESIS_HTTP_Traffic_Detection.md) | Référence académique : techniques de détection du trafic HTTP |
| [Audit vs Thesis](docs/AUDIT_Detection_vs_Thesis.md) | Comparaison entre l'implémentation et les techniques de la thèse |
### Documentation des services
- [ja4ebpf](docs/services/ja4ebpf.md) — Agent eBPF CO-RE (Go + C), capture réseau passive multi-couches
- [Bot Detector](docs/services/bot-detector.md) — Détection ML d'anomalies triple-voix (Python)
- [Dashboard](docs/services/dashboard.md) — Tableau de bord SOC et API (FastAPI)
### Documentation des bibliothèques partagées
- [go-ja4common](docs/shared/go-ja4common.md) — Bibliothèque Go partagée (logger, config, shutdown, ipfilter)
- [python-ja4common](docs/shared/python-ja4common.md) — Bibliothèque Python partagée (client ClickHouse, settings)
## License
See individual service directories for license information.
Voir les répertoires des services individuels pour les informations de licence.