feat: ja4-platform monorepo — 5 services unified, tests & RPM builds standardized

Services:
- ja4sentinel: TLS/JA4 fingerprint capture daemon (Go, libpcap)
- logcorrelator: JA4 log correlation engine (Go, ClickHouse)
- mod_reqin_log: Apache module (C, JSON request logging)
- bot_detector: ML bot detection pipeline (Python)
- dashboard: FastAPI/Streamlit analytics UI (Python)

Shared libraries:
- shared/go/ja4common: logger, config, shutdown, ipfilter (Go module)
- shared/python/ja4_common: ClickHouseClient, ClickHouseSettings (Python package)
- shared/clickhouse/: canonical SQL migrations (10 files)

Build & packaging:
- Unified 3-stage Dockerfile.package for Go RPMs (el8/el9/el10)
- go.work workspace linking sentinel, correlator, ja4common
- Makefile with test-all, build-all, rpm-* targets

Fixes applied:
- go.work: 1.21 → 1.24.6 (required by sentinel)
- correlator Dockerfiles: golang:1.21 → golang:1.24
- replace directives in go.mod for ja4common local path
- pyproject.toml: setuptools.backends → setuptools.build_meta
- Removed static libpcap linking (unavailable on Rocky 9)
- Fixed data races in output/writers_test.go (sync.Mutex + atomic.Int32)
- Rewrote corrupted test files (logger_test.go × 2)

Test coverage:
- correlator: 67.1% total (unixsocket 80.5%, config 91.7%, app 83.3%, multi 87.7%, stdout 100%)
- sentinel: all 10 packages pass (api, capture, config, fingerprint, ipfilter, logging, output, tlsparse)

Documentation:
- README.md + docs/ (architecture, development, 5 services, shared libs, DB schema & migrations)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
toto
2026-04-07 16:42:59 +02:00
commit d469e39da7
278 changed files with 1621301 additions and 0 deletions

247
docs/services/sentinel.md Normal file
View File

@ -0,0 +1,247 @@
# Sentinel
Sentinel (`ja4sentinel`) is a Go daemon that performs live network packet capture on a Linux server, extracts TLS ClientHello handshakes, generates JA4 and JA3 fingerprints, enriches them with IP/TCP metadata, and outputs structured JSON log records to configurable destinations (UNIX socket, file, or stdout).
## Role in the Pipeline
Sentinel is the **network-layer ingestion point**. It sits on the target server, captures TLS traffic via libpcap, and feeds fingerprinted events to the [correlator](correlator.md) through a UNIX datagram socket.
```
Network traffic (port 443/8443)
│ pcap
┌───────────────┐
│ sentinel │
│ ┌─────────┐ │
│ │ capture │──▶ Raw packets
│ └─────────┘ │
│ ┌─────────┐ │
│ │ tlsparse│──▶ TLS ClientHello extraction + TCP reassembly
│ └─────────┘ │
│ ┌─────────┐ │
│ │ finger- │──▶ JA4/JA3 fingerprint generation
│ │ print │ │
│ └─────────┘ │
│ ┌─────────┐ │
│ │ output │──▶ UNIX socket / file / stdout
│ └─────────┘ │
└───────────────┘
```
## Architecture
Sentinel uses a pipeline of goroutines:
1. **Capture goroutine** — Opens pcap handle on the configured interface, applies BPF filter, reads raw packets into a buffered channel (`packet_buffer_size`).
2. **Packet processor goroutine** — Reads from the channel, feeds packets to the TLS parser, generates fingerprints, and writes output.
3. **Watchdog goroutine** — Sends systemd watchdog heartbeats at half the configured interval.
4. **Signal handler** — Listens for `SIGINT`/`SIGTERM` (graceful shutdown) and `SIGHUP` (log rotation).
### Key Interfaces
| Interface | Package | Description |
|-----------|---------|-------------|
| `Capture` | `internal/capture` | Packet capture via libpcap |
| `Parser` | `internal/tlsparse` | TCP reassembly + ClientHello extraction |
| `Engine` | `internal/fingerprint` | JA4/JA3 fingerprint generation |
| `Writer` | `internal/output` | Log record output (stdout, file, UNIX socket) |
| `MultiWriter` | `internal/output` | Fan-out to multiple writers |
| `Builder` | `internal/output` | Factory for constructing writers from config |
## Configuration Reference
Configuration is loaded from a YAML file (default: `config.yml`) with environment variable overrides.
### Core Settings
| Name | Type | Default | Env Override | Description |
|------|------|---------|-------------|-------------|
| `core.interface` | string | `any` | `JA4SENTINEL_INTERFACE` | Network interface to capture (`any` = all interfaces) |
| `core.listen_ports` | []uint16 | `[443]` | `JA4SENTINEL_PORTS` | TCP ports to monitor (comma-separated in env) |
| `core.bpf_filter` | string | `""` (auto) | `JA4SENTINEL_BPF_FILTER` | Custom BPF filter (empty = auto-generated) |
| `core.local_ips` | []string | `[]` (auto) | — | Local IPs to monitor (empty = auto-detect, excludes loopback) |
| `core.exclude_source_ips` | []string | `[]` | — | Source IPs or CIDRs to exclude (e.g., `["10.0.0.0/8"]`) |
| `core.flow_timeout_sec` | int | `30` | `JA4SENTINEL_FLOW_TIMEOUT` | Timeout for TLS handshake extraction (1300) |
| `core.packet_buffer_size` | int | `1000` | `JA4SENTINEL_PACKET_BUFFER_SIZE` | Packet channel buffer size (11,000,000) |
| `core.log_level` | string | `info` | — | Log level: `debug`, `info`, `warn`, `error` (YAML only) |
> **Note:** `log_level` is intentionally not overridable via environment variable (architecture decision since v1.1.12).
### Output Settings
Each output is an entry in the `outputs` array:
| Name | Type | Default | Description |
|------|------|---------|-------------|
| `type` | string | — | Output type: `unix_socket`, `stdout`, `file` |
| `enabled` | bool | — | Whether this output is active |
| `async_buffer` | int | `1000` | Queue size for async writes |
| `params.socket_path` | string | — | Path for `unix_socket` type |
| `params.path` | string | — | File path for `file` type |
### Example Configuration
```yaml
core:
interface: any
listen_ports: [443, 8443]
bpf_filter: ""
local_ips: []
exclude_source_ips: ["10.0.0.0/8", "192.168.1.1"]
flow_timeout_sec: 30
packet_buffer_size: 1000
log_level: info
outputs:
- type: unix_socket
enabled: true
params:
socket_path: /var/run/logcorrelator/network.socket
- type: file
enabled: false
params:
path: /var/log/ja4sentinel/ja4.log
```
## Output Format (LogRecord JSON Schema)
Each output record is a flat JSON object:
```json
{
"src_ip": "203.0.113.42",
"src_port": 52341,
"dst_ip": "192.168.1.10",
"dst_port": 443,
"ip_meta_ttl": 64,
"ip_meta_total_length": 583,
"ip_meta_id": 12345,
"ip_meta_df": true,
"tcp_meta_window_size": 65535,
"tcp_meta_mss": 1460,
"tcp_meta_window_scale": 8,
"tcp_meta_options": "MSS,NOP,WScale,NOP,NOP,Timestamps,SACK",
"conn_id": "203.0.113.42:52341-192.168.1.10:443",
"sensor_id": "",
"tls_version": "1.3",
"tls_sni": "example.com",
"tls_alpn": "h2",
"syn_to_clienthello_ms": 12,
"ja4": "t13d1516h2_8daaf6152771_b0da82dd1658",
"ja3": "771,4866-4867-4865-49196-49200...",
"ja3_hash": "e7d705a3286e19ea42f587b344ee6865",
"timestamp": 1709312345678901234
}
```
### Field Reference
| Field | Type | Description |
|-------|------|-------------|
| `src_ip` | string | Client source IP address |
| `src_port` | uint16 | Client source port |
| `dst_ip` | string | Server destination IP address |
| `dst_port` | uint16 | Server destination port |
| `ip_meta_ttl` | uint8 | IP Time-To-Live |
| `ip_meta_total_length` | uint16 | IP total packet length |
| `ip_meta_id` | uint16 | IP identification field |
| `ip_meta_df` | bool | IP Don't Fragment flag |
| `tcp_meta_window_size` | uint16 | TCP window size |
| `tcp_meta_mss` | uint16 | TCP Maximum Segment Size (omitted if 0) |
| `tcp_meta_window_scale` | uint8 | TCP window scale factor (omitted if 0) |
| `tcp_meta_options` | string | Comma-separated TCP options |
| `conn_id` | string | Unique flow identifier |
| `sensor_id` | string | Sensor/captor identifier |
| `tls_version` | string | Max TLS version from ClientHello |
| `tls_sni` | string | Server Name Indication |
| `tls_alpn` | string | ALPN protocol (e.g., `h2`, `http/1.1`) |
| `syn_to_clienthello_ms` | uint32 | Time from SYN to ClientHello (ms) |
| `ja4` | string | JA4 TLS fingerprint |
| `ja3` | string | JA3 TLS fingerprint |
| `ja3_hash` | string | MD5 hash of JA3 string |
| `timestamp` | int64 | Unix nanoseconds |
## UNIX Socket Output Protocol
- **Socket type**: `unixgram` (DGRAM — connectionless)
- **Encoding**: One JSON object per datagram (no delimiter)
- **Max datagram size**: 64 KB
- **Reconnection**: Exponential backoff (100 ms → 2 s), max 3 attempts per write
- **Queue**: Async write queue (default 1000 items) absorbs transient socket failures
- **Error callback**: Consecutive failures are tracked and reported
## Signal Handling
| Signal | Behavior |
|--------|----------|
| `SIGTERM` / `SIGINT` | Graceful shutdown: cancel context, close capture, flush outputs, log filter stats |
| `SIGHUP` | Log rotation: reopen file outputs (used by `systemctl reload` + logrotate) |
## JA4 Fingerprint Algorithm
1. Extract TLS ClientHello from the TCP payload (with TCP reassembly for fragmented handshakes)
2. Parse cipher suites, extensions, ALPN, SNI, supported versions
3. Build JA4 string: `t{version}{sni_flag}{cipher_count}{ext_count}_{cipher_hash}_{ext_hash}`
4. Build JA3 string: `{version},{ciphers},{extensions},{curves},{formats}`
5. Compute JA3 MD5 hash
Sentinel uses the `tlsfingerprint` library for ALPN and TLS version parsing, with custom sanitization for malformed/truncated ClientHellos.
## Deployment
### systemd
```ini
[Unit]
Description=ja4sentinel TLS fingerprinting daemon
After=network.target
[Service]
Type=notify
ExecStart=/usr/bin/ja4sentinel -config /etc/ja4sentinel/config.yml
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
WatchdogSec=30
TimeoutStopSec=2
[Install]
WantedBy=multi-user.target
```
Sentinel uses systemd `sd_notify` for:
- `READY` — sent after initialization
- `WATCHDOG` — sent at half the `WatchdogSec` interval
- `STOPPING` — sent before shutdown
### Docker
```bash
make build-sentinel
docker run --cap-add=NET_RAW --cap-add=NET_ADMIN \
-v /var/run/logcorrelator:/var/run/logcorrelator \
ja4-platform/sentinel:latest
```
## RPM Package Contents
| Path | Description |
|------|-------------|
| `/usr/bin/ja4sentinel` | Binary (statically linked Go) |
| `/etc/ja4sentinel/config.yml.default` | Default configuration (noreplace) |
| `/usr/share/ja4sentinel/config.yml` | Reference configuration |
| `/usr/lib/systemd/system/ja4sentinel.service` | systemd unit |
| `/etc/logrotate.d/ja4sentinel` | logrotate configuration |
| `/var/lib/ja4sentinel/` | State directory |
| `/var/log/ja4sentinel/` | Log directory |
| `/var/run/logcorrelator/` | Socket directory |
### RPM Dependencies
- `systemd`
- `libpcap >= 1.9.0`
### Supported Distributions
- Rocky Linux 8, 9, 10
- AlmaLinux 8, 9
- RHEL 8, 9