Files
ja4-platform/docs/services/sentinel.md
toto d469e39da7 feat: ja4-platform monorepo — 5 services unified, tests & RPM builds standardized
Services:
- ja4sentinel: TLS/JA4 fingerprint capture daemon (Go, libpcap)
- logcorrelator: JA4 log correlation engine (Go, ClickHouse)
- mod_reqin_log: Apache module (C, JSON request logging)
- bot_detector: ML bot detection pipeline (Python)
- dashboard: FastAPI/Streamlit analytics UI (Python)

Shared libraries:
- shared/go/ja4common: logger, config, shutdown, ipfilter (Go module)
- shared/python/ja4_common: ClickHouseClient, ClickHouseSettings (Python package)
- shared/clickhouse/: canonical SQL migrations (10 files)

Build & packaging:
- Unified 3-stage Dockerfile.package for Go RPMs (el8/el9/el10)
- go.work workspace linking sentinel, correlator, ja4common
- Makefile with test-all, build-all, rpm-* targets

Fixes applied:
- go.work: 1.21 → 1.24.6 (required by sentinel)
- correlator Dockerfiles: golang:1.21 → golang:1.24
- replace directives in go.mod for ja4common local path
- pyproject.toml: setuptools.backends → setuptools.build_meta
- Removed static libpcap linking (unavailable on Rocky 9)
- Fixed data races in output/writers_test.go (sync.Mutex + atomic.Int32)
- Rewrote corrupted test files (logger_test.go × 2)

Test coverage:
- correlator: 67.1% total (unixsocket 80.5%, config 91.7%, app 83.3%, multi 87.7%, stdout 100%)
- sentinel: all 10 packages pass (api, capture, config, fingerprint, ipfilter, logging, output, tlsparse)

Documentation:
- README.md + docs/ (architecture, development, 5 services, shared libs, DB schema & migrations)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-07 16:42:59 +02:00

248 lines
9.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Sentinel
Sentinel (`ja4sentinel`) is a Go daemon that performs live network packet capture on a Linux server, extracts TLS ClientHello handshakes, generates JA4 and JA3 fingerprints, enriches them with IP/TCP metadata, and outputs structured JSON log records to configurable destinations (UNIX socket, file, or stdout).
## Role in the Pipeline
Sentinel is the **network-layer ingestion point**. It sits on the target server, captures TLS traffic via libpcap, and feeds fingerprinted events to the [correlator](correlator.md) through a UNIX datagram socket.
```
Network traffic (port 443/8443)
│ pcap
┌───────────────┐
│ sentinel │
│ ┌─────────┐ │
│ │ capture │──▶ Raw packets
│ └─────────┘ │
│ ┌─────────┐ │
│ │ tlsparse│──▶ TLS ClientHello extraction + TCP reassembly
│ └─────────┘ │
│ ┌─────────┐ │
│ │ finger- │──▶ JA4/JA3 fingerprint generation
│ │ print │ │
│ └─────────┘ │
│ ┌─────────┐ │
│ │ output │──▶ UNIX socket / file / stdout
│ └─────────┘ │
└───────────────┘
```
## Architecture
Sentinel uses a pipeline of goroutines:
1. **Capture goroutine** — Opens pcap handle on the configured interface, applies BPF filter, reads raw packets into a buffered channel (`packet_buffer_size`).
2. **Packet processor goroutine** — Reads from the channel, feeds packets to the TLS parser, generates fingerprints, and writes output.
3. **Watchdog goroutine** — Sends systemd watchdog heartbeats at half the configured interval.
4. **Signal handler** — Listens for `SIGINT`/`SIGTERM` (graceful shutdown) and `SIGHUP` (log rotation).
### Key Interfaces
| Interface | Package | Description |
|-----------|---------|-------------|
| `Capture` | `internal/capture` | Packet capture via libpcap |
| `Parser` | `internal/tlsparse` | TCP reassembly + ClientHello extraction |
| `Engine` | `internal/fingerprint` | JA4/JA3 fingerprint generation |
| `Writer` | `internal/output` | Log record output (stdout, file, UNIX socket) |
| `MultiWriter` | `internal/output` | Fan-out to multiple writers |
| `Builder` | `internal/output` | Factory for constructing writers from config |
## Configuration Reference
Configuration is loaded from a YAML file (default: `config.yml`) with environment variable overrides.
### Core Settings
| Name | Type | Default | Env Override | Description |
|------|------|---------|-------------|-------------|
| `core.interface` | string | `any` | `JA4SENTINEL_INTERFACE` | Network interface to capture (`any` = all interfaces) |
| `core.listen_ports` | []uint16 | `[443]` | `JA4SENTINEL_PORTS` | TCP ports to monitor (comma-separated in env) |
| `core.bpf_filter` | string | `""` (auto) | `JA4SENTINEL_BPF_FILTER` | Custom BPF filter (empty = auto-generated) |
| `core.local_ips` | []string | `[]` (auto) | — | Local IPs to monitor (empty = auto-detect, excludes loopback) |
| `core.exclude_source_ips` | []string | `[]` | — | Source IPs or CIDRs to exclude (e.g., `["10.0.0.0/8"]`) |
| `core.flow_timeout_sec` | int | `30` | `JA4SENTINEL_FLOW_TIMEOUT` | Timeout for TLS handshake extraction (1300) |
| `core.packet_buffer_size` | int | `1000` | `JA4SENTINEL_PACKET_BUFFER_SIZE` | Packet channel buffer size (11,000,000) |
| `core.log_level` | string | `info` | — | Log level: `debug`, `info`, `warn`, `error` (YAML only) |
> **Note:** `log_level` is intentionally not overridable via environment variable (architecture decision since v1.1.12).
### Output Settings
Each output is an entry in the `outputs` array:
| Name | Type | Default | Description |
|------|------|---------|-------------|
| `type` | string | — | Output type: `unix_socket`, `stdout`, `file` |
| `enabled` | bool | — | Whether this output is active |
| `async_buffer` | int | `1000` | Queue size for async writes |
| `params.socket_path` | string | — | Path for `unix_socket` type |
| `params.path` | string | — | File path for `file` type |
### Example Configuration
```yaml
core:
interface: any
listen_ports: [443, 8443]
bpf_filter: ""
local_ips: []
exclude_source_ips: ["10.0.0.0/8", "192.168.1.1"]
flow_timeout_sec: 30
packet_buffer_size: 1000
log_level: info
outputs:
- type: unix_socket
enabled: true
params:
socket_path: /var/run/logcorrelator/network.socket
- type: file
enabled: false
params:
path: /var/log/ja4sentinel/ja4.log
```
## Output Format (LogRecord JSON Schema)
Each output record is a flat JSON object:
```json
{
"src_ip": "203.0.113.42",
"src_port": 52341,
"dst_ip": "192.168.1.10",
"dst_port": 443,
"ip_meta_ttl": 64,
"ip_meta_total_length": 583,
"ip_meta_id": 12345,
"ip_meta_df": true,
"tcp_meta_window_size": 65535,
"tcp_meta_mss": 1460,
"tcp_meta_window_scale": 8,
"tcp_meta_options": "MSS,NOP,WScale,NOP,NOP,Timestamps,SACK",
"conn_id": "203.0.113.42:52341-192.168.1.10:443",
"sensor_id": "",
"tls_version": "1.3",
"tls_sni": "example.com",
"tls_alpn": "h2",
"syn_to_clienthello_ms": 12,
"ja4": "t13d1516h2_8daaf6152771_b0da82dd1658",
"ja3": "771,4866-4867-4865-49196-49200...",
"ja3_hash": "e7d705a3286e19ea42f587b344ee6865",
"timestamp": 1709312345678901234
}
```
### Field Reference
| Field | Type | Description |
|-------|------|-------------|
| `src_ip` | string | Client source IP address |
| `src_port` | uint16 | Client source port |
| `dst_ip` | string | Server destination IP address |
| `dst_port` | uint16 | Server destination port |
| `ip_meta_ttl` | uint8 | IP Time-To-Live |
| `ip_meta_total_length` | uint16 | IP total packet length |
| `ip_meta_id` | uint16 | IP identification field |
| `ip_meta_df` | bool | IP Don't Fragment flag |
| `tcp_meta_window_size` | uint16 | TCP window size |
| `tcp_meta_mss` | uint16 | TCP Maximum Segment Size (omitted if 0) |
| `tcp_meta_window_scale` | uint8 | TCP window scale factor (omitted if 0) |
| `tcp_meta_options` | string | Comma-separated TCP options |
| `conn_id` | string | Unique flow identifier |
| `sensor_id` | string | Sensor/captor identifier |
| `tls_version` | string | Max TLS version from ClientHello |
| `tls_sni` | string | Server Name Indication |
| `tls_alpn` | string | ALPN protocol (e.g., `h2`, `http/1.1`) |
| `syn_to_clienthello_ms` | uint32 | Time from SYN to ClientHello (ms) |
| `ja4` | string | JA4 TLS fingerprint |
| `ja3` | string | JA3 TLS fingerprint |
| `ja3_hash` | string | MD5 hash of JA3 string |
| `timestamp` | int64 | Unix nanoseconds |
## UNIX Socket Output Protocol
- **Socket type**: `unixgram` (DGRAM — connectionless)
- **Encoding**: One JSON object per datagram (no delimiter)
- **Max datagram size**: 64 KB
- **Reconnection**: Exponential backoff (100 ms → 2 s), max 3 attempts per write
- **Queue**: Async write queue (default 1000 items) absorbs transient socket failures
- **Error callback**: Consecutive failures are tracked and reported
## Signal Handling
| Signal | Behavior |
|--------|----------|
| `SIGTERM` / `SIGINT` | Graceful shutdown: cancel context, close capture, flush outputs, log filter stats |
| `SIGHUP` | Log rotation: reopen file outputs (used by `systemctl reload` + logrotate) |
## JA4 Fingerprint Algorithm
1. Extract TLS ClientHello from the TCP payload (with TCP reassembly for fragmented handshakes)
2. Parse cipher suites, extensions, ALPN, SNI, supported versions
3. Build JA4 string: `t{version}{sni_flag}{cipher_count}{ext_count}_{cipher_hash}_{ext_hash}`
4. Build JA3 string: `{version},{ciphers},{extensions},{curves},{formats}`
5. Compute JA3 MD5 hash
Sentinel uses the `tlsfingerprint` library for ALPN and TLS version parsing, with custom sanitization for malformed/truncated ClientHellos.
## Deployment
### systemd
```ini
[Unit]
Description=ja4sentinel TLS fingerprinting daemon
After=network.target
[Service]
Type=notify
ExecStart=/usr/bin/ja4sentinel -config /etc/ja4sentinel/config.yml
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
WatchdogSec=30
TimeoutStopSec=2
[Install]
WantedBy=multi-user.target
```
Sentinel uses systemd `sd_notify` for:
- `READY` — sent after initialization
- `WATCHDOG` — sent at half the `WatchdogSec` interval
- `STOPPING` — sent before shutdown
### Docker
```bash
make build-sentinel
docker run --cap-add=NET_RAW --cap-add=NET_ADMIN \
-v /var/run/logcorrelator:/var/run/logcorrelator \
ja4-platform/sentinel:latest
```
## RPM Package Contents
| Path | Description |
|------|-------------|
| `/usr/bin/ja4sentinel` | Binary (statically linked Go) |
| `/etc/ja4sentinel/config.yml.default` | Default configuration (noreplace) |
| `/usr/share/ja4sentinel/config.yml` | Reference configuration |
| `/usr/lib/systemd/system/ja4sentinel.service` | systemd unit |
| `/etc/logrotate.d/ja4sentinel` | logrotate configuration |
| `/var/lib/ja4sentinel/` | State directory |
| `/var/log/ja4sentinel/` | Log directory |
| `/var/run/logcorrelator/` | Socket directory |
### RPM Dependencies
- `systemd`
- `libpcap >= 1.9.0`
### Supported Distributions
- Rocky Linux 8, 9, 10
- AlmaLinux 8, 9
- RHEL 8, 9