feat(e2e): add multi-IP endpoint architecture with dedicated traffic VM

Replace single-service-per-endpoint with all-ips mode running nginx, apache,
and hitch+varnish simultaneously on 3 dedicated IPs per VM (eth1 alias IPs).
Add a dedicated traffic VM with curl-impersonate for realistic TLS fingerprints,
parallelized traffic generation, and paired SNI_HOSTS/TARGET_IPS lists for
per-VM per-service hostname identification (e.g. rocky9-nginx-platform.test).

Key changes:
- run-tests-vm.sh: add setup_all_ips(), IP-specific Listen/bind directives
  with reset-before-apply pattern, graceful service availability checks
- run-e2e-test.sh: traffic VM architecture, all-ips mode, eth1 network,
  paired IP/SNI lists, updated cleanup for alias IPs
- generate-traffic.sh: parallel background jobs, curl-impersonate detection,
  auto source interface detection via ip route get, Host header in HTTP traffic
- Vagrantfile: add traffic VM with provision-traffic.sh
- provision-traffic.sh: install curl-impersonate and httpx for traffic gen
- test-rpm.sh: multi-interface TC check, updated ja4ebpf config
- clickhouse-init.sh: load CSV stubs for Anubis/bot-networks dictionaries
- Remove obsolete correlator/sentinel/mod-reqin-log docs
- Add h2_settings_ack column to http_logs schema
- Upgrade Go toolchain to 1.25.0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Jacquin Antoine
2026-04-16 14:25:24 +02:00
parent f0c8fe81c6
commit 36b5065a0a
17 changed files with 674 additions and 924 deletions

View File

@ -1,220 +0,0 @@
# Correlator
The correlator (`logcorrelator`) is a Go daemon that joins HTTP events from [mod-reqin-log](mod-reqin-log.md) (source A) with TLS/network events from [sentinel](sentinel.md) (source B) into unified correlated log entries. It uses a `src_ip:src_port` key with a configurable time window to match events, supports HTTP Keep-Alive connections, and writes results to ClickHouse, file, and/or stdout.
## Correlation Algorithm
### Key Matching
Events are correlated by their **correlation key**: `src_ip:src_port`. Since a client's ephemeral source port uniquely identifies a TCP connection, matching on this pair reliably joins the HTTP request (seen by Apache) with the TLS handshake (seen by sentinel) from the same connection.
### Time Window
Events must arrive within the configured time window (default: **10 seconds**) to be matched. This accounts for:
- Processing latency between Apache and sentinel
- Packet capture buffering
- UNIX socket delivery ordering
### Keep-Alive Support
In `one_to_many` mode (default), a single TLS handshake event (source B) can match **multiple** HTTP requests (source A) on the same TCP connection:
1. Source B event arrives → buffered with TTL (default: 120 s)
2. Source A event arrives with same key → correlation match, B event TTL resets
3. Next A event on same connection → matches same B event (TTL resets again)
4. Connection closes → B event expires after TTL
Each A event within a Keep-Alive session gets an incrementing `keepalives` counter.
### Orphan Handling
- **Source A orphans** (HTTP without TLS match): Emitted after `apache_emit_delay_ms` (default: 500 ms) with `correlated=false`, `orphan_side=A`
- **Source B orphans** (TLS without HTTP match): Not emitted by default (`network_emit: false`)
- **Buffer overflow**: Oldest events are rotated out and emitted as orphans
### Field Merging
When two events are correlated:
- HTTP fields (method, path, headers, etc.) come from source A
- TLS/network fields (JA4, JA3, IP/TCP metadata) come from source B
- On field collision with different values: both are kept with `a_` and `b_` prefixes
## Configuration Reference
Configuration is loaded from a YAML file (default: `/etc/logcorrelator/logcorrelator.yml`).
### Log Settings
| Name | Type | Default | Description |
|------|------|---------|-------------|
| `log.level` | string | `INFO` | Log level: `DEBUG`, `INFO`, `WARN`, `ERROR` |
### Input Settings
| Name | Type | Default | Description |
|------|------|---------|-------------|
| `inputs.unix_sockets[].name` | string | — | Human-readable source name (e.g., `http`, `network`) |
| `inputs.unix_sockets[].path` | string | — | UNIX socket path to listen on |
| `inputs.unix_sockets[].format` | string | `json` | Input format |
| `inputs.unix_sockets[].source_type` | string | — | Event source: `A` (HTTP), `B` (Network) |
| `inputs.unix_sockets[].socket_permissions` | string | `0666` | Socket file permissions (octal) |
### Output Settings
#### File Output
| Name | Type | Default | Description |
|------|------|---------|-------------|
| `outputs.file.enabled` | bool | `true` | Enable file output |
| `outputs.file.path` | string | `/var/log/logcorrelator/correlated.log` | Output file path |
#### ClickHouse Output
| Name | Type | Default | Description |
|------|------|---------|-------------|
| `outputs.clickhouse.enabled` | bool | `false` | Enable ClickHouse output |
| `outputs.clickhouse.dsn` | string | — | ClickHouse DSN (e.g., `clickhouse://user:pass@host:9000/db`) |
| `outputs.clickhouse.table` | string | — | Target table name |
| `outputs.clickhouse.batch_size` | int | `500` | Records per batch insert |
| `outputs.clickhouse.flush_interval_ms` | int | `200` | Flush interval in milliseconds |
| `outputs.clickhouse.max_buffer_size` | int | `5000` | Maximum in-memory buffer size |
| `outputs.clickhouse.drop_on_overflow` | bool | `true` | Drop records when buffer is full |
| `outputs.clickhouse.async_insert` | bool | `true` | Use ClickHouse async inserts |
| `outputs.clickhouse.timeout_ms` | int | `1000` | Operation timeout in milliseconds |
#### Stdout Output
| Name | Type | Default | Description |
|------|------|---------|-------------|
| `outputs.stdout.enabled` | bool | `false` | Enable stdout output |
| `outputs.stdout.level` | string | — | Output verbosity filter |
### Correlation Settings
| Name | Type | Default | Description |
|------|------|---------|-------------|
| `correlation.time_window.value` | int | `10` | Time window value |
| `correlation.time_window.unit` | string | `s` | Time window unit (`s`, `ms`) |
| `correlation.orphan_policy.apache_always_emit` | bool | `true` | Always emit A events even without B match |
| `correlation.orphan_policy.apache_emit_delay_ms` | int | `500` | Delay before emitting orphan A (ms) |
| `correlation.orphan_policy.network_emit` | bool | `false` | Emit B events without A match |
| `correlation.matching.mode` | string | `one_to_many` | Matching mode: `one_to_one` or `one_to_many` |
| `correlation.buffers.max_http_items` | int | `10000` | Max buffered HTTP (source A) events |
| `correlation.buffers.max_network_items` | int | `20000` | Max buffered network (source B) events |
| `correlation.ttl.network_ttl_s` | int | `120` | TTL for source B events (seconds) |
| `correlation.exclude_source_ips` | []string | `[]` | IPs or CIDRs to exclude from correlation |
| `correlation.include_dest_ports` | []int | `[]` | If non-empty, only correlate events on these ports |
### Metrics Settings
| Name | Type | Default | Description |
|------|------|---------|-------------|
| `metrics.enabled` | bool | `false` | Enable metrics HTTP server |
| `metrics.addr` | string | `:8080` | Metrics server listen address |
## Input Events
### Source A (HTTP — from mod-reqin-log)
JSON fields: `time`, `src_ip`, `src_port`, `dst_ip`, `dst_port`, `method`, `scheme`, `host`, `path`, `query`, `http_version`, `client_headers`, `header_*`
### Source B (Network — from sentinel)
JSON fields: `src_ip`, `src_port`, `dst_ip`, `dst_port`, `ip_meta_*`, `tcp_meta_*`, `tls_version`, `tls_sni`, `tls_alpn`, `ja4`, `ja3`, `ja3_hash`, `conn_id`, `syn_to_clienthello_ms`, `timestamp`
## Output CorrelatedLog JSON Schema
```json
{
"timestamp": "2026-03-09T14:30:00Z",
"src_ip": "203.0.113.42",
"src_port": 52341,
"dst_ip": "192.168.1.10",
"dst_port": 443,
"correlated": true,
"method": "GET",
"host": "example.com",
"path": "/api/v1/users",
"ja4": "t13d1516h2_8daaf6152771_b0da82dd1658",
"ja3_hash": "e7d705a3286e19ea42f587b344ee6865",
"ip_meta_ttl": 64,
"tcp_meta_window_size": 65535,
"tls_version": "1.3",
"tls_sni": "example.com",
"tls_alpn": "h2",
"header_User-Agent": "Mozilla/5.0 ...",
"keepalives": 3
}
```
Core fields are always present; additional fields are merged from A and B event raw data.
## ClickHouse Sink
- **Protocol**: ClickHouse native TCP (port 9000) via `clickhouse-go/v2`
- **Target table**: `http_logs_raw` (raw JSON stored, then parsed by materialized views)
- **Batch inserts**: Buffered up to `batch_size` records (default 500)
- **Flush interval**: Default 200 ms timer triggers flush if batch not full
- **Retry behavior**: Up to 3 retries with exponential backoff (100 ms base)
- **Connection ping**: 5-second timeout on startup
- **Buffer overflow**: Records dropped when buffer exceeds `max_buffer_size` (configurable)
## Metrics HTTP Server
When `metrics.enabled: true`, exposes:
| Endpoint | Description |
|----------|-------------|
| `GET /metrics` | Correlation metrics as JSON (events received, correlated, orphans, buffer sizes) |
| `GET /health` | Health check endpoint |
## systemd Service
```ini
[Unit]
Description=logcorrelator service
After=network.target
[Service]
Type=simple
User=logcorrelator
Group=logcorrelator
ExecStart=/usr/bin/logcorrelator -config /etc/logcorrelator/logcorrelator.yml
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=5
RuntimeDirectory=logcorrelator
RuntimeDirectoryMode=0755
# Security hardening
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/log/logcorrelator /etc/logcorrelator
# Resource limits
LimitNOFILE=65536
TimeoutStartSec=10
TimeoutStopSec=30
[Install]
WantedBy=multi-user.target
```
### Security Hardening
- Runs as dedicated `logcorrelator` user/group
- `NoNewPrivileges=true` — prevents privilege escalation
- `ProtectSystem=strict` — read-only filesystem except `ReadWritePaths`
- `ProtectHome=true` — no access to home directories
- `RuntimeDirectory=logcorrelator` — systemd creates socket directory with correct ownership
## RPM Package Contents
| Path | Description |
|------|-------------|
| `/usr/bin/logcorrelator` | Binary |
| `/etc/logcorrelator/logcorrelator.yml` | Configuration file |
| `/usr/lib/systemd/system/logcorrelator.service` | systemd unit |
| `/var/log/logcorrelator/` | Log directory |
| `/var/run/logcorrelator/` | Socket directory (RuntimeDirectory) |

View File

@ -1,200 +0,0 @@
# mod-reqin-log
`mod_reqin_log` is an Apache HTTPD module (C shared object) that captures HTTP request metadata and sends it as JSON to a UNIX datagram socket. It serves as the HTTP-layer ingestion point for the ja4-platform pipeline, feeding request data to the [correlator](correlator.md) for joining with TLS fingerprint data from [sentinel](sentinel.md).
## Purpose
Apache processes HTTP requests after TLS termination, so it has access to the decoded HTTP method, path, headers, and client IP/port. mod-reqin-log hooks into the `post_read_request` phase to serialize this data immediately, before any rewrite or auth module modifies the request.
## Apache Directives Reference
All directives are server-level (`RSRC_CONF`):
| Directive | Type | Default | Description |
|-----------|------|---------|-------------|
| `JsonSockLogEnabled` | Flag (On/Off) | Off | Enable or disable the module |
| `JsonSockLogSocket` | String | — | UNIX domain socket path for JSON output |
| `JsonSockLogHeaders` | String list | — | HTTP header names to log (repeatable) |
| `JsonSockLogMaxHeaders` | Integer | `25` | Maximum number of headers to log |
| `JsonSockLogMaxHeaderValueLen` | Integer | `256` | Maximum length of each header value (truncated beyond) |
| `JsonSockLogReconnectInterval` | Integer (seconds) | `10` | Minimum seconds between reconnection attempts |
| `JsonSockLogErrorReportInterval` | Integer (seconds) | `10` | Minimum seconds between error log entries (throttling) |
| `JsonSockLogLevel` | String | `WARNING` | Module log level: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `EMERG` |
### Example httpd.conf
```apache
LoadModule reqin_log_module modules/mod_reqin_log.so
JsonSockLogEnabled On
JsonSockLogSocket /var/run/logcorrelator/http.socket
JsonSockLogHeaders User-Agent Accept Accept-Encoding Accept-Language
JsonSockLogHeaders Content-Type X-Request-Id X-Trace-Id X-Forwarded-For
JsonSockLogHeaders Sec-CH-UA Sec-CH-UA-Mobile Sec-CH-UA-Platform
JsonSockLogHeaders Sec-Fetch-Dest Sec-Fetch-Mode Sec-Fetch-Site
JsonSockLogMaxHeaders 25
JsonSockLogMaxHeaderValueLen 256
JsonSockLogReconnectInterval 10
JsonSockLogErrorReportInterval 10
JsonSockLogLevel WARNING
```
## Output JSON Schema
Each HTTP request is serialized as a flat JSON object and sent as a single UNIX datagram:
```json
{
"time": "2026-03-09T14:30:00Z",
"src_ip": "203.0.113.42",
"src_port": 52341,
"dst_ip": "192.168.1.10",
"dst_port": 443,
"method": "GET",
"scheme": "https",
"host": "example.com",
"path": "/api/v1/users",
"query": "page=1&limit=20",
"http_version": "HTTP/2.0",
"client_headers": "User-Agent,Accept,Accept-Encoding,Accept-Language",
"header_User-Agent": "Mozilla/5.0 ...",
"header_Accept": "text/html,application/xhtml+xml",
"header_Accept-Encoding": "gzip, deflate, br",
"header_Accept-Language": "en-US,en;q=0.9",
"header_Sec-Fetch-Dest": "document",
"header_Sec-Fetch-Mode": "navigate",
"header_Sec-Fetch-Site": "none"
}
```
### Field Reference
| Field | Type | Description |
|-------|------|-------------|
| `time` | string (ISO 8601) | Request timestamp (UTC) |
| `src_ip` | string | Client IP address |
| `src_port` | int | Client source port |
| `dst_ip` | string | Server IP address |
| `dst_port` | int | Server port |
| `method` | string | HTTP method (`GET`, `POST`, etc.) |
| `scheme` | string | URL scheme (`http` or `https`) |
| `host` | string | HTTP Host header value |
| `path` | string | Request URI path |
| `query` | string | Query string (without `?`) |
| `http_version` | string | HTTP version (`HTTP/1.1`, `HTTP/2.0`) |
| `client_headers` | string | Comma-separated list of header names sent by client (order preserved) |
| `header_<Name>` | string | Value of each configured header (one field per header) |
### Sensitive Headers
The following headers are **always excluded** from output regardless of `JsonSockLogHeaders`:
- `Authorization`
- `Cookie`
- `Set-Cookie`
- `X-Api-Key`
- `X-Auth-Token`
- `Proxy-Authorization`
- `WWW-Authenticate`
### Size Limits
- Maximum JSON size: **64 KB** (prevents memory exhaustion DoS)
- Header values are truncated to `JsonSockLogMaxHeaderValueLen` bytes
## Thread Safety
mod-reqin-log is designed for Apache's `worker` and `event` MPMs (multi-threaded):
- **Socket FD** is protected by an `apr_thread_mutex_t` (`fd_mutex`)
- **Per-child process state** includes the socket file descriptor, mutex, and error tracking
- **Error reporting** uses `LOG_THROTTLED` macro with timestamp-based deduplication
- All JSON serialization uses per-request pool allocation — no shared buffers
### Architecture
```
Apache HTTPD process
├── child process 1
│ ├── fd_mutex (apr_thread_mutex_t)
│ ├── socket_fd (shared across threads)
│ ├── thread 1 → post_read_request → serialize JSON → mutex lock → sendto() → unlock
│ ├── thread 2 → post_read_request → serialize JSON → mutex lock → sendto() → unlock
│ └── ...
├── child process 2
│ ├── fd_mutex
│ ├── socket_fd (independent)
│ └── ...
```
## Reconnection Behavior
- Socket is opened during `child_init` (per-child process startup)
- If the socket is unavailable at startup, connection is deferred
- On send failure, reconnection is attempted respecting `JsonSockLogReconnectInterval`
- Failed sends are silently dropped (HTTP request processing is not blocked)
- Error log entries are throttled by `JsonSockLogErrorReportInterval`
- Socket type: `SOCK_DGRAM` (connectionless UNIX datagram)
- Non-blocking sends with `MSG_NOSIGNAL`
## Deployment
### Installation via RPM
```bash
rpm -ivh mod_reqin_log-1.0.19-1.el10.x86_64.rpm
```
### LoadModule Directive
```apache
LoadModule reqin_log_module modules/mod_reqin_log.so
```
### Verifying Installation
```bash
httpd -M | grep reqin_log
# Expected: reqin_log_module (shared)
```
## Build
All builds run inside Docker:
```bash
# Run unit tests
make test-mod-reqin-log
# Build RPM packages (el8, el9, el10)
make rpm-mod-reqin-log
# RPMs in services/mod-reqin-log/dist/rpm/el{8,9,10}/
```
### Local Build (requires Apache development headers)
```bash
cd services/mod-reqin-log
make build # Compiles mod_reqin_log.so via apxs
make test # Runs unit tests
```
### Test Coverage
Unit tests cover:
- JSON serialization (escaping, size limits, field output)
- Config parsing (all directives, edge cases)
- Header handling (sensitive header exclusion, max headers, truncation)
- Module integration (real Apache module hooks)
## Source Files
| File | Description |
|------|-------------|
| `src/mod_reqin_log.c` | Main module source |
| `src/mod_reqin_log.h` | Header with types, constants, defaults |
| `conf/mod_reqin_log.conf` | Example Apache configuration |
| `tests/unit/test_json_serialization.c` | JSON output tests |
| `tests/unit/test_config_parsing.c` | Directive parsing tests |
| `tests/unit/test_header_handling.c` | Header filtering tests |
| `tests/unit/test_module_real.c` | Integration tests |

View File

@ -1,247 +0,0 @@
# Sentinel
Sentinel (`ja4sentinel`) is a Go daemon that performs live network packet capture on a Linux server, extracts TLS ClientHello handshakes, generates JA4 and JA3 fingerprints, enriches them with IP/TCP metadata, and outputs structured JSON log records to configurable destinations (UNIX socket, file, or stdout).
## Role in the Pipeline
Sentinel is the **network-layer ingestion point**. It sits on the target server, captures TLS traffic via libpcap, and feeds fingerprinted events to the [correlator](correlator.md) through a UNIX datagram socket.
```
Network traffic (port 443/8443)
│ pcap
┌───────────────┐
│ sentinel │
│ ┌─────────┐ │
│ │ capture │──▶ Raw packets
│ └─────────┘ │
│ ┌─────────┐ │
│ │ tlsparse│──▶ TLS ClientHello extraction + TCP reassembly
│ └─────────┘ │
│ ┌─────────┐ │
│ │ finger- │──▶ JA4/JA3 fingerprint generation
│ │ print │ │
│ └─────────┘ │
│ ┌─────────┐ │
│ │ output │──▶ UNIX socket / file / stdout
│ └─────────┘ │
└───────────────┘
```
## Architecture
Sentinel uses a pipeline of goroutines:
1. **Capture goroutine** — Opens pcap handle on the configured interface, applies BPF filter, reads raw packets into a buffered channel (`packet_buffer_size`).
2. **Packet processor goroutine** — Reads from the channel, feeds packets to the TLS parser, generates fingerprints, and writes output.
3. **Watchdog goroutine** — Sends systemd watchdog heartbeats at half the configured interval.
4. **Signal handler** — Listens for `SIGINT`/`SIGTERM` (graceful shutdown) and `SIGHUP` (log rotation).
### Key Interfaces
| Interface | Package | Description |
|-----------|---------|-------------|
| `Capture` | `internal/capture` | Packet capture via libpcap |
| `Parser` | `internal/tlsparse` | TCP reassembly + ClientHello extraction |
| `Engine` | `internal/fingerprint` | JA4/JA3 fingerprint generation |
| `Writer` | `internal/output` | Log record output (stdout, file, UNIX socket) |
| `MultiWriter` | `internal/output` | Fan-out to multiple writers |
| `Builder` | `internal/output` | Factory for constructing writers from config |
## Configuration Reference
Configuration is loaded from a YAML file (default: `config.yml`) with environment variable overrides.
### Core Settings
| Name | Type | Default | Env Override | Description |
|------|------|---------|-------------|-------------|
| `core.interface` | string | `any` | `JA4SENTINEL_INTERFACE` | Network interface to capture (`any` = all interfaces) |
| `core.listen_ports` | []uint16 | `[443]` | `JA4SENTINEL_PORTS` | TCP ports to monitor (comma-separated in env) |
| `core.bpf_filter` | string | `""` (auto) | `JA4SENTINEL_BPF_FILTER` | Custom BPF filter (empty = auto-generated) |
| `core.local_ips` | []string | `[]` (auto) | — | Local IPs to monitor (empty = auto-detect, excludes loopback) |
| `core.exclude_source_ips` | []string | `[]` | — | Source IPs or CIDRs to exclude (e.g., `["10.0.0.0/8"]`) |
| `core.flow_timeout_sec` | int | `30` | `JA4SENTINEL_FLOW_TIMEOUT` | Timeout for TLS handshake extraction (1300) |
| `core.packet_buffer_size` | int | `1000` | `JA4SENTINEL_PACKET_BUFFER_SIZE` | Packet channel buffer size (11,000,000) |
| `core.log_level` | string | `info` | — | Log level: `debug`, `info`, `warn`, `error` (YAML only) |
> **Note:** `log_level` is intentionally not overridable via environment variable (architecture decision since v1.1.12).
### Output Settings
Each output is an entry in the `outputs` array:
| Name | Type | Default | Description |
|------|------|---------|-------------|
| `type` | string | — | Output type: `unix_socket`, `stdout`, `file` |
| `enabled` | bool | — | Whether this output is active |
| `async_buffer` | int | `1000` | Queue size for async writes |
| `params.socket_path` | string | — | Path for `unix_socket` type |
| `params.path` | string | — | File path for `file` type |
### Example Configuration
```yaml
core:
interface: any
listen_ports: [443, 8443]
bpf_filter: ""
local_ips: []
exclude_source_ips: ["10.0.0.0/8", "192.168.1.1"]
flow_timeout_sec: 30
packet_buffer_size: 1000
log_level: info
outputs:
- type: unix_socket
enabled: true
params:
socket_path: /var/run/logcorrelator/network.socket
- type: file
enabled: false
params:
path: /var/log/ja4sentinel/ja4.log
```
## Output Format (LogRecord JSON Schema)
Each output record is a flat JSON object:
```json
{
"src_ip": "203.0.113.42",
"src_port": 52341,
"dst_ip": "192.168.1.10",
"dst_port": 443,
"ip_meta_ttl": 64,
"ip_meta_total_length": 583,
"ip_meta_id": 12345,
"ip_meta_df": true,
"tcp_meta_window_size": 65535,
"tcp_meta_mss": 1460,
"tcp_meta_window_scale": 8,
"tcp_meta_options": "MSS,NOP,WScale,NOP,NOP,Timestamps,SACK",
"conn_id": "203.0.113.42:52341-192.168.1.10:443",
"sensor_id": "",
"tls_version": "1.3",
"tls_sni": "example.com",
"tls_alpn": "h2",
"syn_to_clienthello_ms": 12,
"ja4": "t13d1516h2_8daaf6152771_b0da82dd1658",
"ja3": "771,4866-4867-4865-49196-49200...",
"ja3_hash": "e7d705a3286e19ea42f587b344ee6865",
"timestamp": 1709312345678901234
}
```
### Field Reference
| Field | Type | Description |
|-------|------|-------------|
| `src_ip` | string | Client source IP address |
| `src_port` | uint16 | Client source port |
| `dst_ip` | string | Server destination IP address |
| `dst_port` | uint16 | Server destination port |
| `ip_meta_ttl` | uint8 | IP Time-To-Live |
| `ip_meta_total_length` | uint16 | IP total packet length |
| `ip_meta_id` | uint16 | IP identification field |
| `ip_meta_df` | bool | IP Don't Fragment flag |
| `tcp_meta_window_size` | uint16 | TCP window size |
| `tcp_meta_mss` | uint16 | TCP Maximum Segment Size (omitted if 0) |
| `tcp_meta_window_scale` | uint8 | TCP window scale factor (omitted if 0) |
| `tcp_meta_options` | string | Comma-separated TCP options |
| `conn_id` | string | Unique flow identifier |
| `sensor_id` | string | Sensor/captor identifier |
| `tls_version` | string | Max TLS version from ClientHello |
| `tls_sni` | string | Server Name Indication |
| `tls_alpn` | string | ALPN protocol (e.g., `h2`, `http/1.1`) |
| `syn_to_clienthello_ms` | uint32 | Time from SYN to ClientHello (ms) |
| `ja4` | string | JA4 TLS fingerprint |
| `ja3` | string | JA3 TLS fingerprint |
| `ja3_hash` | string | MD5 hash of JA3 string |
| `timestamp` | int64 | Unix nanoseconds |
## UNIX Socket Output Protocol
- **Socket type**: `unixgram` (DGRAM — connectionless)
- **Encoding**: One JSON object per datagram (no delimiter)
- **Max datagram size**: 64 KB
- **Reconnection**: Exponential backoff (100 ms → 2 s), max 3 attempts per write
- **Queue**: Async write queue (default 1000 items) absorbs transient socket failures
- **Error callback**: Consecutive failures are tracked and reported
## Signal Handling
| Signal | Behavior |
|--------|----------|
| `SIGTERM` / `SIGINT` | Graceful shutdown: cancel context, close capture, flush outputs, log filter stats |
| `SIGHUP` | Log rotation: reopen file outputs (used by `systemctl reload` + logrotate) |
## JA4 Fingerprint Algorithm
1. Extract TLS ClientHello from the TCP payload (with TCP reassembly for fragmented handshakes)
2. Parse cipher suites, extensions, ALPN, SNI, supported versions
3. Build JA4 string: `t{version}{sni_flag}{cipher_count}{ext_count}_{cipher_hash}_{ext_hash}`
4. Build JA3 string: `{version},{ciphers},{extensions},{curves},{formats}`
5. Compute JA3 MD5 hash
Sentinel uses the `tlsfingerprint` library for ALPN and TLS version parsing, with custom sanitization for malformed/truncated ClientHellos.
## Deployment
### systemd
```ini
[Unit]
Description=ja4sentinel TLS fingerprinting daemon
After=network.target
[Service]
Type=notify
ExecStart=/usr/bin/ja4sentinel -config /etc/ja4sentinel/config.yml
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
WatchdogSec=30
TimeoutStopSec=2
[Install]
WantedBy=multi-user.target
```
Sentinel uses systemd `sd_notify` for:
- `READY` — sent after initialization
- `WATCHDOG` — sent at half the `WatchdogSec` interval
- `STOPPING` — sent before shutdown
### Docker
```bash
make build-sentinel
docker run --cap-add=NET_RAW --cap-add=NET_ADMIN \
-v /var/run/logcorrelator:/var/run/logcorrelator \
ja4-platform/sentinel:latest
```
## RPM Package Contents
| Path | Description |
|------|-------------|
| `/usr/bin/ja4sentinel` | Binary (statically linked Go) |
| `/etc/ja4sentinel/config.yml.default` | Default configuration (noreplace) |
| `/usr/share/ja4sentinel/config.yml` | Reference configuration |
| `/usr/lib/systemd/system/ja4sentinel.service` | systemd unit |
| `/etc/logrotate.d/ja4sentinel` | logrotate configuration |
| `/var/lib/ja4sentinel/` | State directory |
| `/var/log/ja4sentinel/` | Log directory |
| `/var/run/logcorrelator/` | Socket directory |
### RPM Dependencies
- `systemd`
- `libpcap >= 1.9.0`
### Supported Distributions
- Rocky Linux 8, 9, 10
- AlmaLinux 8, 9
- RHEL 8, 9