feat: ja4-platform monorepo — 5 services unified, tests & RPM builds standardized

Services: - ja4sentinel: TLS/JA4 fingerprint capture daemon (Go, libpcap) - logcorrelator: JA4 log correlation engine (Go, ClickHouse) - mod_reqin_log: Apache module (C, JSON request logging) - bot_detector: ML bot detection pipeline (Python) - dashboard: FastAPI/Streamlit analytics UI (Python) Shared libraries: - shared/go/ja4common: logger, config, shutdown, ipfilter (Go module) - shared/python/ja4_common: ClickHouseClient, ClickHouseSettings (Python package) - shared/clickhouse/: canonical SQL migrations (10 files) Build & packaging: - Unified 3-stage Dockerfile.package for Go RPMs (el8/el9/el10) - go.work workspace linking sentinel, correlator, ja4common - Makefile with test-all, build-all, rpm-* targets Fixes applied: - go.work: 1.21 → 1.24.6 (required by sentinel) - correlator Dockerfiles: golang:1.21 → golang:1.24 - replace directives in go.mod for ja4common local path - pyproject.toml: setuptools.backends → setuptools.build_meta - Removed static libpcap linking (unavailable on Rocky 9) - Fixed data races in output/writers_test.go (sync.Mutex + atomic.Int32) - Rewrote corrupted test files (logger_test.go × 2) Test coverage: - correlator: 67.1% total (unixsocket 80.5%, config 91.7%, app 83.3%, multi 87.7%, stdout 100%) - sentinel: all 10 packages pass (api, capture, config, fingerprint, ipfilter, logging, output, tlsparse) Documentation: - README.md + docs/ (architecture, development, 5 services, shared libs, DB schema & migrations) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-07 16:42:59 +02:00
commit d469e39da7
278 changed files with 1621301 additions and 0 deletions
--- a/docs/services/bot-detector.md
+++ b/docs/services/bot-detector.md
@ -0,0 +1,265 @@
+# Bot Detector
+
+The bot-detector is a Python service that performs machine-learning anomaly detection on aggregated HTTP/TLS traffic features stored in ClickHouse. It runs on a continuous cycle (default: every 5 minutes), using Isolation Forest to identify suspicious traffic patterns, enriched with SHAP explainability, DBSCAN clustering, and Anubis bot-rule enrichment.
+
+## ML Algorithm
+
+### Isolation Forest (Semi-Supervised)
+
+The core algorithm is **Isolation Forest** (Liu, Ting & Zhou, 2008) — an unsupervised anomaly detection algorithm that isolates anomalies by randomly partitioning feature space. Anomalies require fewer partitions to isolate than normal points.
+
+The approach is **semi-supervised** because:
+1. **Known bots** are identified a priori via reputation dictionaries (IP, JA4, ASN)
+2. **Human baseline** is identified via ASN reputation labels (`asn_label = 'human'`)
+3. The model trains **only on human-baseline traffic** (minimum 500 sessions required)
+4. Unknown traffic is scored by deviation from the human profile
+
+### Two-Model Architecture
+
+| Model | Condition | Features | Data |
+|-------|-----------|----------|------|
+| **Complet** | `correlated = 1` | 35 | HTTP + TCP + TLS (full pipeline data) |
+| **Applicatif** | `correlated = 0` | 31 | HTTP only (no TLS correlation available) |
+
+### Threat Levels
+
+| Score Range | Level | Interpretation |
+|------------|-------|----------------|
+| `< -0.30` | **CRITICAL** | Extremely anomalous behavior |
+| `< -0.15` | **HIGH** | Strong anomaly signal |
+| `< -0.05` | **MEDIUM** | Moderate anomaly |
+| `≥ -0.05` | **LOW** | Slightly unusual |
+
+## Feature List
+
+### Common Features (31 — Applicatif model)
+
+#### HTTP Behavior
+
+| Feature | Description |
+|---------|-------------|
+| `hits` | Request count in the window |
+| `hit_velocity` | Requests per second |
+| `fuzzing_index` | Path/parameter diversity anomaly score |
+| `post_ratio` | Fraction of POST requests |
+| `port_exhaustion_ratio` | Fraction of distinct source ports / total |
+| `orphan_ratio` | Requests without TLS correlation |
+| `head_ratio` | Fraction of HEAD requests |
+| `http10_ratio` | Fraction of HTTP/1.0 requests |
+| `generic_accept_ratio` | Fraction of short Accept headers |
+| `sec_fetch_absence_rate` | Fraction missing Sec-Fetch-Site |
+| `missing_accept_enc_ratio` | Fraction missing Accept-Encoding |
+| `http_scheme_ratio` | Fraction using HTTP (not HTTPS) |
+
+#### Connection Management
+
+| Feature | Description |
+|---------|-------------|
+| `max_keepalives` | Max requests on a single Keep-Alive connection |
+| `tcp_shared_count` | TCP connections shared between sessions |
+| `multiplexing_efficiency` | HTTP/2 multiplexing efficiency |
+
+#### Browser Fingerprint
+
+| Feature | Description |
+|---------|-------------|
+| `header_count` | HTTP headers sent |
+| `has_accept_language` | Accept-Language header presence |
+| `has_cookie` | Cookie header presence |
+| `has_referer` | Referer header presence |
+| `modern_browser_score` | Composite browser compliance score (0–100) |
+| `ua_ch_mismatch` | User-Agent vs Client Hints inconsistency |
+| `ip_id_zero_ratio` | IP packets with ID=0 (headless/minimal stack) |
+| `header_order_shared_count` | IPs sharing same header order |
+| `header_order_confidence` | Normalized entropy of header order |
+| `distinct_header_orders` | Distinct header orderings per IP |
+| `is_fake_navigation` | Sec-Fetch-Mode=navigate with non-document dest |
+
+#### Navigation Patterns
+
+| Feature | Description |
+|---------|-------------|
+| `request_size_variance` | Variance of request sizes |
+| `mss_mobile_mismatch` | TCP MSS vs mobile profile inconsistency |
+| `asset_ratio` | Static asset request fraction |
+| `direct_access_ratio` | Direct accesses (no referer) |
+| `is_ua_rotating` | User-Agent rotation detected (flag) |
+| `distinct_ja4_count` | Distinct JA4 fingerprints per IP |
+| `anomalous_payload_ratio` | Anomalous payload size fraction |
+
+#### Concentration & Rarity
+
+| Feature | Description |
+|---------|-------------|
+| `src_port_density` | Source port entropy |
+| `ja4_asn_concentration` | JA4 concentration within ASN |
+| `ja4_country_concentration` | JA4 concentration per country |
+| `is_rare_ja4` | Rare JA4 fingerprint (< 100 total hits) |
+
+#### Temporal & Diversity
+
+| Feature | Description |
+|---------|-------------|
+| `temporal_entropy` | Temporal distribution entropy |
+| `path_diversity_ratio` | URL path diversity |
+| `url_depth_variance` | URL depth variance |
+| `ja3_diversity_ratio` | JA3 diversity ratio per IP |
+
+### Additional TCP/TLS Features (Complet model only — 4 extra)
+
+| Feature | Description |
+|---------|-------------|
+| `tcp_jitter_variance` | TCP inter-packet jitter variance |
+| `alpn_http_mismatch` | ALPN vs actual HTTP protocol mismatch |
+| `is_alpn_missing` | ALPN absent in ClientHello |
+| `sni_host_mismatch` | TLS SNI vs HTTP Host mismatch |
+
+### L4 Fingerprint Features (Complet model)
+
+| Feature | Description |
+|---------|-------------|
+| `avg_ttl` | Average IP TTL (OS fingerprint) |
+| `ttl_std` | TTL standard deviation |
+| `no_window_scale_ratio` | Fraction without TCP window scale |
+| `syn_timing_cv` | SYN timing coefficient of variation |
+| `tls12_ratio` | Fraction of TLS 1.2 connections |
+| `ip_df_variance` | IP Don't-Fragment flag variance |
+
+## Detection Pipeline
+
+```
+1. Read view_ai_features_1h (last 24h) → DataFrame
+2. Read view_ip_recurrence → recurrence map
+3. Clean columns (fillna, astype)
+4. Split by correlated=1 / correlated=0
+5. For each model (Complet, Applicatif):
+   a. A7: Validate features (exclude missing/constant)
+   b. Separate known bots → log as KNOWN_BOT
+   c. Filter human baseline (asn_label='human', min 500 sessions)
+   d. Load or train Isolation Forest model
+   e. A1: Check concept drift (KS test on features)
+   f. Score unknown traffic
+   g. A10: Normalize scores to [-1, 0]
+   h. A2: Compute adaptive threshold = min(percentile_5, ANOMALY_THRESHOLD)
+   i. A6: Apply recurrence weighting
+   j. Filter scores below threshold
+   k. A4: SHAP explainability (top 5 features)
+   l. A8: DBSCAN clustering (campaign detection)
+6. Concatenate results, deduplicate by src_ip (keep lowest score)
+7. A5: Deduplication with TTL (skip recently reported IPs)
+8. Insert into ml_detected_anomalies + ml_all_scores
+```
+
+## Concept Drift Detection (A1)
+
+Uses the **Kolmogorov-Smirnov test** to compare feature distributions between the current data and the training data. If the fraction of drifted features exceeds `DRIFT_THRESHOLD` (default: 0.30), the model is retrained.
+
+## SHAP Explainability (A4)
+
+When enabled (`ENABLE_SHAP=true`), computes SHAP values for each detected anomaly using `shap.TreeExplainer`. The top 5 contributing features are stored in the `reason` field.
+
+## DBSCAN Clustering (A8)
+
+When enabled (`ENABLE_CLUSTERING=true`), applies DBSCAN on anomaly feature vectors to group related anomalies into campaigns. Each anomaly gets a `campaign_id` (-1 = no cluster).
+
+## Anubis Bot-Rule Enrichment
+
+The `view_ai_features_1h` view enriches each IP with Anubis bot detection using a priority cascade:
+1. **UA + IP combined** (same `rule_id`) — highest confidence
+2. **UA only** (no IP requirement)
+3. **IP only** (no UA requirement)
+4. **ASN match**
+5. **Country match**
+
+## Environment Variables
+
+| Variable | Type | Default | Description |
+|----------|------|---------|-------------|
+| `CLICKHOUSE_HOST` | string | `clickhouse` | ClickHouse server hostname |
+| `CLICKHOUSE_PORT` | int | `8123` | ClickHouse HTTP port |
+| `CLICKHOUSE_DB` | string | `mabase_prod` | Database name |
+| `CLICKHOUSE_USER` | string | `admin` | ClickHouse username |
+| `CLICKHOUSE_PASSWORD` | string | `""` | ClickHouse password |
+| `ISOLATION_CONTAMINATION` | float | `0.02` | Contamination parameter for Isolation Forest |
+| `ANOMALY_THRESHOLD` | float | `-0.03` | Score threshold for anomaly detection |
+| `ANOMALY_PERCENTILE` | int | `5` | Percentile for adaptive threshold (A2) |
+| `CYCLE_INTERVAL_SEC` | int | `300` | Seconds between detection cycles |
+| `MAX_CONSECUTIVE_FAILURES` | int | `3` | Max consecutive failures before exit |
+| `BOT_DETECTOR_LOG` | string | `/var/log/bot_detector/decisions.jsonl` | Decision log file path |
+| `LOG_BACKUP_COUNT` | int | `7` | Number of rotated log backups |
+| `MODEL_DIR` | string | `/var/lib/bot_detector` | Model persistence directory |
+| `RETRAIN_INTERVAL_HOURS` | int | `24` | Hours between model retraining |
+| `MODEL_HISTORY_COUNT` | int | `10` | Number of model versions to keep |
+| `DRIFT_THRESHOLD` | float | `0.30` | KS-test drift threshold (A1) |
+| `ENABLE_MULTIWINDOW` | bool | `false` | Enable 24h multi-window analysis (A3) |
+| `MULTIWINDOW_VIEW` | string | `view_ai_features_24h` | View for multi-window mode |
+| `ENABLE_SHAP` | bool | `true` | Enable SHAP explainability (A4) |
+| `DEDUP_TTL_MIN` | int | `60` | Deduplication TTL in minutes (A5) |
+| `RECURRENCE_WEIGHT` | float | `0.005` | Recurrence score weighting factor (A6) |
+| `MIN_VALID_FEATURE_RATIO` | float | `0.50` | Min valid feature ratio (A7) |
+| `ENABLE_CLUSTERING` | bool | `true` | Enable DBSCAN clustering (A8) |
+| `CLUSTERING_MIN_SAMPLES` | int | `3` | DBSCAN min samples per cluster |
+| `HEALTH_PORT` | int | `8080` | Health check HTTP server port |
+
+## Output Tables
+
+### ml_detected_anomalies
+
+Anomaly detections above the threat threshold. Engine: `ReplacingMergeTree(detected_at)`, ORDER BY `(src_ip)`, TTL 30 days.
+
+Key columns: `detected_at`, `src_ip`, `ja4`, `host`, `bot_name`, `anomaly_score`, `raw_anomaly_score`, `threat_level`, `model_name`, `recurrence`, `campaign_id`, `reason`, `anubis_bot_name`, `anubis_bot_action`, `anubis_bot_category`, plus all ML features.
+
+### ml_all_scores
+
+All classifications (no threshold filter) for observability. Engine: `ReplacingMergeTree(detected_at)`, ORDER BY `(window_start, src_ip, ja4, host, model_name)`, TTL 3 days.
+
+## Decision Log Format
+
+The `decisions.jsonl` file contains structured JSONL entries:
+
+```json
+{"event": "CYCLE_START", "cycle_id": "20260309T143000", "total": 5000, "human": 1500, "known_bot": 200, "correlated": 3000}
+{"event": "ANOMALY", "src_ip": "203.0.113.42", "score": -0.25, "threat_level": "HIGH", "reason": "hit_velocity=45.2, fuzzing_index=0.8, ...", "campaign_id": 3}
+{"event": "KNOWN_BOT", "src_ip": "198.51.100.10", "bot_name": "AhrefsBot"}
+{"event": "CYCLE_END", "cycle_id": "20260309T143000", "anomalies": 15, "known_bots": 200, "duration_sec": 12.5}
+```
+
+Log rotation: 50 MB max size × `LOG_BACKUP_COUNT` backups (default 7).
+
+## Health Check Endpoint
+
+- **URL**: `GET http://localhost:8080/`
+- **Response**: `200 OK` with status JSON
+- Runs in a separate thread
+
+## Model Persistence
+
+| File | Description |
+|------|-------------|
+| `model_<name>_<version>.joblib` | Serialized Isolation Forest (joblib) |
+| `model_<name>_<version>.meta.json` | Model metadata (features, thresholds, training stats) |
+| `model_<name>.current` | Pointer to active model version |
+| `training_history.jsonl` | Training history log |
+
+Models are rotated: only the last `MODEL_HISTORY_COUNT` versions (default 10) are kept.
+
+## Docker Deployment
+
+```bash
+# Build
+make build-bot-detector
+
+# Run with docker-compose
+cd services/bot-detector
+docker-compose up -d
+```
+
+### Volumes
+
+| Host Path | Container Path | Description |
+|-----------|---------------|-------------|
+| `./bot_detector_logs` | `/var/log/bot_detector` | Decision logs (JSONL) |
+| `./bot_detector_models` | `/var/lib/bot_detector` | Persisted ML models |
+| `./reputation/data/user_files/bot_ip.csv` | `/data/bot_ip.csv` (ro) | Known bot IP list |
+| `./reputation/data/user_files/bot_ja4.csv` | `/data/bot_ja4.csv` (ro) | Known bot JA4 list |
+| `./reputation/data/user_files/asn_reputation.csv` | `/data/asn_reputation.csv` (ro) | ASN reputation labels |
--- a/docs/services/correlator.md
+++ b/docs/services/correlator.md
@ -0,0 +1,220 @@
+# Correlator
+
+The correlator (`logcorrelator`) is a Go daemon that joins HTTP events from [mod-reqin-log](mod-reqin-log.md) (source A) with TLS/network events from [sentinel](sentinel.md) (source B) into unified correlated log entries. It uses a `src_ip:src_port` key with a configurable time window to match events, supports HTTP Keep-Alive connections, and writes results to ClickHouse, file, and/or stdout.
+
+## Correlation Algorithm
+
+### Key Matching
+
+Events are correlated by their **correlation key**: `src_ip:src_port`. Since a client's ephemeral source port uniquely identifies a TCP connection, matching on this pair reliably joins the HTTP request (seen by Apache) with the TLS handshake (seen by sentinel) from the same connection.
+
+### Time Window
+
+Events must arrive within the configured time window (default: **10 seconds**) to be matched. This accounts for:
+- Processing latency between Apache and sentinel
+- Packet capture buffering
+- UNIX socket delivery ordering
+
+### Keep-Alive Support
+
+In `one_to_many` mode (default), a single TLS handshake event (source B) can match **multiple** HTTP requests (source A) on the same TCP connection:
+
+1. Source B event arrives → buffered with TTL (default: 120 s)
+2. Source A event arrives with same key → correlation match, B event TTL resets
+3. Next A event on same connection → matches same B event (TTL resets again)
+4. Connection closes → B event expires after TTL
+
+Each A event within a Keep-Alive session gets an incrementing `keepalives` counter.
+
+### Orphan Handling
+
+- **Source A orphans** (HTTP without TLS match): Emitted after `apache_emit_delay_ms` (default: 500 ms) with `correlated=false`, `orphan_side=A`
+- **Source B orphans** (TLS without HTTP match): Not emitted by default (`network_emit: false`)
+- **Buffer overflow**: Oldest events are rotated out and emitted as orphans
+
+### Field Merging
+
+When two events are correlated:
+- HTTP fields (method, path, headers, etc.) come from source A
+- TLS/network fields (JA4, JA3, IP/TCP metadata) come from source B
+- On field collision with different values: both are kept with `a_` and `b_` prefixes
+
+## Configuration Reference
+
+Configuration is loaded from a YAML file (default: `/etc/logcorrelator/logcorrelator.yml`).
+
+### Log Settings
+
+| Name | Type | Default | Description |
+|------|------|---------|-------------|
+| `log.level` | string | `INFO` | Log level: `DEBUG`, `INFO`, `WARN`, `ERROR` |
+
+### Input Settings
+
+| Name | Type | Default | Description |
+|------|------|---------|-------------|
+| `inputs.unix_sockets[].name` | string | — | Human-readable source name (e.g., `http`, `network`) |
+| `inputs.unix_sockets[].path` | string | — | UNIX socket path to listen on |
+| `inputs.unix_sockets[].format` | string | `json` | Input format |
+| `inputs.unix_sockets[].source_type` | string | — | Event source: `A` (HTTP), `B` (Network) |
+| `inputs.unix_sockets[].socket_permissions` | string | `0666` | Socket file permissions (octal) |
+
+### Output Settings
+
+#### File Output
+
+| Name | Type | Default | Description |
+|------|------|---------|-------------|
+| `outputs.file.enabled` | bool | `true` | Enable file output |
+| `outputs.file.path` | string | `/var/log/logcorrelator/correlated.log` | Output file path |
+
+#### ClickHouse Output
+
+| Name | Type | Default | Description |
+|------|------|---------|-------------|
+| `outputs.clickhouse.enabled` | bool | `false` | Enable ClickHouse output |
+| `outputs.clickhouse.dsn` | string | — | ClickHouse DSN (e.g., `clickhouse://user:pass@host:9000/db`) |
+| `outputs.clickhouse.table` | string | — | Target table name |
+| `outputs.clickhouse.batch_size` | int | `500` | Records per batch insert |
+| `outputs.clickhouse.flush_interval_ms` | int | `200` | Flush interval in milliseconds |
+| `outputs.clickhouse.max_buffer_size` | int | `5000` | Maximum in-memory buffer size |
+| `outputs.clickhouse.drop_on_overflow` | bool | `true` | Drop records when buffer is full |
+| `outputs.clickhouse.async_insert` | bool | `true` | Use ClickHouse async inserts |
+| `outputs.clickhouse.timeout_ms` | int | `1000` | Operation timeout in milliseconds |
+
+#### Stdout Output
+
+| Name | Type | Default | Description |
+|------|------|---------|-------------|
+| `outputs.stdout.enabled` | bool | `false` | Enable stdout output |
+| `outputs.stdout.level` | string | — | Output verbosity filter |
+
+### Correlation Settings
+
+| Name | Type | Default | Description |
+|------|------|---------|-------------|
+| `correlation.time_window.value` | int | `10` | Time window value |
+| `correlation.time_window.unit` | string | `s` | Time window unit (`s`, `ms`) |
+| `correlation.orphan_policy.apache_always_emit` | bool | `true` | Always emit A events even without B match |
+| `correlation.orphan_policy.apache_emit_delay_ms` | int | `500` | Delay before emitting orphan A (ms) |
+| `correlation.orphan_policy.network_emit` | bool | `false` | Emit B events without A match |
+| `correlation.matching.mode` | string | `one_to_many` | Matching mode: `one_to_one` or `one_to_many` |
+| `correlation.buffers.max_http_items` | int | `10000` | Max buffered HTTP (source A) events |
+| `correlation.buffers.max_network_items` | int | `20000` | Max buffered network (source B) events |
+| `correlation.ttl.network_ttl_s` | int | `120` | TTL for source B events (seconds) |
+| `correlation.exclude_source_ips` | []string | `[]` | IPs or CIDRs to exclude from correlation |
+| `correlation.include_dest_ports` | []int | `[]` | If non-empty, only correlate events on these ports |
+
+### Metrics Settings
+
+| Name | Type | Default | Description |
+|------|------|---------|-------------|
+| `metrics.enabled` | bool | `false` | Enable metrics HTTP server |
+| `metrics.addr` | string | `:8080` | Metrics server listen address |
+
+## Input Events
+
+### Source A (HTTP — from mod-reqin-log)
+
+JSON fields: `time`, `src_ip`, `src_port`, `dst_ip`, `dst_port`, `method`, `scheme`, `host`, `path`, `query`, `http_version`, `client_headers`, `header_*`
+
+### Source B (Network — from sentinel)
+
+JSON fields: `src_ip`, `src_port`, `dst_ip`, `dst_port`, `ip_meta_*`, `tcp_meta_*`, `tls_version`, `tls_sni`, `tls_alpn`, `ja4`, `ja3`, `ja3_hash`, `conn_id`, `syn_to_clienthello_ms`, `timestamp`
+
+## Output CorrelatedLog JSON Schema
+
+```json
+{
+  "timestamp": "2026-03-09T14:30:00Z",
+  "src_ip": "203.0.113.42",
+  "src_port": 52341,
+  "dst_ip": "192.168.1.10",
+  "dst_port": 443,
+  "correlated": true,
+  "method": "GET",
+  "host": "example.com",
+  "path": "/api/v1/users",
+  "ja4": "t13d1516h2_8daaf6152771_b0da82dd1658",
+  "ja3_hash": "e7d705a3286e19ea42f587b344ee6865",
+  "ip_meta_ttl": 64,
+  "tcp_meta_window_size": 65535,
+  "tls_version": "1.3",
+  "tls_sni": "example.com",
+  "tls_alpn": "h2",
+  "header_User-Agent": "Mozilla/5.0 ...",
+  "keepalives": 3
+}
+```
+
+Core fields are always present; additional fields are merged from A and B event raw data.
+
+## ClickHouse Sink
+
+- **Protocol**: ClickHouse native TCP (port 9000) via `clickhouse-go/v2`
+- **Target table**: `http_logs_raw` (raw JSON stored, then parsed by materialized views)
+- **Batch inserts**: Buffered up to `batch_size` records (default 500)
+- **Flush interval**: Default 200 ms timer triggers flush if batch not full
+- **Retry behavior**: Up to 3 retries with exponential backoff (100 ms base)
+- **Connection ping**: 5-second timeout on startup
+- **Buffer overflow**: Records dropped when buffer exceeds `max_buffer_size` (configurable)
+
+## Metrics HTTP Server
+
+When `metrics.enabled: true`, exposes:
+
+| Endpoint | Description |
+|----------|-------------|
+| `GET /metrics` | Correlation metrics as JSON (events received, correlated, orphans, buffer sizes) |
+| `GET /health` | Health check endpoint |
+
+## systemd Service
+
+```ini
+[Unit]
+Description=logcorrelator service
+After=network.target
+
+[Service]
+Type=simple
+User=logcorrelator
+Group=logcorrelator
+ExecStart=/usr/bin/logcorrelator -config /etc/logcorrelator/logcorrelator.yml
+ExecReload=/bin/kill -HUP $MAINPID
+Restart=on-failure
+RestartSec=5
+RuntimeDirectory=logcorrelator
+RuntimeDirectoryMode=0755
+
+# Security hardening
+NoNewPrivileges=true
+ProtectSystem=strict
+ProtectHome=true
+ReadWritePaths=/var/log/logcorrelator /etc/logcorrelator
+
+# Resource limits
+LimitNOFILE=65536
+TimeoutStartSec=10
+TimeoutStopSec=30
+
+[Install]
+WantedBy=multi-user.target
+```
+
+### Security Hardening
+
+- Runs as dedicated `logcorrelator` user/group
+- `NoNewPrivileges=true` — prevents privilege escalation
+- `ProtectSystem=strict` — read-only filesystem except `ReadWritePaths`
+- `ProtectHome=true` — no access to home directories
+- `RuntimeDirectory=logcorrelator` — systemd creates socket directory with correct ownership
+
+## RPM Package Contents
+
+| Path | Description |
+|------|-------------|
+| `/usr/bin/logcorrelator` | Binary |
+| `/etc/logcorrelator/logcorrelator.yml` | Configuration file |
+| `/usr/lib/systemd/system/logcorrelator.service` | systemd unit |
+| `/var/log/logcorrelator/` | Log directory |
+| `/var/run/logcorrelator/` | Socket directory (RuntimeDirectory) |
--- a/docs/services/dashboard.md
+++ b/docs/services/dashboard.md
@ -0,0 +1,308 @@
+# Dashboard
+
+The dashboard is a SOC (Security Operations Center) web application built with FastAPI (backend) and React (frontend) that provides real-time visualization, investigation, and analysis of bot detections generated by the [bot-detector](bot-detector.md). It queries ClickHouse (`mabase_prod`) for all data.
+
+## Technology Stack
+
+| Component | Technology |
+|-----------|-----------|
+| Backend | Python 3.11 + FastAPI |
+| Frontend | React + Vite |
+| Database | ClickHouse (via `ja4_common` shared client) |
+| API Docs | Swagger UI (`/docs`) and ReDoc (`/redoc`) |
+
+## Configuration
+
+| Variable | Type | Default | Description |
+|----------|------|---------|-------------|
+| `CLICKHOUSE_HOST` | string | `clickhouse` | ClickHouse hostname |
+| `CLICKHOUSE_PORT` | int | `8123` | ClickHouse HTTP port |
+| `CLICKHOUSE_DB` | string | `mabase_prod` | Database name |
+| `CLICKHOUSE_USER` | string | `admin` | ClickHouse user |
+| `CLICKHOUSE_PASSWORD` | string | `""` | ClickHouse password |
+| `API_HOST` | string | `0.0.0.0` | API listen address |
+| `API_PORT` | int | `8000` | API listen port |
+| `CORS_ORIGINS` | list | `["http://localhost:3000", "http://127.0.0.1:3000"]` | Allowed CORS origins |
+
+## API Reference
+
+All endpoints are prefixed with `/api/`. The dashboard exposes **74+ endpoints** across 20 routers.
+
+### Health
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/health` | Health check — returns ClickHouse connection status |
+
+---
+
+### Metrics (`/api/metrics`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/metrics` | Global dashboard metrics: detection counts by threat level, unique IPs, time series |
+| GET | `/api/metrics/threats` | Threat distribution summary |
+| GET | `/api/metrics/baseline` | Human baseline statistics |
+
+---
+
+### Detections (`/api/detections`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/detections` | Paginated detection list with filtering, sorting, and text search |
+| GET | `/api/detections/{detection_id}` | Single detection details |
+
+**Query Parameters** (GET `/api/detections`):
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `page` | int | Page number (default: 1) |
+| `page_size` | int | Items per page (default: 20) |
+| `threat_level` | string | Filter by threat level |
+| `model_name` | string | Filter by model name |
+| `search` | string | Full-text search across IP, JA4, host, bot_name |
+| `sort_by` | string | Sort field |
+| `sort_order` | string | `asc` or `desc` |
+
+---
+
+### Investigation (`/api/investigation`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/investigation/{ip}/summary` | **Primary investigation endpoint.** Aggregates ML score, brute-force, TCP spoofing, JA4 rotation, persistence, and 24h timeline into a single response with a `risk_score` (0–100) |
+
+---
+
+### Reputation (`/api/reputation`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/reputation/ip/{ip_address}` | Full IP reputation from IP-API.com and IPinfo.io (proxy, VPN, Tor, hosting detection) |
+| GET | `/api/reputation/ip/{ip_address}/summary` | Simplified reputation summary |
+
+---
+
+### Analysis (`/api/analysis`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/analysis/{ip}/subnet` | Subnet analysis for an IP (related IPs in same /24) |
+| GET | `/api/analysis/{ip}/country` | Country-level analysis for an IP |
+| GET | `/api/analysis/country` | Global country analysis across all detections |
+| GET | `/api/analysis/{ip}/ja4` | JA4 fingerprint analysis for an IP |
+| GET | `/api/analysis/{ip}/user-agents` | User-agent analysis for an IP |
+| GET | `/api/analysis/{ip}/recommendation` | SOC classification recommendation |
+| POST | `/api/analysis/classifications` | Create a classification (legitimate/suspicious/malicious) |
+| GET | `/api/analysis/classifications` | List all classifications |
+| GET | `/api/analysis/classifications/stats` | Classification statistics |
+
+---
+
+### Entities (`/api/entities`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/entities/types` | List available entity types |
+| GET | `/api/entities/subnet/{subnet}` | Investigate a subnet |
+| GET | `/api/entities/{entity_type}/{entity_value}` | Investigate any entity (IP, JA4, subnet, UA, host) |
+| GET | `/api/entities/{entity_type}/{entity_value}/related` | Related entities |
+| GET | `/api/entities/{entity_type}/{entity_value}/user_agents` | User-agents for entity |
+| GET | `/api/entities/{entity_type}/{entity_value}/client_headers` | Client headers for entity |
+| GET | `/api/entities/{entity_type}/{entity_value}/paths` | URL paths for entity |
+| GET | `/api/entities/{entity_type}/{entity_value}/query_params` | Query parameters for entity |
+
+---
+
+### Incidents (`/api/incidents`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/incidents` | List all incidents |
+| GET | `/api/incidents/clusters` | Active incident clusters (behavioral similarity grouping) |
+| GET | `/api/incidents/{cluster_id}` | Incident cluster details |
+| POST | `/api/incidents/{cluster_id}/classify` | Classify an incident cluster |
+
+---
+
+### Fingerprints (`/api/fingerprints`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/fingerprints/spoofing` | TLS fingerprint spoofing detection |
+| GET | `/api/fingerprints/ja4-ua-matrix` | JA4 ↔ User-Agent correlation matrix |
+| GET | `/api/fingerprints/ua-analysis` | Suspicious user-agent analysis |
+| GET | `/api/fingerprints/ip/{ip}/coherence` | Fingerprint coherence analysis per IP |
+| GET | `/api/fingerprints/legitimate-ja4` | Known legitimate JA4 fingerprints |
+| GET | `/api/fingerprints/asn-correlation` | JA4-ASN correlation analysis |
+
+---
+
+### Brute Force (`/api/bruteforce`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/bruteforce/targets` | Brute-force target hosts |
+| GET | `/api/bruteforce/attackers` | Brute-force attacker IPs |
+| GET | `/api/bruteforce/timeline` | Brute-force attack timeline |
+| GET | `/api/bruteforce/host/{host}/attackers` | Attackers for a specific host |
+
+---
+
+### TCP Spoofing (`/api/tcp-spoofing`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/tcp-spoofing/overview` | TCP/OS fingerprint spoofing overview |
+| GET | `/api/tcp-spoofing/list` | Spoofing detection list |
+| GET | `/api/tcp-spoofing/matrix` | TTL × MSS anomaly matrix |
+
+---
+
+### Header Fingerprint (`/api/headers`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/headers/clusters` | Header fingerprint clusters (suspicious patterns) |
+| GET | `/api/headers/cluster/{hash}/ips` | IPs sharing a header fingerprint |
+
+---
+
+### Heatmap (`/api/heatmap`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/heatmap/hourly` | Hourly traffic heatmap |
+| GET | `/api/heatmap/top-hosts` | Top hosts by traffic volume |
+| GET | `/api/heatmap/matrix` | Activity/hour matrix |
+
+---
+
+### Botnets (`/api/botnets`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/botnets/ja4-spread` | JA4 geographic spread (botnet indicator) |
+| GET | `/api/botnets/ja4/{ja4}/countries` | Country distribution for a JA4 fingerprint |
+| GET | `/api/botnets/summary` | Global botnet detection summary |
+
+---
+
+### Rotation (`/api/rotation`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/rotation/ja4-rotators` | IPs rotating JA4 fingerprints (evasion detection) |
+| GET | `/api/rotation/persistent-threats` | Persistent threats across time windows |
+| GET | `/api/rotation/ip/{ip}/ja4-history` | JA4 fingerprint history for an IP |
+| GET | `/api/rotation/sophistication` | Sophistication score analysis |
+| GET | `/api/rotation/proactive-hunt` | Proactive threat hunting suggestions |
+
+---
+
+### ML Features (`/api/ml`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/ml/top-anomalies` | Top anomalies with feature details |
+| GET | `/api/ml/ip/{ip}/radar` | Feature radar chart data for an IP |
+| GET | `/api/ml/score-distribution` | Anomaly score distribution histogram |
+| GET | `/api/ml/score-trends` | Score trends over time |
+| GET | `/api/ml/b-features` | Source B (TCP/TLS) feature analysis |
+| GET | `/api/ml/campaigns` | ML-detected campaign analysis |
+| GET | `/api/ml/scatter` | Feature scatter plot data |
+
+---
+
+### Attributes (`/api/attributes`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/attributes/{attr_type}` | List distinct values for an attribute (ja4, user_agent, asn, country, host) with counts |
+
+---
+
+### Variability (`/api/variability`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/variability/{attr_type}/{value}` | Behavioral variability analysis for an attribute value |
+| GET | `/api/variability/{attr_type}/{value}/ips` | IPs associated with an attribute value |
+| GET | `/api/variability/{attr_type}/{value}/attributes` | Attribute breakdown for a value |
+| GET | `/api/variability/{attr_type}/{value}/user_agents` | User-agents for an attribute value |
+
+---
+
+### Clustering (`/api/clustering`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/clustering/status` | Clustering cache status |
+| GET | `/api/clustering/clusters` | K-Means cluster list |
+| GET | `/api/clustering/cluster/{cluster_id}/points` | Data points in a cluster |
+| GET | `/api/clustering/cluster/{cluster_id}/ips` | IPs in a cluster |
+
+---
+
+### Search (`/api/search`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/search/quick` | Cross-entity search (IP, JA4, host, UA, country, ASN) |
+
+---
+
+### Audit (`/api/audit`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| POST | `/api/audit/logs` | Create an audit log entry |
+| GET | `/api/audit/logs` | Query audit logs (filtered, paginated) |
+| GET | `/api/audit/stats` | Audit statistics |
+| GET | `/api/audit/users/activity` | Per-user activity summary |
+
+## Frontend Structure
+
+The React frontend is built with Vite and served as static assets:
+
+- **Entry point**: `/` → `frontend/dist/index.html`
+- **Static assets**: `/assets/*` → `frontend/dist/assets/`
+- **SPA routing**: All non-`/api/` paths fall through to `index.html` (React Router)
+- **API proxy**: Frontend calls `/api/*` which is handled by FastAPI routers
+
+## Services
+
+### IPReputationService
+
+Queries public IP reputation databases (IP-API.com, IPinfo.io) without API keys:
+- Proxy/VPN/Tor detection
+- ASN, country, ISP information
+- Hosting provider identification
+
+### ClusteringEngine
+
+K-Means clustering on ML features with caching:
+- Automatic cluster count selection
+- Feature normalization via StandardScaler
+- In-memory cache with TTL
+
+## Deployment
+
+```bash
+# Build Docker image
+make build-dashboard
+
+# Run tests
+make test-dashboard
+
+# Run locally (development)
+cd services/dashboard
+uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000
+```
+
+### Health Check
+
+```
+GET /health → {"status": "healthy", "clickhouse": "connected"}
+```
--- a/docs/services/mod-reqin-log.md
+++ b/docs/services/mod-reqin-log.md
@ -0,0 +1,200 @@
+# mod-reqin-log
+
+`mod_reqin_log` is an Apache HTTPD module (C shared object) that captures HTTP request metadata and sends it as JSON to a UNIX datagram socket. It serves as the HTTP-layer ingestion point for the ja4-platform pipeline, feeding request data to the [correlator](correlator.md) for joining with TLS fingerprint data from [sentinel](sentinel.md).
+
+## Purpose
+
+Apache processes HTTP requests after TLS termination, so it has access to the decoded HTTP method, path, headers, and client IP/port. mod-reqin-log hooks into the `post_read_request` phase to serialize this data immediately, before any rewrite or auth module modifies the request.
+
+## Apache Directives Reference
+
+All directives are server-level (`RSRC_CONF`):
+
+| Directive | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `JsonSockLogEnabled` | Flag (On/Off) | Off | Enable or disable the module |
+| `JsonSockLogSocket` | String | — | UNIX domain socket path for JSON output |
+| `JsonSockLogHeaders` | String list | — | HTTP header names to log (repeatable) |
+| `JsonSockLogMaxHeaders` | Integer | `25` | Maximum number of headers to log |
+| `JsonSockLogMaxHeaderValueLen` | Integer | `256` | Maximum length of each header value (truncated beyond) |
+| `JsonSockLogReconnectInterval` | Integer (seconds) | `10` | Minimum seconds between reconnection attempts |
+| `JsonSockLogErrorReportInterval` | Integer (seconds) | `10` | Minimum seconds between error log entries (throttling) |
+| `JsonSockLogLevel` | String | `WARNING` | Module log level: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `EMERG` |
+
+### Example httpd.conf
+
+```apache
+LoadModule reqin_log_module modules/mod_reqin_log.so
+
+JsonSockLogEnabled On
+JsonSockLogSocket /var/run/logcorrelator/http.socket
+JsonSockLogHeaders User-Agent Accept Accept-Encoding Accept-Language
+JsonSockLogHeaders Content-Type X-Request-Id X-Trace-Id X-Forwarded-For
+JsonSockLogHeaders Sec-CH-UA Sec-CH-UA-Mobile Sec-CH-UA-Platform
+JsonSockLogHeaders Sec-Fetch-Dest Sec-Fetch-Mode Sec-Fetch-Site
+JsonSockLogMaxHeaders 25
+JsonSockLogMaxHeaderValueLen 256
+JsonSockLogReconnectInterval 10
+JsonSockLogErrorReportInterval 10
+JsonSockLogLevel WARNING
+```
+
+## Output JSON Schema
+
+Each HTTP request is serialized as a flat JSON object and sent as a single UNIX datagram:
+
+```json
+{
+  "time": "2026-03-09T14:30:00Z",
+  "src_ip": "203.0.113.42",
+  "src_port": 52341,
+  "dst_ip": "192.168.1.10",
+  "dst_port": 443,
+  "method": "GET",
+  "scheme": "https",
+  "host": "example.com",
+  "path": "/api/v1/users",
+  "query": "page=1&limit=20",
+  "http_version": "HTTP/2.0",
+  "client_headers": "User-Agent,Accept,Accept-Encoding,Accept-Language",
+  "header_User-Agent": "Mozilla/5.0 ...",
+  "header_Accept": "text/html,application/xhtml+xml",
+  "header_Accept-Encoding": "gzip, deflate, br",
+  "header_Accept-Language": "en-US,en;q=0.9",
+  "header_Sec-Fetch-Dest": "document",
+  "header_Sec-Fetch-Mode": "navigate",
+  "header_Sec-Fetch-Site": "none"
+}
+```
+
+### Field Reference
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `time` | string (ISO 8601) | Request timestamp (UTC) |
+| `src_ip` | string | Client IP address |
+| `src_port` | int | Client source port |
+| `dst_ip` | string | Server IP address |
+| `dst_port` | int | Server port |
+| `method` | string | HTTP method (`GET`, `POST`, etc.) |
+| `scheme` | string | URL scheme (`http` or `https`) |
+| `host` | string | HTTP Host header value |
+| `path` | string | Request URI path |
+| `query` | string | Query string (without `?`) |
+| `http_version` | string | HTTP version (`HTTP/1.1`, `HTTP/2.0`) |
+| `client_headers` | string | Comma-separated list of header names sent by client (order preserved) |
+| `header_<Name>` | string | Value of each configured header (one field per header) |
+
+### Sensitive Headers
+
+The following headers are **always excluded** from output regardless of `JsonSockLogHeaders`:
+
+- `Authorization`
+- `Cookie`
+- `Set-Cookie`
+- `X-Api-Key`
+- `X-Auth-Token`
+- `Proxy-Authorization`
+- `WWW-Authenticate`
+
+### Size Limits
+
+- Maximum JSON size: **64 KB** (prevents memory exhaustion DoS)
+- Header values are truncated to `JsonSockLogMaxHeaderValueLen` bytes
+
+## Thread Safety
+
+mod-reqin-log is designed for Apache's `worker` and `event` MPMs (multi-threaded):
+
+- **Socket FD** is protected by an `apr_thread_mutex_t` (`fd_mutex`)
+- **Per-child process state** includes the socket file descriptor, mutex, and error tracking
+- **Error reporting** uses `LOG_THROTTLED` macro with timestamp-based deduplication
+- All JSON serialization uses per-request pool allocation — no shared buffers
+
+### Architecture
+
+```
+Apache HTTPD process
+├── child process 1
+│   ├── fd_mutex (apr_thread_mutex_t)
+│   ├── socket_fd (shared across threads)
+│   ├── thread 1 → post_read_request → serialize JSON → mutex lock → sendto() → unlock
+│   ├── thread 2 → post_read_request → serialize JSON → mutex lock → sendto() → unlock
+│   └── ...
+├── child process 2
+│   ├── fd_mutex
+│   ├── socket_fd (independent)
+│   └── ...
+```
+
+## Reconnection Behavior
+
+- Socket is opened during `child_init` (per-child process startup)
+- If the socket is unavailable at startup, connection is deferred
+- On send failure, reconnection is attempted respecting `JsonSockLogReconnectInterval`
+- Failed sends are silently dropped (HTTP request processing is not blocked)
+- Error log entries are throttled by `JsonSockLogErrorReportInterval`
+- Socket type: `SOCK_DGRAM` (connectionless UNIX datagram)
+- Non-blocking sends with `MSG_NOSIGNAL`
+
+## Deployment
+
+### Installation via RPM
+
+```bash
+rpm -ivh mod_reqin_log-1.0.19-1.el10.x86_64.rpm
+```
+
+### LoadModule Directive
+
+```apache
+LoadModule reqin_log_module modules/mod_reqin_log.so
+```
+
+### Verifying Installation
+
+```bash
+httpd -M | grep reqin_log
+# Expected: reqin_log_module (shared)
+```
+
+## Build
+
+All builds run inside Docker:
+
+```bash
+# Run unit tests
+make test-mod-reqin-log
+
+# Build RPM packages (el8, el9, el10)
+make rpm-mod-reqin-log
+# RPMs in services/mod-reqin-log/dist/rpm/el{8,9,10}/
+```
+
+### Local Build (requires Apache development headers)
+
+```bash
+cd services/mod-reqin-log
+make build    # Compiles mod_reqin_log.so via apxs
+make test     # Runs unit tests
+```
+
+### Test Coverage
+
+Unit tests cover:
+- JSON serialization (escaping, size limits, field output)
+- Config parsing (all directives, edge cases)
+- Header handling (sensitive header exclusion, max headers, truncation)
+- Module integration (real Apache module hooks)
+
+## Source Files
+
+| File | Description |
+|------|-------------|
+| `src/mod_reqin_log.c` | Main module source |
+| `src/mod_reqin_log.h` | Header with types, constants, defaults |
+| `conf/mod_reqin_log.conf` | Example Apache configuration |
+| `tests/unit/test_json_serialization.c` | JSON output tests |
+| `tests/unit/test_config_parsing.c` | Directive parsing tests |
+| `tests/unit/test_header_handling.c` | Header filtering tests |
+| `tests/unit/test_module_real.c` | Integration tests |
--- a/docs/services/sentinel.md
+++ b/docs/services/sentinel.md
@ -0,0 +1,247 @@
+# Sentinel
+
+Sentinel (`ja4sentinel`) is a Go daemon that performs live network packet capture on a Linux server, extracts TLS ClientHello handshakes, generates JA4 and JA3 fingerprints, enriches them with IP/TCP metadata, and outputs structured JSON log records to configurable destinations (UNIX socket, file, or stdout).
+
+## Role in the Pipeline
+
+Sentinel is the **network-layer ingestion point**. It sits on the target server, captures TLS traffic via libpcap, and feeds fingerprinted events to the [correlator](correlator.md) through a UNIX datagram socket.
+
+```
+Network traffic (port 443/8443)
+        │ pcap
+        ▼
+┌───────────────┐
+│   sentinel    │
+│  ┌─────────┐  │
+│  │ capture  │──▶ Raw packets
+│  └─────────┘  │
+│  ┌─────────┐  │
+│  │ tlsparse│──▶ TLS ClientHello extraction + TCP reassembly
+│  └─────────┘  │
+│  ┌─────────┐  │
+│  │ finger- │──▶ JA4/JA3 fingerprint generation
+│  │ print   │  │
+│  └─────────┘  │
+│  ┌─────────┐  │
+│  │ output  │──▶ UNIX socket / file / stdout
+│  └─────────┘  │
+└───────────────┘
+```
+
+## Architecture
+
+Sentinel uses a pipeline of goroutines:
+
+1. **Capture goroutine** — Opens pcap handle on the configured interface, applies BPF filter, reads raw packets into a buffered channel (`packet_buffer_size`).
+2. **Packet processor goroutine** — Reads from the channel, feeds packets to the TLS parser, generates fingerprints, and writes output.
+3. **Watchdog goroutine** — Sends systemd watchdog heartbeats at half the configured interval.
+4. **Signal handler** — Listens for `SIGINT`/`SIGTERM` (graceful shutdown) and `SIGHUP` (log rotation).
+
+### Key Interfaces
+
+| Interface | Package | Description |
+|-----------|---------|-------------|
+| `Capture` | `internal/capture` | Packet capture via libpcap |
+| `Parser` | `internal/tlsparse` | TCP reassembly + ClientHello extraction |
+| `Engine` | `internal/fingerprint` | JA4/JA3 fingerprint generation |
+| `Writer` | `internal/output` | Log record output (stdout, file, UNIX socket) |
+| `MultiWriter` | `internal/output` | Fan-out to multiple writers |
+| `Builder` | `internal/output` | Factory for constructing writers from config |
+
+## Configuration Reference
+
+Configuration is loaded from a YAML file (default: `config.yml`) with environment variable overrides.
+
+### Core Settings
+
+| Name | Type | Default | Env Override | Description |
+|------|------|---------|-------------|-------------|
+| `core.interface` | string | `any` | `JA4SENTINEL_INTERFACE` | Network interface to capture (`any` = all interfaces) |
+| `core.listen_ports` | []uint16 | `[443]` | `JA4SENTINEL_PORTS` | TCP ports to monitor (comma-separated in env) |
+| `core.bpf_filter` | string | `""` (auto) | `JA4SENTINEL_BPF_FILTER` | Custom BPF filter (empty = auto-generated) |
+| `core.local_ips` | []string | `[]` (auto) | — | Local IPs to monitor (empty = auto-detect, excludes loopback) |
+| `core.exclude_source_ips` | []string | `[]` | — | Source IPs or CIDRs to exclude (e.g., `["10.0.0.0/8"]`) |
+| `core.flow_timeout_sec` | int | `30` | `JA4SENTINEL_FLOW_TIMEOUT` | Timeout for TLS handshake extraction (1–300) |
+| `core.packet_buffer_size` | int | `1000` | `JA4SENTINEL_PACKET_BUFFER_SIZE` | Packet channel buffer size (1–1,000,000) |
+| `core.log_level` | string | `info` | — | Log level: `debug`, `info`, `warn`, `error` (YAML only) |
+
+> **Note:** `log_level` is intentionally not overridable via environment variable (architecture decision since v1.1.12).
+
+### Output Settings
+
+Each output is an entry in the `outputs` array:
+
+| Name | Type | Default | Description |
+|------|------|---------|-------------|
+| `type` | string | — | Output type: `unix_socket`, `stdout`, `file` |
+| `enabled` | bool | — | Whether this output is active |
+| `async_buffer` | int | `1000` | Queue size for async writes |
+| `params.socket_path` | string | — | Path for `unix_socket` type |
+| `params.path` | string | — | File path for `file` type |
+
+### Example Configuration
+
+```yaml
+core:
+  interface: any
+  listen_ports: [443, 8443]
+  bpf_filter: ""
+  local_ips: []
+  exclude_source_ips: ["10.0.0.0/8", "192.168.1.1"]
+  flow_timeout_sec: 30
+  packet_buffer_size: 1000
+  log_level: info
+
+outputs:
+  - type: unix_socket
+    enabled: true
+    params:
+      socket_path: /var/run/logcorrelator/network.socket
+  - type: file
+    enabled: false
+    params:
+      path: /var/log/ja4sentinel/ja4.log
+```
+
+## Output Format (LogRecord JSON Schema)
+
+Each output record is a flat JSON object:
+
+```json
+{
+  "src_ip": "203.0.113.42",
+  "src_port": 52341,
+  "dst_ip": "192.168.1.10",
+  "dst_port": 443,
+  "ip_meta_ttl": 64,
+  "ip_meta_total_length": 583,
+  "ip_meta_id": 12345,
+  "ip_meta_df": true,
+  "tcp_meta_window_size": 65535,
+  "tcp_meta_mss": 1460,
+  "tcp_meta_window_scale": 8,
+  "tcp_meta_options": "MSS,NOP,WScale,NOP,NOP,Timestamps,SACK",
+  "conn_id": "203.0.113.42:52341-192.168.1.10:443",
+  "sensor_id": "",
+  "tls_version": "1.3",
+  "tls_sni": "example.com",
+  "tls_alpn": "h2",
+  "syn_to_clienthello_ms": 12,
+  "ja4": "t13d1516h2_8daaf6152771_b0da82dd1658",
+  "ja3": "771,4866-4867-4865-49196-49200...",
+  "ja3_hash": "e7d705a3286e19ea42f587b344ee6865",
+  "timestamp": 1709312345678901234
+}
+```
+
+### Field Reference
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `src_ip` | string | Client source IP address |
+| `src_port` | uint16 | Client source port |
+| `dst_ip` | string | Server destination IP address |
+| `dst_port` | uint16 | Server destination port |
+| `ip_meta_ttl` | uint8 | IP Time-To-Live |
+| `ip_meta_total_length` | uint16 | IP total packet length |
+| `ip_meta_id` | uint16 | IP identification field |
+| `ip_meta_df` | bool | IP Don't Fragment flag |
+| `tcp_meta_window_size` | uint16 | TCP window size |
+| `tcp_meta_mss` | uint16 | TCP Maximum Segment Size (omitted if 0) |
+| `tcp_meta_window_scale` | uint8 | TCP window scale factor (omitted if 0) |
+| `tcp_meta_options` | string | Comma-separated TCP options |
+| `conn_id` | string | Unique flow identifier |
+| `sensor_id` | string | Sensor/captor identifier |
+| `tls_version` | string | Max TLS version from ClientHello |
+| `tls_sni` | string | Server Name Indication |
+| `tls_alpn` | string | ALPN protocol (e.g., `h2`, `http/1.1`) |
+| `syn_to_clienthello_ms` | uint32 | Time from SYN to ClientHello (ms) |
+| `ja4` | string | JA4 TLS fingerprint |
+| `ja3` | string | JA3 TLS fingerprint |
+| `ja3_hash` | string | MD5 hash of JA3 string |
+| `timestamp` | int64 | Unix nanoseconds |
+
+## UNIX Socket Output Protocol
+
+- **Socket type**: `unixgram` (DGRAM — connectionless)
+- **Encoding**: One JSON object per datagram (no delimiter)
+- **Max datagram size**: 64 KB
+- **Reconnection**: Exponential backoff (100 ms → 2 s), max 3 attempts per write
+- **Queue**: Async write queue (default 1000 items) absorbs transient socket failures
+- **Error callback**: Consecutive failures are tracked and reported
+
+## Signal Handling
+
+| Signal | Behavior |
+|--------|----------|
+| `SIGTERM` / `SIGINT` | Graceful shutdown: cancel context, close capture, flush outputs, log filter stats |
+| `SIGHUP` | Log rotation: reopen file outputs (used by `systemctl reload` + logrotate) |
+
+## JA4 Fingerprint Algorithm
+
+1. Extract TLS ClientHello from the TCP payload (with TCP reassembly for fragmented handshakes)
+2. Parse cipher suites, extensions, ALPN, SNI, supported versions
+3. Build JA4 string: `t{version}{sni_flag}{cipher_count}{ext_count}_{cipher_hash}_{ext_hash}`
+4. Build JA3 string: `{version},{ciphers},{extensions},{curves},{formats}`
+5. Compute JA3 MD5 hash
+
+Sentinel uses the `tlsfingerprint` library for ALPN and TLS version parsing, with custom sanitization for malformed/truncated ClientHellos.
+
+## Deployment
+
+### systemd
+
+```ini
+[Unit]
+Description=ja4sentinel TLS fingerprinting daemon
+After=network.target
+
+[Service]
+Type=notify
+ExecStart=/usr/bin/ja4sentinel -config /etc/ja4sentinel/config.yml
+ExecReload=/bin/kill -HUP $MAINPID
+Restart=on-failure
+WatchdogSec=30
+TimeoutStopSec=2
+
+[Install]
+WantedBy=multi-user.target
+```
+
+Sentinel uses systemd `sd_notify` for:
+- `READY` — sent after initialization
+- `WATCHDOG` — sent at half the `WatchdogSec` interval
+- `STOPPING` — sent before shutdown
+
+### Docker
+
+```bash
+make build-sentinel
+docker run --cap-add=NET_RAW --cap-add=NET_ADMIN \
+  -v /var/run/logcorrelator:/var/run/logcorrelator \
+  ja4-platform/sentinel:latest
+```
+
+## RPM Package Contents
+
+| Path | Description |
+|------|-------------|
+| `/usr/bin/ja4sentinel` | Binary (statically linked Go) |
+| `/etc/ja4sentinel/config.yml.default` | Default configuration (noreplace) |
+| `/usr/share/ja4sentinel/config.yml` | Reference configuration |
+| `/usr/lib/systemd/system/ja4sentinel.service` | systemd unit |
+| `/etc/logrotate.d/ja4sentinel` | logrotate configuration |
+| `/var/lib/ja4sentinel/` | State directory |
+| `/var/log/ja4sentinel/` | Log directory |
+| `/var/run/logcorrelator/` | Socket directory |
+
+### RPM Dependencies
+
+- `systemd`
+- `libpcap >= 1.9.0`
+
+### Supported Distributions
+
+- Rocky Linux 8, 9, 10
+- AlmaLinux 8, 9
+- RHEL 8, 9