Files
ja4-platform/docs/services/mod-reqin-log.md
toto d469e39da7 feat: ja4-platform monorepo — 5 services unified, tests & RPM builds standardized
Services:
- ja4sentinel: TLS/JA4 fingerprint capture daemon (Go, libpcap)
- logcorrelator: JA4 log correlation engine (Go, ClickHouse)
- mod_reqin_log: Apache module (C, JSON request logging)
- bot_detector: ML bot detection pipeline (Python)
- dashboard: FastAPI/Streamlit analytics UI (Python)

Shared libraries:
- shared/go/ja4common: logger, config, shutdown, ipfilter (Go module)
- shared/python/ja4_common: ClickHouseClient, ClickHouseSettings (Python package)
- shared/clickhouse/: canonical SQL migrations (10 files)

Build & packaging:
- Unified 3-stage Dockerfile.package for Go RPMs (el8/el9/el10)
- go.work workspace linking sentinel, correlator, ja4common
- Makefile with test-all, build-all, rpm-* targets

Fixes applied:
- go.work: 1.21 → 1.24.6 (required by sentinel)
- correlator Dockerfiles: golang:1.21 → golang:1.24
- replace directives in go.mod for ja4common local path
- pyproject.toml: setuptools.backends → setuptools.build_meta
- Removed static libpcap linking (unavailable on Rocky 9)
- Fixed data races in output/writers_test.go (sync.Mutex + atomic.Int32)
- Rewrote corrupted test files (logger_test.go × 2)

Test coverage:
- correlator: 67.1% total (unixsocket 80.5%, config 91.7%, app 83.3%, multi 87.7%, stdout 100%)
- sentinel: all 10 packages pass (api, capture, config, fingerprint, ipfilter, logging, output, tlsparse)

Documentation:
- README.md + docs/ (architecture, development, 5 services, shared libs, DB schema & migrations)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-07 16:42:59 +02:00

201 lines
6.9 KiB
Markdown

# mod-reqin-log
`mod_reqin_log` is an Apache HTTPD module (C shared object) that captures HTTP request metadata and sends it as JSON to a UNIX datagram socket. It serves as the HTTP-layer ingestion point for the ja4-platform pipeline, feeding request data to the [correlator](correlator.md) for joining with TLS fingerprint data from [sentinel](sentinel.md).
## Purpose
Apache processes HTTP requests after TLS termination, so it has access to the decoded HTTP method, path, headers, and client IP/port. mod-reqin-log hooks into the `post_read_request` phase to serialize this data immediately, before any rewrite or auth module modifies the request.
## Apache Directives Reference
All directives are server-level (`RSRC_CONF`):
| Directive | Type | Default | Description |
|-----------|------|---------|-------------|
| `JsonSockLogEnabled` | Flag (On/Off) | Off | Enable or disable the module |
| `JsonSockLogSocket` | String | — | UNIX domain socket path for JSON output |
| `JsonSockLogHeaders` | String list | — | HTTP header names to log (repeatable) |
| `JsonSockLogMaxHeaders` | Integer | `25` | Maximum number of headers to log |
| `JsonSockLogMaxHeaderValueLen` | Integer | `256` | Maximum length of each header value (truncated beyond) |
| `JsonSockLogReconnectInterval` | Integer (seconds) | `10` | Minimum seconds between reconnection attempts |
| `JsonSockLogErrorReportInterval` | Integer (seconds) | `10` | Minimum seconds between error log entries (throttling) |
| `JsonSockLogLevel` | String | `WARNING` | Module log level: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `EMERG` |
### Example httpd.conf
```apache
LoadModule reqin_log_module modules/mod_reqin_log.so
JsonSockLogEnabled On
JsonSockLogSocket /var/run/logcorrelator/http.socket
JsonSockLogHeaders User-Agent Accept Accept-Encoding Accept-Language
JsonSockLogHeaders Content-Type X-Request-Id X-Trace-Id X-Forwarded-For
JsonSockLogHeaders Sec-CH-UA Sec-CH-UA-Mobile Sec-CH-UA-Platform
JsonSockLogHeaders Sec-Fetch-Dest Sec-Fetch-Mode Sec-Fetch-Site
JsonSockLogMaxHeaders 25
JsonSockLogMaxHeaderValueLen 256
JsonSockLogReconnectInterval 10
JsonSockLogErrorReportInterval 10
JsonSockLogLevel WARNING
```
## Output JSON Schema
Each HTTP request is serialized as a flat JSON object and sent as a single UNIX datagram:
```json
{
"time": "2026-03-09T14:30:00Z",
"src_ip": "203.0.113.42",
"src_port": 52341,
"dst_ip": "192.168.1.10",
"dst_port": 443,
"method": "GET",
"scheme": "https",
"host": "example.com",
"path": "/api/v1/users",
"query": "page=1&limit=20",
"http_version": "HTTP/2.0",
"client_headers": "User-Agent,Accept,Accept-Encoding,Accept-Language",
"header_User-Agent": "Mozilla/5.0 ...",
"header_Accept": "text/html,application/xhtml+xml",
"header_Accept-Encoding": "gzip, deflate, br",
"header_Accept-Language": "en-US,en;q=0.9",
"header_Sec-Fetch-Dest": "document",
"header_Sec-Fetch-Mode": "navigate",
"header_Sec-Fetch-Site": "none"
}
```
### Field Reference
| Field | Type | Description |
|-------|------|-------------|
| `time` | string (ISO 8601) | Request timestamp (UTC) |
| `src_ip` | string | Client IP address |
| `src_port` | int | Client source port |
| `dst_ip` | string | Server IP address |
| `dst_port` | int | Server port |
| `method` | string | HTTP method (`GET`, `POST`, etc.) |
| `scheme` | string | URL scheme (`http` or `https`) |
| `host` | string | HTTP Host header value |
| `path` | string | Request URI path |
| `query` | string | Query string (without `?`) |
| `http_version` | string | HTTP version (`HTTP/1.1`, `HTTP/2.0`) |
| `client_headers` | string | Comma-separated list of header names sent by client (order preserved) |
| `header_<Name>` | string | Value of each configured header (one field per header) |
### Sensitive Headers
The following headers are **always excluded** from output regardless of `JsonSockLogHeaders`:
- `Authorization`
- `Cookie`
- `Set-Cookie`
- `X-Api-Key`
- `X-Auth-Token`
- `Proxy-Authorization`
- `WWW-Authenticate`
### Size Limits
- Maximum JSON size: **64 KB** (prevents memory exhaustion DoS)
- Header values are truncated to `JsonSockLogMaxHeaderValueLen` bytes
## Thread Safety
mod-reqin-log is designed for Apache's `worker` and `event` MPMs (multi-threaded):
- **Socket FD** is protected by an `apr_thread_mutex_t` (`fd_mutex`)
- **Per-child process state** includes the socket file descriptor, mutex, and error tracking
- **Error reporting** uses `LOG_THROTTLED` macro with timestamp-based deduplication
- All JSON serialization uses per-request pool allocation — no shared buffers
### Architecture
```
Apache HTTPD process
├── child process 1
│ ├── fd_mutex (apr_thread_mutex_t)
│ ├── socket_fd (shared across threads)
│ ├── thread 1 → post_read_request → serialize JSON → mutex lock → sendto() → unlock
│ ├── thread 2 → post_read_request → serialize JSON → mutex lock → sendto() → unlock
│ └── ...
├── child process 2
│ ├── fd_mutex
│ ├── socket_fd (independent)
│ └── ...
```
## Reconnection Behavior
- Socket is opened during `child_init` (per-child process startup)
- If the socket is unavailable at startup, connection is deferred
- On send failure, reconnection is attempted respecting `JsonSockLogReconnectInterval`
- Failed sends are silently dropped (HTTP request processing is not blocked)
- Error log entries are throttled by `JsonSockLogErrorReportInterval`
- Socket type: `SOCK_DGRAM` (connectionless UNIX datagram)
- Non-blocking sends with `MSG_NOSIGNAL`
## Deployment
### Installation via RPM
```bash
rpm -ivh mod_reqin_log-1.0.19-1.el10.x86_64.rpm
```
### LoadModule Directive
```apache
LoadModule reqin_log_module modules/mod_reqin_log.so
```
### Verifying Installation
```bash
httpd -M | grep reqin_log
# Expected: reqin_log_module (shared)
```
## Build
All builds run inside Docker:
```bash
# Run unit tests
make test-mod-reqin-log
# Build RPM packages (el8, el9, el10)
make rpm-mod-reqin-log
# RPMs in services/mod-reqin-log/dist/rpm/el{8,9,10}/
```
### Local Build (requires Apache development headers)
```bash
cd services/mod-reqin-log
make build # Compiles mod_reqin_log.so via apxs
make test # Runs unit tests
```
### Test Coverage
Unit tests cover:
- JSON serialization (escaping, size limits, field output)
- Config parsing (all directives, edge cases)
- Header handling (sensitive header exclusion, max headers, truncation)
- Module integration (real Apache module hooks)
## Source Files
| File | Description |
|------|-------------|
| `src/mod_reqin_log.c` | Main module source |
| `src/mod_reqin_log.h` | Header with types, constants, defaults |
| `conf/mod_reqin_log.conf` | Example Apache configuration |
| `tests/unit/test_json_serialization.c` | JSON output tests |
| `tests/unit/test_config_parsing.c` | Directive parsing tests |
| `tests/unit/test_header_handling.c` | Header filtering tests |
| `tests/unit/test_module_real.c` | Integration tests |