feat: ja4-platform monorepo — 5 services unified, tests & RPM builds standardized
Services: - ja4sentinel: TLS/JA4 fingerprint capture daemon (Go, libpcap) - logcorrelator: JA4 log correlation engine (Go, ClickHouse) - mod_reqin_log: Apache module (C, JSON request logging) - bot_detector: ML bot detection pipeline (Python) - dashboard: FastAPI/Streamlit analytics UI (Python) Shared libraries: - shared/go/ja4common: logger, config, shutdown, ipfilter (Go module) - shared/python/ja4_common: ClickHouseClient, ClickHouseSettings (Python package) - shared/clickhouse/: canonical SQL migrations (10 files) Build & packaging: - Unified 3-stage Dockerfile.package for Go RPMs (el8/el9/el10) - go.work workspace linking sentinel, correlator, ja4common - Makefile with test-all, build-all, rpm-* targets Fixes applied: - go.work: 1.21 → 1.24.6 (required by sentinel) - correlator Dockerfiles: golang:1.21 → golang:1.24 - replace directives in go.mod for ja4common local path - pyproject.toml: setuptools.backends → setuptools.build_meta - Removed static libpcap linking (unavailable on Rocky 9) - Fixed data races in output/writers_test.go (sync.Mutex + atomic.Int32) - Rewrote corrupted test files (logger_test.go × 2) Test coverage: - correlator: 67.1% total (unixsocket 80.5%, config 91.7%, app 83.3%, multi 87.7%, stdout 100%) - sentinel: all 10 packages pass (api, capture, config, fingerprint, ipfilter, logging, output, tlsparse) Documentation: - README.md + docs/ (architecture, development, 5 services, shared libs, DB schema & migrations) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
200
docs/services/mod-reqin-log.md
Normal file
200
docs/services/mod-reqin-log.md
Normal file
@ -0,0 +1,200 @@
|
||||
# mod-reqin-log
|
||||
|
||||
`mod_reqin_log` is an Apache HTTPD module (C shared object) that captures HTTP request metadata and sends it as JSON to a UNIX datagram socket. It serves as the HTTP-layer ingestion point for the ja4-platform pipeline, feeding request data to the [correlator](correlator.md) for joining with TLS fingerprint data from [sentinel](sentinel.md).
|
||||
|
||||
## Purpose
|
||||
|
||||
Apache processes HTTP requests after TLS termination, so it has access to the decoded HTTP method, path, headers, and client IP/port. mod-reqin-log hooks into the `post_read_request` phase to serialize this data immediately, before any rewrite or auth module modifies the request.
|
||||
|
||||
## Apache Directives Reference
|
||||
|
||||
All directives are server-level (`RSRC_CONF`):
|
||||
|
||||
| Directive | Type | Default | Description |
|
||||
|-----------|------|---------|-------------|
|
||||
| `JsonSockLogEnabled` | Flag (On/Off) | Off | Enable or disable the module |
|
||||
| `JsonSockLogSocket` | String | — | UNIX domain socket path for JSON output |
|
||||
| `JsonSockLogHeaders` | String list | — | HTTP header names to log (repeatable) |
|
||||
| `JsonSockLogMaxHeaders` | Integer | `25` | Maximum number of headers to log |
|
||||
| `JsonSockLogMaxHeaderValueLen` | Integer | `256` | Maximum length of each header value (truncated beyond) |
|
||||
| `JsonSockLogReconnectInterval` | Integer (seconds) | `10` | Minimum seconds between reconnection attempts |
|
||||
| `JsonSockLogErrorReportInterval` | Integer (seconds) | `10` | Minimum seconds between error log entries (throttling) |
|
||||
| `JsonSockLogLevel` | String | `WARNING` | Module log level: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `EMERG` |
|
||||
|
||||
### Example httpd.conf
|
||||
|
||||
```apache
|
||||
LoadModule reqin_log_module modules/mod_reqin_log.so
|
||||
|
||||
JsonSockLogEnabled On
|
||||
JsonSockLogSocket /var/run/logcorrelator/http.socket
|
||||
JsonSockLogHeaders User-Agent Accept Accept-Encoding Accept-Language
|
||||
JsonSockLogHeaders Content-Type X-Request-Id X-Trace-Id X-Forwarded-For
|
||||
JsonSockLogHeaders Sec-CH-UA Sec-CH-UA-Mobile Sec-CH-UA-Platform
|
||||
JsonSockLogHeaders Sec-Fetch-Dest Sec-Fetch-Mode Sec-Fetch-Site
|
||||
JsonSockLogMaxHeaders 25
|
||||
JsonSockLogMaxHeaderValueLen 256
|
||||
JsonSockLogReconnectInterval 10
|
||||
JsonSockLogErrorReportInterval 10
|
||||
JsonSockLogLevel WARNING
|
||||
```
|
||||
|
||||
## Output JSON Schema
|
||||
|
||||
Each HTTP request is serialized as a flat JSON object and sent as a single UNIX datagram:
|
||||
|
||||
```json
|
||||
{
|
||||
"time": "2026-03-09T14:30:00Z",
|
||||
"src_ip": "203.0.113.42",
|
||||
"src_port": 52341,
|
||||
"dst_ip": "192.168.1.10",
|
||||
"dst_port": 443,
|
||||
"method": "GET",
|
||||
"scheme": "https",
|
||||
"host": "example.com",
|
||||
"path": "/api/v1/users",
|
||||
"query": "page=1&limit=20",
|
||||
"http_version": "HTTP/2.0",
|
||||
"client_headers": "User-Agent,Accept,Accept-Encoding,Accept-Language",
|
||||
"header_User-Agent": "Mozilla/5.0 ...",
|
||||
"header_Accept": "text/html,application/xhtml+xml",
|
||||
"header_Accept-Encoding": "gzip, deflate, br",
|
||||
"header_Accept-Language": "en-US,en;q=0.9",
|
||||
"header_Sec-Fetch-Dest": "document",
|
||||
"header_Sec-Fetch-Mode": "navigate",
|
||||
"header_Sec-Fetch-Site": "none"
|
||||
}
|
||||
```
|
||||
|
||||
### Field Reference
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `time` | string (ISO 8601) | Request timestamp (UTC) |
|
||||
| `src_ip` | string | Client IP address |
|
||||
| `src_port` | int | Client source port |
|
||||
| `dst_ip` | string | Server IP address |
|
||||
| `dst_port` | int | Server port |
|
||||
| `method` | string | HTTP method (`GET`, `POST`, etc.) |
|
||||
| `scheme` | string | URL scheme (`http` or `https`) |
|
||||
| `host` | string | HTTP Host header value |
|
||||
| `path` | string | Request URI path |
|
||||
| `query` | string | Query string (without `?`) |
|
||||
| `http_version` | string | HTTP version (`HTTP/1.1`, `HTTP/2.0`) |
|
||||
| `client_headers` | string | Comma-separated list of header names sent by client (order preserved) |
|
||||
| `header_<Name>` | string | Value of each configured header (one field per header) |
|
||||
|
||||
### Sensitive Headers
|
||||
|
||||
The following headers are **always excluded** from output regardless of `JsonSockLogHeaders`:
|
||||
|
||||
- `Authorization`
|
||||
- `Cookie`
|
||||
- `Set-Cookie`
|
||||
- `X-Api-Key`
|
||||
- `X-Auth-Token`
|
||||
- `Proxy-Authorization`
|
||||
- `WWW-Authenticate`
|
||||
|
||||
### Size Limits
|
||||
|
||||
- Maximum JSON size: **64 KB** (prevents memory exhaustion DoS)
|
||||
- Header values are truncated to `JsonSockLogMaxHeaderValueLen` bytes
|
||||
|
||||
## Thread Safety
|
||||
|
||||
mod-reqin-log is designed for Apache's `worker` and `event` MPMs (multi-threaded):
|
||||
|
||||
- **Socket FD** is protected by an `apr_thread_mutex_t` (`fd_mutex`)
|
||||
- **Per-child process state** includes the socket file descriptor, mutex, and error tracking
|
||||
- **Error reporting** uses `LOG_THROTTLED` macro with timestamp-based deduplication
|
||||
- All JSON serialization uses per-request pool allocation — no shared buffers
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
Apache HTTPD process
|
||||
├── child process 1
|
||||
│ ├── fd_mutex (apr_thread_mutex_t)
|
||||
│ ├── socket_fd (shared across threads)
|
||||
│ ├── thread 1 → post_read_request → serialize JSON → mutex lock → sendto() → unlock
|
||||
│ ├── thread 2 → post_read_request → serialize JSON → mutex lock → sendto() → unlock
|
||||
│ └── ...
|
||||
├── child process 2
|
||||
│ ├── fd_mutex
|
||||
│ ├── socket_fd (independent)
|
||||
│ └── ...
|
||||
```
|
||||
|
||||
## Reconnection Behavior
|
||||
|
||||
- Socket is opened during `child_init` (per-child process startup)
|
||||
- If the socket is unavailable at startup, connection is deferred
|
||||
- On send failure, reconnection is attempted respecting `JsonSockLogReconnectInterval`
|
||||
- Failed sends are silently dropped (HTTP request processing is not blocked)
|
||||
- Error log entries are throttled by `JsonSockLogErrorReportInterval`
|
||||
- Socket type: `SOCK_DGRAM` (connectionless UNIX datagram)
|
||||
- Non-blocking sends with `MSG_NOSIGNAL`
|
||||
|
||||
## Deployment
|
||||
|
||||
### Installation via RPM
|
||||
|
||||
```bash
|
||||
rpm -ivh mod_reqin_log-1.0.19-1.el10.x86_64.rpm
|
||||
```
|
||||
|
||||
### LoadModule Directive
|
||||
|
||||
```apache
|
||||
LoadModule reqin_log_module modules/mod_reqin_log.so
|
||||
```
|
||||
|
||||
### Verifying Installation
|
||||
|
||||
```bash
|
||||
httpd -M | grep reqin_log
|
||||
# Expected: reqin_log_module (shared)
|
||||
```
|
||||
|
||||
## Build
|
||||
|
||||
All builds run inside Docker:
|
||||
|
||||
```bash
|
||||
# Run unit tests
|
||||
make test-mod-reqin-log
|
||||
|
||||
# Build RPM packages (el8, el9, el10)
|
||||
make rpm-mod-reqin-log
|
||||
# RPMs in services/mod-reqin-log/dist/rpm/el{8,9,10}/
|
||||
```
|
||||
|
||||
### Local Build (requires Apache development headers)
|
||||
|
||||
```bash
|
||||
cd services/mod-reqin-log
|
||||
make build # Compiles mod_reqin_log.so via apxs
|
||||
make test # Runs unit tests
|
||||
```
|
||||
|
||||
### Test Coverage
|
||||
|
||||
Unit tests cover:
|
||||
- JSON serialization (escaping, size limits, field output)
|
||||
- Config parsing (all directives, edge cases)
|
||||
- Header handling (sensitive header exclusion, max headers, truncation)
|
||||
- Module integration (real Apache module hooks)
|
||||
|
||||
## Source Files
|
||||
|
||||
| File | Description |
|
||||
|------|-------------|
|
||||
| `src/mod_reqin_log.c` | Main module source |
|
||||
| `src/mod_reqin_log.h` | Header with types, constants, defaults |
|
||||
| `conf/mod_reqin_log.conf` | Example Apache configuration |
|
||||
| `tests/unit/test_json_serialization.c` | JSON output tests |
|
||||
| `tests/unit/test_config_parsing.c` | Directive parsing tests |
|
||||
| `tests/unit/test_header_handling.c` | Header filtering tests |
|
||||
| `tests/unit/test_module_real.c` | Integration tests |
|
||||
Reference in New Issue
Block a user