Files
ja4-platform/docs/services/mod-reqin-log.md
toto d469e39da7 feat: ja4-platform monorepo — 5 services unified, tests & RPM builds standardized
Services:
- ja4sentinel: TLS/JA4 fingerprint capture daemon (Go, libpcap)
- logcorrelator: JA4 log correlation engine (Go, ClickHouse)
- mod_reqin_log: Apache module (C, JSON request logging)
- bot_detector: ML bot detection pipeline (Python)
- dashboard: FastAPI/Streamlit analytics UI (Python)

Shared libraries:
- shared/go/ja4common: logger, config, shutdown, ipfilter (Go module)
- shared/python/ja4_common: ClickHouseClient, ClickHouseSettings (Python package)
- shared/clickhouse/: canonical SQL migrations (10 files)

Build & packaging:
- Unified 3-stage Dockerfile.package for Go RPMs (el8/el9/el10)
- go.work workspace linking sentinel, correlator, ja4common
- Makefile with test-all, build-all, rpm-* targets

Fixes applied:
- go.work: 1.21 → 1.24.6 (required by sentinel)
- correlator Dockerfiles: golang:1.21 → golang:1.24
- replace directives in go.mod for ja4common local path
- pyproject.toml: setuptools.backends → setuptools.build_meta
- Removed static libpcap linking (unavailable on Rocky 9)
- Fixed data races in output/writers_test.go (sync.Mutex + atomic.Int32)
- Rewrote corrupted test files (logger_test.go × 2)

Test coverage:
- correlator: 67.1% total (unixsocket 80.5%, config 91.7%, app 83.3%, multi 87.7%, stdout 100%)
- sentinel: all 10 packages pass (api, capture, config, fingerprint, ipfilter, logging, output, tlsparse)

Documentation:
- README.md + docs/ (architecture, development, 5 services, shared libs, DB schema & migrations)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-07 16:42:59 +02:00

6.9 KiB

mod-reqin-log

mod_reqin_log is an Apache HTTPD module (C shared object) that captures HTTP request metadata and sends it as JSON to a UNIX datagram socket. It serves as the HTTP-layer ingestion point for the ja4-platform pipeline, feeding request data to the correlator for joining with TLS fingerprint data from sentinel.

Purpose

Apache processes HTTP requests after TLS termination, so it has access to the decoded HTTP method, path, headers, and client IP/port. mod-reqin-log hooks into the post_read_request phase to serialize this data immediately, before any rewrite or auth module modifies the request.

Apache Directives Reference

All directives are server-level (RSRC_CONF):

Directive Type Default Description
JsonSockLogEnabled Flag (On/Off) Off Enable or disable the module
JsonSockLogSocket String UNIX domain socket path for JSON output
JsonSockLogHeaders String list HTTP header names to log (repeatable)
JsonSockLogMaxHeaders Integer 25 Maximum number of headers to log
JsonSockLogMaxHeaderValueLen Integer 256 Maximum length of each header value (truncated beyond)
JsonSockLogReconnectInterval Integer (seconds) 10 Minimum seconds between reconnection attempts
JsonSockLogErrorReportInterval Integer (seconds) 10 Minimum seconds between error log entries (throttling)
JsonSockLogLevel String WARNING Module log level: DEBUG, INFO, WARNING, ERROR, EMERG

Example httpd.conf

LoadModule reqin_log_module modules/mod_reqin_log.so

JsonSockLogEnabled On
JsonSockLogSocket /var/run/logcorrelator/http.socket
JsonSockLogHeaders User-Agent Accept Accept-Encoding Accept-Language
JsonSockLogHeaders Content-Type X-Request-Id X-Trace-Id X-Forwarded-For
JsonSockLogHeaders Sec-CH-UA Sec-CH-UA-Mobile Sec-CH-UA-Platform
JsonSockLogHeaders Sec-Fetch-Dest Sec-Fetch-Mode Sec-Fetch-Site
JsonSockLogMaxHeaders 25
JsonSockLogMaxHeaderValueLen 256
JsonSockLogReconnectInterval 10
JsonSockLogErrorReportInterval 10
JsonSockLogLevel WARNING

Output JSON Schema

Each HTTP request is serialized as a flat JSON object and sent as a single UNIX datagram:

{
  "time": "2026-03-09T14:30:00Z",
  "src_ip": "203.0.113.42",
  "src_port": 52341,
  "dst_ip": "192.168.1.10",
  "dst_port": 443,
  "method": "GET",
  "scheme": "https",
  "host": "example.com",
  "path": "/api/v1/users",
  "query": "page=1&limit=20",
  "http_version": "HTTP/2.0",
  "client_headers": "User-Agent,Accept,Accept-Encoding,Accept-Language",
  "header_User-Agent": "Mozilla/5.0 ...",
  "header_Accept": "text/html,application/xhtml+xml",
  "header_Accept-Encoding": "gzip, deflate, br",
  "header_Accept-Language": "en-US,en;q=0.9",
  "header_Sec-Fetch-Dest": "document",
  "header_Sec-Fetch-Mode": "navigate",
  "header_Sec-Fetch-Site": "none"
}

Field Reference

Field Type Description
time string (ISO 8601) Request timestamp (UTC)
src_ip string Client IP address
src_port int Client source port
dst_ip string Server IP address
dst_port int Server port
method string HTTP method (GET, POST, etc.)
scheme string URL scheme (http or https)
host string HTTP Host header value
path string Request URI path
query string Query string (without ?)
http_version string HTTP version (HTTP/1.1, HTTP/2.0)
client_headers string Comma-separated list of header names sent by client (order preserved)
header_<Name> string Value of each configured header (one field per header)

Sensitive Headers

The following headers are always excluded from output regardless of JsonSockLogHeaders:

  • Authorization
  • Cookie
  • Set-Cookie
  • X-Api-Key
  • X-Auth-Token
  • Proxy-Authorization
  • WWW-Authenticate

Size Limits

  • Maximum JSON size: 64 KB (prevents memory exhaustion DoS)
  • Header values are truncated to JsonSockLogMaxHeaderValueLen bytes

Thread Safety

mod-reqin-log is designed for Apache's worker and event MPMs (multi-threaded):

  • Socket FD is protected by an apr_thread_mutex_t (fd_mutex)
  • Per-child process state includes the socket file descriptor, mutex, and error tracking
  • Error reporting uses LOG_THROTTLED macro with timestamp-based deduplication
  • All JSON serialization uses per-request pool allocation — no shared buffers

Architecture

Apache HTTPD process
├── child process 1
│   ├── fd_mutex (apr_thread_mutex_t)
│   ├── socket_fd (shared across threads)
│   ├── thread 1 → post_read_request → serialize JSON → mutex lock → sendto() → unlock
│   ├── thread 2 → post_read_request → serialize JSON → mutex lock → sendto() → unlock
│   └── ...
├── child process 2
│   ├── fd_mutex
│   ├── socket_fd (independent)
│   └── ...

Reconnection Behavior

  • Socket is opened during child_init (per-child process startup)
  • If the socket is unavailable at startup, connection is deferred
  • On send failure, reconnection is attempted respecting JsonSockLogReconnectInterval
  • Failed sends are silently dropped (HTTP request processing is not blocked)
  • Error log entries are throttled by JsonSockLogErrorReportInterval
  • Socket type: SOCK_DGRAM (connectionless UNIX datagram)
  • Non-blocking sends with MSG_NOSIGNAL

Deployment

Installation via RPM

rpm -ivh mod_reqin_log-1.0.19-1.el10.x86_64.rpm

LoadModule Directive

LoadModule reqin_log_module modules/mod_reqin_log.so

Verifying Installation

httpd -M | grep reqin_log
# Expected: reqin_log_module (shared)

Build

All builds run inside Docker:

# Run unit tests
make test-mod-reqin-log

# Build RPM packages (el8, el9, el10)
make rpm-mod-reqin-log
# RPMs in services/mod-reqin-log/dist/rpm/el{8,9,10}/

Local Build (requires Apache development headers)

cd services/mod-reqin-log
make build    # Compiles mod_reqin_log.so via apxs
make test     # Runs unit tests

Test Coverage

Unit tests cover:

  • JSON serialization (escaping, size limits, field output)
  • Config parsing (all directives, edge cases)
  • Header handling (sensitive header exclusion, max headers, truncation)
  • Module integration (real Apache module hooks)

Source Files

File Description
src/mod_reqin_log.c Main module source
src/mod_reqin_log.h Header with types, constants, defaults
conf/mod_reqin_log.conf Example Apache configuration
tests/unit/test_json_serialization.c JSON output tests
tests/unit/test_config_parsing.c Directive parsing tests
tests/unit/test_header_handling.c Header filtering tests
tests/unit/test_module_real.c Integration tests