Covers: build/test/lint commands, architecture overview, ClickHouse dual-DB pattern, inter-service communication, key conventions for Go (hexagonal, YAML config), Python (pydantic-settings, FastAPI routes), C (Apache module), Docker-first builds, and RPM packaging. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
6.1 KiB
Copilot Instructions — ja4-platform
What is this?
A monorepo for a JA4/JA3 TLS fingerprinting security pipeline. Five services capture network traffic, correlate logs, detect bots via ML, and present results in a SOC dashboard. All backed by ClickHouse.
Data flow: mod-reqin-log (Apache HTTP logs) → unix socket → correlator ← unix socket ← sentinel (TLS/TCP capture) → ClickHouse → bot-detector (ML scoring) → dashboard (FastAPI SOC UI)
Build, test, lint
All builds run in Docker — no native Go/Python/C toolchain required on the host.
# Full suite
make test-all # run all tests (Docker)
make build-all # build all service images
make rpm-all # build RPMs (sentinel, correlator, mod-reqin-log) for el8/el9/el10
# Per-service tests
make test-sentinel # Go tests (needs --cap-add=NET_RAW inside)
make test-correlator # Go tests with 60% coverage gate
make test-bot-detector # Python pytest
make test-dashboard # Python pytest
make test-ja4common-python # Python pytest (shared lib)
make test-mod-reqin-log # C cmocka tests
# Single Go test (from service dir, or via Docker):
docker run --rm -v $(pwd):/build -w /build/services/correlator golang:1.24 \
go test -v -run TestConfigLoad ./internal/config/
# Single Python test (from repo root):
docker build -f services/dashboard/Dockerfile.tests -t dash-tests .
docker run --rm dash-tests pytest backend/tests/test_metrics.py -v -k test_health
# Linting (Go only — no Python linter configured)
cd services/sentinel && go vet ./... && gofmt -l .
cd services/correlator && go vet ./... && gofmt -l .
Architecture
Go workspace (go.work, Go 1.24.6)
Three modules in the workspace:
services/sentinel— TLS/TCP packet capture daemon (gopacket/pcap, systemd)services/correlator— log correlation engine, hexagonal architectureshared/go/ja4common— shared logger, config, shutdown, ipfilter
Both services have a replace directive in their go.mod pointing to ../../shared/go/ja4common. The workspace takes precedence for local dev; the replace is needed for Docker builds.
Correlator hexagonal architecture
ports/source.go → EventSource, CorrelatedLogSink, CorrelationProcessor interfaces
adapters/inbound/ → unixsocket (reads from sentinel + mod-reqin-log)
adapters/outbound/ → clickhouse, file, stdout, multi (fan-out wrapper)
domain/ → CorrelationService, CorrelatedLog, NormalizedEvent
app/ → Orchestrator (wires everything together)
config/ → YAML config loader
Python services
bot-detector— scikit-learn IsolationForest + DBSCAN. Single monolithic module (bot_detector.py). Usesos.getenv()directly for config, NOT pydantic-settings.dashboard— FastAPI + React SPA. 20 route modules inbackend/routes/. Uses pydantic-settings (backend/config.py).shared/python/ja4_common—ClickHouseClientsingleton +ClickHouseSettings(pydantic-settings). Installed as a local package in each Python Dockerfile.
C module
mod-reqin-log— Apache HTTPD module (C11, built withapxs). Logs HTTP requests as JSON to a Unix socket. Tests use cmocka.
ClickHouse dual-database pattern
Two configurable databases (env vars with defaults):
| Env var | Default | Contains |
|---|---|---|
CLICKHOUSE_DB_LOGS |
ja4_logs |
http_logs_raw, http_logs, mv_http_logs |
CLICKHOUSE_DB_PROCESSING |
ja4_processing |
Aggregations, ML tables, views, dicts, audit |
Cross-database references exist — materialized views in one DB read from the other:
ja4_logs.mv_http_logsreferencesja4_processing.dict_anubis_*andja4_processing.dict_iplocate_asnja4_processing.mv_agg_*readsFROM ja4_logs.http_logs
In Python code, always use fully qualified table names:
from ..config import settings
query = f"SELECT ... FROM {settings.CLICKHOUSE_DB_PROCESSING}.ml_detected_anomalies ..."
query = f"SELECT ... FROM {settings.CLICKHOUSE_DB_LOGS}.http_logs ..."
Never hardcode database names in queries.
In Go (correlator), the database is part of the ClickHouse DSN (clickhouse://user:pass@host:9000/ja4_logs). The target table is configurable via YAML (outputs.clickhouse.table).
SQL migrations live in shared/clickhouse/ (10 ordered files). Deploy with shared/clickhouse/deploy_schema.sh which substitutes DB names from env vars.
Key conventions
Docker-first builds
Every service has Dockerfile (prod), Dockerfile.dev or Dockerfile.tests (tests), and Go/C services have Dockerfile.package (RPM packaging via 3-stage: builder → rpmbuild × 3 distros → alpine output).
Go config: YAML + env vars
- Sentinel:
config.yml, env prefixJA4SENTINEL_ - Correlator:
config.yml, env prefixLOGCORRELATOR_ - Both support
SIGHUPfor log rotation
Python config: pydantic-settings
- Dashboard:
backend/config.py→Settings(BaseSettings)with.envfile - ja4_common:
ClickHouseSettings(BaseSettings)— singleton atsettings - bot-detector: exception — uses raw
os.getenv(), not pydantic-settings
Dashboard route structure
Every route file follows this pattern:
from fastapi import APIRouter, HTTPException, Query
from ..config import settings
from ..database import db
router = APIRouter()
@router.get("/api/something")
async def get_something():
query = f"SELECT ... FROM {settings.CLICKHOUSE_DB_PROCESSING}.table_name ..."
result = db.query(query)
...
RPM spec files
Located at services/<name>/packaging/rpm/<name>.spec. Version injected via --define "build_version X.Y.Z" at build time.
Inter-service communication
Services communicate via Unix sockets, not HTTP:
sentinel→/var/run/logcorrelator/network.socket→correlator(source B: TLS/TCP data)mod-reqin-log→/var/run/logcorrelator/http.socket→correlator(source A: HTTP data)correlator→ ClickHouse (batch inserts intoja4_logs.http_logs_raw)
Sentinel requires elevated privileges
Tests need --cap-add=NET_RAW --cap-add=NET_ADMIN for packet capture (pcap).