Covers: build/test/lint commands, architecture overview, ClickHouse dual-DB pattern, inter-service communication, key conventions for Go (hexagonal, YAML config), Python (pydantic-settings, FastAPI routes), C (Apache module), Docker-first builds, and RPM packaging. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
139 lines
6.1 KiB
Markdown
139 lines
6.1 KiB
Markdown
# Copilot Instructions — ja4-platform
|
||
|
||
## What is this?
|
||
|
||
A monorepo for a JA4/JA3 TLS fingerprinting security pipeline. Five services capture network traffic, correlate logs, detect bots via ML, and present results in a SOC dashboard. All backed by ClickHouse.
|
||
|
||
**Data flow:** `mod-reqin-log` (Apache HTTP logs) → unix socket → `correlator` ← unix socket ← `sentinel` (TLS/TCP capture) → ClickHouse → `bot-detector` (ML scoring) → `dashboard` (FastAPI SOC UI)
|
||
|
||
## Build, test, lint
|
||
|
||
All builds run in Docker — no native Go/Python/C toolchain required on the host.
|
||
|
||
```sh
|
||
# Full suite
|
||
make test-all # run all tests (Docker)
|
||
make build-all # build all service images
|
||
make rpm-all # build RPMs (sentinel, correlator, mod-reqin-log) for el8/el9/el10
|
||
|
||
# Per-service tests
|
||
make test-sentinel # Go tests (needs --cap-add=NET_RAW inside)
|
||
make test-correlator # Go tests with 60% coverage gate
|
||
make test-bot-detector # Python pytest
|
||
make test-dashboard # Python pytest
|
||
make test-ja4common-python # Python pytest (shared lib)
|
||
make test-mod-reqin-log # C cmocka tests
|
||
|
||
# Single Go test (from service dir, or via Docker):
|
||
docker run --rm -v $(pwd):/build -w /build/services/correlator golang:1.24 \
|
||
go test -v -run TestConfigLoad ./internal/config/
|
||
|
||
# Single Python test (from repo root):
|
||
docker build -f services/dashboard/Dockerfile.tests -t dash-tests .
|
||
docker run --rm dash-tests pytest backend/tests/test_metrics.py -v -k test_health
|
||
|
||
# Linting (Go only — no Python linter configured)
|
||
cd services/sentinel && go vet ./... && gofmt -l .
|
||
cd services/correlator && go vet ./... && gofmt -l .
|
||
```
|
||
|
||
## Architecture
|
||
|
||
### Go workspace (`go.work`, Go 1.24.6)
|
||
|
||
Three modules in the workspace:
|
||
- `services/sentinel` — TLS/TCP packet capture daemon (gopacket/pcap, systemd)
|
||
- `services/correlator` — log correlation engine, hexagonal architecture
|
||
- `shared/go/ja4common` — shared logger, config, shutdown, ipfilter
|
||
|
||
Both services have a `replace` directive in their `go.mod` pointing to `../../shared/go/ja4common`. The workspace takes precedence for local dev; the `replace` is needed for Docker builds.
|
||
|
||
### Correlator hexagonal architecture
|
||
|
||
```
|
||
ports/source.go → EventSource, CorrelatedLogSink, CorrelationProcessor interfaces
|
||
adapters/inbound/ → unixsocket (reads from sentinel + mod-reqin-log)
|
||
adapters/outbound/ → clickhouse, file, stdout, multi (fan-out wrapper)
|
||
domain/ → CorrelationService, CorrelatedLog, NormalizedEvent
|
||
app/ → Orchestrator (wires everything together)
|
||
config/ → YAML config loader
|
||
```
|
||
|
||
### Python services
|
||
|
||
- `bot-detector` — scikit-learn IsolationForest + DBSCAN. Single monolithic module (`bot_detector.py`). Uses `os.getenv()` directly for config, NOT pydantic-settings.
|
||
- `dashboard` — FastAPI + React SPA. 20 route modules in `backend/routes/`. Uses pydantic-settings (`backend/config.py`).
|
||
- `shared/python/ja4_common` — `ClickHouseClient` singleton + `ClickHouseSettings` (pydantic-settings). Installed as a local package in each Python Dockerfile.
|
||
|
||
### C module
|
||
|
||
- `mod-reqin-log` — Apache HTTPD module (C11, built with `apxs`). Logs HTTP requests as JSON to a Unix socket. Tests use cmocka.
|
||
|
||
## ClickHouse dual-database pattern
|
||
|
||
Two configurable databases (env vars with defaults):
|
||
|
||
| Env var | Default | Contains |
|
||
|---------|---------|----------|
|
||
| `CLICKHOUSE_DB_LOGS` | `ja4_logs` | `http_logs_raw`, `http_logs`, `mv_http_logs` |
|
||
| `CLICKHOUSE_DB_PROCESSING` | `ja4_processing` | Aggregations, ML tables, views, dicts, audit |
|
||
|
||
**Cross-database references exist** — materialized views in one DB read from the other:
|
||
- `ja4_logs.mv_http_logs` references `ja4_processing.dict_anubis_*` and `ja4_processing.dict_iplocate_asn`
|
||
- `ja4_processing.mv_agg_*` reads `FROM ja4_logs.http_logs`
|
||
|
||
**In Python code**, always use fully qualified table names:
|
||
```python
|
||
from ..config import settings
|
||
query = f"SELECT ... FROM {settings.CLICKHOUSE_DB_PROCESSING}.ml_detected_anomalies ..."
|
||
query = f"SELECT ... FROM {settings.CLICKHOUSE_DB_LOGS}.http_logs ..."
|
||
```
|
||
Never hardcode database names in queries.
|
||
|
||
**In Go (correlator)**, the database is part of the ClickHouse DSN (`clickhouse://user:pass@host:9000/ja4_logs`). The target table is configurable via YAML (`outputs.clickhouse.table`).
|
||
|
||
**SQL migrations** live in `shared/clickhouse/` (10 ordered files). Deploy with `shared/clickhouse/deploy_schema.sh` which substitutes DB names from env vars.
|
||
|
||
## Key conventions
|
||
|
||
### Docker-first builds
|
||
Every service has `Dockerfile` (prod), `Dockerfile.dev` or `Dockerfile.tests` (tests), and Go/C services have `Dockerfile.package` (RPM packaging via 3-stage: builder → rpmbuild × 3 distros → alpine output).
|
||
|
||
### Go config: YAML + env vars
|
||
- Sentinel: `config.yml`, env prefix `JA4SENTINEL_`
|
||
- Correlator: `config.yml`, env prefix `LOGCORRELATOR_`
|
||
- Both support `SIGHUP` for log rotation
|
||
|
||
### Python config: pydantic-settings
|
||
- Dashboard: `backend/config.py` → `Settings(BaseSettings)` with `.env` file
|
||
- ja4_common: `ClickHouseSettings(BaseSettings)` — singleton at `settings`
|
||
- bot-detector: exception — uses raw `os.getenv()`, not pydantic-settings
|
||
|
||
### Dashboard route structure
|
||
Every route file follows this pattern:
|
||
```python
|
||
from fastapi import APIRouter, HTTPException, Query
|
||
from ..config import settings
|
||
from ..database import db
|
||
|
||
router = APIRouter()
|
||
|
||
@router.get("/api/something")
|
||
async def get_something():
|
||
query = f"SELECT ... FROM {settings.CLICKHOUSE_DB_PROCESSING}.table_name ..."
|
||
result = db.query(query)
|
||
...
|
||
```
|
||
|
||
### RPM spec files
|
||
Located at `services/<name>/packaging/rpm/<name>.spec`. Version injected via `--define "build_version X.Y.Z"` at build time.
|
||
|
||
### Inter-service communication
|
||
Services communicate via **Unix sockets**, not HTTP:
|
||
- `sentinel` → `/var/run/logcorrelator/network.socket` → `correlator` (source B: TLS/TCP data)
|
||
- `mod-reqin-log` → `/var/run/logcorrelator/http.socket` → `correlator` (source A: HTTP data)
|
||
- `correlator` → ClickHouse (batch inserts into `ja4_logs.http_logs_raw`)
|
||
|
||
### Sentinel requires elevated privileges
|
||
Tests need `--cap-add=NET_RAW --cap-add=NET_ADMIN` for packet capture (pcap).
|