From d8dbd4e7062f4fa2591d0aaa5d40d188c6e5f77f Mon Sep 17 00:00:00 2001 From: toto Date: Tue, 7 Apr 2026 19:21:32 +0200 Subject: [PATCH] docs: add .github/copilot-instructions.md for Copilot context Covers: build/test/lint commands, architecture overview, ClickHouse dual-DB pattern, inter-service communication, key conventions for Go (hexagonal, YAML config), Python (pydantic-settings, FastAPI routes), C (Apache module), Docker-first builds, and RPM packaging. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .github/copilot-instructions.md | 138 ++++++++++++++++++++++++++++++++ 1 file changed, 138 insertions(+) create mode 100644 .github/copilot-instructions.md diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 100644 index 0000000..8aa4498 --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1,138 @@ +# Copilot Instructions — ja4-platform + +## What is this? + +A monorepo for a JA4/JA3 TLS fingerprinting security pipeline. Five services capture network traffic, correlate logs, detect bots via ML, and present results in a SOC dashboard. All backed by ClickHouse. + +**Data flow:** `mod-reqin-log` (Apache HTTP logs) → unix socket → `correlator` ← unix socket ← `sentinel` (TLS/TCP capture) → ClickHouse → `bot-detector` (ML scoring) → `dashboard` (FastAPI SOC UI) + +## Build, test, lint + +All builds run in Docker — no native Go/Python/C toolchain required on the host. + +```sh +# Full suite +make test-all # run all tests (Docker) +make build-all # build all service images +make rpm-all # build RPMs (sentinel, correlator, mod-reqin-log) for el8/el9/el10 + +# Per-service tests +make test-sentinel # Go tests (needs --cap-add=NET_RAW inside) +make test-correlator # Go tests with 60% coverage gate +make test-bot-detector # Python pytest +make test-dashboard # Python pytest +make test-ja4common-python # Python pytest (shared lib) +make test-mod-reqin-log # C cmocka tests + +# Single Go test (from service dir, or via Docker): +docker run --rm -v $(pwd):/build -w /build/services/correlator golang:1.24 \ + go test -v -run TestConfigLoad ./internal/config/ + +# Single Python test (from repo root): +docker build -f services/dashboard/Dockerfile.tests -t dash-tests . +docker run --rm dash-tests pytest backend/tests/test_metrics.py -v -k test_health + +# Linting (Go only — no Python linter configured) +cd services/sentinel && go vet ./... && gofmt -l . +cd services/correlator && go vet ./... && gofmt -l . +``` + +## Architecture + +### Go workspace (`go.work`, Go 1.24.6) + +Three modules in the workspace: +- `services/sentinel` — TLS/TCP packet capture daemon (gopacket/pcap, systemd) +- `services/correlator` — log correlation engine, hexagonal architecture +- `shared/go/ja4common` — shared logger, config, shutdown, ipfilter + +Both services have a `replace` directive in their `go.mod` pointing to `../../shared/go/ja4common`. The workspace takes precedence for local dev; the `replace` is needed for Docker builds. + +### Correlator hexagonal architecture + +``` +ports/source.go → EventSource, CorrelatedLogSink, CorrelationProcessor interfaces +adapters/inbound/ → unixsocket (reads from sentinel + mod-reqin-log) +adapters/outbound/ → clickhouse, file, stdout, multi (fan-out wrapper) +domain/ → CorrelationService, CorrelatedLog, NormalizedEvent +app/ → Orchestrator (wires everything together) +config/ → YAML config loader +``` + +### Python services + +- `bot-detector` — scikit-learn IsolationForest + DBSCAN. Single monolithic module (`bot_detector.py`). Uses `os.getenv()` directly for config, NOT pydantic-settings. +- `dashboard` — FastAPI + React SPA. 20 route modules in `backend/routes/`. Uses pydantic-settings (`backend/config.py`). +- `shared/python/ja4_common` — `ClickHouseClient` singleton + `ClickHouseSettings` (pydantic-settings). Installed as a local package in each Python Dockerfile. + +### C module + +- `mod-reqin-log` — Apache HTTPD module (C11, built with `apxs`). Logs HTTP requests as JSON to a Unix socket. Tests use cmocka. + +## ClickHouse dual-database pattern + +Two configurable databases (env vars with defaults): + +| Env var | Default | Contains | +|---------|---------|----------| +| `CLICKHOUSE_DB_LOGS` | `ja4_logs` | `http_logs_raw`, `http_logs`, `mv_http_logs` | +| `CLICKHOUSE_DB_PROCESSING` | `ja4_processing` | Aggregations, ML tables, views, dicts, audit | + +**Cross-database references exist** — materialized views in one DB read from the other: +- `ja4_logs.mv_http_logs` references `ja4_processing.dict_anubis_*` and `ja4_processing.dict_iplocate_asn` +- `ja4_processing.mv_agg_*` reads `FROM ja4_logs.http_logs` + +**In Python code**, always use fully qualified table names: +```python +from ..config import settings +query = f"SELECT ... FROM {settings.CLICKHOUSE_DB_PROCESSING}.ml_detected_anomalies ..." +query = f"SELECT ... FROM {settings.CLICKHOUSE_DB_LOGS}.http_logs ..." +``` +Never hardcode database names in queries. + +**In Go (correlator)**, the database is part of the ClickHouse DSN (`clickhouse://user:pass@host:9000/ja4_logs`). The target table is configurable via YAML (`outputs.clickhouse.table`). + +**SQL migrations** live in `shared/clickhouse/` (10 ordered files). Deploy with `shared/clickhouse/deploy_schema.sh` which substitutes DB names from env vars. + +## Key conventions + +### Docker-first builds +Every service has `Dockerfile` (prod), `Dockerfile.dev` or `Dockerfile.tests` (tests), and Go/C services have `Dockerfile.package` (RPM packaging via 3-stage: builder → rpmbuild × 3 distros → alpine output). + +### Go config: YAML + env vars +- Sentinel: `config.yml`, env prefix `JA4SENTINEL_` +- Correlator: `config.yml`, env prefix `LOGCORRELATOR_` +- Both support `SIGHUP` for log rotation + +### Python config: pydantic-settings +- Dashboard: `backend/config.py` → `Settings(BaseSettings)` with `.env` file +- ja4_common: `ClickHouseSettings(BaseSettings)` — singleton at `settings` +- bot-detector: exception — uses raw `os.getenv()`, not pydantic-settings + +### Dashboard route structure +Every route file follows this pattern: +```python +from fastapi import APIRouter, HTTPException, Query +from ..config import settings +from ..database import db + +router = APIRouter() + +@router.get("/api/something") +async def get_something(): + query = f"SELECT ... FROM {settings.CLICKHOUSE_DB_PROCESSING}.table_name ..." + result = db.query(query) + ... +``` + +### RPM spec files +Located at `services//packaging/rpm/.spec`. Version injected via `--define "build_version X.Y.Z"` at build time. + +### Inter-service communication +Services communicate via **Unix sockets**, not HTTP: +- `sentinel` → `/var/run/logcorrelator/network.socket` → `correlator` (source B: TLS/TCP data) +- `mod-reqin-log` → `/var/run/logcorrelator/http.socket` → `correlator` (source A: HTTP data) +- `correlator` → ClickHouse (batch inserts into `ja4_logs.http_logs_raw`) + +### Sentinel requires elevated privileges +Tests need `--cap-add=NET_RAW --cap-add=NET_ADMIN` for packet capture (pcap).