docs: add .github/copilot-instructions.md for Copilot context
Covers: build/test/lint commands, architecture overview, ClickHouse dual-DB pattern, inter-service communication, key conventions for Go (hexagonal, YAML config), Python (pydantic-settings, FastAPI routes), C (Apache module), Docker-first builds, and RPM packaging. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
138
.github/copilot-instructions.md
vendored
Normal file
138
.github/copilot-instructions.md
vendored
Normal file
@ -0,0 +1,138 @@
|
|||||||
|
# Copilot Instructions — ja4-platform
|
||||||
|
|
||||||
|
## What is this?
|
||||||
|
|
||||||
|
A monorepo for a JA4/JA3 TLS fingerprinting security pipeline. Five services capture network traffic, correlate logs, detect bots via ML, and present results in a SOC dashboard. All backed by ClickHouse.
|
||||||
|
|
||||||
|
**Data flow:** `mod-reqin-log` (Apache HTTP logs) → unix socket → `correlator` ← unix socket ← `sentinel` (TLS/TCP capture) → ClickHouse → `bot-detector` (ML scoring) → `dashboard` (FastAPI SOC UI)
|
||||||
|
|
||||||
|
## Build, test, lint
|
||||||
|
|
||||||
|
All builds run in Docker — no native Go/Python/C toolchain required on the host.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
# Full suite
|
||||||
|
make test-all # run all tests (Docker)
|
||||||
|
make build-all # build all service images
|
||||||
|
make rpm-all # build RPMs (sentinel, correlator, mod-reqin-log) for el8/el9/el10
|
||||||
|
|
||||||
|
# Per-service tests
|
||||||
|
make test-sentinel # Go tests (needs --cap-add=NET_RAW inside)
|
||||||
|
make test-correlator # Go tests with 60% coverage gate
|
||||||
|
make test-bot-detector # Python pytest
|
||||||
|
make test-dashboard # Python pytest
|
||||||
|
make test-ja4common-python # Python pytest (shared lib)
|
||||||
|
make test-mod-reqin-log # C cmocka tests
|
||||||
|
|
||||||
|
# Single Go test (from service dir, or via Docker):
|
||||||
|
docker run --rm -v $(pwd):/build -w /build/services/correlator golang:1.24 \
|
||||||
|
go test -v -run TestConfigLoad ./internal/config/
|
||||||
|
|
||||||
|
# Single Python test (from repo root):
|
||||||
|
docker build -f services/dashboard/Dockerfile.tests -t dash-tests .
|
||||||
|
docker run --rm dash-tests pytest backend/tests/test_metrics.py -v -k test_health
|
||||||
|
|
||||||
|
# Linting (Go only — no Python linter configured)
|
||||||
|
cd services/sentinel && go vet ./... && gofmt -l .
|
||||||
|
cd services/correlator && go vet ./... && gofmt -l .
|
||||||
|
```
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
### Go workspace (`go.work`, Go 1.24.6)
|
||||||
|
|
||||||
|
Three modules in the workspace:
|
||||||
|
- `services/sentinel` — TLS/TCP packet capture daemon (gopacket/pcap, systemd)
|
||||||
|
- `services/correlator` — log correlation engine, hexagonal architecture
|
||||||
|
- `shared/go/ja4common` — shared logger, config, shutdown, ipfilter
|
||||||
|
|
||||||
|
Both services have a `replace` directive in their `go.mod` pointing to `../../shared/go/ja4common`. The workspace takes precedence for local dev; the `replace` is needed for Docker builds.
|
||||||
|
|
||||||
|
### Correlator hexagonal architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
ports/source.go → EventSource, CorrelatedLogSink, CorrelationProcessor interfaces
|
||||||
|
adapters/inbound/ → unixsocket (reads from sentinel + mod-reqin-log)
|
||||||
|
adapters/outbound/ → clickhouse, file, stdout, multi (fan-out wrapper)
|
||||||
|
domain/ → CorrelationService, CorrelatedLog, NormalizedEvent
|
||||||
|
app/ → Orchestrator (wires everything together)
|
||||||
|
config/ → YAML config loader
|
||||||
|
```
|
||||||
|
|
||||||
|
### Python services
|
||||||
|
|
||||||
|
- `bot-detector` — scikit-learn IsolationForest + DBSCAN. Single monolithic module (`bot_detector.py`). Uses `os.getenv()` directly for config, NOT pydantic-settings.
|
||||||
|
- `dashboard` — FastAPI + React SPA. 20 route modules in `backend/routes/`. Uses pydantic-settings (`backend/config.py`).
|
||||||
|
- `shared/python/ja4_common` — `ClickHouseClient` singleton + `ClickHouseSettings` (pydantic-settings). Installed as a local package in each Python Dockerfile.
|
||||||
|
|
||||||
|
### C module
|
||||||
|
|
||||||
|
- `mod-reqin-log` — Apache HTTPD module (C11, built with `apxs`). Logs HTTP requests as JSON to a Unix socket. Tests use cmocka.
|
||||||
|
|
||||||
|
## ClickHouse dual-database pattern
|
||||||
|
|
||||||
|
Two configurable databases (env vars with defaults):
|
||||||
|
|
||||||
|
| Env var | Default | Contains |
|
||||||
|
|---------|---------|----------|
|
||||||
|
| `CLICKHOUSE_DB_LOGS` | `ja4_logs` | `http_logs_raw`, `http_logs`, `mv_http_logs` |
|
||||||
|
| `CLICKHOUSE_DB_PROCESSING` | `ja4_processing` | Aggregations, ML tables, views, dicts, audit |
|
||||||
|
|
||||||
|
**Cross-database references exist** — materialized views in one DB read from the other:
|
||||||
|
- `ja4_logs.mv_http_logs` references `ja4_processing.dict_anubis_*` and `ja4_processing.dict_iplocate_asn`
|
||||||
|
- `ja4_processing.mv_agg_*` reads `FROM ja4_logs.http_logs`
|
||||||
|
|
||||||
|
**In Python code**, always use fully qualified table names:
|
||||||
|
```python
|
||||||
|
from ..config import settings
|
||||||
|
query = f"SELECT ... FROM {settings.CLICKHOUSE_DB_PROCESSING}.ml_detected_anomalies ..."
|
||||||
|
query = f"SELECT ... FROM {settings.CLICKHOUSE_DB_LOGS}.http_logs ..."
|
||||||
|
```
|
||||||
|
Never hardcode database names in queries.
|
||||||
|
|
||||||
|
**In Go (correlator)**, the database is part of the ClickHouse DSN (`clickhouse://user:pass@host:9000/ja4_logs`). The target table is configurable via YAML (`outputs.clickhouse.table`).
|
||||||
|
|
||||||
|
**SQL migrations** live in `shared/clickhouse/` (10 ordered files). Deploy with `shared/clickhouse/deploy_schema.sh` which substitutes DB names from env vars.
|
||||||
|
|
||||||
|
## Key conventions
|
||||||
|
|
||||||
|
### Docker-first builds
|
||||||
|
Every service has `Dockerfile` (prod), `Dockerfile.dev` or `Dockerfile.tests` (tests), and Go/C services have `Dockerfile.package` (RPM packaging via 3-stage: builder → rpmbuild × 3 distros → alpine output).
|
||||||
|
|
||||||
|
### Go config: YAML + env vars
|
||||||
|
- Sentinel: `config.yml`, env prefix `JA4SENTINEL_`
|
||||||
|
- Correlator: `config.yml`, env prefix `LOGCORRELATOR_`
|
||||||
|
- Both support `SIGHUP` for log rotation
|
||||||
|
|
||||||
|
### Python config: pydantic-settings
|
||||||
|
- Dashboard: `backend/config.py` → `Settings(BaseSettings)` with `.env` file
|
||||||
|
- ja4_common: `ClickHouseSettings(BaseSettings)` — singleton at `settings`
|
||||||
|
- bot-detector: exception — uses raw `os.getenv()`, not pydantic-settings
|
||||||
|
|
||||||
|
### Dashboard route structure
|
||||||
|
Every route file follows this pattern:
|
||||||
|
```python
|
||||||
|
from fastapi import APIRouter, HTTPException, Query
|
||||||
|
from ..config import settings
|
||||||
|
from ..database import db
|
||||||
|
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
@router.get("/api/something")
|
||||||
|
async def get_something():
|
||||||
|
query = f"SELECT ... FROM {settings.CLICKHOUSE_DB_PROCESSING}.table_name ..."
|
||||||
|
result = db.query(query)
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
### RPM spec files
|
||||||
|
Located at `services/<name>/packaging/rpm/<name>.spec`. Version injected via `--define "build_version X.Y.Z"` at build time.
|
||||||
|
|
||||||
|
### Inter-service communication
|
||||||
|
Services communicate via **Unix sockets**, not HTTP:
|
||||||
|
- `sentinel` → `/var/run/logcorrelator/network.socket` → `correlator` (source B: TLS/TCP data)
|
||||||
|
- `mod-reqin-log` → `/var/run/logcorrelator/http.socket` → `correlator` (source A: HTTP data)
|
||||||
|
- `correlator` → ClickHouse (batch inserts into `ja4_logs.http_logs_raw`)
|
||||||
|
|
||||||
|
### Sentinel requires elevated privileges
|
||||||
|
Tests need `--cap-add=NET_RAW --cap-add=NET_ADMIN` for packet capture (pcap).
|
||||||
Reference in New Issue
Block a user