docs: add .github/copilot-instructions.md for Copilot context
Covers: build/test/lint commands, architecture overview, ClickHouse dual-DB pattern, inter-service communication, key conventions for Go (hexagonal, YAML config), Python (pydantic-settings, FastAPI routes), C (Apache module), Docker-first builds, and RPM packaging. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
138
.github/copilot-instructions.md
vendored
Normal file
138
.github/copilot-instructions.md
vendored
Normal file
@ -0,0 +1,138 @@
|
||||
# Copilot Instructions — ja4-platform
|
||||
|
||||
## What is this?
|
||||
|
||||
A monorepo for a JA4/JA3 TLS fingerprinting security pipeline. Five services capture network traffic, correlate logs, detect bots via ML, and present results in a SOC dashboard. All backed by ClickHouse.
|
||||
|
||||
**Data flow:** `mod-reqin-log` (Apache HTTP logs) → unix socket → `correlator` ← unix socket ← `sentinel` (TLS/TCP capture) → ClickHouse → `bot-detector` (ML scoring) → `dashboard` (FastAPI SOC UI)
|
||||
|
||||
## Build, test, lint
|
||||
|
||||
All builds run in Docker — no native Go/Python/C toolchain required on the host.
|
||||
|
||||
```sh
|
||||
# Full suite
|
||||
make test-all # run all tests (Docker)
|
||||
make build-all # build all service images
|
||||
make rpm-all # build RPMs (sentinel, correlator, mod-reqin-log) for el8/el9/el10
|
||||
|
||||
# Per-service tests
|
||||
make test-sentinel # Go tests (needs --cap-add=NET_RAW inside)
|
||||
make test-correlator # Go tests with 60% coverage gate
|
||||
make test-bot-detector # Python pytest
|
||||
make test-dashboard # Python pytest
|
||||
make test-ja4common-python # Python pytest (shared lib)
|
||||
make test-mod-reqin-log # C cmocka tests
|
||||
|
||||
# Single Go test (from service dir, or via Docker):
|
||||
docker run --rm -v $(pwd):/build -w /build/services/correlator golang:1.24 \
|
||||
go test -v -run TestConfigLoad ./internal/config/
|
||||
|
||||
# Single Python test (from repo root):
|
||||
docker build -f services/dashboard/Dockerfile.tests -t dash-tests .
|
||||
docker run --rm dash-tests pytest backend/tests/test_metrics.py -v -k test_health
|
||||
|
||||
# Linting (Go only — no Python linter configured)
|
||||
cd services/sentinel && go vet ./... && gofmt -l .
|
||||
cd services/correlator && go vet ./... && gofmt -l .
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
### Go workspace (`go.work`, Go 1.24.6)
|
||||
|
||||
Three modules in the workspace:
|
||||
- `services/sentinel` — TLS/TCP packet capture daemon (gopacket/pcap, systemd)
|
||||
- `services/correlator` — log correlation engine, hexagonal architecture
|
||||
- `shared/go/ja4common` — shared logger, config, shutdown, ipfilter
|
||||
|
||||
Both services have a `replace` directive in their `go.mod` pointing to `../../shared/go/ja4common`. The workspace takes precedence for local dev; the `replace` is needed for Docker builds.
|
||||
|
||||
### Correlator hexagonal architecture
|
||||
|
||||
```
|
||||
ports/source.go → EventSource, CorrelatedLogSink, CorrelationProcessor interfaces
|
||||
adapters/inbound/ → unixsocket (reads from sentinel + mod-reqin-log)
|
||||
adapters/outbound/ → clickhouse, file, stdout, multi (fan-out wrapper)
|
||||
domain/ → CorrelationService, CorrelatedLog, NormalizedEvent
|
||||
app/ → Orchestrator (wires everything together)
|
||||
config/ → YAML config loader
|
||||
```
|
||||
|
||||
### Python services
|
||||
|
||||
- `bot-detector` — scikit-learn IsolationForest + DBSCAN. Single monolithic module (`bot_detector.py`). Uses `os.getenv()` directly for config, NOT pydantic-settings.
|
||||
- `dashboard` — FastAPI + React SPA. 20 route modules in `backend/routes/`. Uses pydantic-settings (`backend/config.py`).
|
||||
- `shared/python/ja4_common` — `ClickHouseClient` singleton + `ClickHouseSettings` (pydantic-settings). Installed as a local package in each Python Dockerfile.
|
||||
|
||||
### C module
|
||||
|
||||
- `mod-reqin-log` — Apache HTTPD module (C11, built with `apxs`). Logs HTTP requests as JSON to a Unix socket. Tests use cmocka.
|
||||
|
||||
## ClickHouse dual-database pattern
|
||||
|
||||
Two configurable databases (env vars with defaults):
|
||||
|
||||
| Env var | Default | Contains |
|
||||
|---------|---------|----------|
|
||||
| `CLICKHOUSE_DB_LOGS` | `ja4_logs` | `http_logs_raw`, `http_logs`, `mv_http_logs` |
|
||||
| `CLICKHOUSE_DB_PROCESSING` | `ja4_processing` | Aggregations, ML tables, views, dicts, audit |
|
||||
|
||||
**Cross-database references exist** — materialized views in one DB read from the other:
|
||||
- `ja4_logs.mv_http_logs` references `ja4_processing.dict_anubis_*` and `ja4_processing.dict_iplocate_asn`
|
||||
- `ja4_processing.mv_agg_*` reads `FROM ja4_logs.http_logs`
|
||||
|
||||
**In Python code**, always use fully qualified table names:
|
||||
```python
|
||||
from ..config import settings
|
||||
query = f"SELECT ... FROM {settings.CLICKHOUSE_DB_PROCESSING}.ml_detected_anomalies ..."
|
||||
query = f"SELECT ... FROM {settings.CLICKHOUSE_DB_LOGS}.http_logs ..."
|
||||
```
|
||||
Never hardcode database names in queries.
|
||||
|
||||
**In Go (correlator)**, the database is part of the ClickHouse DSN (`clickhouse://user:pass@host:9000/ja4_logs`). The target table is configurable via YAML (`outputs.clickhouse.table`).
|
||||
|
||||
**SQL migrations** live in `shared/clickhouse/` (10 ordered files). Deploy with `shared/clickhouse/deploy_schema.sh` which substitutes DB names from env vars.
|
||||
|
||||
## Key conventions
|
||||
|
||||
### Docker-first builds
|
||||
Every service has `Dockerfile` (prod), `Dockerfile.dev` or `Dockerfile.tests` (tests), and Go/C services have `Dockerfile.package` (RPM packaging via 3-stage: builder → rpmbuild × 3 distros → alpine output).
|
||||
|
||||
### Go config: YAML + env vars
|
||||
- Sentinel: `config.yml`, env prefix `JA4SENTINEL_`
|
||||
- Correlator: `config.yml`, env prefix `LOGCORRELATOR_`
|
||||
- Both support `SIGHUP` for log rotation
|
||||
|
||||
### Python config: pydantic-settings
|
||||
- Dashboard: `backend/config.py` → `Settings(BaseSettings)` with `.env` file
|
||||
- ja4_common: `ClickHouseSettings(BaseSettings)` — singleton at `settings`
|
||||
- bot-detector: exception — uses raw `os.getenv()`, not pydantic-settings
|
||||
|
||||
### Dashboard route structure
|
||||
Every route file follows this pattern:
|
||||
```python
|
||||
from fastapi import APIRouter, HTTPException, Query
|
||||
from ..config import settings
|
||||
from ..database import db
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
@router.get("/api/something")
|
||||
async def get_something():
|
||||
query = f"SELECT ... FROM {settings.CLICKHOUSE_DB_PROCESSING}.table_name ..."
|
||||
result = db.query(query)
|
||||
...
|
||||
```
|
||||
|
||||
### RPM spec files
|
||||
Located at `services/<name>/packaging/rpm/<name>.spec`. Version injected via `--define "build_version X.Y.Z"` at build time.
|
||||
|
||||
### Inter-service communication
|
||||
Services communicate via **Unix sockets**, not HTTP:
|
||||
- `sentinel` → `/var/run/logcorrelator/network.socket` → `correlator` (source B: TLS/TCP data)
|
||||
- `mod-reqin-log` → `/var/run/logcorrelator/http.socket` → `correlator` (source A: HTTP data)
|
||||
- `correlator` → ClickHouse (batch inserts into `ja4_logs.http_logs_raw`)
|
||||
|
||||
### Sentinel requires elevated privileges
|
||||
Tests need `--cap-add=NET_RAW --cap-add=NET_ADMIN` for packet capture (pcap).
|
||||
Reference in New Issue
Block a user