docs: update copilot-instructions.md for v14 changes
- Fix coverage gate: 60% → 80% for correlator - Document dual-model pattern (Complet/Applicatif) in bot-detector - Add SQL deployment paths: deploy_views.sql + service migrations - Add data retention TTL table with partition info - Fix integration test description (8 phases, --build-only flag) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
30
.github/copilot-instructions.md
vendored
30
.github/copilot-instructions.md
vendored
@ -18,7 +18,7 @@ make rpm-all # build RPMs (sentinel, correlator, mod-reqin-log) for el8/
|
|||||||
|
|
||||||
# Per-service tests
|
# Per-service tests
|
||||||
make test-sentinel # Go tests (needs --cap-add=NET_RAW inside)
|
make test-sentinel # Go tests (needs --cap-add=NET_RAW inside)
|
||||||
make test-correlator # Go tests with 60% coverage gate
|
make test-correlator # Go tests with 80% coverage gate
|
||||||
make test-bot-detector # Python pytest
|
make test-bot-detector # Python pytest
|
||||||
make test-dashboard # Python pytest
|
make test-dashboard # Python pytest
|
||||||
make test-ja4common-python # Python pytest (shared lib)
|
make test-ja4common-python # Python pytest (shared lib)
|
||||||
@ -40,9 +40,10 @@ cd services/sentinel && go vet ./... && gofmt -l .
|
|||||||
cd services/correlator && go vet ./... && gofmt -l .
|
cd services/correlator && go vet ./... && gofmt -l .
|
||||||
|
|
||||||
# Full-stack integration tests (Docker Compose, resets DB each run)
|
# Full-stack integration tests (Docker Compose, resets DB each run)
|
||||||
make test-integration # runs tests/integration/run-tests.sh → down -v + up + traffic + verify
|
make test-integration # 8 phases: build → start → schema → traffic → pipeline → dashboard → bot-detector → sentinel
|
||||||
make test-integration-keep # same but leaves stack running after
|
make test-integration-keep # same but leaves stack running after
|
||||||
make test-integration-down # tear down integration stack
|
make test-integration-down # tear down integration stack
|
||||||
|
# run-tests.sh also accepts: --build-only (build images without running tests)
|
||||||
```
|
```
|
||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
@ -69,7 +70,12 @@ config/ → YAML config loader
|
|||||||
|
|
||||||
### Python services
|
### Python services
|
||||||
|
|
||||||
- `bot-detector` — scikit-learn IsolationForest + DBSCAN. Single monolithic module (`bot_detector.py`). Uses `os.getenv()` directly for config, NOT pydantic-settings.
|
- `bot-detector` — scikit-learn IsolationForest + DBSCAN + SHAP. Single monolithic module (`bot_detector.py`). Runs **two parallel IF models** per cycle:
|
||||||
|
- `Complet` (45 features L3→L7) on correlated traffic (`correlated=1`, TCP+TLS+HTTP)
|
||||||
|
- `Applicatif` (35 features L7-only) on uncorrelated traffic (`correlated=0`)
|
||||||
|
- Optional 24h variants (`Complet_24h`/`Applicatif_24h`) when `ENABLE_MULTIWINDOW=true`
|
||||||
|
|
||||||
|
`model_name` is part of the ORDER BY key in both `ml_detected_anomalies` and `ml_all_scores`. Uses `os.getenv()` directly for config, NOT pydantic-settings.
|
||||||
- `dashboard` — FastAPI + React SPA. 20 route modules in `backend/routes/`. Uses pydantic-settings (`backend/config.py`).
|
- `dashboard` — FastAPI + React SPA. 20 route modules in `backend/routes/`. Uses pydantic-settings (`backend/config.py`).
|
||||||
- `shared/python/ja4_common` — `ClickHouseClient` singleton + `ClickHouseSettings` (pydantic-settings). Installed as a local package in each Python Dockerfile.
|
- `shared/python/ja4_common` — `ClickHouseClient` singleton + `ClickHouseSettings` (pydantic-settings). Installed as a local package in each Python Dockerfile.
|
||||||
|
|
||||||
@ -100,7 +106,23 @@ Never hardcode database names in queries.
|
|||||||
|
|
||||||
**In Go (correlator)**, the database is part of the ClickHouse DSN (`clickhouse://user:pass@host:9000/ja4_logs`). The target table is configurable via YAML (`outputs.clickhouse.table`).
|
**In Go (correlator)**, the database is part of the ClickHouse DSN (`clickhouse://user:pass@host:9000/ja4_logs`). The target table is configurable via YAML (`outputs.clickhouse.table`).
|
||||||
|
|
||||||
**SQL migrations** live in `shared/clickhouse/` (10 ordered files). Deploy with `shared/clickhouse/deploy_schema.sh` which substitutes DB names from env vars.
|
**SQL schema** has two deployment paths:
|
||||||
|
- **Base schema**: `shared/clickhouse/` (10 ordered files). Deploy with `shared/clickhouse/deploy_schema.sh` which substitutes DB names from env vars.
|
||||||
|
- **Bot-detector views**: `services/bot-detector/deploy_views.sql` — aggregation tables, MVs, ML result tables, dashboard views. Version-controlled separately (currently v14).
|
||||||
|
- **Post-deploy migrations**: `services/correlator/sql/migrations/` — ALTER TABLE statements for existing deployments (TTL changes, ORDER BY fixes). Run manually: `clickhouse-client --multiquery < file.sql`.
|
||||||
|
|
||||||
|
### Data retention (TTL)
|
||||||
|
|
||||||
|
| Table | TTL | Partition |
|
||||||
|
|-------|-----|-----------|
|
||||||
|
| `http_logs_raw` | 2 hours | `toStartOfHour(ingest_time)` |
|
||||||
|
| `http_logs` | 30 days | `toDate(log_date)` |
|
||||||
|
| `agg_host_ip_ja4_1h` | 7 days | `toDate(window_start)` |
|
||||||
|
| `agg_header_fingerprint_1h` | 7 days | `toDate(window_start)` |
|
||||||
|
| `ml_detected_anomalies` | 7 days | `toDate(detected_at)` |
|
||||||
|
| `ml_all_scores` | 7 days | `toDate(window_start)` |
|
||||||
|
|
||||||
|
All aggregation/ML tables use `ttl_only_drop_parts=1` for efficient partition-level expiry.
|
||||||
|
|
||||||
## Key conventions
|
## Key conventions
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user