Services: - ja4sentinel: TLS/JA4 fingerprint capture daemon (Go, libpcap) - logcorrelator: JA4 log correlation engine (Go, ClickHouse) - mod_reqin_log: Apache module (C, JSON request logging) - bot_detector: ML bot detection pipeline (Python) - dashboard: FastAPI/Streamlit analytics UI (Python) Shared libraries: - shared/go/ja4common: logger, config, shutdown, ipfilter (Go module) - shared/python/ja4_common: ClickHouseClient, ClickHouseSettings (Python package) - shared/clickhouse/: canonical SQL migrations (10 files) Build & packaging: - Unified 3-stage Dockerfile.package for Go RPMs (el8/el9/el10) - go.work workspace linking sentinel, correlator, ja4common - Makefile with test-all, build-all, rpm-* targets Fixes applied: - go.work: 1.21 → 1.24.6 (required by sentinel) - correlator Dockerfiles: golang:1.21 → golang:1.24 - replace directives in go.mod for ja4common local path - pyproject.toml: setuptools.backends → setuptools.build_meta - Removed static libpcap linking (unavailable on Rocky 9) - Fixed data races in output/writers_test.go (sync.Mutex + atomic.Int32) - Rewrote corrupted test files (logger_test.go × 2) Test coverage: - correlator: 67.1% total (unixsocket 80.5%, config 91.7%, app 83.3%, multi 87.7%, stdout 100%) - sentinel: all 10 packages pass (api, capture, config, fingerprint, ipfilter, logging, output, tlsparse) Documentation: - README.md + docs/ (architecture, development, 5 services, shared libs, DB schema & migrations) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
6.7 KiB
6.7 KiB
Copilot Instructions — Bot Detector Dashboard
Architecture Overview
This is a SOC (Security Operations Center) dashboard for visualizing bot detections from an upstream bot_detector_ai service. It is a single-service, full-stack app: the FastAPI backend serves the built React frontend as static files and exposes a REST API, all on port 8000. There is no separate frontend server in production and no authentication.
Data source: ClickHouse database (mabase_prod), primarily the ml_detected_anomalies table and the view_dashboard_entities view.
dashboard/
├── backend/ # Python 3.11 + FastAPI — REST API + static file serving
│ ├── main.py # App entry point: CORS, router registration, SPA catch-all
│ ├── config.py # pydantic-settings Settings, reads .env
│ ├── database.py # ClickHouseClient singleton (db)
│ ├── models.py # All Pydantic v2 response models
│ ├── routes/ # One module per domain: metrics, detections, variability,
│ │ # attributes, analysis, entities, incidents, audit, reputation
│ └── services/
│ └── reputation_ip.py # Async httpx → ip-api.com + ipinfo.io (no API keys)
└── frontend/ # React 18 + TypeScript 5 + Vite 5 + Tailwind CSS 3
└── src/
├── App.tsx # BrowserRouter + Sidebar + TopHeader + all Routes
├── ThemeContext.tsx # dark/light/auto, persisted to localStorage (key: soc_theme)
├── api/client.ts # Axios instance (baseURL: /api) + all TS interfaces
├── components/ # One component per route view + shared panels + ui/
├── hooks/ # useMetrics, useDetections, useVariability (polling wrappers)
└── utils/STIXExporter.ts
Dev Commands
# Backend (run from repo root)
pip install -r requirements.txt
python -m uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000
# Frontend (separate terminal)
cd frontend && npm install
npm run dev # :3000 with HMR, proxies /api → localhost:8000
npm run build # tsc type-check + vite build → frontend/dist/
npm run preview # preview the production build
# Docker (production)
docker compose up -d dashboard_web
docker compose build dashboard_web && docker compose up -d dashboard_web
docker compose logs -f dashboard_web
There is no test suite or linter configured (no pytest, vitest, ESLint, Black, etc.).
# Manual smoke tests
curl http://localhost:8000/health
curl http://localhost:8000/api/metrics | jq '.summary'
curl "http://localhost:8000/api/detections?page=1&page_size=5" | jq '.items | length'
Key Conventions
Backend
- All routes are raw SQL — no ORM. Results are accessed by positional index:
result.result_rows[0][n]. Column order is determined by theSELECTstatement. - Query parameters use
%(name)sdict syntax:db.query(sql, {"param": value}). - Every router module defines
router = APIRouter(prefix="/api/<domain>", tags=["..."])and is registered inmain.pyviaapp.include_router(...). - SPA catch-all (
/{full_path:path}) must remain the last registered route inmain.py. New routers must be added withapp.include_router()before it. - IPv4 IPs are stored as IPv6-mapped (
::ffff:x.x.x.x) insrc_ip; queries normalize withreplaceRegexpAll(toString(src_ip), '^::ffff:', ''). - NULL guards — all row fields are coalesced:
row[n] or "",row[n] or 0,row[n] or "LOW". anomaly_scorecan be negative in the DB; always normalize withabs()for display.analysis.pystores SOC classifications in aclassificationsClickHouse table. Theaudit_logstable is optional — routes silently return empty results if absent.
Frontend
- API calls use the axios instance from
src/api/client.ts(baseURL/api) or directfetch('/api/...'). There is no global state manager — components useuseState/useEffector custom hooks directly. - TypeScript interfaces in
client.tsmirror the Pydantic models inbackend/models.py. Both must be kept in sync when changing data shapes. - Tailwind uses semantic CSS-variable tokens — always use
bg-background,bg-background-secondary,bg-background-card,text-text-primary,text-text-secondary,text-text-disabled,bg-accent-primary,threat-critical/high/medium/lowrather than raw Tailwind color classes (e.g.,slate-800). This ensures dark/light theme compatibility. - Threat level taxonomy:
CRITICAL>HIGH>MEDIUM>LOW— always uppercase strings; colors: red / orange / yellow / green. - URL encoding: entity values with special characters (JA4 fingerprints, subnets) are
encodeURIComponent-encoded. Subnets use_24in place of/24(e.g.,/entities/subnet/141.98.11.0_24). - Recent investigations are stored in
localStorageundersoc_recent_investigations(max 8). Tracked byRouteTrackercomponent. Only typesip,ja4,subnetare tracked. - Auto-refresh: metrics every 30 s, incidents every 60 s.
- French UI text — all user-facing strings and log messages are in French; code identifiers are in English.
Frontend → Backend in Dev vs Production
- Dev: Vite dev server on
:3000proxies/api/*tohttp://localhost:8000(seevite.config.ts). - Production: React SPA is served by FastAPI from
frontend/dist/. API calls hit the same origin at:8000— no proxy needed.
Docker
- Single service using
network_mode: "host"— no port mapping; the container shares the host network stack. - Multi-stage Dockerfile:
node:20-alpinebuilds the frontend →python:3.11-sliminstalls deps → final image copies both.
Environment Variables (.env)
| Variable | Default | Description |
|---|---|---|
CLICKHOUSE_HOST |
clickhouse |
ClickHouse hostname |
CLICKHOUSE_PORT |
8123 |
ClickHouse HTTP port (set in code) |
CLICKHOUSE_DB |
mabase_prod |
Database name |
CLICKHOUSE_USER |
admin |
|
CLICKHOUSE_PASSWORD |
`` | |
API_HOST |
0.0.0.0 |
Uvicorn bind host |
API_PORT |
8000 |
Uvicorn bind port |
CORS_ORIGINS |
["http://localhost:3000", ...] |
Allowed origins |
⚠️ The
.envfile contains real credentials — never commit it to public repos.
ClickHouse Tables
| Table / View | Used by |
|---|---|
ml_detected_anomalies |
Primary source for detections, metrics, variability, analysis |
view_dashboard_entities |
User agents, client headers, paths, query params (entities routes) |
classifications |
SOC analyst classifications (created by analysis.py) |
mabase_prod.audit_logs |
Audit trail (optional — missing table is handled silently) |