feat: ja4-platform monorepo — 5 services unified, tests & RPM builds standardized

Services: - ja4sentinel: TLS/JA4 fingerprint capture daemon (Go, libpcap) - logcorrelator: JA4 log correlation engine (Go, ClickHouse) - mod_reqin_log: Apache module (C, JSON request logging) - bot_detector: ML bot detection pipeline (Python) - dashboard: FastAPI/Streamlit analytics UI (Python) Shared libraries: - shared/go/ja4common: logger, config, shutdown, ipfilter (Go module) - shared/python/ja4_common: ClickHouseClient, ClickHouseSettings (Python package) - shared/clickhouse/: canonical SQL migrations (10 files) Build & packaging: - Unified 3-stage Dockerfile.package for Go RPMs (el8/el9/el10) - go.work workspace linking sentinel, correlator, ja4common - Makefile with test-all, build-all, rpm-* targets Fixes applied: - go.work: 1.21 → 1.24.6 (required by sentinel) - correlator Dockerfiles: golang:1.21 → golang:1.24 - replace directives in go.mod for ja4common local path - pyproject.toml: setuptools.backends → setuptools.build_meta - Removed static libpcap linking (unavailable on Rocky 9) - Fixed data races in output/writers_test.go (sync.Mutex + atomic.Int32) - Rewrote corrupted test files (logger_test.go × 2) Test coverage: - correlator: 67.1% total (unixsocket 80.5%, config 91.7%, app 83.3%, multi 87.7%, stdout 100%) - sentinel: all 10 packages pass (api, capture, config, fingerprint, ipfilter, logging, output, tlsparse) Documentation: - README.md + docs/ (architecture, development, 5 services, shared libs, DB schema & migrations) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-07 16:42:59 +02:00
commit d469e39da7
278 changed files with 1621301 additions and 0 deletions
--- a/docs/services/dashboard.md
+++ b/docs/services/dashboard.md
@ -0,0 +1,308 @@
+# Dashboard
+
+The dashboard is a SOC (Security Operations Center) web application built with FastAPI (backend) and React (frontend) that provides real-time visualization, investigation, and analysis of bot detections generated by the [bot-detector](bot-detector.md). It queries ClickHouse (`mabase_prod`) for all data.
+
+## Technology Stack
+
+| Component | Technology |
+|-----------|-----------|
+| Backend | Python 3.11 + FastAPI |
+| Frontend | React + Vite |
+| Database | ClickHouse (via `ja4_common` shared client) |
+| API Docs | Swagger UI (`/docs`) and ReDoc (`/redoc`) |
+
+## Configuration
+
+| Variable | Type | Default | Description |
+|----------|------|---------|-------------|
+| `CLICKHOUSE_HOST` | string | `clickhouse` | ClickHouse hostname |
+| `CLICKHOUSE_PORT` | int | `8123` | ClickHouse HTTP port |
+| `CLICKHOUSE_DB` | string | `mabase_prod` | Database name |
+| `CLICKHOUSE_USER` | string | `admin` | ClickHouse user |
+| `CLICKHOUSE_PASSWORD` | string | `""` | ClickHouse password |
+| `API_HOST` | string | `0.0.0.0` | API listen address |
+| `API_PORT` | int | `8000` | API listen port |
+| `CORS_ORIGINS` | list | `["http://localhost:3000", "http://127.0.0.1:3000"]` | Allowed CORS origins |
+
+## API Reference
+
+All endpoints are prefixed with `/api/`. The dashboard exposes **74+ endpoints** across 20 routers.
+
+### Health
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/health` | Health check — returns ClickHouse connection status |
+
+---
+
+### Metrics (`/api/metrics`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/metrics` | Global dashboard metrics: detection counts by threat level, unique IPs, time series |
+| GET | `/api/metrics/threats` | Threat distribution summary |
+| GET | `/api/metrics/baseline` | Human baseline statistics |
+
+---
+
+### Detections (`/api/detections`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/detections` | Paginated detection list with filtering, sorting, and text search |
+| GET | `/api/detections/{detection_id}` | Single detection details |
+
+**Query Parameters** (GET `/api/detections`):
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `page` | int | Page number (default: 1) |
+| `page_size` | int | Items per page (default: 20) |
+| `threat_level` | string | Filter by threat level |
+| `model_name` | string | Filter by model name |
+| `search` | string | Full-text search across IP, JA4, host, bot_name |
+| `sort_by` | string | Sort field |
+| `sort_order` | string | `asc` or `desc` |
+
+---
+
+### Investigation (`/api/investigation`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/investigation/{ip}/summary` | **Primary investigation endpoint.** Aggregates ML score, brute-force, TCP spoofing, JA4 rotation, persistence, and 24h timeline into a single response with a `risk_score` (0–100) |
+
+---
+
+### Reputation (`/api/reputation`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/reputation/ip/{ip_address}` | Full IP reputation from IP-API.com and IPinfo.io (proxy, VPN, Tor, hosting detection) |
+| GET | `/api/reputation/ip/{ip_address}/summary` | Simplified reputation summary |
+
+---
+
+### Analysis (`/api/analysis`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/analysis/{ip}/subnet` | Subnet analysis for an IP (related IPs in same /24) |
+| GET | `/api/analysis/{ip}/country` | Country-level analysis for an IP |
+| GET | `/api/analysis/country` | Global country analysis across all detections |
+| GET | `/api/analysis/{ip}/ja4` | JA4 fingerprint analysis for an IP |
+| GET | `/api/analysis/{ip}/user-agents` | User-agent analysis for an IP |
+| GET | `/api/analysis/{ip}/recommendation` | SOC classification recommendation |
+| POST | `/api/analysis/classifications` | Create a classification (legitimate/suspicious/malicious) |
+| GET | `/api/analysis/classifications` | List all classifications |
+| GET | `/api/analysis/classifications/stats` | Classification statistics |
+
+---
+
+### Entities (`/api/entities`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/entities/types` | List available entity types |
+| GET | `/api/entities/subnet/{subnet}` | Investigate a subnet |
+| GET | `/api/entities/{entity_type}/{entity_value}` | Investigate any entity (IP, JA4, subnet, UA, host) |
+| GET | `/api/entities/{entity_type}/{entity_value}/related` | Related entities |
+| GET | `/api/entities/{entity_type}/{entity_value}/user_agents` | User-agents for entity |
+| GET | `/api/entities/{entity_type}/{entity_value}/client_headers` | Client headers for entity |
+| GET | `/api/entities/{entity_type}/{entity_value}/paths` | URL paths for entity |
+| GET | `/api/entities/{entity_type}/{entity_value}/query_params` | Query parameters for entity |
+
+---
+
+### Incidents (`/api/incidents`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/incidents` | List all incidents |
+| GET | `/api/incidents/clusters` | Active incident clusters (behavioral similarity grouping) |
+| GET | `/api/incidents/{cluster_id}` | Incident cluster details |
+| POST | `/api/incidents/{cluster_id}/classify` | Classify an incident cluster |
+
+---
+
+### Fingerprints (`/api/fingerprints`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/fingerprints/spoofing` | TLS fingerprint spoofing detection |
+| GET | `/api/fingerprints/ja4-ua-matrix` | JA4 ↔ User-Agent correlation matrix |
+| GET | `/api/fingerprints/ua-analysis` | Suspicious user-agent analysis |
+| GET | `/api/fingerprints/ip/{ip}/coherence` | Fingerprint coherence analysis per IP |
+| GET | `/api/fingerprints/legitimate-ja4` | Known legitimate JA4 fingerprints |
+| GET | `/api/fingerprints/asn-correlation` | JA4-ASN correlation analysis |
+
+---
+
+### Brute Force (`/api/bruteforce`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/bruteforce/targets` | Brute-force target hosts |
+| GET | `/api/bruteforce/attackers` | Brute-force attacker IPs |
+| GET | `/api/bruteforce/timeline` | Brute-force attack timeline |
+| GET | `/api/bruteforce/host/{host}/attackers` | Attackers for a specific host |
+
+---
+
+### TCP Spoofing (`/api/tcp-spoofing`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/tcp-spoofing/overview` | TCP/OS fingerprint spoofing overview |
+| GET | `/api/tcp-spoofing/list` | Spoofing detection list |
+| GET | `/api/tcp-spoofing/matrix` | TTL × MSS anomaly matrix |
+
+---
+
+### Header Fingerprint (`/api/headers`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/headers/clusters` | Header fingerprint clusters (suspicious patterns) |
+| GET | `/api/headers/cluster/{hash}/ips` | IPs sharing a header fingerprint |
+
+---
+
+### Heatmap (`/api/heatmap`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/heatmap/hourly` | Hourly traffic heatmap |
+| GET | `/api/heatmap/top-hosts` | Top hosts by traffic volume |
+| GET | `/api/heatmap/matrix` | Activity/hour matrix |
+
+---
+
+### Botnets (`/api/botnets`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/botnets/ja4-spread` | JA4 geographic spread (botnet indicator) |
+| GET | `/api/botnets/ja4/{ja4}/countries` | Country distribution for a JA4 fingerprint |
+| GET | `/api/botnets/summary` | Global botnet detection summary |
+
+---
+
+### Rotation (`/api/rotation`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/rotation/ja4-rotators` | IPs rotating JA4 fingerprints (evasion detection) |
+| GET | `/api/rotation/persistent-threats` | Persistent threats across time windows |
+| GET | `/api/rotation/ip/{ip}/ja4-history` | JA4 fingerprint history for an IP |
+| GET | `/api/rotation/sophistication` | Sophistication score analysis |
+| GET | `/api/rotation/proactive-hunt` | Proactive threat hunting suggestions |
+
+---
+
+### ML Features (`/api/ml`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/ml/top-anomalies` | Top anomalies with feature details |
+| GET | `/api/ml/ip/{ip}/radar` | Feature radar chart data for an IP |
+| GET | `/api/ml/score-distribution` | Anomaly score distribution histogram |
+| GET | `/api/ml/score-trends` | Score trends over time |
+| GET | `/api/ml/b-features` | Source B (TCP/TLS) feature analysis |
+| GET | `/api/ml/campaigns` | ML-detected campaign analysis |
+| GET | `/api/ml/scatter` | Feature scatter plot data |
+
+---
+
+### Attributes (`/api/attributes`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/attributes/{attr_type}` | List distinct values for an attribute (ja4, user_agent, asn, country, host) with counts |
+
+---
+
+### Variability (`/api/variability`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/variability/{attr_type}/{value}` | Behavioral variability analysis for an attribute value |
+| GET | `/api/variability/{attr_type}/{value}/ips` | IPs associated with an attribute value |
+| GET | `/api/variability/{attr_type}/{value}/attributes` | Attribute breakdown for a value |
+| GET | `/api/variability/{attr_type}/{value}/user_agents` | User-agents for an attribute value |
+
+---
+
+### Clustering (`/api/clustering`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/clustering/status` | Clustering cache status |
+| GET | `/api/clustering/clusters` | K-Means cluster list |
+| GET | `/api/clustering/cluster/{cluster_id}/points` | Data points in a cluster |
+| GET | `/api/clustering/cluster/{cluster_id}/ips` | IPs in a cluster |
+
+---
+
+### Search (`/api/search`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/search/quick` | Cross-entity search (IP, JA4, host, UA, country, ASN) |
+
+---
+
+### Audit (`/api/audit`)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| POST | `/api/audit/logs` | Create an audit log entry |
+| GET | `/api/audit/logs` | Query audit logs (filtered, paginated) |
+| GET | `/api/audit/stats` | Audit statistics |
+| GET | `/api/audit/users/activity` | Per-user activity summary |
+
+## Frontend Structure
+
+The React frontend is built with Vite and served as static assets:
+
+- **Entry point**: `/` → `frontend/dist/index.html`
+- **Static assets**: `/assets/*` → `frontend/dist/assets/`
+- **SPA routing**: All non-`/api/` paths fall through to `index.html` (React Router)
+- **API proxy**: Frontend calls `/api/*` which is handled by FastAPI routers
+
+## Services
+
+### IPReputationService
+
+Queries public IP reputation databases (IP-API.com, IPinfo.io) without API keys:
+- Proxy/VPN/Tor detection
+- ASN, country, ISP information
+- Hosting provider identification
+
+### ClusteringEngine
+
+K-Means clustering on ML features with caching:
+- Automatic cluster count selection
+- Feature normalization via StandardScaler
+- In-memory cache with TTL
+
+## Deployment
+
+```bash
+# Build Docker image
+make build-dashboard
+
+# Run tests
+make test-dashboard
+
+# Run locally (development)
+cd services/dashboard
+uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000
+```
+
+### Health Check
+
+```
+GET /health → {"status": "healthy", "clickhouse": "connected"}
+```