feat: ja4-platform monorepo — 5 services unified, tests & RPM builds standardized

Services:
- ja4sentinel: TLS/JA4 fingerprint capture daemon (Go, libpcap)
- logcorrelator: JA4 log correlation engine (Go, ClickHouse)
- mod_reqin_log: Apache module (C, JSON request logging)
- bot_detector: ML bot detection pipeline (Python)
- dashboard: FastAPI/Streamlit analytics UI (Python)

Shared libraries:
- shared/go/ja4common: logger, config, shutdown, ipfilter (Go module)
- shared/python/ja4_common: ClickHouseClient, ClickHouseSettings (Python package)
- shared/clickhouse/: canonical SQL migrations (10 files)

Build & packaging:
- Unified 3-stage Dockerfile.package for Go RPMs (el8/el9/el10)
- go.work workspace linking sentinel, correlator, ja4common
- Makefile with test-all, build-all, rpm-* targets

Fixes applied:
- go.work: 1.21 → 1.24.6 (required by sentinel)
- correlator Dockerfiles: golang:1.21 → golang:1.24
- replace directives in go.mod for ja4common local path
- pyproject.toml: setuptools.backends → setuptools.build_meta
- Removed static libpcap linking (unavailable on Rocky 9)
- Fixed data races in output/writers_test.go (sync.Mutex + atomic.Int32)
- Rewrote corrupted test files (logger_test.go × 2)

Test coverage:
- correlator: 67.1% total (unixsocket 80.5%, config 91.7%, app 83.3%, multi 87.7%, stdout 100%)
- sentinel: all 10 packages pass (api, capture, config, fingerprint, ipfilter, logging, output, tlsparse)

Documentation:
- README.md + docs/ (architecture, development, 5 services, shared libs, DB schema & migrations)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
toto
2026-04-07 16:42:59 +02:00
commit d469e39da7
278 changed files with 1621301 additions and 0 deletions

308
docs/services/dashboard.md Normal file
View File

@ -0,0 +1,308 @@
# Dashboard
The dashboard is a SOC (Security Operations Center) web application built with FastAPI (backend) and React (frontend) that provides real-time visualization, investigation, and analysis of bot detections generated by the [bot-detector](bot-detector.md). It queries ClickHouse (`mabase_prod`) for all data.
## Technology Stack
| Component | Technology |
|-----------|-----------|
| Backend | Python 3.11 + FastAPI |
| Frontend | React + Vite |
| Database | ClickHouse (via `ja4_common` shared client) |
| API Docs | Swagger UI (`/docs`) and ReDoc (`/redoc`) |
## Configuration
| Variable | Type | Default | Description |
|----------|------|---------|-------------|
| `CLICKHOUSE_HOST` | string | `clickhouse` | ClickHouse hostname |
| `CLICKHOUSE_PORT` | int | `8123` | ClickHouse HTTP port |
| `CLICKHOUSE_DB` | string | `mabase_prod` | Database name |
| `CLICKHOUSE_USER` | string | `admin` | ClickHouse user |
| `CLICKHOUSE_PASSWORD` | string | `""` | ClickHouse password |
| `API_HOST` | string | `0.0.0.0` | API listen address |
| `API_PORT` | int | `8000` | API listen port |
| `CORS_ORIGINS` | list | `["http://localhost:3000", "http://127.0.0.1:3000"]` | Allowed CORS origins |
## API Reference
All endpoints are prefixed with `/api/`. The dashboard exposes **74+ endpoints** across 20 routers.
### Health
| Method | Path | Description |
|--------|------|-------------|
| GET | `/health` | Health check — returns ClickHouse connection status |
---
### Metrics (`/api/metrics`)
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/metrics` | Global dashboard metrics: detection counts by threat level, unique IPs, time series |
| GET | `/api/metrics/threats` | Threat distribution summary |
| GET | `/api/metrics/baseline` | Human baseline statistics |
---
### Detections (`/api/detections`)
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/detections` | Paginated detection list with filtering, sorting, and text search |
| GET | `/api/detections/{detection_id}` | Single detection details |
**Query Parameters** (GET `/api/detections`):
| Parameter | Type | Description |
|-----------|------|-------------|
| `page` | int | Page number (default: 1) |
| `page_size` | int | Items per page (default: 20) |
| `threat_level` | string | Filter by threat level |
| `model_name` | string | Filter by model name |
| `search` | string | Full-text search across IP, JA4, host, bot_name |
| `sort_by` | string | Sort field |
| `sort_order` | string | `asc` or `desc` |
---
### Investigation (`/api/investigation`)
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/investigation/{ip}/summary` | **Primary investigation endpoint.** Aggregates ML score, brute-force, TCP spoofing, JA4 rotation, persistence, and 24h timeline into a single response with a `risk_score` (0100) |
---
### Reputation (`/api/reputation`)
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/reputation/ip/{ip_address}` | Full IP reputation from IP-API.com and IPinfo.io (proxy, VPN, Tor, hosting detection) |
| GET | `/api/reputation/ip/{ip_address}/summary` | Simplified reputation summary |
---
### Analysis (`/api/analysis`)
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/analysis/{ip}/subnet` | Subnet analysis for an IP (related IPs in same /24) |
| GET | `/api/analysis/{ip}/country` | Country-level analysis for an IP |
| GET | `/api/analysis/country` | Global country analysis across all detections |
| GET | `/api/analysis/{ip}/ja4` | JA4 fingerprint analysis for an IP |
| GET | `/api/analysis/{ip}/user-agents` | User-agent analysis for an IP |
| GET | `/api/analysis/{ip}/recommendation` | SOC classification recommendation |
| POST | `/api/analysis/classifications` | Create a classification (legitimate/suspicious/malicious) |
| GET | `/api/analysis/classifications` | List all classifications |
| GET | `/api/analysis/classifications/stats` | Classification statistics |
---
### Entities (`/api/entities`)
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/entities/types` | List available entity types |
| GET | `/api/entities/subnet/{subnet}` | Investigate a subnet |
| GET | `/api/entities/{entity_type}/{entity_value}` | Investigate any entity (IP, JA4, subnet, UA, host) |
| GET | `/api/entities/{entity_type}/{entity_value}/related` | Related entities |
| GET | `/api/entities/{entity_type}/{entity_value}/user_agents` | User-agents for entity |
| GET | `/api/entities/{entity_type}/{entity_value}/client_headers` | Client headers for entity |
| GET | `/api/entities/{entity_type}/{entity_value}/paths` | URL paths for entity |
| GET | `/api/entities/{entity_type}/{entity_value}/query_params` | Query parameters for entity |
---
### Incidents (`/api/incidents`)
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/incidents` | List all incidents |
| GET | `/api/incidents/clusters` | Active incident clusters (behavioral similarity grouping) |
| GET | `/api/incidents/{cluster_id}` | Incident cluster details |
| POST | `/api/incidents/{cluster_id}/classify` | Classify an incident cluster |
---
### Fingerprints (`/api/fingerprints`)
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/fingerprints/spoofing` | TLS fingerprint spoofing detection |
| GET | `/api/fingerprints/ja4-ua-matrix` | JA4 ↔ User-Agent correlation matrix |
| GET | `/api/fingerprints/ua-analysis` | Suspicious user-agent analysis |
| GET | `/api/fingerprints/ip/{ip}/coherence` | Fingerprint coherence analysis per IP |
| GET | `/api/fingerprints/legitimate-ja4` | Known legitimate JA4 fingerprints |
| GET | `/api/fingerprints/asn-correlation` | JA4-ASN correlation analysis |
---
### Brute Force (`/api/bruteforce`)
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/bruteforce/targets` | Brute-force target hosts |
| GET | `/api/bruteforce/attackers` | Brute-force attacker IPs |
| GET | `/api/bruteforce/timeline` | Brute-force attack timeline |
| GET | `/api/bruteforce/host/{host}/attackers` | Attackers for a specific host |
---
### TCP Spoofing (`/api/tcp-spoofing`)
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/tcp-spoofing/overview` | TCP/OS fingerprint spoofing overview |
| GET | `/api/tcp-spoofing/list` | Spoofing detection list |
| GET | `/api/tcp-spoofing/matrix` | TTL × MSS anomaly matrix |
---
### Header Fingerprint (`/api/headers`)
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/headers/clusters` | Header fingerprint clusters (suspicious patterns) |
| GET | `/api/headers/cluster/{hash}/ips` | IPs sharing a header fingerprint |
---
### Heatmap (`/api/heatmap`)
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/heatmap/hourly` | Hourly traffic heatmap |
| GET | `/api/heatmap/top-hosts` | Top hosts by traffic volume |
| GET | `/api/heatmap/matrix` | Activity/hour matrix |
---
### Botnets (`/api/botnets`)
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/botnets/ja4-spread` | JA4 geographic spread (botnet indicator) |
| GET | `/api/botnets/ja4/{ja4}/countries` | Country distribution for a JA4 fingerprint |
| GET | `/api/botnets/summary` | Global botnet detection summary |
---
### Rotation (`/api/rotation`)
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/rotation/ja4-rotators` | IPs rotating JA4 fingerprints (evasion detection) |
| GET | `/api/rotation/persistent-threats` | Persistent threats across time windows |
| GET | `/api/rotation/ip/{ip}/ja4-history` | JA4 fingerprint history for an IP |
| GET | `/api/rotation/sophistication` | Sophistication score analysis |
| GET | `/api/rotation/proactive-hunt` | Proactive threat hunting suggestions |
---
### ML Features (`/api/ml`)
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/ml/top-anomalies` | Top anomalies with feature details |
| GET | `/api/ml/ip/{ip}/radar` | Feature radar chart data for an IP |
| GET | `/api/ml/score-distribution` | Anomaly score distribution histogram |
| GET | `/api/ml/score-trends` | Score trends over time |
| GET | `/api/ml/b-features` | Source B (TCP/TLS) feature analysis |
| GET | `/api/ml/campaigns` | ML-detected campaign analysis |
| GET | `/api/ml/scatter` | Feature scatter plot data |
---
### Attributes (`/api/attributes`)
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/attributes/{attr_type}` | List distinct values for an attribute (ja4, user_agent, asn, country, host) with counts |
---
### Variability (`/api/variability`)
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/variability/{attr_type}/{value}` | Behavioral variability analysis for an attribute value |
| GET | `/api/variability/{attr_type}/{value}/ips` | IPs associated with an attribute value |
| GET | `/api/variability/{attr_type}/{value}/attributes` | Attribute breakdown for a value |
| GET | `/api/variability/{attr_type}/{value}/user_agents` | User-agents for an attribute value |
---
### Clustering (`/api/clustering`)
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/clustering/status` | Clustering cache status |
| GET | `/api/clustering/clusters` | K-Means cluster list |
| GET | `/api/clustering/cluster/{cluster_id}/points` | Data points in a cluster |
| GET | `/api/clustering/cluster/{cluster_id}/ips` | IPs in a cluster |
---
### Search (`/api/search`)
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/search/quick` | Cross-entity search (IP, JA4, host, UA, country, ASN) |
---
### Audit (`/api/audit`)
| Method | Path | Description |
|--------|------|-------------|
| POST | `/api/audit/logs` | Create an audit log entry |
| GET | `/api/audit/logs` | Query audit logs (filtered, paginated) |
| GET | `/api/audit/stats` | Audit statistics |
| GET | `/api/audit/users/activity` | Per-user activity summary |
## Frontend Structure
The React frontend is built with Vite and served as static assets:
- **Entry point**: `/``frontend/dist/index.html`
- **Static assets**: `/assets/*``frontend/dist/assets/`
- **SPA routing**: All non-`/api/` paths fall through to `index.html` (React Router)
- **API proxy**: Frontend calls `/api/*` which is handled by FastAPI routers
## Services
### IPReputationService
Queries public IP reputation databases (IP-API.com, IPinfo.io) without API keys:
- Proxy/VPN/Tor detection
- ASN, country, ISP information
- Hosting provider identification
### ClusteringEngine
K-Means clustering on ML features with caching:
- Automatic cluster count selection
- Feature normalization via StandardScaler
- In-memory cache with TTL
## Deployment
```bash
# Build Docker image
make build-dashboard
# Run tests
make test-dashboard
# Run locally (development)
cd services/dashboard
uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000
```
### Health Check
```
GET /health → {"status": "healthy", "clickhouse": "connected"}
```