Files

toto 9f3e0621e5 feat: split ClickHouse into dual configurable databases (ja4_logs / ja4_processing)

Architecture:
- ja4_logs: raw log ingestion (http_logs_raw, http_logs, mv_http_logs)
- ja4_processing: analytics, aggregation, ML, dictionaries, audit

Configuration (env vars):
- CLICKHOUSE_DB_LOGS (default: ja4_logs)
- CLICKHOUSE_DB_PROCESSING (default: ja4_processing)

Changes:
- SQL migrations (10 files): all mabase_prod refs → ja4_logs or ja4_processing
  with correct cross-database references (MVs, views, dicts)
- deploy_schema.sh: substitutes DB names from env vars at deploy time
- Python shared settings: added CLICKHOUSE_DB_LOGS + CLICKHOUSE_DB_PROCESSING
- Dashboard routes (19 files): replaced ~80 hardcoded mabase_prod refs
  with settings.CLICKHOUSE_DB_LOGS / settings.CLICKHOUSE_DB_PROCESSING
- Bot-detector: DB → CLICKHOUSE_DB_PROCESSING, fetch_rules.py configurable
- Correlator: DSN example updated to ja4_logs
- Docker-compose + .env files: new env vars with defaults
- All documentation updated (14 markdown files)

All tests pass: sentinel 10/10, correlator 67.1%, bot-detector 11, dashboard 20, ja4_common 18

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-04-07 19:10:35 +02:00

11 KiB

Raw Blame History

Dashboard

The dashboard is a SOC (Security Operations Center) web application built with FastAPI (backend) and React (frontend) that provides real-time visualization, investigation, and analysis of bot detections generated by the bot-detector. It queries ClickHouse (ja4_processing) for all data.

Technology Stack

Component	Technology
Backend	Python 3.11 + FastAPI
Frontend	React + Vite
Database	ClickHouse (via `ja4_common` shared client)
API Docs	Swagger UI (`/docs`) and ReDoc (`/redoc`)

Configuration

Variable	Type	Default	Description
`CLICKHOUSE_HOST`	string	`clickhouse`	ClickHouse hostname
`CLICKHOUSE_PORT`	int	`8123`	ClickHouse HTTP port
`CLICKHOUSE_DB`	string	`ja4_processing`	Database name
`CLICKHOUSE_USER`	string	`admin`	ClickHouse user
`CLICKHOUSE_PASSWORD`	string	`""`	ClickHouse password
`API_HOST`	string	`0.0.0.0`	API listen address
`API_PORT`	int	`8000`	API listen port
`CORS_ORIGINS`	list	`["http://localhost:3000", "http://127.0.0.1:3000"]`	Allowed CORS origins

API Reference

All endpoints are prefixed with /api/. The dashboard exposes 74+ endpoints across 20 routers.

Health

Method	Path	Description
GET	`/health`	Health check — returns ClickHouse connection status

Metrics (`/api/metrics`)

Method	Path	Description
GET	`/api/metrics`	Global dashboard metrics: detection counts by threat level, unique IPs, time series
GET	`/api/metrics/threats`	Threat distribution summary
GET	`/api/metrics/baseline`	Human baseline statistics

Detections (`/api/detections`)

Method	Path	Description
GET	`/api/detections`	Paginated detection list with filtering, sorting, and text search
GET	`/api/detections/{detection_id}`	Single detection details

Query Parameters (GET /api/detections):

Parameter	Type	Description
`page`	int	Page number (default: 1)
`page_size`	int	Items per page (default: 20)
`threat_level`	string	Filter by threat level
`model_name`	string	Filter by model name
`search`	string	Full-text search across IP, JA4, host, bot_name
`sort_by`	string	Sort field
`sort_order`	string	`asc` or `desc`

Investigation (`/api/investigation`)

Method	Path	Description
GET	`/api/investigation/{ip}/summary`	Primary investigation endpoint. Aggregates ML score, brute-force, TCP spoofing, JA4 rotation, persistence, and 24h timeline into a single response with a `risk_score` (0–100)

Reputation (`/api/reputation`)

Method	Path	Description
GET	`/api/reputation/ip/{ip_address}`	Full IP reputation from IP-API.com and IPinfo.io (proxy, VPN, Tor, hosting detection)
GET	`/api/reputation/ip/{ip_address}/summary`	Simplified reputation summary

Analysis (`/api/analysis`)

Method	Path	Description
GET	`/api/analysis/{ip}/subnet`	Subnet analysis for an IP (related IPs in same /24)
GET	`/api/analysis/{ip}/country`	Country-level analysis for an IP
GET	`/api/analysis/country`	Global country analysis across all detections
GET	`/api/analysis/{ip}/ja4`	JA4 fingerprint analysis for an IP
GET	`/api/analysis/{ip}/user-agents`	User-agent analysis for an IP
GET	`/api/analysis/{ip}/recommendation`	SOC classification recommendation
POST	`/api/analysis/classifications`	Create a classification (legitimate/suspicious/malicious)
GET	`/api/analysis/classifications`	List all classifications
GET	`/api/analysis/classifications/stats`	Classification statistics

Entities (`/api/entities`)

Method	Path	Description
GET	`/api/entities/types`	List available entity types
GET	`/api/entities/subnet/{subnet}`	Investigate a subnet
GET	`/api/entities/{entity_type}/{entity_value}`	Investigate any entity (IP, JA4, subnet, UA, host)
GET	`/api/entities/{entity_type}/{entity_value}/related`	Related entities
GET	`/api/entities/{entity_type}/{entity_value}/user_agents`	User-agents for entity
GET	`/api/entities/{entity_type}/{entity_value}/client_headers`	Client headers for entity
GET	`/api/entities/{entity_type}/{entity_value}/paths`	URL paths for entity
GET	`/api/entities/{entity_type}/{entity_value}/query_params`	Query parameters for entity

Incidents (`/api/incidents`)

Method	Path	Description
GET	`/api/incidents`	List all incidents
GET	`/api/incidents/clusters`	Active incident clusters (behavioral similarity grouping)
GET	`/api/incidents/{cluster_id}`	Incident cluster details
POST	`/api/incidents/{cluster_id}/classify`	Classify an incident cluster

Fingerprints (`/api/fingerprints`)

Method	Path	Description
GET	`/api/fingerprints/spoofing`	TLS fingerprint spoofing detection
GET	`/api/fingerprints/ja4-ua-matrix`	JA4 ↔ User-Agent correlation matrix
GET	`/api/fingerprints/ua-analysis`	Suspicious user-agent analysis
GET	`/api/fingerprints/ip/{ip}/coherence`	Fingerprint coherence analysis per IP
GET	`/api/fingerprints/legitimate-ja4`	Known legitimate JA4 fingerprints
GET	`/api/fingerprints/asn-correlation`	JA4-ASN correlation analysis

Brute Force (`/api/bruteforce`)

Method	Path	Description
GET	`/api/bruteforce/targets`	Brute-force target hosts
GET	`/api/bruteforce/attackers`	Brute-force attacker IPs
GET	`/api/bruteforce/timeline`	Brute-force attack timeline
GET	`/api/bruteforce/host/{host}/attackers`	Attackers for a specific host

TCP Spoofing (`/api/tcp-spoofing`)

Method	Path	Description
GET	`/api/tcp-spoofing/overview`	TCP/OS fingerprint spoofing overview
GET	`/api/tcp-spoofing/list`	Spoofing detection list
GET	`/api/tcp-spoofing/matrix`	TTL × MSS anomaly matrix

Header Fingerprint (`/api/headers`)

Method	Path	Description
GET	`/api/headers/clusters`	Header fingerprint clusters (suspicious patterns)
GET	`/api/headers/cluster/{hash}/ips`	IPs sharing a header fingerprint

Heatmap (`/api/heatmap`)

Method	Path	Description
GET	`/api/heatmap/hourly`	Hourly traffic heatmap
GET	`/api/heatmap/top-hosts`	Top hosts by traffic volume
GET	`/api/heatmap/matrix`	Activity/hour matrix

Botnets (`/api/botnets`)

Method	Path	Description
GET	`/api/botnets/ja4-spread`	JA4 geographic spread (botnet indicator)
GET	`/api/botnets/ja4/{ja4}/countries`	Country distribution for a JA4 fingerprint
GET	`/api/botnets/summary`	Global botnet detection summary

Rotation (`/api/rotation`)

Method	Path	Description
GET	`/api/rotation/ja4-rotators`	IPs rotating JA4 fingerprints (evasion detection)
GET	`/api/rotation/persistent-threats`	Persistent threats across time windows
GET	`/api/rotation/ip/{ip}/ja4-history`	JA4 fingerprint history for an IP
GET	`/api/rotation/sophistication`	Sophistication score analysis
GET	`/api/rotation/proactive-hunt`	Proactive threat hunting suggestions

ML Features (`/api/ml`)

Method	Path	Description
GET	`/api/ml/top-anomalies`	Top anomalies with feature details
GET	`/api/ml/ip/{ip}/radar`	Feature radar chart data for an IP
GET	`/api/ml/score-distribution`	Anomaly score distribution histogram
GET	`/api/ml/score-trends`	Score trends over time
GET	`/api/ml/b-features`	Source B (TCP/TLS) feature analysis
GET	`/api/ml/campaigns`	ML-detected campaign analysis
GET	`/api/ml/scatter`	Feature scatter plot data

Attributes (`/api/attributes`)

Method	Path	Description
GET	`/api/attributes/{attr_type}`	List distinct values for an attribute (ja4, user_agent, asn, country, host) with counts

Variability (`/api/variability`)

Method	Path	Description
GET	`/api/variability/{attr_type}/{value}`	Behavioral variability analysis for an attribute value
GET	`/api/variability/{attr_type}/{value}/ips`	IPs associated with an attribute value
GET	`/api/variability/{attr_type}/{value}/attributes`	Attribute breakdown for a value
GET	`/api/variability/{attr_type}/{value}/user_agents`	User-agents for an attribute value

Clustering (`/api/clustering`)

Method	Path	Description
GET	`/api/clustering/status`	Clustering cache status
GET	`/api/clustering/clusters`	K-Means cluster list
GET	`/api/clustering/cluster/{cluster_id}/points`	Data points in a cluster
GET	`/api/clustering/cluster/{cluster_id}/ips`	IPs in a cluster

Search (`/api/search`)

Method	Path	Description
GET	`/api/search/quick`	Cross-entity search (IP, JA4, host, UA, country, ASN)

Audit (`/api/audit`)

Method	Path	Description
POST	`/api/audit/logs`	Create an audit log entry
GET	`/api/audit/logs`	Query audit logs (filtered, paginated)
GET	`/api/audit/stats`	Audit statistics
GET	`/api/audit/users/activity`	Per-user activity summary

Frontend Structure

The React frontend is built with Vite and served as static assets:

Entry point: / → frontend/dist/index.html
Static assets: /assets/* → frontend/dist/assets/
SPA routing: All non-/api/ paths fall through to index.html (React Router)
API proxy: Frontend calls /api/* which is handled by FastAPI routers

Services

IPReputationService

Queries public IP reputation databases (IP-API.com, IPinfo.io) without API keys:

Proxy/VPN/Tor detection
ASN, country, ISP information
Hosting provider identification

ClusteringEngine

K-Means clustering on ML features with caching:

Automatic cluster count selection
Feature normalization via StandardScaler
In-memory cache with TTL

Deployment

# Build Docker image
make build-dashboard

# Run tests
make test-dashboard

# Run locally (development)
cd services/dashboard
uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000

Health Check

GET /health → {"status": "healthy", "clickhouse": "connected"}

11 KiB Raw Blame History Unescape Escape

Dashboard

Technology Stack

Configuration

API Reference

Health

Metrics (/api/metrics)

Detections (/api/detections)

Investigation (/api/investigation)

Reputation (/api/reputation)

Analysis (/api/analysis)

Entities (/api/entities)

Incidents (/api/incidents)

Fingerprints (/api/fingerprints)

Brute Force (/api/bruteforce)

TCP Spoofing (/api/tcp-spoofing)

Header Fingerprint (/api/headers)

Heatmap (/api/heatmap)

Botnets (/api/botnets)

Rotation (/api/rotation)

ML Features (/api/ml)

Attributes (/api/attributes)

Variability (/api/variability)

Clustering (/api/clustering)

Search (/api/search)

Audit (/api/audit)