feat: ja4-platform monorepo — 5 services unified, tests & RPM builds standardized
Services: - ja4sentinel: TLS/JA4 fingerprint capture daemon (Go, libpcap) - logcorrelator: JA4 log correlation engine (Go, ClickHouse) - mod_reqin_log: Apache module (C, JSON request logging) - bot_detector: ML bot detection pipeline (Python) - dashboard: FastAPI/Streamlit analytics UI (Python) Shared libraries: - shared/go/ja4common: logger, config, shutdown, ipfilter (Go module) - shared/python/ja4_common: ClickHouseClient, ClickHouseSettings (Python package) - shared/clickhouse/: canonical SQL migrations (10 files) Build & packaging: - Unified 3-stage Dockerfile.package for Go RPMs (el8/el9/el10) - go.work workspace linking sentinel, correlator, ja4common - Makefile with test-all, build-all, rpm-* targets Fixes applied: - go.work: 1.21 → 1.24.6 (required by sentinel) - correlator Dockerfiles: golang:1.21 → golang:1.24 - replace directives in go.mod for ja4common local path - pyproject.toml: setuptools.backends → setuptools.build_meta - Removed static libpcap linking (unavailable on Rocky 9) - Fixed data races in output/writers_test.go (sync.Mutex + atomic.Int32) - Rewrote corrupted test files (logger_test.go × 2) Test coverage: - correlator: 67.1% total (unixsocket 80.5%, config 91.7%, app 83.3%, multi 87.7%, stdout 100%) - sentinel: all 10 packages pass (api, capture, config, fingerprint, ipfilter, logging, output, tlsparse) Documentation: - README.md + docs/ (architecture, development, 5 services, shared libs, DB schema & migrations) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
35
.gitignore
vendored
Normal file
35
.gitignore
vendored
Normal file
@ -0,0 +1,35 @@
|
||||
# Secrets — never commit
|
||||
.env
|
||||
.env.*
|
||||
!.env.example
|
||||
*.env
|
||||
|
||||
# Python
|
||||
__pycache__/
|
||||
*.pyc
|
||||
*.pyo
|
||||
.pytest_cache/
|
||||
*.egg-info/
|
||||
dist/
|
||||
build/
|
||||
.coverage
|
||||
coverage.xml
|
||||
htmlcov/
|
||||
|
||||
# Go
|
||||
*.test
|
||||
coverage.out
|
||||
coverage.html
|
||||
|
||||
# Node
|
||||
node_modules/
|
||||
frontend/dist/
|
||||
|
||||
# Models and logs (runtime artifacts)
|
||||
bot_detector_models/
|
||||
bot_detector_logs/
|
||||
|
||||
# IDE
|
||||
.vscode/
|
||||
.idea/
|
||||
*.swp
|
||||
129
Makefile
Normal file
129
Makefile
Normal file
@ -0,0 +1,129 @@
|
||||
# =============================================================================
|
||||
# ja4-platform — Monorepo Makefile
|
||||
# All targets use new service names:
|
||||
# sentinel, correlator, bot-detector, dashboard, mod-reqin-log
|
||||
# =============================================================================
|
||||
|
||||
.PHONY: build-all test-all rpm-all dist \
|
||||
build-sentinel test-sentinel rpm-sentinel \
|
||||
test-mod-reqin-log rpm-mod-reqin-log \
|
||||
build-correlator test-correlator rpm-correlator \
|
||||
build-bot-detector test-bot-detector \
|
||||
build-dashboard test-dashboard \
|
||||
test-ja4common-python
|
||||
|
||||
# --- Root -------------------------------------------------------------------
|
||||
|
||||
build-all: build-sentinel build-correlator build-bot-detector build-dashboard
|
||||
@echo "All services built."
|
||||
|
||||
test-all: test-sentinel test-correlator test-bot-detector test-dashboard test-ja4common-python
|
||||
@echo "All tests completed."
|
||||
|
||||
rpm-all: rpm-sentinel rpm-correlator rpm-mod-reqin-log
|
||||
@echo "All RPMs built."
|
||||
|
||||
dist: rpm-all
|
||||
@echo "Distribution packages ready in services/*/dist/"
|
||||
|
||||
# --- sentinel (was ja4sentinel) ---------------------------------------------
|
||||
|
||||
build-sentinel:
|
||||
docker build \
|
||||
--build-arg VERSION=$$(git -C services/sentinel describe --tags --always 2>/dev/null || echo dev) \
|
||||
--build-arg GIT_COMMIT=$$(git rev-parse --short HEAD 2>/dev/null || echo unknown) \
|
||||
--build-arg BUILD_TIME=$$(date -u +%Y-%m-%dT%H:%M:%SZ) \
|
||||
-f services/sentinel/Dockerfile \
|
||||
-t ja4-platform/sentinel:latest \
|
||||
.
|
||||
|
||||
test-sentinel:
|
||||
# Tests run inside Docker — no native Go required on the host
|
||||
docker build -f services/sentinel/Dockerfile.dev -t ja4-platform/sentinel-tests:latest .
|
||||
docker run --rm --cap-add=NET_RAW --cap-add=NET_ADMIN ja4-platform/sentinel-tests:latest
|
||||
|
||||
rpm-sentinel:
|
||||
# Méthode: Dockerfile.package → builder Go → rpm-builder (rpmbuild ×3) → output alpine
|
||||
docker build \
|
||||
-f services/sentinel/Dockerfile.package \
|
||||
--target output \
|
||||
--output type=local,dest=services/sentinel/dist \
|
||||
--build-arg VERSION=$(shell git -C services/sentinel describe --tags --always 2>/dev/null || echo dev) \
|
||||
.
|
||||
@echo "📦 RPMs sentinel dans services/sentinel/dist/"
|
||||
|
||||
# --- mod-reqin-log (was mod_reqin_log) --------------------------------------
|
||||
|
||||
test-mod-reqin-log:
|
||||
docker build -f services/mod-reqin-log/Dockerfile.tests -t ja4-platform/mod-reqin-log-tests:latest .
|
||||
docker run --rm ja4-platform/mod-reqin-log-tests:latest
|
||||
|
||||
rpm-mod-reqin-log:
|
||||
# Méthode: Dockerfile.package → builder C (×3 distros) → rpm-builder (rpmbuild ×3) → output alpine
|
||||
docker build \
|
||||
-f services/mod-reqin-log/Dockerfile.package \
|
||||
--target output \
|
||||
--output type=local,dest=services/mod-reqin-log/dist \
|
||||
.
|
||||
@echo "📦 RPMs mod-reqin-log dans services/mod-reqin-log/dist/"
|
||||
|
||||
# --- correlator (was logcorrelator) -----------------------------------------
|
||||
|
||||
build-correlator:
|
||||
docker build \
|
||||
-f services/correlator/Dockerfile \
|
||||
-t ja4-platform/correlator:latest \
|
||||
.
|
||||
|
||||
test-correlator:
|
||||
# Tests run inside the Dockerfile builder stage (80% coverage gate enforced)
|
||||
docker build --target builder -f services/correlator/Dockerfile -t ja4-platform/correlator-tests:latest .
|
||||
|
||||
rpm-correlator:
|
||||
# Méthode: Dockerfile.package → builder Go → rpm-builder (rpmbuild ×3) → output alpine
|
||||
docker build \
|
||||
-f services/correlator/Dockerfile.package \
|
||||
--target output \
|
||||
--output type=local,dest=services/correlator/dist \
|
||||
--build-arg VERSION=$(shell git -C services/correlator describe --tags --always 2>/dev/null || echo dev) \
|
||||
.
|
||||
@echo "📦 RPMs correlator dans services/correlator/dist/"
|
||||
|
||||
# --- bot-detector (was bot_detector) ----------------------------------------
|
||||
|
||||
build-bot-detector:
|
||||
docker build \
|
||||
-f services/bot-detector/bot_detector/Dockerfile \
|
||||
-t ja4-platform/bot-detector:latest \
|
||||
.
|
||||
|
||||
test-bot-detector:
|
||||
docker build \
|
||||
-f services/bot-detector/bot_detector/Dockerfile.tests \
|
||||
-t ja4-platform/bot-detector-tests:latest \
|
||||
.
|
||||
docker run --rm ja4-platform/bot-detector-tests:latest
|
||||
|
||||
# --- dashboard --------------------------------------------------------------
|
||||
|
||||
build-dashboard:
|
||||
docker build \
|
||||
-f services/dashboard/Dockerfile \
|
||||
-t ja4-platform/dashboard:latest \
|
||||
.
|
||||
|
||||
test-dashboard:
|
||||
docker build \
|
||||
-f services/dashboard/Dockerfile.tests \
|
||||
-t ja4-platform/dashboard-tests:latest \
|
||||
.
|
||||
docker run --rm ja4-platform/dashboard-tests:latest
|
||||
|
||||
# --- shared/python/ja4_common -----------------------------------------------
|
||||
|
||||
test-ja4common-python:
|
||||
docker build \
|
||||
-f shared/python/ja4_common/Dockerfile.tests \
|
||||
-t ja4-platform/ja4common-python-tests:latest \
|
||||
shared/python/ja4_common/
|
||||
docker run --rm ja4-platform/ja4common-python-tests:latest
|
||||
123
README.md
Normal file
123
README.md
Normal file
@ -0,0 +1,123 @@
|
||||
# ja4-platform
|
||||
|
||||
**ja4-platform** is a monorepo security pipeline for TLS fingerprinting (JA4/JA3) and bot detection. It captures live network traffic, correlates TLS handshakes with HTTP requests, detects anomalous behavior using machine learning (Isolation Forest), and presents results through a SOC analyst dashboard — all backed by ClickHouse as the central data store.
|
||||
|
||||
## Pipeline Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Linux Server (Apache) │
|
||||
│ │
|
||||
│ ┌─────────────────┐ ┌─────────────────────┐ │
|
||||
│ │ mod-reqin-log │───────▶│ UNIX socket (HTTP) │──┐ │
|
||||
│ │ (Apache module) │ JSON │ /var/run/logcorr/ │ │ │
|
||||
│ │ C · httpd DSO │ │ http.socket │ │ │
|
||||
│ └─────────────────┘ └─────────────────────┘ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────┐ ┌─────────────────────┐ ┌──────────────────┐ │
|
||||
│ │ sentinel │───────▶│ UNIX socket (TLS) │─▶│ correlator │ │
|
||||
│ │ (TLS capture) │ JSON │ /var/run/logcorr/ │ │ (event join) │ │
|
||||
│ │ Go · libpcap │ │ network.socket │ │ Go · hex. arch │ │
|
||||
│ └─────────────────┘ └─────────────────────┘ └────────┬─────────┘ │
|
||||
│ │ │
|
||||
└────────────────────────────────────────────────────────────────┼────────────┘
|
||||
│ INSERT
|
||||
▼
|
||||
┌──────────────────┐
|
||||
│ ClickHouse │
|
||||
│ mabase_prod │
|
||||
│ (all tables) │
|
||||
└────────┬─────────┘
|
||||
│ SELECT
|
||||
┌────────────────────┼────────────────────┐
|
||||
▼ ▼
|
||||
┌──────────────────┐ ┌──────────────────┐
|
||||
│ bot-detector │ │ dashboard │
|
||||
│ (ML anomaly det) │ │ (SOC web UI) │
|
||||
│ Python · sklearn │ │ FastAPI + React │
|
||||
└──────────────────┘ └──────────────────┘
|
||||
```
|
||||
|
||||
## Services
|
||||
|
||||
| Service | Language | Purpose | Interface |
|
||||
|---------|----------|---------|-----------|
|
||||
| [sentinel](docs/services/sentinel.md) | Go | Live TLS packet capture, JA4/JA3 fingerprint generation | UNIX socket (`network.socket`) |
|
||||
| [mod-reqin-log](docs/services/mod-reqin-log.md) | C | Apache HTTPD module, HTTP request JSON logging | UNIX socket (`http.socket`) |
|
||||
| [correlator](docs/services/correlator.md) | Go | Joins HTTP + TLS events by `src_ip:src_port` + time window | ClickHouse INSERT, file, stdout |
|
||||
| [bot-detector](docs/services/bot-detector.md) | Python | Isolation Forest ML anomaly detection on aggregated traffic | ClickHouse read/write, HTTP `:8080` |
|
||||
| [dashboard](docs/services/dashboard.md) | Python/JS | SOC analyst web dashboard (FastAPI + React) | HTTP `:8000` |
|
||||
|
||||
## Shared Libraries
|
||||
|
||||
| Library | Language | Description |
|
||||
|---------|----------|-------------|
|
||||
| [go/ja4common](docs/shared/go-ja4common.md) | Go | Logger, config loader, shutdown handler, IP filter |
|
||||
| [python/ja4_common](docs/shared/python-ja4common.md) | Python | ClickHouse client singleton, settings |
|
||||
|
||||
## Quickstart
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Docker (with BuildKit) and Docker Compose
|
||||
- `make`
|
||||
- No native Go, Python, or C toolchains required — all builds run inside Docker
|
||||
|
||||
### Build All Services
|
||||
|
||||
```bash
|
||||
make build-all
|
||||
```
|
||||
|
||||
### Run All Tests
|
||||
|
||||
```bash
|
||||
make test-all
|
||||
```
|
||||
|
||||
### Build RPM Packages
|
||||
|
||||
```bash
|
||||
make rpm-all
|
||||
# RPMs written to services/<service>/dist/
|
||||
```
|
||||
|
||||
## Documentation
|
||||
|
||||
| Document | Description |
|
||||
|----------|-------------|
|
||||
| [Architecture](docs/architecture.md) | System architecture, data flow, component interactions |
|
||||
| [Development](docs/development.md) | Build, test, package, and extend the platform |
|
||||
| [Database Schema](docs/database/schema.md) | Every ClickHouse table, view, dictionary, and materialized view |
|
||||
| [Database Migrations](docs/database/migrations.md) | Migration order, application, verification, and rollback |
|
||||
|
||||
### Service Documentation
|
||||
|
||||
- [Sentinel](docs/services/sentinel.md) — TLS capture daemon
|
||||
- [mod-reqin-log](docs/services/mod-reqin-log.md) — Apache HTTP logging module
|
||||
- [Correlator](docs/services/correlator.md) — HTTP/TLS event correlation engine
|
||||
- [Bot Detector](docs/services/bot-detector.md) — ML anomaly detection
|
||||
- [Dashboard](docs/services/dashboard.md) — SOC web dashboard and API
|
||||
|
||||
### Shared Library Documentation
|
||||
|
||||
- [go-ja4common](docs/shared/go-ja4common.md) — Go shared library
|
||||
- [python-ja4common](docs/shared/python-ja4common.md) — Python shared library
|
||||
|
||||
## Go Workspace
|
||||
|
||||
The repository uses a Go workspace (`go.work`) to link the Go modules:
|
||||
|
||||
```
|
||||
go 1.21
|
||||
|
||||
use (
|
||||
./services/sentinel
|
||||
./services/correlator
|
||||
./shared/go/ja4common
|
||||
)
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
See individual service directories for license information.
|
||||
162
docs/architecture.md
Normal file
162
docs/architecture.md
Normal file
@ -0,0 +1,162 @@
|
||||
# Architecture
|
||||
|
||||
The ja4-platform is a security pipeline that captures live network traffic, generates JA4/JA3 TLS fingerprints, correlates them with HTTP requests, applies machine-learning anomaly detection, and surfaces results through a SOC analyst dashboard. ClickHouse serves as the central data store linking all services.
|
||||
|
||||
## System Architecture
|
||||
|
||||
```
|
||||
┌───────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Target Linux Server │
|
||||
│ │
|
||||
│ ┌─────────────┐ HTTP req ┌───────────────────────┐ UNIX socket (DGRAM) │
|
||||
│ │ Client │────────────▶│ Apache HTTPD │──────────────┐ │
|
||||
│ │ (browser / │ │ + mod-reqin-log │ │ │
|
||||
│ │ bot) │ └───────────────────────┘ │ │
|
||||
│ │ │ ▼ │
|
||||
│ │ │ TLS CH ┌───────────────────────┐ ┌─────────────────────┐ │
|
||||
│ │ │────────────▶│ sentinel │ │ correlator │ │
|
||||
│ │ │ (pcap) │ (packet capture) │──▶│ (event join) │ │
|
||||
│ └─────────────┘ └───────────────────────┘ └────────┬────────────┘ │
|
||||
│ │ │
|
||||
└────────────────────────────────────────────────────────────────────┼──────────────┘
|
||||
│ INSERT JSON
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ ClickHouse │
|
||||
│ mabase_prod │
|
||||
│ │
|
||||
│ http_logs_raw │
|
||||
│ ──(MV)──▶ http_logs│
|
||||
│ ──(MV)──▶ agg_* │
|
||||
│ view_ai_features │
|
||||
│ ml_detected_anom. │
|
||||
│ ml_all_scores │
|
||||
└──────┬──────┬───────┘
|
||||
│ │
|
||||
┌──────────────────┘ └──────────────────┐
|
||||
▼ ▼
|
||||
┌──────────────────────┐ ┌──────────────────────┐
|
||||
│ bot-detector │ │ dashboard │
|
||||
│ (Python) │ │ (FastAPI + React) │
|
||||
│ │ │ │
|
||||
│ Reads: │ │ Reads: │
|
||||
│ view_ai_features │ │ ml_detected_anom. │
|
||||
│ view_ip_recurrence │ │ ml_all_scores │
|
||||
│ Writes: │ │ http_logs │
|
||||
│ ml_detected_anom. │ │ agg_* tables │
|
||||
│ ml_all_scores │ │ audit_logs │
|
||||
└──────────────────────┘ └──────────────────────┘
|
||||
```
|
||||
|
||||
## Data Flow
|
||||
|
||||
### 1. Capture Phase
|
||||
|
||||
1. **mod-reqin-log** (Apache C module) hooks into `post_read_request`. On each HTTP request, it serializes method, path, headers, client IP/port into JSON and sends it via UNIX datagram socket to `/var/run/logcorrelator/http.socket`.
|
||||
|
||||
2. **sentinel** (Go daemon) uses libpcap to capture live TLS ClientHello packets on configured ports (default: 443, 8443). It extracts IP/TCP metadata, generates JA4 and JA3 fingerprints, and sends the result as JSON via UNIX datagram socket to `/var/run/logcorrelator/network.socket`.
|
||||
|
||||
### 2. Correlation Phase
|
||||
|
||||
3. **correlator** (Go daemon) listens on both UNIX sockets. It buffers incoming events and correlates them by matching `src_ip:src_port` within a configurable time window (default: 10 s). HTTP Keep-Alive connections are supported via `one_to_many` matching mode where a single TLS handshake (source B) is reused for multiple HTTP requests (source A). Correlated events merge HTTP fields (method, path, headers) with TLS fields (JA4, JA3, IP/TCP metadata) into a single `CorrelatedLog` JSON object, which is inserted into `http_logs_raw`.
|
||||
|
||||
### 3. Enrichment Phase (ClickHouse)
|
||||
|
||||
4. **mv_http_logs** materialized view automatically transforms `http_logs_raw` JSON into the structured `http_logs` table, enriching each row with:
|
||||
- ASN/geo data via `dict_iplocate_asn`
|
||||
- Anubis bot identification via `dict_anubis_ua`, `dict_anubis_ip`, `dict_anubis_asn`, `dict_anubis_country`
|
||||
|
||||
5. **mv_agg_host_ip_ja4_1h** and **mv_agg_header_fingerprint_1h** aggregate `http_logs` into 1-hour behavioral windows.
|
||||
|
||||
6. **view_ai_features_1h** joins the two aggregation tables and computes 50+ ML features per `(src_ip, ja4, host)` tuple.
|
||||
|
||||
### 4. Detection Phase
|
||||
|
||||
7. **bot-detector** (Python) runs on a 5-minute cycle:
|
||||
- Reads `view_ai_features_1h` for the last 24 hours
|
||||
- Separates known bots (via reputation dictionaries) from unknown traffic
|
||||
- Trains/loads Isolation Forest models on human-baseline traffic
|
||||
- Scores unknown traffic and writes anomalies to `ml_detected_anomalies` and all scores to `ml_all_scores`
|
||||
|
||||
### 5. Visualization Phase
|
||||
|
||||
8. **dashboard** (FastAPI + React) queries ClickHouse to display detections, feature analysis, investigation summaries, and clustering to SOC analysts.
|
||||
|
||||
## Component Interaction Matrix
|
||||
|
||||
| From → To | mod-reqin-log | sentinel | correlator | ClickHouse | bot-detector | dashboard |
|
||||
|-----------|:---:|:---:|:---:|:---:|:---:|:---:|
|
||||
| **mod-reqin-log** | — | — | UNIX socket (DGRAM) | — | — | — |
|
||||
| **sentinel** | — | — | UNIX socket (DGRAM) | — | — | — |
|
||||
| **correlator** | — | — | — | Native TCP :9000 (INSERT) | — | — |
|
||||
| **ClickHouse** | — | — | — | — | — | — |
|
||||
| **bot-detector** | — | — | — | HTTP :8123 (SELECT/INSERT) | — | — |
|
||||
| **dashboard** | — | — | — | HTTP :8123 (SELECT/INSERT) | — | — |
|
||||
|
||||
## ClickHouse Table Ownership
|
||||
|
||||
| Table/View | Written By | Read By |
|
||||
|------------|-----------|---------|
|
||||
| `http_logs_raw` | correlator | mv_http_logs (MV) |
|
||||
| `http_logs` | mv_http_logs (MV) | mv_agg_*, dashboard |
|
||||
| `agg_host_ip_ja4_1h` | mv_agg_host_ip_ja4_1h (MV) | view_ai_features_1h |
|
||||
| `agg_header_fingerprint_1h` | mv_agg_header_fingerprint_1h (MV) | view_ai_features_1h |
|
||||
| `view_ai_features_1h` | — (view) | bot-detector |
|
||||
| `view_ip_recurrence` | — (view) | bot-detector |
|
||||
| `ml_detected_anomalies` | bot-detector | dashboard |
|
||||
| `ml_all_scores` | bot-detector | dashboard |
|
||||
| `audit_logs` | dashboard | dashboard |
|
||||
|
||||
## Correlation Algorithm
|
||||
|
||||
The correlator joins HTTP events (source A) with TLS/network events (source B) using a two-key correlation:
|
||||
|
||||
1. **Key**: `src_ip + src_port` — the client's source IP and ephemeral port uniquely identify a TCP connection.
|
||||
2. **Time window**: Events must arrive within the configured window (default 10 seconds).
|
||||
3. **Matching mode**:
|
||||
- `one_to_one`: Each B event matches at most one A event (consumed after match).
|
||||
- `one_to_many` (default, Keep-Alive): A single B (TLS handshake) can match multiple A events (HTTP requests) on the same connection. The B event has a configurable TTL (default 120 s) that resets on each match.
|
||||
4. **Orphan handling**: Unmatched A events are emitted after a configurable delay (default 500 ms) with `correlated=false` and `orphan_side=A`.
|
||||
|
||||
## JA4/JA3 Fingerprint Format
|
||||
|
||||
### JA4
|
||||
|
||||
JA4 is a modern TLS fingerprinting format (successor to JA3) with the structure:
|
||||
|
||||
```
|
||||
t{TLS_VER}{SNI}{CIPHER_COUNT}{EXT_COUNT}_{CIPHER_HASH}_{EXT_HASH}
|
||||
```
|
||||
|
||||
Example: `t13d1516h2_8daaf6152771_b0da82dd1658`
|
||||
|
||||
- Prefix `t` = TLS, followed by version (`13` = TLS 1.3)
|
||||
- `d` = SNI present, `i` = SNI absent
|
||||
- Cipher suite count and extension count
|
||||
- SHA-256 truncated hashes of sorted cipher suites and extensions
|
||||
|
||||
### JA3
|
||||
|
||||
JA3 is the original TLS fingerprinting format:
|
||||
|
||||
```
|
||||
{TLS_VER},{CIPHERS},{EXTENSIONS},{ELLIPTIC_CURVES},{EC_POINT_FORMATS}
|
||||
```
|
||||
|
||||
The `ja3_hash` is the MD5 hash of the JA3 string.
|
||||
|
||||
Both fingerprints are generated by sentinel from the TLS ClientHello payload.
|
||||
|
||||
## Technology Stack
|
||||
|
||||
| Component | Technology |
|
||||
|-----------|-----------|
|
||||
| Packet capture | Go + libpcap (gopacket) |
|
||||
| HTTP logging | C Apache module (APR) |
|
||||
| Event correlation | Go (hexagonal architecture) |
|
||||
| ML detection | Python 3.11 + scikit-learn |
|
||||
| Dashboard backend | FastAPI (Python) |
|
||||
| Dashboard frontend | React + Vite |
|
||||
| Data store | ClickHouse |
|
||||
| Deployment | systemd, Docker, RPM |
|
||||
| IPC | UNIX datagram sockets |
|
||||
256
docs/database/migrations.md
Normal file
256
docs/database/migrations.md
Normal file
@ -0,0 +1,256 @@
|
||||
# Database Migrations
|
||||
|
||||
The ClickHouse schema for ja4-platform is managed through numbered SQL migration files in `shared/clickhouse/`. Migrations are idempotent (using `IF NOT EXISTS` / `IF EXISTS`) and must be applied in numeric order.
|
||||
|
||||
## Migration Order
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `00_database.sql` | Creates the `mabase_prod` database |
|
||||
| `01_raw_tables.sql` | Creates `http_logs_raw` ingest table (MergeTree, 1-day TTL) |
|
||||
| `02_dictionaries.sql` | Creates ASN geo dictionary (`dict_iplocate_asn`), bot IP/JA4 reference tables, `ref_bot_networks` |
|
||||
| `03_anubis_tables.sql` | Creates Anubis crawler rule tables (`anubis_ua_rules`, `anubis_ip_rules`, `anubis_asn_rules`, `anubis_country_rules`) and their dictionaries (`dict_anubis_ua`, `dict_anubis_ip`, `dict_anubis_asn`, `dict_anubis_country`) |
|
||||
| `04_mv_http_logs.sql` | Creates the canonical `http_logs` table and `mv_http_logs` materialized view with full Anubis enrichment |
|
||||
| `05_aggregation_tables.sql` | Creates reputation dictionaries (`dict_bot_ip`, `dict_bot_ja4`, `dict_asn_reputation`), behavioral aggregation tables (`agg_host_ip_ja4_1h`, `agg_header_fingerprint_1h`), and their materialized views |
|
||||
| `06_ml_tables.sql` | Creates ML output tables (`ml_detected_anomalies`, `ml_all_scores`) and `view_ip_recurrence` |
|
||||
| `07_ai_features_view.sql` | Creates `view_ai_features_1h` — the 50+ feature view used by bot-detector |
|
||||
| `08_users.sql` | Creates ClickHouse users (`data_writer`, `analyst`) and grants permissions |
|
||||
| `09_audit_table.sql` | Creates `audit_logs` table for SOC dashboard audit trail |
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### 1. ClickHouse Server
|
||||
|
||||
A running ClickHouse server (version 23.8+ recommended for `REGEXP_TREE` dictionary support).
|
||||
|
||||
### 2. CSV Data Files
|
||||
|
||||
Place the following files in `/var/lib/clickhouse/user_files/`:
|
||||
|
||||
| File | Source | Description |
|
||||
|------|--------|-------------|
|
||||
| `iplocate-ip-to-asn.csv` | [IPLocate](https://iplocate.io) | IP-to-ASN mapping with country, org, domain |
|
||||
| `bot_ip.csv` | Custom | Known bot IP prefixes (CIDR format) |
|
||||
| `bot_ja4.csv` | Custom | Known bot JA4 fingerprints |
|
||||
| `asn_reputation.csv` | Custom | ASN reputation labels (`human`, `bot`, `unknown`) |
|
||||
|
||||
### 3. Anubis Passwords
|
||||
|
||||
Migration `03_anubis_tables.sql` contains placeholder passwords (`CHANGE_ME`) for the Anubis dictionaries. Replace these with the actual ClickHouse admin password before applying:
|
||||
|
||||
```bash
|
||||
sed -i "s/CHANGE_ME/your_actual_password/g" 03_anubis_tables.sql
|
||||
```
|
||||
|
||||
## How to Apply
|
||||
|
||||
### Full Initial Setup
|
||||
|
||||
Apply all migrations in order:
|
||||
|
||||
```bash
|
||||
cd shared/clickhouse/
|
||||
|
||||
clickhouse-client --multiquery < 00_database.sql
|
||||
clickhouse-client --multiquery < 01_raw_tables.sql
|
||||
clickhouse-client --multiquery < 02_dictionaries.sql
|
||||
clickhouse-client --multiquery < 03_anubis_tables.sql
|
||||
clickhouse-client --multiquery < 04_mv_http_logs.sql
|
||||
clickhouse-client --multiquery < 05_aggregation_tables.sql
|
||||
clickhouse-client --multiquery < 06_ml_tables.sql
|
||||
clickhouse-client --multiquery < 07_ai_features_view.sql
|
||||
clickhouse-client --multiquery < 08_users.sql
|
||||
clickhouse-client --multiquery < 09_audit_table.sql
|
||||
```
|
||||
|
||||
### With Authentication
|
||||
|
||||
```bash
|
||||
clickhouse-client --user admin --password 'your_password' --multiquery < 00_database.sql
|
||||
# ... repeat for each file
|
||||
```
|
||||
|
||||
### One-Liner (All at Once)
|
||||
|
||||
```bash
|
||||
cd shared/clickhouse/
|
||||
for f in 0*.sql; do
|
||||
echo "Applying $f..."
|
||||
clickhouse-client --multiquery < "$f"
|
||||
done
|
||||
```
|
||||
|
||||
## How to Verify
|
||||
|
||||
After applying all migrations, run these queries to verify each migration was successful:
|
||||
|
||||
### 00 — Database
|
||||
|
||||
```sql
|
||||
SHOW DATABASES LIKE 'mabase_prod';
|
||||
-- Expected: mabase_prod
|
||||
```
|
||||
|
||||
### 01 — Raw Tables
|
||||
|
||||
```sql
|
||||
EXISTS mabase_prod.http_logs_raw;
|
||||
-- Expected: 1
|
||||
```
|
||||
|
||||
### 02 — Dictionaries
|
||||
|
||||
```sql
|
||||
SELECT dictGetOrDefault('mabase_prod.dict_iplocate_asn', 'country_code',
|
||||
toIPv6(toIPv4('8.8.8.8')), 'MISSING');
|
||||
-- Expected: US (if CSV loaded) or MISSING
|
||||
```
|
||||
|
||||
### 03 — Anubis Tables
|
||||
|
||||
```sql
|
||||
EXISTS mabase_prod.anubis_ua_rules;
|
||||
EXISTS mabase_prod.anubis_ip_rules;
|
||||
EXISTS mabase_prod.anubis_asn_rules;
|
||||
EXISTS mabase_prod.anubis_country_rules;
|
||||
-- Expected: 1 for each
|
||||
```
|
||||
|
||||
### 04 — MV + http_logs
|
||||
|
||||
```sql
|
||||
EXISTS mabase_prod.http_logs;
|
||||
SELECT name FROM system.tables WHERE database = 'mabase_prod' AND name = 'mv_http_logs';
|
||||
-- Expected: mv_http_logs
|
||||
```
|
||||
|
||||
### 05 — Aggregation Tables
|
||||
|
||||
```sql
|
||||
EXISTS mabase_prod.agg_host_ip_ja4_1h;
|
||||
EXISTS mabase_prod.agg_header_fingerprint_1h;
|
||||
SELECT name FROM system.dictionaries WHERE database = 'mabase_prod' AND name = 'dict_bot_ip';
|
||||
-- Expected: dict_bot_ip
|
||||
```
|
||||
|
||||
### 06 — ML Tables
|
||||
|
||||
```sql
|
||||
EXISTS mabase_prod.ml_detected_anomalies;
|
||||
EXISTS mabase_prod.ml_all_scores;
|
||||
SELECT name FROM system.tables WHERE database = 'mabase_prod' AND name LIKE 'view_ip%';
|
||||
-- Expected: view_ip_recurrence
|
||||
```
|
||||
|
||||
### 07 — AI Features View
|
||||
|
||||
```sql
|
||||
SELECT name FROM system.tables WHERE database = 'mabase_prod' AND name = 'view_ai_features_1h';
|
||||
-- Expected: view_ai_features_1h
|
||||
```
|
||||
|
||||
### 08 — Users
|
||||
|
||||
```sql
|
||||
SHOW GRANTS FOR data_writer;
|
||||
-- Expected: GRANT INSERT, SELECT ON mabase_prod.http_logs_raw TO data_writer
|
||||
SHOW GRANTS FOR analyst;
|
||||
-- Expected: GRANT SELECT ON multiple tables
|
||||
```
|
||||
|
||||
### 09 — Audit Table
|
||||
|
||||
```sql
|
||||
EXISTS mabase_prod.audit_logs;
|
||||
-- Expected: 1
|
||||
```
|
||||
|
||||
### Full Verification Query
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
count() AS total_tables
|
||||
FROM system.tables
|
||||
WHERE database = 'mabase_prod'
|
||||
AND name IN (
|
||||
'http_logs_raw', 'http_logs', 'agg_host_ip_ja4_1h', 'agg_header_fingerprint_1h',
|
||||
'ml_detected_anomalies', 'ml_all_scores', 'ref_bot_networks',
|
||||
'anubis_ua_rules', 'anubis_ip_rules', 'anubis_asn_rules', 'anubis_country_rules',
|
||||
'audit_logs', 'bot_ip', 'bot_ja4'
|
||||
);
|
||||
-- Expected: 14
|
||||
```
|
||||
|
||||
## Rollback Notes
|
||||
|
||||
### General Approach
|
||||
|
||||
ClickHouse does not support transactional DDL. To roll back a migration:
|
||||
|
||||
1. **Tables**: `DROP TABLE IF EXISTS mabase_prod.<table_name>`
|
||||
2. **Materialized Views**: `DROP VIEW IF EXISTS mabase_prod.<mv_name>` (drop MV before its target table)
|
||||
3. **Dictionaries**: `DROP DICTIONARY IF EXISTS mabase_prod.<dict_name>`
|
||||
4. **Views**: `DROP VIEW IF EXISTS mabase_prod.<view_name>`
|
||||
5. **Users**: `DROP USER IF EXISTS <username>`
|
||||
|
||||
### Rollback Order (Reverse of Apply)
|
||||
|
||||
```sql
|
||||
-- 09: Audit
|
||||
DROP TABLE IF EXISTS mabase_prod.audit_logs;
|
||||
|
||||
-- 08: Users
|
||||
DROP USER IF EXISTS data_writer;
|
||||
DROP USER IF EXISTS analyst;
|
||||
|
||||
-- 07: AI Features View
|
||||
DROP VIEW IF EXISTS mabase_prod.view_ai_features_1h;
|
||||
|
||||
-- 06: ML Tables
|
||||
DROP VIEW IF EXISTS mabase_prod.view_ip_recurrence;
|
||||
DROP TABLE IF EXISTS mabase_prod.ml_all_scores;
|
||||
DROP TABLE IF EXISTS mabase_prod.ml_detected_anomalies;
|
||||
|
||||
-- 05: Aggregation
|
||||
DROP VIEW IF EXISTS mabase_prod.mv_agg_header_fingerprint_1h;
|
||||
DROP VIEW IF EXISTS mabase_prod.mv_agg_host_ip_ja4_1h;
|
||||
DROP TABLE IF EXISTS mabase_prod.agg_header_fingerprint_1h;
|
||||
DROP TABLE IF EXISTS mabase_prod.agg_host_ip_ja4_1h;
|
||||
DROP DICTIONARY IF EXISTS mabase_prod.dict_asn_reputation;
|
||||
DROP DICTIONARY IF EXISTS mabase_prod.dict_bot_ja4;
|
||||
DROP DICTIONARY IF EXISTS mabase_prod.dict_bot_ip;
|
||||
|
||||
-- 04: MV + http_logs
|
||||
DROP VIEW IF EXISTS mabase_prod.mv_http_logs;
|
||||
DROP TABLE IF EXISTS mabase_prod.http_logs;
|
||||
|
||||
-- 03: Anubis
|
||||
DROP DICTIONARY IF EXISTS mabase_prod.dict_anubis_country;
|
||||
DROP DICTIONARY IF EXISTS mabase_prod.dict_anubis_asn;
|
||||
DROP DICTIONARY IF EXISTS mabase_prod.dict_anubis_ip;
|
||||
DROP DICTIONARY IF EXISTS mabase_prod.dict_anubis_ua;
|
||||
DROP TABLE IF EXISTS mabase_prod.anubis_country_rules;
|
||||
DROP TABLE IF EXISTS mabase_prod.anubis_asn_rules;
|
||||
DROP TABLE IF EXISTS mabase_prod.anubis_ip_rules;
|
||||
DROP TABLE IF EXISTS mabase_prod.anubis_ua_rules;
|
||||
|
||||
-- 02: Dictionaries
|
||||
DROP DICTIONARY IF EXISTS mabase_prod.dict_iplocate_asn;
|
||||
DROP TABLE IF EXISTS mabase_prod.bot_ja4;
|
||||
DROP TABLE IF EXISTS mabase_prod.bot_ip;
|
||||
DROP TABLE IF EXISTS mabase_prod.ref_bot_networks;
|
||||
|
||||
-- 01: Raw Tables
|
||||
DROP TABLE IF EXISTS mabase_prod.http_logs_raw;
|
||||
|
||||
-- 00: Database
|
||||
DROP DATABASE IF EXISTS mabase_prod;
|
||||
```
|
||||
|
||||
### Important Notes
|
||||
|
||||
- **Data loss**: Dropping tables destroys all data. Always back up before rollback.
|
||||
- **MV dependency**: Materialized views must be dropped before their target tables.
|
||||
- **Dictionary dependency**: Views/MVs using dictionaries will fail if dictionaries are dropped while they still reference them.
|
||||
- **Idempotent re-apply**: After rollback, migrations can be safely re-applied since they use `IF NOT EXISTS`.
|
||||
- **`04_mv_http_logs.sql`** is the canonical version of the MV, superseding any base version in `services/correlator/sql/init.sql`.
|
||||
334
docs/database/schema.md
Normal file
334
docs/database/schema.md
Normal file
@ -0,0 +1,334 @@
|
||||
# Database Schema
|
||||
|
||||
The ja4-platform uses ClickHouse as its central data store with database `mabase_prod`. This document describes every table, materialized view, dictionary, and view in the schema.
|
||||
|
||||
## Tables
|
||||
|
||||
### http_logs_raw
|
||||
|
||||
Raw JSON ingest table — direct target for correlator INSERTs.
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| `raw_json` | String (ZSTD(3)) | Complete correlated log as JSON string |
|
||||
| `ingest_time` | DateTime | Insertion timestamp (default: `now()`) |
|
||||
|
||||
- **Engine**: MergeTree
|
||||
- **Partition by**: `toDate(ingest_time)`
|
||||
- **Order by**: `ingest_time`
|
||||
- **TTL**: `ingest_time + INTERVAL 1 DAY`
|
||||
|
||||
---
|
||||
|
||||
### http_logs
|
||||
|
||||
Parsed and enriched HTTP log table — populated by `mv_http_logs` materialized view.
|
||||
|
||||
| Column | Type | Nullable | Description |
|
||||
|--------|------|----------|-------------|
|
||||
| `time` | DateTime | No | Request timestamp |
|
||||
| `log_date` | Date | No | Date partition key (default: `toDate(time)`) |
|
||||
| `src_ip` | IPv4 | No | Client source IP |
|
||||
| `src_port` | UInt16 | No | Client source port |
|
||||
| `dst_ip` | IPv4 | No | Server destination IP |
|
||||
| `dst_port` | UInt16 | No | Server destination port |
|
||||
| `src_asn` | UInt32 | No | Source ASN (enriched via dict_iplocate_asn) |
|
||||
| `src_country_code` | LowCardinality(String) | No | Source country code |
|
||||
| `src_as_name` | LowCardinality(String) | No | AS name |
|
||||
| `src_org` | LowCardinality(String) | No | AS organization |
|
||||
| `src_domain` | LowCardinality(String) | No | AS domain |
|
||||
| `method` | LowCardinality(String) | No | HTTP method |
|
||||
| `scheme` | LowCardinality(String) | No | URL scheme (http/https) |
|
||||
| `host` | LowCardinality(String) | No | HTTP Host header |
|
||||
| `path` | String (ZSTD(3)) | No | Request path |
|
||||
| `query` | String (ZSTD(3)) | No | Query string |
|
||||
| `http_version` | LowCardinality(String) | No | HTTP version |
|
||||
| `orphan_side` | LowCardinality(String) | No | Orphan side (A, B, or empty) |
|
||||
| `correlated` | UInt8 | No | 1 if HTTP+TLS correlated |
|
||||
| `keepalives` | UInt16 | No | Keep-alive request sequence |
|
||||
| `a_timestamp` | UInt64 | No | Source A event timestamp (ns) |
|
||||
| `b_timestamp` | UInt64 | No | Source B event timestamp (ns) |
|
||||
| `conn_id` | String (ZSTD(3)) | No | TCP connection identifier |
|
||||
| `ip_meta_df` | UInt8 | No | IP Don't Fragment flag |
|
||||
| `ip_meta_id` | UInt16 | No | IP identification |
|
||||
| `ip_meta_total_length` | UInt16 | No | IP total length |
|
||||
| `ip_meta_ttl` | UInt8 | No | IP TTL |
|
||||
| `tcp_meta_options` | LowCardinality(String) | No | TCP options list |
|
||||
| `tcp_meta_window_size` | UInt32 | No | TCP window size |
|
||||
| `tcp_meta_mss` | UInt16 | No | TCP MSS |
|
||||
| `tcp_meta_window_scale` | UInt8 | No | TCP window scale |
|
||||
| `syn_to_clienthello_ms` | Int32 | No | SYN-to-ClientHello timing (ms) |
|
||||
| `tls_version` | LowCardinality(String) | No | TLS version |
|
||||
| `tls_sni` | LowCardinality(String) | No | TLS SNI |
|
||||
| `tls_alpn` | LowCardinality(String) | No | TLS ALPN |
|
||||
| `ja3` | String (ZSTD(3)) | No | JA3 fingerprint |
|
||||
| `ja3_hash` | String (ZSTD(3)) | No | JA3 MD5 hash |
|
||||
| `ja4` | String (ZSTD(3)) | No | JA4 fingerprint |
|
||||
| `client_headers` | String (ZSTD(3)) | No | Comma-separated header names |
|
||||
| `header_user_agent` | String (ZSTD(3)) | No | User-Agent header |
|
||||
| `header_accept` | String (ZSTD(3)) | No | Accept header |
|
||||
| `header_accept_encoding` | String (ZSTD(3)) | No | Accept-Encoding header |
|
||||
| `header_accept_language` | String (ZSTD(3)) | No | Accept-Language header |
|
||||
| `header_content_type` | String (ZSTD(3)) | No | Content-Type header |
|
||||
| `header_x_request_id` | String (ZSTD(3)) | No | X-Request-Id header |
|
||||
| `header_x_trace_id` | String (ZSTD(3)) | No | X-Trace-Id header |
|
||||
| `header_x_forwarded_for` | String (ZSTD(3)) | No | X-Forwarded-For header |
|
||||
| `header_sec_ch_ua` | String (ZSTD(3)) | No | Sec-CH-UA header |
|
||||
| `header_sec_ch_ua_mobile` | String (ZSTD(3)) | No | Sec-CH-UA-Mobile header |
|
||||
| `header_sec_ch_ua_platform` | String (ZSTD(3)) | No | Sec-CH-UA-Platform header |
|
||||
| `header_sec_fetch_dest` | String (ZSTD(3)) | No | Sec-Fetch-Dest header |
|
||||
| `header_sec_fetch_mode` | String (ZSTD(3)) | No | Sec-Fetch-Mode header |
|
||||
| `header_sec_fetch_site` | String (ZSTD(3)) | No | Sec-Fetch-Site header |
|
||||
| `anubis_bot_name` | LowCardinality(String) | No | Anubis-detected bot name (default: '') |
|
||||
| `anubis_bot_action` | LowCardinality(String) | No | Anubis-detected bot action (default: '') |
|
||||
| `anubis_bot_category` | LowCardinality(String) | No | Anubis-detected bot category (default: '') |
|
||||
|
||||
- **Engine**: MergeTree
|
||||
- **Partition by**: `log_date`
|
||||
- **Order by**: `(time, src_ip, dst_ip, ja4)`
|
||||
- **TTL**: `log_date + INTERVAL 7 DAY`
|
||||
|
||||
---
|
||||
|
||||
### agg_host_ip_ja4_1h
|
||||
|
||||
Behavioral aggregation per `(src_ip, ja4, host)` per hour. Uses `AggregatingMergeTree` with `SimpleAggregateFunction` and `AggregateFunction` columns for incremental aggregation.
|
||||
|
||||
Key columns include: `window_start`, `src_ip`, `ja4`, `host`, `src_asn`, `hits`, `count_post`, `uniq_paths`, `uniq_query_params`, `tcp_jitter_variance`, `unique_src_ports`, `unique_conn_id`, `orphan_count`, `ip_id_zero_count`, `mss_1460_count`, `uniq_ua`, `url_depth_variance`, `count_anomalous_payload`, `uniq_ja3`, `avg_syn_ms`, `tls12_count`, `count_head`, `count_no_sec_fetch`, `count_generic_accept`, `count_http10`, `ip_df_var`, `avg_ttl`, `ttl_var`, `count_no_wscale`, `count_correlated`, `count_no_accept_enc`, `count_http_scheme`.
|
||||
|
||||
- **Engine**: AggregatingMergeTree
|
||||
- **Order by**: `(window_start, src_ip, ja4, host)`
|
||||
|
||||
---
|
||||
|
||||
### agg_header_fingerprint_1h
|
||||
|
||||
Header-level behavioral fingerprint aggregation per `(src_ip)` per hour.
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| `window_start` | DateTime | Hour window start |
|
||||
| `src_ip` | IPv6 | Source IP |
|
||||
| `header_order_hash` | SimpleAggregateFunction(any, String) | Hash of header order |
|
||||
| `header_count` | SimpleAggregateFunction(max, UInt16) | Max header count |
|
||||
| `has_accept_language` | SimpleAggregateFunction(max, UInt8) | Accept-Language presence |
|
||||
| `has_cookie` | SimpleAggregateFunction(max, UInt8) | Cookie presence |
|
||||
| `has_referer` | SimpleAggregateFunction(max, UInt8) | Referer presence |
|
||||
| `modern_browser_score` | SimpleAggregateFunction(max, UInt8) | Browser compliance score |
|
||||
| `ua_ch_mismatch` | SimpleAggregateFunction(max, UInt8) | UA/Client Hints mismatch |
|
||||
| `sec_fetch_mode` | SimpleAggregateFunction(any, String) | Sec-Fetch-Mode value |
|
||||
| `sec_fetch_dest` | SimpleAggregateFunction(any, String) | Sec-Fetch-Dest value |
|
||||
|
||||
- **Engine**: AggregatingMergeTree
|
||||
- **Order by**: `(window_start, src_ip)`
|
||||
|
||||
---
|
||||
|
||||
### ml_detected_anomalies
|
||||
|
||||
Anomaly detections above the threat threshold.
|
||||
|
||||
Key columns: `detected_at`, `src_ip` (IPv6), `ja4`, `host`, `bot_name`, `anomaly_score` (Float32), `raw_anomaly_score` (Float32), `threat_level`, `model_name`, `recurrence` (UInt32), `campaign_id` (Int32), `reason`, plus all ML feature columns and Anubis enrichment (`anubis_bot_name`, `anubis_bot_action`, `anubis_bot_category`).
|
||||
|
||||
- **Engine**: ReplacingMergeTree(detected_at)
|
||||
- **Order by**: `(src_ip)`
|
||||
- **TTL**: `detected_at + INTERVAL 30 DAY`
|
||||
|
||||
---
|
||||
|
||||
### ml_all_scores
|
||||
|
||||
All ML classifications (no threshold filter) for observability.
|
||||
|
||||
Key columns: `detected_at`, `window_start`, `src_ip`, `ja4`, `host`, `bot_name`, `anomaly_score`, `raw_anomaly_score`, `threat_level`, `model_name`, `correlated`, `campaign_id`, plus ASN and Anubis enrichment.
|
||||
|
||||
- **Engine**: ReplacingMergeTree(detected_at)
|
||||
- **Order by**: `(window_start, src_ip, ja4, host, model_name)`
|
||||
- **TTL**: `window_start + INTERVAL 3 DAY`
|
||||
|
||||
---
|
||||
|
||||
### ref_bot_networks
|
||||
|
||||
Bot network CIDR reference table.
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| `network` | IPv6CIDR | Network CIDR |
|
||||
| `bot_name` | LowCardinality(String) | Bot name |
|
||||
| `is_legitimate` | UInt8 | 1 = legitimate bot |
|
||||
| `last_update` | DateTime | Last update timestamp |
|
||||
|
||||
- **Engine**: ReplacingMergeTree(last_update)
|
||||
- **Order by**: `(network, bot_name)`
|
||||
|
||||
---
|
||||
|
||||
### bot_ip / bot_ja4
|
||||
|
||||
CSV-backed flat tables for quick bot lookups.
|
||||
|
||||
- `bot_ip`: single column `ip` (String) — Engine: File(CSV, 'bot_ip.csv')
|
||||
- `bot_ja4`: single column `ja4` (String) — Engine: File(CSV, 'bot_ja4.csv')
|
||||
|
||||
---
|
||||
|
||||
### Anubis Rule Tables
|
||||
|
||||
| Table | Key | Columns | Engine |
|
||||
|-------|-----|---------|--------|
|
||||
| `anubis_ua_rules` | `id` (UInt64) | `parent_id`, `regexp`, `keys`, `values` | ReplacingMergeTree |
|
||||
| `anubis_ip_rules` | `prefix` (String) | `bot_name`, `action`, `rule_id`, `has_ua`, `category` | ReplacingMergeTree |
|
||||
| `anubis_asn_rules` | `asn` (UInt32) | `bot_name`, `action`, `category` | ReplacingMergeTree |
|
||||
| `anubis_country_rules` | `country_code` (String) | `bot_name`, `action`, `category` | ReplacingMergeTree |
|
||||
|
||||
---
|
||||
|
||||
### audit_logs
|
||||
|
||||
SOC audit trail for dashboard activity.
|
||||
|
||||
| Column | Type | Default | Description |
|
||||
|--------|------|---------|-------------|
|
||||
| `timestamp` | DateTime | `now()` | Event time |
|
||||
| `user_name` | LowCardinality(String) | `'soc_user'` | Analyst name |
|
||||
| `action` | LowCardinality(String) | — | Action performed |
|
||||
| `entity_type` | LowCardinality(String) | `''` | Entity type (ip, ja4, etc.) |
|
||||
| `entity_id` | String | `''` | Entity identifier |
|
||||
| `entity_count` | UInt32 | `0` | Entity count |
|
||||
| `details` | String (ZSTD(3)) | `''` | JSON details |
|
||||
| `client_ip` | String | `''` | Analyst client IP |
|
||||
|
||||
- **Engine**: MergeTree
|
||||
- **Partition by**: `toDate(timestamp)`
|
||||
- **Order by**: `(timestamp, user_name, action)`
|
||||
- **TTL**: `toDate(timestamp) + INTERVAL 90 DAY`
|
||||
|
||||
---
|
||||
|
||||
## Materialized Views
|
||||
|
||||
### mv_http_logs
|
||||
|
||||
- **Source**: `http_logs_raw`
|
||||
- **Target**: `http_logs`
|
||||
- **Transformation**: Parses `raw_json` via `JSONExtract*` functions, enriches with ASN data from `dict_iplocate_asn` and Anubis bot detection from `dict_anubis_ua`, `dict_anubis_ip`, `dict_anubis_asn`, `dict_anubis_country`. Uses a 5-level priority cascade for Anubis: UA+IP combined > UA only > IP only > ASN > Country.
|
||||
|
||||
### mv_agg_host_ip_ja4_1h
|
||||
|
||||
- **Source**: `http_logs`
|
||||
- **Target**: `agg_host_ip_ja4_1h`
|
||||
- **Transformation**: Groups by `(toStartOfHour(time), src_ip, ja4, host, src_asn)`. Computes counts, unique values, variances, and aggregate functions for 50+ behavioral features.
|
||||
|
||||
### mv_agg_header_fingerprint_1h
|
||||
|
||||
- **Source**: `http_logs`
|
||||
- **Target**: `agg_header_fingerprint_1h`
|
||||
- **Transformation**: Groups by `(toStartOfHour(time), src_ip)`. Computes header order hash, header count, browser compliance score, Client Hints mismatch.
|
||||
|
||||
---
|
||||
|
||||
## Dictionaries
|
||||
|
||||
### dict_iplocate_asn
|
||||
|
||||
- **Source**: CSV file `/var/lib/clickhouse/user_files/iplocate-ip-to-asn.csv`
|
||||
- **Key**: `network` (String)
|
||||
- **Layout**: `IP_TRIE`
|
||||
- **Attributes**: `asn` (UInt32), `country_code`, `name`, `org`, `domain`
|
||||
- **Lifetime**: 3600–7200 seconds
|
||||
|
||||
### dict_bot_ip
|
||||
|
||||
- **Source**: CSV file `/var/lib/clickhouse/user_files/bot_ip.csv`
|
||||
- **Key**: `prefix` (String)
|
||||
- **Layout**: `IP_TRIE`
|
||||
- **Attributes**: `bot_name` (String)
|
||||
- **Lifetime**: 300 seconds
|
||||
|
||||
### dict_bot_ja4
|
||||
|
||||
- **Source**: CSV file `/var/lib/clickhouse/user_files/bot_ja4.csv`
|
||||
- **Key**: `ja4` (String)
|
||||
- **Layout**: `COMPLEX_KEY_HASHED`
|
||||
- **Attributes**: `bot_name` (String)
|
||||
- **Lifetime**: 300 seconds
|
||||
|
||||
### dict_asn_reputation
|
||||
|
||||
- **Source**: CSV file `/var/lib/clickhouse/user_files/asn_reputation.csv`
|
||||
- **Key**: `src_asn` (UInt64)
|
||||
- **Layout**: `HASHED`
|
||||
- **Attributes**: `label` (String)
|
||||
- **Lifetime**: 300 seconds
|
||||
|
||||
### dict_anubis_ua
|
||||
|
||||
- **Source**: ClickHouse table `anubis_ua_rules`
|
||||
- **Key**: `regexp` (String)
|
||||
- **Layout**: `REGEXP_TREE`
|
||||
- **Attributes**: `bot_name`, `action`, `has_ip`, `rule_id`, `category`
|
||||
- **Lifetime**: 300–600 seconds
|
||||
|
||||
### dict_anubis_ip
|
||||
|
||||
- **Source**: ClickHouse table `anubis_ip_rules`
|
||||
- **Key**: `prefix` (String)
|
||||
- **Layout**: `IP_TRIE`
|
||||
- **Attributes**: `bot_name`, `action`, `rule_id`, `has_ua`, `category`
|
||||
- **Lifetime**: 300–600 seconds
|
||||
|
||||
### dict_anubis_asn
|
||||
|
||||
- **Source**: ClickHouse table `anubis_asn_rules`
|
||||
- **Key**: `asn` (UInt32)
|
||||
- **Layout**: `FLAT`
|
||||
- **Attributes**: `bot_name`, `action`, `category`
|
||||
- **Lifetime**: 300–600 seconds
|
||||
|
||||
### dict_anubis_country
|
||||
|
||||
- **Source**: ClickHouse table `anubis_country_rules`
|
||||
- **Key**: `country_code` (String)
|
||||
- **Layout**: `FLAT`
|
||||
- **Attributes**: `bot_name`, `action`, `category`
|
||||
- **Lifetime**: 300–600 seconds
|
||||
|
||||
---
|
||||
|
||||
## Views
|
||||
|
||||
### view_ai_features_1h
|
||||
|
||||
Computes 50+ ML features per `(src_ip, ja4, host)` from the last 24 hours by joining `agg_host_ip_ja4_1h` and `agg_header_fingerprint_1h`. Includes:
|
||||
|
||||
- Behavioral features: `hits`, `hit_velocity`, `fuzzing_index`, `post_ratio`, `orphan_ratio`
|
||||
- Connection features: `max_keepalives`, `multiplexing_efficiency`, `port_exhaustion_ratio`
|
||||
- Browser features: `modern_browser_score`, `ua_ch_mismatch`, `header_order_shared_count`
|
||||
- TLS features: `alpn_http_mismatch`, `is_alpn_missing`, `sni_host_mismatch`
|
||||
- L4 features: `tcp_jitter_variance`, `avg_ttl`, `ttl_std`, `syn_timing_cv`
|
||||
- Reputation: `bot_name` (from dict_bot_ip/dict_bot_ja4), `anubis_bot_name/action/category`
|
||||
- Derived: `temporal_entropy`, `ja3_diversity_ratio`
|
||||
|
||||
### view_ip_recurrence
|
||||
|
||||
Aggregates recurrence data from `ml_detected_anomalies`:
|
||||
|
||||
```sql
|
||||
SELECT src_ip, count() AS recurrence,
|
||||
min(detected_at) AS first_seen, max(detected_at) AS last_seen,
|
||||
min(anomaly_score) AS worst_score,
|
||||
argMin(threat_level, anomaly_score) AS worst_threat_level
|
||||
FROM ml_detected_anomalies GROUP BY src_ip;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## User Accounts
|
||||
|
||||
| User | Permissions | Purpose |
|
||||
|------|------------|---------|
|
||||
| `data_writer` | INSERT + SELECT on `http_logs_raw` | Used by correlator service |
|
||||
| `analyst` | SELECT on `http_logs`, `ml_detected_anomalies`, `ml_all_scores`, `view_ai_features_1h`, `view_ip_recurrence`, `audit_logs` | Used by dashboard/SOC analysts |
|
||||
|
||||
> **Security note**: Default passwords are `ChangeMe` — replace with strong passwords before production use. Store credentials in a secrets manager.
|
||||
246
docs/development.md
Normal file
246
docs/development.md
Normal file
@ -0,0 +1,246 @@
|
||||
# Development Guide
|
||||
|
||||
This guide covers building, testing, packaging, and extending the ja4-platform monorepo. All build and test operations run inside Docker — no native Go, Python, or C toolchains are required on the host.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
| Requirement | Minimum Version | Notes |
|
||||
|-------------|----------------|-------|
|
||||
| Docker | 20.10+ | BuildKit enabled (`DOCKER_BUILDKIT=1`) |
|
||||
| Docker Compose | 2.x | For bot-detector and dashboard |
|
||||
| make | 3.81+ | GNU Make |
|
||||
| git | 2.x | For version tagging |
|
||||
|
||||
No Go, Python, or C compilers are needed on the host machine.
|
||||
|
||||
## Building All Services
|
||||
|
||||
```bash
|
||||
make build-all
|
||||
```
|
||||
|
||||
This builds Docker images for:
|
||||
- `ja4-platform/sentinel:latest`
|
||||
- `ja4-platform/correlator:latest`
|
||||
- `ja4-platform/bot-detector:latest`
|
||||
- `ja4-platform/dashboard:latest`
|
||||
|
||||
mod-reqin-log is an Apache module and is only built as part of the RPM packaging process.
|
||||
|
||||
### Building Individual Services
|
||||
|
||||
```bash
|
||||
make build-sentinel # Go binary in Docker
|
||||
make build-correlator # Go binary in Docker
|
||||
make build-bot-detector # Python image
|
||||
make build-dashboard # FastAPI + React image
|
||||
```
|
||||
|
||||
## Running Tests
|
||||
|
||||
```bash
|
||||
make test-all
|
||||
```
|
||||
|
||||
### Per-Service Testing
|
||||
|
||||
| Service | Command | Details |
|
||||
|---------|---------|---------|
|
||||
| sentinel | `make test-sentinel` | Go tests with `-race` flag, requires `NET_RAW`/`NET_ADMIN` caps |
|
||||
| correlator | `make test-correlator` | Go tests with 80% coverage gate enforced |
|
||||
| mod-reqin-log | `make test-mod-reqin-log` | C unit tests (JSON serialization, config parsing, header handling) |
|
||||
| bot-detector | `make test-bot-detector` | Python pytest suite |
|
||||
| dashboard | `make test-dashboard` | Python pytest for FastAPI routes |
|
||||
| ja4_common (Python) | `make test-ja4common-python` | Shared Python library tests |
|
||||
|
||||
## Building RPM Packages
|
||||
|
||||
```bash
|
||||
make rpm-all
|
||||
```
|
||||
|
||||
Builds RPMs for sentinel, correlator, and mod-reqin-log targeting Rocky Linux 8/9/10:
|
||||
|
||||
```bash
|
||||
make rpm-sentinel # → services/sentinel/dist/rpm/
|
||||
make rpm-correlator # → services/correlator/dist/rpm/
|
||||
make rpm-mod-reqin-log # → services/mod-reqin-log/dist/rpm/
|
||||
```
|
||||
|
||||
Each RPM build uses a multi-stage Docker pipeline:
|
||||
1. Builder stage compiles the binary (Go) or shared object (C)
|
||||
2. RPM builder stage runs `rpmbuild` for each target distro (el8, el9, el10)
|
||||
3. Output stage copies RPMs to the host via `--output type=local`
|
||||
|
||||
### Distribution Packages
|
||||
|
||||
```bash
|
||||
make dist # Alias for rpm-all
|
||||
# RPMs in services/<service>/dist/rpm/el{8,9,10}/
|
||||
```
|
||||
|
||||
## Local Development Workflow
|
||||
|
||||
### Go Services (sentinel, correlator)
|
||||
|
||||
The `go.work` workspace links Go modules:
|
||||
|
||||
```
|
||||
go 1.21
|
||||
|
||||
use (
|
||||
./services/sentinel
|
||||
./services/correlator
|
||||
./shared/go/ja4common
|
||||
)
|
||||
```
|
||||
|
||||
If you have Go 1.21+ installed locally, you can develop without Docker:
|
||||
|
||||
```bash
|
||||
# Run sentinel tests locally
|
||||
cd services/sentinel && go test ./... -race -v
|
||||
|
||||
# Run correlator tests locally
|
||||
cd services/correlator && go test ./... -race -cover -v
|
||||
|
||||
# Build sentinel binary locally (requires libpcap-dev)
|
||||
cd services/sentinel && go build -o ja4sentinel ./cmd/ja4sentinel/
|
||||
```
|
||||
|
||||
### Python Services (bot-detector, dashboard)
|
||||
|
||||
```bash
|
||||
# Install shared library in development mode
|
||||
cd shared/python/ja4_common && pip install -e .
|
||||
|
||||
# Run bot-detector locally
|
||||
cd services/bot-detector && pip install -r bot_detector/requirements.txt
|
||||
python -m bot_detector.bot_detector
|
||||
|
||||
# Run dashboard locally
|
||||
cd services/dashboard && pip install -r backend/requirements.txt
|
||||
uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000
|
||||
```
|
||||
|
||||
### C Module (mod-reqin-log)
|
||||
|
||||
Requires `apxs` (Apache extension tool) and development headers:
|
||||
|
||||
```bash
|
||||
cd services/mod-reqin-log
|
||||
make build # Compiles mod_reqin_log.so
|
||||
make test # Runs unit tests
|
||||
make rpm # Builds RPM packages
|
||||
```
|
||||
|
||||
## Adding a New Service
|
||||
|
||||
### Go Service
|
||||
|
||||
1. Create the service directory:
|
||||
```bash
|
||||
mkdir -p services/my-service/cmd/my-service
|
||||
mkdir -p services/my-service/internal
|
||||
```
|
||||
|
||||
2. Initialize the Go module:
|
||||
```bash
|
||||
cd services/my-service
|
||||
go mod init github.com/antitbone/ja4/my-service
|
||||
```
|
||||
|
||||
3. Add to `go.work`:
|
||||
```
|
||||
use (
|
||||
./services/sentinel
|
||||
./services/correlator
|
||||
./services/my-service # ← add this
|
||||
./shared/go/ja4common
|
||||
)
|
||||
```
|
||||
|
||||
4. Import the shared library:
|
||||
```go
|
||||
import (
|
||||
"github.com/antitbone/ja4/ja4common/logger"
|
||||
"github.com/antitbone/ja4/ja4common/config"
|
||||
"github.com/antitbone/ja4/ja4common/shutdown"
|
||||
)
|
||||
```
|
||||
|
||||
5. Add Makefile targets:
|
||||
```makefile
|
||||
build-my-service:
|
||||
docker build -f services/my-service/Dockerfile -t ja4-platform/my-service:latest .
|
||||
|
||||
test-my-service:
|
||||
docker build -f services/my-service/Dockerfile.dev -t ja4-platform/my-service-tests:latest .
|
||||
docker run --rm ja4-platform/my-service-tests:latest
|
||||
```
|
||||
|
||||
6. Update `build-all` and `test-all` dependencies.
|
||||
|
||||
### Python Service
|
||||
|
||||
1. Create the service directory with a `requirements.txt` or `pyproject.toml`.
|
||||
2. Add `ja4-common` as a dependency (installed from `shared/python/ja4_common`).
|
||||
3. Use `from ja4_common.clickhouse import get_client` for ClickHouse access.
|
||||
4. Add Makefile targets following the bot-detector/dashboard pattern.
|
||||
|
||||
## go.work Workspace
|
||||
|
||||
The `go.work` file at the repository root links all Go modules, allowing cross-module development without publishing:
|
||||
|
||||
```
|
||||
go 1.21
|
||||
|
||||
use (
|
||||
./services/sentinel
|
||||
./services/correlator
|
||||
./shared/go/ja4common
|
||||
)
|
||||
```
|
||||
|
||||
When adding a new Go module:
|
||||
1. `go mod init` in the service directory
|
||||
2. Add the path to `go.work`
|
||||
3. Reference shared packages via their module path: `github.com/antitbone/ja4/ja4common/...`
|
||||
4. Run `go work sync` to update the workspace
|
||||
|
||||
## ja4_common Python Package
|
||||
|
||||
The shared Python package (`shared/python/ja4_common`) provides:
|
||||
|
||||
- `ClickHouseSettings` — pydantic-settings model reading from `.env`
|
||||
- `ClickHouseClient` — singleton client with auto-reconnect
|
||||
- `get_client()` — module-level singleton accessor
|
||||
|
||||
### Extending ja4_common
|
||||
|
||||
1. Add new modules under `shared/python/ja4_common/ja4_common/`
|
||||
2. Export them in `__init__.py`
|
||||
3. Add dependencies to `pyproject.toml`
|
||||
4. Run tests: `make test-ja4common-python`
|
||||
|
||||
### Using in a New Service
|
||||
|
||||
Add to `requirements.txt`:
|
||||
```
|
||||
ja4-common @ file:///app/shared/python/ja4_common
|
||||
```
|
||||
|
||||
Or in Docker, copy the shared library and install:
|
||||
```dockerfile
|
||||
COPY shared/python/ja4_common /app/shared/python/ja4_common
|
||||
RUN pip install /app/shared/python/ja4_common
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
Each service reads configuration from environment variables and/or YAML config files. See individual service documentation for the full reference:
|
||||
|
||||
- [Sentinel configuration](services/sentinel.md#configuration-reference)
|
||||
- [Correlator configuration](services/correlator.md#configuration-reference)
|
||||
- [Bot Detector configuration](services/bot-detector.md#environment-variables)
|
||||
- [Dashboard configuration](services/dashboard.md#configuration)
|
||||
265
docs/services/bot-detector.md
Normal file
265
docs/services/bot-detector.md
Normal file
@ -0,0 +1,265 @@
|
||||
# Bot Detector
|
||||
|
||||
The bot-detector is a Python service that performs machine-learning anomaly detection on aggregated HTTP/TLS traffic features stored in ClickHouse. It runs on a continuous cycle (default: every 5 minutes), using Isolation Forest to identify suspicious traffic patterns, enriched with SHAP explainability, DBSCAN clustering, and Anubis bot-rule enrichment.
|
||||
|
||||
## ML Algorithm
|
||||
|
||||
### Isolation Forest (Semi-Supervised)
|
||||
|
||||
The core algorithm is **Isolation Forest** (Liu, Ting & Zhou, 2008) — an unsupervised anomaly detection algorithm that isolates anomalies by randomly partitioning feature space. Anomalies require fewer partitions to isolate than normal points.
|
||||
|
||||
The approach is **semi-supervised** because:
|
||||
1. **Known bots** are identified a priori via reputation dictionaries (IP, JA4, ASN)
|
||||
2. **Human baseline** is identified via ASN reputation labels (`asn_label = 'human'`)
|
||||
3. The model trains **only on human-baseline traffic** (minimum 500 sessions required)
|
||||
4. Unknown traffic is scored by deviation from the human profile
|
||||
|
||||
### Two-Model Architecture
|
||||
|
||||
| Model | Condition | Features | Data |
|
||||
|-------|-----------|----------|------|
|
||||
| **Complet** | `correlated = 1` | 35 | HTTP + TCP + TLS (full pipeline data) |
|
||||
| **Applicatif** | `correlated = 0` | 31 | HTTP only (no TLS correlation available) |
|
||||
|
||||
### Threat Levels
|
||||
|
||||
| Score Range | Level | Interpretation |
|
||||
|------------|-------|----------------|
|
||||
| `< -0.30` | **CRITICAL** | Extremely anomalous behavior |
|
||||
| `< -0.15` | **HIGH** | Strong anomaly signal |
|
||||
| `< -0.05` | **MEDIUM** | Moderate anomaly |
|
||||
| `≥ -0.05` | **LOW** | Slightly unusual |
|
||||
|
||||
## Feature List
|
||||
|
||||
### Common Features (31 — Applicatif model)
|
||||
|
||||
#### HTTP Behavior
|
||||
|
||||
| Feature | Description |
|
||||
|---------|-------------|
|
||||
| `hits` | Request count in the window |
|
||||
| `hit_velocity` | Requests per second |
|
||||
| `fuzzing_index` | Path/parameter diversity anomaly score |
|
||||
| `post_ratio` | Fraction of POST requests |
|
||||
| `port_exhaustion_ratio` | Fraction of distinct source ports / total |
|
||||
| `orphan_ratio` | Requests without TLS correlation |
|
||||
| `head_ratio` | Fraction of HEAD requests |
|
||||
| `http10_ratio` | Fraction of HTTP/1.0 requests |
|
||||
| `generic_accept_ratio` | Fraction of short Accept headers |
|
||||
| `sec_fetch_absence_rate` | Fraction missing Sec-Fetch-Site |
|
||||
| `missing_accept_enc_ratio` | Fraction missing Accept-Encoding |
|
||||
| `http_scheme_ratio` | Fraction using HTTP (not HTTPS) |
|
||||
|
||||
#### Connection Management
|
||||
|
||||
| Feature | Description |
|
||||
|---------|-------------|
|
||||
| `max_keepalives` | Max requests on a single Keep-Alive connection |
|
||||
| `tcp_shared_count` | TCP connections shared between sessions |
|
||||
| `multiplexing_efficiency` | HTTP/2 multiplexing efficiency |
|
||||
|
||||
#### Browser Fingerprint
|
||||
|
||||
| Feature | Description |
|
||||
|---------|-------------|
|
||||
| `header_count` | HTTP headers sent |
|
||||
| `has_accept_language` | Accept-Language header presence |
|
||||
| `has_cookie` | Cookie header presence |
|
||||
| `has_referer` | Referer header presence |
|
||||
| `modern_browser_score` | Composite browser compliance score (0–100) |
|
||||
| `ua_ch_mismatch` | User-Agent vs Client Hints inconsistency |
|
||||
| `ip_id_zero_ratio` | IP packets with ID=0 (headless/minimal stack) |
|
||||
| `header_order_shared_count` | IPs sharing same header order |
|
||||
| `header_order_confidence` | Normalized entropy of header order |
|
||||
| `distinct_header_orders` | Distinct header orderings per IP |
|
||||
| `is_fake_navigation` | Sec-Fetch-Mode=navigate with non-document dest |
|
||||
|
||||
#### Navigation Patterns
|
||||
|
||||
| Feature | Description |
|
||||
|---------|-------------|
|
||||
| `request_size_variance` | Variance of request sizes |
|
||||
| `mss_mobile_mismatch` | TCP MSS vs mobile profile inconsistency |
|
||||
| `asset_ratio` | Static asset request fraction |
|
||||
| `direct_access_ratio` | Direct accesses (no referer) |
|
||||
| `is_ua_rotating` | User-Agent rotation detected (flag) |
|
||||
| `distinct_ja4_count` | Distinct JA4 fingerprints per IP |
|
||||
| `anomalous_payload_ratio` | Anomalous payload size fraction |
|
||||
|
||||
#### Concentration & Rarity
|
||||
|
||||
| Feature | Description |
|
||||
|---------|-------------|
|
||||
| `src_port_density` | Source port entropy |
|
||||
| `ja4_asn_concentration` | JA4 concentration within ASN |
|
||||
| `ja4_country_concentration` | JA4 concentration per country |
|
||||
| `is_rare_ja4` | Rare JA4 fingerprint (< 100 total hits) |
|
||||
|
||||
#### Temporal & Diversity
|
||||
|
||||
| Feature | Description |
|
||||
|---------|-------------|
|
||||
| `temporal_entropy` | Temporal distribution entropy |
|
||||
| `path_diversity_ratio` | URL path diversity |
|
||||
| `url_depth_variance` | URL depth variance |
|
||||
| `ja3_diversity_ratio` | JA3 diversity ratio per IP |
|
||||
|
||||
### Additional TCP/TLS Features (Complet model only — 4 extra)
|
||||
|
||||
| Feature | Description |
|
||||
|---------|-------------|
|
||||
| `tcp_jitter_variance` | TCP inter-packet jitter variance |
|
||||
| `alpn_http_mismatch` | ALPN vs actual HTTP protocol mismatch |
|
||||
| `is_alpn_missing` | ALPN absent in ClientHello |
|
||||
| `sni_host_mismatch` | TLS SNI vs HTTP Host mismatch |
|
||||
|
||||
### L4 Fingerprint Features (Complet model)
|
||||
|
||||
| Feature | Description |
|
||||
|---------|-------------|
|
||||
| `avg_ttl` | Average IP TTL (OS fingerprint) |
|
||||
| `ttl_std` | TTL standard deviation |
|
||||
| `no_window_scale_ratio` | Fraction without TCP window scale |
|
||||
| `syn_timing_cv` | SYN timing coefficient of variation |
|
||||
| `tls12_ratio` | Fraction of TLS 1.2 connections |
|
||||
| `ip_df_variance` | IP Don't-Fragment flag variance |
|
||||
|
||||
## Detection Pipeline
|
||||
|
||||
```
|
||||
1. Read view_ai_features_1h (last 24h) → DataFrame
|
||||
2. Read view_ip_recurrence → recurrence map
|
||||
3. Clean columns (fillna, astype)
|
||||
4. Split by correlated=1 / correlated=0
|
||||
5. For each model (Complet, Applicatif):
|
||||
a. A7: Validate features (exclude missing/constant)
|
||||
b. Separate known bots → log as KNOWN_BOT
|
||||
c. Filter human baseline (asn_label='human', min 500 sessions)
|
||||
d. Load or train Isolation Forest model
|
||||
e. A1: Check concept drift (KS test on features)
|
||||
f. Score unknown traffic
|
||||
g. A10: Normalize scores to [-1, 0]
|
||||
h. A2: Compute adaptive threshold = min(percentile_5, ANOMALY_THRESHOLD)
|
||||
i. A6: Apply recurrence weighting
|
||||
j. Filter scores below threshold
|
||||
k. A4: SHAP explainability (top 5 features)
|
||||
l. A8: DBSCAN clustering (campaign detection)
|
||||
6. Concatenate results, deduplicate by src_ip (keep lowest score)
|
||||
7. A5: Deduplication with TTL (skip recently reported IPs)
|
||||
8. Insert into ml_detected_anomalies + ml_all_scores
|
||||
```
|
||||
|
||||
## Concept Drift Detection (A1)
|
||||
|
||||
Uses the **Kolmogorov-Smirnov test** to compare feature distributions between the current data and the training data. If the fraction of drifted features exceeds `DRIFT_THRESHOLD` (default: 0.30), the model is retrained.
|
||||
|
||||
## SHAP Explainability (A4)
|
||||
|
||||
When enabled (`ENABLE_SHAP=true`), computes SHAP values for each detected anomaly using `shap.TreeExplainer`. The top 5 contributing features are stored in the `reason` field.
|
||||
|
||||
## DBSCAN Clustering (A8)
|
||||
|
||||
When enabled (`ENABLE_CLUSTERING=true`), applies DBSCAN on anomaly feature vectors to group related anomalies into campaigns. Each anomaly gets a `campaign_id` (-1 = no cluster).
|
||||
|
||||
## Anubis Bot-Rule Enrichment
|
||||
|
||||
The `view_ai_features_1h` view enriches each IP with Anubis bot detection using a priority cascade:
|
||||
1. **UA + IP combined** (same `rule_id`) — highest confidence
|
||||
2. **UA only** (no IP requirement)
|
||||
3. **IP only** (no UA requirement)
|
||||
4. **ASN match**
|
||||
5. **Country match**
|
||||
|
||||
## Environment Variables
|
||||
|
||||
| Variable | Type | Default | Description |
|
||||
|----------|------|---------|-------------|
|
||||
| `CLICKHOUSE_HOST` | string | `clickhouse` | ClickHouse server hostname |
|
||||
| `CLICKHOUSE_PORT` | int | `8123` | ClickHouse HTTP port |
|
||||
| `CLICKHOUSE_DB` | string | `mabase_prod` | Database name |
|
||||
| `CLICKHOUSE_USER` | string | `admin` | ClickHouse username |
|
||||
| `CLICKHOUSE_PASSWORD` | string | `""` | ClickHouse password |
|
||||
| `ISOLATION_CONTAMINATION` | float | `0.02` | Contamination parameter for Isolation Forest |
|
||||
| `ANOMALY_THRESHOLD` | float | `-0.03` | Score threshold for anomaly detection |
|
||||
| `ANOMALY_PERCENTILE` | int | `5` | Percentile for adaptive threshold (A2) |
|
||||
| `CYCLE_INTERVAL_SEC` | int | `300` | Seconds between detection cycles |
|
||||
| `MAX_CONSECUTIVE_FAILURES` | int | `3` | Max consecutive failures before exit |
|
||||
| `BOT_DETECTOR_LOG` | string | `/var/log/bot_detector/decisions.jsonl` | Decision log file path |
|
||||
| `LOG_BACKUP_COUNT` | int | `7` | Number of rotated log backups |
|
||||
| `MODEL_DIR` | string | `/var/lib/bot_detector` | Model persistence directory |
|
||||
| `RETRAIN_INTERVAL_HOURS` | int | `24` | Hours between model retraining |
|
||||
| `MODEL_HISTORY_COUNT` | int | `10` | Number of model versions to keep |
|
||||
| `DRIFT_THRESHOLD` | float | `0.30` | KS-test drift threshold (A1) |
|
||||
| `ENABLE_MULTIWINDOW` | bool | `false` | Enable 24h multi-window analysis (A3) |
|
||||
| `MULTIWINDOW_VIEW` | string | `view_ai_features_24h` | View for multi-window mode |
|
||||
| `ENABLE_SHAP` | bool | `true` | Enable SHAP explainability (A4) |
|
||||
| `DEDUP_TTL_MIN` | int | `60` | Deduplication TTL in minutes (A5) |
|
||||
| `RECURRENCE_WEIGHT` | float | `0.005` | Recurrence score weighting factor (A6) |
|
||||
| `MIN_VALID_FEATURE_RATIO` | float | `0.50` | Min valid feature ratio (A7) |
|
||||
| `ENABLE_CLUSTERING` | bool | `true` | Enable DBSCAN clustering (A8) |
|
||||
| `CLUSTERING_MIN_SAMPLES` | int | `3` | DBSCAN min samples per cluster |
|
||||
| `HEALTH_PORT` | int | `8080` | Health check HTTP server port |
|
||||
|
||||
## Output Tables
|
||||
|
||||
### ml_detected_anomalies
|
||||
|
||||
Anomaly detections above the threat threshold. Engine: `ReplacingMergeTree(detected_at)`, ORDER BY `(src_ip)`, TTL 30 days.
|
||||
|
||||
Key columns: `detected_at`, `src_ip`, `ja4`, `host`, `bot_name`, `anomaly_score`, `raw_anomaly_score`, `threat_level`, `model_name`, `recurrence`, `campaign_id`, `reason`, `anubis_bot_name`, `anubis_bot_action`, `anubis_bot_category`, plus all ML features.
|
||||
|
||||
### ml_all_scores
|
||||
|
||||
All classifications (no threshold filter) for observability. Engine: `ReplacingMergeTree(detected_at)`, ORDER BY `(window_start, src_ip, ja4, host, model_name)`, TTL 3 days.
|
||||
|
||||
## Decision Log Format
|
||||
|
||||
The `decisions.jsonl` file contains structured JSONL entries:
|
||||
|
||||
```json
|
||||
{"event": "CYCLE_START", "cycle_id": "20260309T143000", "total": 5000, "human": 1500, "known_bot": 200, "correlated": 3000}
|
||||
{"event": "ANOMALY", "src_ip": "203.0.113.42", "score": -0.25, "threat_level": "HIGH", "reason": "hit_velocity=45.2, fuzzing_index=0.8, ...", "campaign_id": 3}
|
||||
{"event": "KNOWN_BOT", "src_ip": "198.51.100.10", "bot_name": "AhrefsBot"}
|
||||
{"event": "CYCLE_END", "cycle_id": "20260309T143000", "anomalies": 15, "known_bots": 200, "duration_sec": 12.5}
|
||||
```
|
||||
|
||||
Log rotation: 50 MB max size × `LOG_BACKUP_COUNT` backups (default 7).
|
||||
|
||||
## Health Check Endpoint
|
||||
|
||||
- **URL**: `GET http://localhost:8080/`
|
||||
- **Response**: `200 OK` with status JSON
|
||||
- Runs in a separate thread
|
||||
|
||||
## Model Persistence
|
||||
|
||||
| File | Description |
|
||||
|------|-------------|
|
||||
| `model_<name>_<version>.joblib` | Serialized Isolation Forest (joblib) |
|
||||
| `model_<name>_<version>.meta.json` | Model metadata (features, thresholds, training stats) |
|
||||
| `model_<name>.current` | Pointer to active model version |
|
||||
| `training_history.jsonl` | Training history log |
|
||||
|
||||
Models are rotated: only the last `MODEL_HISTORY_COUNT` versions (default 10) are kept.
|
||||
|
||||
## Docker Deployment
|
||||
|
||||
```bash
|
||||
# Build
|
||||
make build-bot-detector
|
||||
|
||||
# Run with docker-compose
|
||||
cd services/bot-detector
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
### Volumes
|
||||
|
||||
| Host Path | Container Path | Description |
|
||||
|-----------|---------------|-------------|
|
||||
| `./bot_detector_logs` | `/var/log/bot_detector` | Decision logs (JSONL) |
|
||||
| `./bot_detector_models` | `/var/lib/bot_detector` | Persisted ML models |
|
||||
| `./reputation/data/user_files/bot_ip.csv` | `/data/bot_ip.csv` (ro) | Known bot IP list |
|
||||
| `./reputation/data/user_files/bot_ja4.csv` | `/data/bot_ja4.csv` (ro) | Known bot JA4 list |
|
||||
| `./reputation/data/user_files/asn_reputation.csv` | `/data/asn_reputation.csv` (ro) | ASN reputation labels |
|
||||
220
docs/services/correlator.md
Normal file
220
docs/services/correlator.md
Normal file
@ -0,0 +1,220 @@
|
||||
# Correlator
|
||||
|
||||
The correlator (`logcorrelator`) is a Go daemon that joins HTTP events from [mod-reqin-log](mod-reqin-log.md) (source A) with TLS/network events from [sentinel](sentinel.md) (source B) into unified correlated log entries. It uses a `src_ip:src_port` key with a configurable time window to match events, supports HTTP Keep-Alive connections, and writes results to ClickHouse, file, and/or stdout.
|
||||
|
||||
## Correlation Algorithm
|
||||
|
||||
### Key Matching
|
||||
|
||||
Events are correlated by their **correlation key**: `src_ip:src_port`. Since a client's ephemeral source port uniquely identifies a TCP connection, matching on this pair reliably joins the HTTP request (seen by Apache) with the TLS handshake (seen by sentinel) from the same connection.
|
||||
|
||||
### Time Window
|
||||
|
||||
Events must arrive within the configured time window (default: **10 seconds**) to be matched. This accounts for:
|
||||
- Processing latency between Apache and sentinel
|
||||
- Packet capture buffering
|
||||
- UNIX socket delivery ordering
|
||||
|
||||
### Keep-Alive Support
|
||||
|
||||
In `one_to_many` mode (default), a single TLS handshake event (source B) can match **multiple** HTTP requests (source A) on the same TCP connection:
|
||||
|
||||
1. Source B event arrives → buffered with TTL (default: 120 s)
|
||||
2. Source A event arrives with same key → correlation match, B event TTL resets
|
||||
3. Next A event on same connection → matches same B event (TTL resets again)
|
||||
4. Connection closes → B event expires after TTL
|
||||
|
||||
Each A event within a Keep-Alive session gets an incrementing `keepalives` counter.
|
||||
|
||||
### Orphan Handling
|
||||
|
||||
- **Source A orphans** (HTTP without TLS match): Emitted after `apache_emit_delay_ms` (default: 500 ms) with `correlated=false`, `orphan_side=A`
|
||||
- **Source B orphans** (TLS without HTTP match): Not emitted by default (`network_emit: false`)
|
||||
- **Buffer overflow**: Oldest events are rotated out and emitted as orphans
|
||||
|
||||
### Field Merging
|
||||
|
||||
When two events are correlated:
|
||||
- HTTP fields (method, path, headers, etc.) come from source A
|
||||
- TLS/network fields (JA4, JA3, IP/TCP metadata) come from source B
|
||||
- On field collision with different values: both are kept with `a_` and `b_` prefixes
|
||||
|
||||
## Configuration Reference
|
||||
|
||||
Configuration is loaded from a YAML file (default: `/etc/logcorrelator/logcorrelator.yml`).
|
||||
|
||||
### Log Settings
|
||||
|
||||
| Name | Type | Default | Description |
|
||||
|------|------|---------|-------------|
|
||||
| `log.level` | string | `INFO` | Log level: `DEBUG`, `INFO`, `WARN`, `ERROR` |
|
||||
|
||||
### Input Settings
|
||||
|
||||
| Name | Type | Default | Description |
|
||||
|------|------|---------|-------------|
|
||||
| `inputs.unix_sockets[].name` | string | — | Human-readable source name (e.g., `http`, `network`) |
|
||||
| `inputs.unix_sockets[].path` | string | — | UNIX socket path to listen on |
|
||||
| `inputs.unix_sockets[].format` | string | `json` | Input format |
|
||||
| `inputs.unix_sockets[].source_type` | string | — | Event source: `A` (HTTP), `B` (Network) |
|
||||
| `inputs.unix_sockets[].socket_permissions` | string | `0666` | Socket file permissions (octal) |
|
||||
|
||||
### Output Settings
|
||||
|
||||
#### File Output
|
||||
|
||||
| Name | Type | Default | Description |
|
||||
|------|------|---------|-------------|
|
||||
| `outputs.file.enabled` | bool | `true` | Enable file output |
|
||||
| `outputs.file.path` | string | `/var/log/logcorrelator/correlated.log` | Output file path |
|
||||
|
||||
#### ClickHouse Output
|
||||
|
||||
| Name | Type | Default | Description |
|
||||
|------|------|---------|-------------|
|
||||
| `outputs.clickhouse.enabled` | bool | `false` | Enable ClickHouse output |
|
||||
| `outputs.clickhouse.dsn` | string | — | ClickHouse DSN (e.g., `clickhouse://user:pass@host:9000/db`) |
|
||||
| `outputs.clickhouse.table` | string | — | Target table name |
|
||||
| `outputs.clickhouse.batch_size` | int | `500` | Records per batch insert |
|
||||
| `outputs.clickhouse.flush_interval_ms` | int | `200` | Flush interval in milliseconds |
|
||||
| `outputs.clickhouse.max_buffer_size` | int | `5000` | Maximum in-memory buffer size |
|
||||
| `outputs.clickhouse.drop_on_overflow` | bool | `true` | Drop records when buffer is full |
|
||||
| `outputs.clickhouse.async_insert` | bool | `true` | Use ClickHouse async inserts |
|
||||
| `outputs.clickhouse.timeout_ms` | int | `1000` | Operation timeout in milliseconds |
|
||||
|
||||
#### Stdout Output
|
||||
|
||||
| Name | Type | Default | Description |
|
||||
|------|------|---------|-------------|
|
||||
| `outputs.stdout.enabled` | bool | `false` | Enable stdout output |
|
||||
| `outputs.stdout.level` | string | — | Output verbosity filter |
|
||||
|
||||
### Correlation Settings
|
||||
|
||||
| Name | Type | Default | Description |
|
||||
|------|------|---------|-------------|
|
||||
| `correlation.time_window.value` | int | `10` | Time window value |
|
||||
| `correlation.time_window.unit` | string | `s` | Time window unit (`s`, `ms`) |
|
||||
| `correlation.orphan_policy.apache_always_emit` | bool | `true` | Always emit A events even without B match |
|
||||
| `correlation.orphan_policy.apache_emit_delay_ms` | int | `500` | Delay before emitting orphan A (ms) |
|
||||
| `correlation.orphan_policy.network_emit` | bool | `false` | Emit B events without A match |
|
||||
| `correlation.matching.mode` | string | `one_to_many` | Matching mode: `one_to_one` or `one_to_many` |
|
||||
| `correlation.buffers.max_http_items` | int | `10000` | Max buffered HTTP (source A) events |
|
||||
| `correlation.buffers.max_network_items` | int | `20000` | Max buffered network (source B) events |
|
||||
| `correlation.ttl.network_ttl_s` | int | `120` | TTL for source B events (seconds) |
|
||||
| `correlation.exclude_source_ips` | []string | `[]` | IPs or CIDRs to exclude from correlation |
|
||||
| `correlation.include_dest_ports` | []int | `[]` | If non-empty, only correlate events on these ports |
|
||||
|
||||
### Metrics Settings
|
||||
|
||||
| Name | Type | Default | Description |
|
||||
|------|------|---------|-------------|
|
||||
| `metrics.enabled` | bool | `false` | Enable metrics HTTP server |
|
||||
| `metrics.addr` | string | `:8080` | Metrics server listen address |
|
||||
|
||||
## Input Events
|
||||
|
||||
### Source A (HTTP — from mod-reqin-log)
|
||||
|
||||
JSON fields: `time`, `src_ip`, `src_port`, `dst_ip`, `dst_port`, `method`, `scheme`, `host`, `path`, `query`, `http_version`, `client_headers`, `header_*`
|
||||
|
||||
### Source B (Network — from sentinel)
|
||||
|
||||
JSON fields: `src_ip`, `src_port`, `dst_ip`, `dst_port`, `ip_meta_*`, `tcp_meta_*`, `tls_version`, `tls_sni`, `tls_alpn`, `ja4`, `ja3`, `ja3_hash`, `conn_id`, `syn_to_clienthello_ms`, `timestamp`
|
||||
|
||||
## Output CorrelatedLog JSON Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-03-09T14:30:00Z",
|
||||
"src_ip": "203.0.113.42",
|
||||
"src_port": 52341,
|
||||
"dst_ip": "192.168.1.10",
|
||||
"dst_port": 443,
|
||||
"correlated": true,
|
||||
"method": "GET",
|
||||
"host": "example.com",
|
||||
"path": "/api/v1/users",
|
||||
"ja4": "t13d1516h2_8daaf6152771_b0da82dd1658",
|
||||
"ja3_hash": "e7d705a3286e19ea42f587b344ee6865",
|
||||
"ip_meta_ttl": 64,
|
||||
"tcp_meta_window_size": 65535,
|
||||
"tls_version": "1.3",
|
||||
"tls_sni": "example.com",
|
||||
"tls_alpn": "h2",
|
||||
"header_User-Agent": "Mozilla/5.0 ...",
|
||||
"keepalives": 3
|
||||
}
|
||||
```
|
||||
|
||||
Core fields are always present; additional fields are merged from A and B event raw data.
|
||||
|
||||
## ClickHouse Sink
|
||||
|
||||
- **Protocol**: ClickHouse native TCP (port 9000) via `clickhouse-go/v2`
|
||||
- **Target table**: `http_logs_raw` (raw JSON stored, then parsed by materialized views)
|
||||
- **Batch inserts**: Buffered up to `batch_size` records (default 500)
|
||||
- **Flush interval**: Default 200 ms timer triggers flush if batch not full
|
||||
- **Retry behavior**: Up to 3 retries with exponential backoff (100 ms base)
|
||||
- **Connection ping**: 5-second timeout on startup
|
||||
- **Buffer overflow**: Records dropped when buffer exceeds `max_buffer_size` (configurable)
|
||||
|
||||
## Metrics HTTP Server
|
||||
|
||||
When `metrics.enabled: true`, exposes:
|
||||
|
||||
| Endpoint | Description |
|
||||
|----------|-------------|
|
||||
| `GET /metrics` | Correlation metrics as JSON (events received, correlated, orphans, buffer sizes) |
|
||||
| `GET /health` | Health check endpoint |
|
||||
|
||||
## systemd Service
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=logcorrelator service
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=logcorrelator
|
||||
Group=logcorrelator
|
||||
ExecStart=/usr/bin/logcorrelator -config /etc/logcorrelator/logcorrelator.yml
|
||||
ExecReload=/bin/kill -HUP $MAINPID
|
||||
Restart=on-failure
|
||||
RestartSec=5
|
||||
RuntimeDirectory=logcorrelator
|
||||
RuntimeDirectoryMode=0755
|
||||
|
||||
# Security hardening
|
||||
NoNewPrivileges=true
|
||||
ProtectSystem=strict
|
||||
ProtectHome=true
|
||||
ReadWritePaths=/var/log/logcorrelator /etc/logcorrelator
|
||||
|
||||
# Resource limits
|
||||
LimitNOFILE=65536
|
||||
TimeoutStartSec=10
|
||||
TimeoutStopSec=30
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
### Security Hardening
|
||||
|
||||
- Runs as dedicated `logcorrelator` user/group
|
||||
- `NoNewPrivileges=true` — prevents privilege escalation
|
||||
- `ProtectSystem=strict` — read-only filesystem except `ReadWritePaths`
|
||||
- `ProtectHome=true` — no access to home directories
|
||||
- `RuntimeDirectory=logcorrelator` — systemd creates socket directory with correct ownership
|
||||
|
||||
## RPM Package Contents
|
||||
|
||||
| Path | Description |
|
||||
|------|-------------|
|
||||
| `/usr/bin/logcorrelator` | Binary |
|
||||
| `/etc/logcorrelator/logcorrelator.yml` | Configuration file |
|
||||
| `/usr/lib/systemd/system/logcorrelator.service` | systemd unit |
|
||||
| `/var/log/logcorrelator/` | Log directory |
|
||||
| `/var/run/logcorrelator/` | Socket directory (RuntimeDirectory) |
|
||||
308
docs/services/dashboard.md
Normal file
308
docs/services/dashboard.md
Normal file
@ -0,0 +1,308 @@
|
||||
# Dashboard
|
||||
|
||||
The dashboard is a SOC (Security Operations Center) web application built with FastAPI (backend) and React (frontend) that provides real-time visualization, investigation, and analysis of bot detections generated by the [bot-detector](bot-detector.md). It queries ClickHouse (`mabase_prod`) for all data.
|
||||
|
||||
## Technology Stack
|
||||
|
||||
| Component | Technology |
|
||||
|-----------|-----------|
|
||||
| Backend | Python 3.11 + FastAPI |
|
||||
| Frontend | React + Vite |
|
||||
| Database | ClickHouse (via `ja4_common` shared client) |
|
||||
| API Docs | Swagger UI (`/docs`) and ReDoc (`/redoc`) |
|
||||
|
||||
## Configuration
|
||||
|
||||
| Variable | Type | Default | Description |
|
||||
|----------|------|---------|-------------|
|
||||
| `CLICKHOUSE_HOST` | string | `clickhouse` | ClickHouse hostname |
|
||||
| `CLICKHOUSE_PORT` | int | `8123` | ClickHouse HTTP port |
|
||||
| `CLICKHOUSE_DB` | string | `mabase_prod` | Database name |
|
||||
| `CLICKHOUSE_USER` | string | `admin` | ClickHouse user |
|
||||
| `CLICKHOUSE_PASSWORD` | string | `""` | ClickHouse password |
|
||||
| `API_HOST` | string | `0.0.0.0` | API listen address |
|
||||
| `API_PORT` | int | `8000` | API listen port |
|
||||
| `CORS_ORIGINS` | list | `["http://localhost:3000", "http://127.0.0.1:3000"]` | Allowed CORS origins |
|
||||
|
||||
## API Reference
|
||||
|
||||
All endpoints are prefixed with `/api/`. The dashboard exposes **74+ endpoints** across 20 routers.
|
||||
|
||||
### Health
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/health` | Health check — returns ClickHouse connection status |
|
||||
|
||||
---
|
||||
|
||||
### Metrics (`/api/metrics`)
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/metrics` | Global dashboard metrics: detection counts by threat level, unique IPs, time series |
|
||||
| GET | `/api/metrics/threats` | Threat distribution summary |
|
||||
| GET | `/api/metrics/baseline` | Human baseline statistics |
|
||||
|
||||
---
|
||||
|
||||
### Detections (`/api/detections`)
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/detections` | Paginated detection list with filtering, sorting, and text search |
|
||||
| GET | `/api/detections/{detection_id}` | Single detection details |
|
||||
|
||||
**Query Parameters** (GET `/api/detections`):
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `page` | int | Page number (default: 1) |
|
||||
| `page_size` | int | Items per page (default: 20) |
|
||||
| `threat_level` | string | Filter by threat level |
|
||||
| `model_name` | string | Filter by model name |
|
||||
| `search` | string | Full-text search across IP, JA4, host, bot_name |
|
||||
| `sort_by` | string | Sort field |
|
||||
| `sort_order` | string | `asc` or `desc` |
|
||||
|
||||
---
|
||||
|
||||
### Investigation (`/api/investigation`)
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/investigation/{ip}/summary` | **Primary investigation endpoint.** Aggregates ML score, brute-force, TCP spoofing, JA4 rotation, persistence, and 24h timeline into a single response with a `risk_score` (0–100) |
|
||||
|
||||
---
|
||||
|
||||
### Reputation (`/api/reputation`)
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/reputation/ip/{ip_address}` | Full IP reputation from IP-API.com and IPinfo.io (proxy, VPN, Tor, hosting detection) |
|
||||
| GET | `/api/reputation/ip/{ip_address}/summary` | Simplified reputation summary |
|
||||
|
||||
---
|
||||
|
||||
### Analysis (`/api/analysis`)
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/analysis/{ip}/subnet` | Subnet analysis for an IP (related IPs in same /24) |
|
||||
| GET | `/api/analysis/{ip}/country` | Country-level analysis for an IP |
|
||||
| GET | `/api/analysis/country` | Global country analysis across all detections |
|
||||
| GET | `/api/analysis/{ip}/ja4` | JA4 fingerprint analysis for an IP |
|
||||
| GET | `/api/analysis/{ip}/user-agents` | User-agent analysis for an IP |
|
||||
| GET | `/api/analysis/{ip}/recommendation` | SOC classification recommendation |
|
||||
| POST | `/api/analysis/classifications` | Create a classification (legitimate/suspicious/malicious) |
|
||||
| GET | `/api/analysis/classifications` | List all classifications |
|
||||
| GET | `/api/analysis/classifications/stats` | Classification statistics |
|
||||
|
||||
---
|
||||
|
||||
### Entities (`/api/entities`)
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/entities/types` | List available entity types |
|
||||
| GET | `/api/entities/subnet/{subnet}` | Investigate a subnet |
|
||||
| GET | `/api/entities/{entity_type}/{entity_value}` | Investigate any entity (IP, JA4, subnet, UA, host) |
|
||||
| GET | `/api/entities/{entity_type}/{entity_value}/related` | Related entities |
|
||||
| GET | `/api/entities/{entity_type}/{entity_value}/user_agents` | User-agents for entity |
|
||||
| GET | `/api/entities/{entity_type}/{entity_value}/client_headers` | Client headers for entity |
|
||||
| GET | `/api/entities/{entity_type}/{entity_value}/paths` | URL paths for entity |
|
||||
| GET | `/api/entities/{entity_type}/{entity_value}/query_params` | Query parameters for entity |
|
||||
|
||||
---
|
||||
|
||||
### Incidents (`/api/incidents`)
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/incidents` | List all incidents |
|
||||
| GET | `/api/incidents/clusters` | Active incident clusters (behavioral similarity grouping) |
|
||||
| GET | `/api/incidents/{cluster_id}` | Incident cluster details |
|
||||
| POST | `/api/incidents/{cluster_id}/classify` | Classify an incident cluster |
|
||||
|
||||
---
|
||||
|
||||
### Fingerprints (`/api/fingerprints`)
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/fingerprints/spoofing` | TLS fingerprint spoofing detection |
|
||||
| GET | `/api/fingerprints/ja4-ua-matrix` | JA4 ↔ User-Agent correlation matrix |
|
||||
| GET | `/api/fingerprints/ua-analysis` | Suspicious user-agent analysis |
|
||||
| GET | `/api/fingerprints/ip/{ip}/coherence` | Fingerprint coherence analysis per IP |
|
||||
| GET | `/api/fingerprints/legitimate-ja4` | Known legitimate JA4 fingerprints |
|
||||
| GET | `/api/fingerprints/asn-correlation` | JA4-ASN correlation analysis |
|
||||
|
||||
---
|
||||
|
||||
### Brute Force (`/api/bruteforce`)
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/bruteforce/targets` | Brute-force target hosts |
|
||||
| GET | `/api/bruteforce/attackers` | Brute-force attacker IPs |
|
||||
| GET | `/api/bruteforce/timeline` | Brute-force attack timeline |
|
||||
| GET | `/api/bruteforce/host/{host}/attackers` | Attackers for a specific host |
|
||||
|
||||
---
|
||||
|
||||
### TCP Spoofing (`/api/tcp-spoofing`)
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/tcp-spoofing/overview` | TCP/OS fingerprint spoofing overview |
|
||||
| GET | `/api/tcp-spoofing/list` | Spoofing detection list |
|
||||
| GET | `/api/tcp-spoofing/matrix` | TTL × MSS anomaly matrix |
|
||||
|
||||
---
|
||||
|
||||
### Header Fingerprint (`/api/headers`)
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/headers/clusters` | Header fingerprint clusters (suspicious patterns) |
|
||||
| GET | `/api/headers/cluster/{hash}/ips` | IPs sharing a header fingerprint |
|
||||
|
||||
---
|
||||
|
||||
### Heatmap (`/api/heatmap`)
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/heatmap/hourly` | Hourly traffic heatmap |
|
||||
| GET | `/api/heatmap/top-hosts` | Top hosts by traffic volume |
|
||||
| GET | `/api/heatmap/matrix` | Activity/hour matrix |
|
||||
|
||||
---
|
||||
|
||||
### Botnets (`/api/botnets`)
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/botnets/ja4-spread` | JA4 geographic spread (botnet indicator) |
|
||||
| GET | `/api/botnets/ja4/{ja4}/countries` | Country distribution for a JA4 fingerprint |
|
||||
| GET | `/api/botnets/summary` | Global botnet detection summary |
|
||||
|
||||
---
|
||||
|
||||
### Rotation (`/api/rotation`)
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/rotation/ja4-rotators` | IPs rotating JA4 fingerprints (evasion detection) |
|
||||
| GET | `/api/rotation/persistent-threats` | Persistent threats across time windows |
|
||||
| GET | `/api/rotation/ip/{ip}/ja4-history` | JA4 fingerprint history for an IP |
|
||||
| GET | `/api/rotation/sophistication` | Sophistication score analysis |
|
||||
| GET | `/api/rotation/proactive-hunt` | Proactive threat hunting suggestions |
|
||||
|
||||
---
|
||||
|
||||
### ML Features (`/api/ml`)
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/ml/top-anomalies` | Top anomalies with feature details |
|
||||
| GET | `/api/ml/ip/{ip}/radar` | Feature radar chart data for an IP |
|
||||
| GET | `/api/ml/score-distribution` | Anomaly score distribution histogram |
|
||||
| GET | `/api/ml/score-trends` | Score trends over time |
|
||||
| GET | `/api/ml/b-features` | Source B (TCP/TLS) feature analysis |
|
||||
| GET | `/api/ml/campaigns` | ML-detected campaign analysis |
|
||||
| GET | `/api/ml/scatter` | Feature scatter plot data |
|
||||
|
||||
---
|
||||
|
||||
### Attributes (`/api/attributes`)
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/attributes/{attr_type}` | List distinct values for an attribute (ja4, user_agent, asn, country, host) with counts |
|
||||
|
||||
---
|
||||
|
||||
### Variability (`/api/variability`)
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/variability/{attr_type}/{value}` | Behavioral variability analysis for an attribute value |
|
||||
| GET | `/api/variability/{attr_type}/{value}/ips` | IPs associated with an attribute value |
|
||||
| GET | `/api/variability/{attr_type}/{value}/attributes` | Attribute breakdown for a value |
|
||||
| GET | `/api/variability/{attr_type}/{value}/user_agents` | User-agents for an attribute value |
|
||||
|
||||
---
|
||||
|
||||
### Clustering (`/api/clustering`)
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/clustering/status` | Clustering cache status |
|
||||
| GET | `/api/clustering/clusters` | K-Means cluster list |
|
||||
| GET | `/api/clustering/cluster/{cluster_id}/points` | Data points in a cluster |
|
||||
| GET | `/api/clustering/cluster/{cluster_id}/ips` | IPs in a cluster |
|
||||
|
||||
---
|
||||
|
||||
### Search (`/api/search`)
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/search/quick` | Cross-entity search (IP, JA4, host, UA, country, ASN) |
|
||||
|
||||
---
|
||||
|
||||
### Audit (`/api/audit`)
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| POST | `/api/audit/logs` | Create an audit log entry |
|
||||
| GET | `/api/audit/logs` | Query audit logs (filtered, paginated) |
|
||||
| GET | `/api/audit/stats` | Audit statistics |
|
||||
| GET | `/api/audit/users/activity` | Per-user activity summary |
|
||||
|
||||
## Frontend Structure
|
||||
|
||||
The React frontend is built with Vite and served as static assets:
|
||||
|
||||
- **Entry point**: `/` → `frontend/dist/index.html`
|
||||
- **Static assets**: `/assets/*` → `frontend/dist/assets/`
|
||||
- **SPA routing**: All non-`/api/` paths fall through to `index.html` (React Router)
|
||||
- **API proxy**: Frontend calls `/api/*` which is handled by FastAPI routers
|
||||
|
||||
## Services
|
||||
|
||||
### IPReputationService
|
||||
|
||||
Queries public IP reputation databases (IP-API.com, IPinfo.io) without API keys:
|
||||
- Proxy/VPN/Tor detection
|
||||
- ASN, country, ISP information
|
||||
- Hosting provider identification
|
||||
|
||||
### ClusteringEngine
|
||||
|
||||
K-Means clustering on ML features with caching:
|
||||
- Automatic cluster count selection
|
||||
- Feature normalization via StandardScaler
|
||||
- In-memory cache with TTL
|
||||
|
||||
## Deployment
|
||||
|
||||
```bash
|
||||
# Build Docker image
|
||||
make build-dashboard
|
||||
|
||||
# Run tests
|
||||
make test-dashboard
|
||||
|
||||
# Run locally (development)
|
||||
cd services/dashboard
|
||||
uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000
|
||||
```
|
||||
|
||||
### Health Check
|
||||
|
||||
```
|
||||
GET /health → {"status": "healthy", "clickhouse": "connected"}
|
||||
```
|
||||
200
docs/services/mod-reqin-log.md
Normal file
200
docs/services/mod-reqin-log.md
Normal file
@ -0,0 +1,200 @@
|
||||
# mod-reqin-log
|
||||
|
||||
`mod_reqin_log` is an Apache HTTPD module (C shared object) that captures HTTP request metadata and sends it as JSON to a UNIX datagram socket. It serves as the HTTP-layer ingestion point for the ja4-platform pipeline, feeding request data to the [correlator](correlator.md) for joining with TLS fingerprint data from [sentinel](sentinel.md).
|
||||
|
||||
## Purpose
|
||||
|
||||
Apache processes HTTP requests after TLS termination, so it has access to the decoded HTTP method, path, headers, and client IP/port. mod-reqin-log hooks into the `post_read_request` phase to serialize this data immediately, before any rewrite or auth module modifies the request.
|
||||
|
||||
## Apache Directives Reference
|
||||
|
||||
All directives are server-level (`RSRC_CONF`):
|
||||
|
||||
| Directive | Type | Default | Description |
|
||||
|-----------|------|---------|-------------|
|
||||
| `JsonSockLogEnabled` | Flag (On/Off) | Off | Enable or disable the module |
|
||||
| `JsonSockLogSocket` | String | — | UNIX domain socket path for JSON output |
|
||||
| `JsonSockLogHeaders` | String list | — | HTTP header names to log (repeatable) |
|
||||
| `JsonSockLogMaxHeaders` | Integer | `25` | Maximum number of headers to log |
|
||||
| `JsonSockLogMaxHeaderValueLen` | Integer | `256` | Maximum length of each header value (truncated beyond) |
|
||||
| `JsonSockLogReconnectInterval` | Integer (seconds) | `10` | Minimum seconds between reconnection attempts |
|
||||
| `JsonSockLogErrorReportInterval` | Integer (seconds) | `10` | Minimum seconds between error log entries (throttling) |
|
||||
| `JsonSockLogLevel` | String | `WARNING` | Module log level: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `EMERG` |
|
||||
|
||||
### Example httpd.conf
|
||||
|
||||
```apache
|
||||
LoadModule reqin_log_module modules/mod_reqin_log.so
|
||||
|
||||
JsonSockLogEnabled On
|
||||
JsonSockLogSocket /var/run/logcorrelator/http.socket
|
||||
JsonSockLogHeaders User-Agent Accept Accept-Encoding Accept-Language
|
||||
JsonSockLogHeaders Content-Type X-Request-Id X-Trace-Id X-Forwarded-For
|
||||
JsonSockLogHeaders Sec-CH-UA Sec-CH-UA-Mobile Sec-CH-UA-Platform
|
||||
JsonSockLogHeaders Sec-Fetch-Dest Sec-Fetch-Mode Sec-Fetch-Site
|
||||
JsonSockLogMaxHeaders 25
|
||||
JsonSockLogMaxHeaderValueLen 256
|
||||
JsonSockLogReconnectInterval 10
|
||||
JsonSockLogErrorReportInterval 10
|
||||
JsonSockLogLevel WARNING
|
||||
```
|
||||
|
||||
## Output JSON Schema
|
||||
|
||||
Each HTTP request is serialized as a flat JSON object and sent as a single UNIX datagram:
|
||||
|
||||
```json
|
||||
{
|
||||
"time": "2026-03-09T14:30:00Z",
|
||||
"src_ip": "203.0.113.42",
|
||||
"src_port": 52341,
|
||||
"dst_ip": "192.168.1.10",
|
||||
"dst_port": 443,
|
||||
"method": "GET",
|
||||
"scheme": "https",
|
||||
"host": "example.com",
|
||||
"path": "/api/v1/users",
|
||||
"query": "page=1&limit=20",
|
||||
"http_version": "HTTP/2.0",
|
||||
"client_headers": "User-Agent,Accept,Accept-Encoding,Accept-Language",
|
||||
"header_User-Agent": "Mozilla/5.0 ...",
|
||||
"header_Accept": "text/html,application/xhtml+xml",
|
||||
"header_Accept-Encoding": "gzip, deflate, br",
|
||||
"header_Accept-Language": "en-US,en;q=0.9",
|
||||
"header_Sec-Fetch-Dest": "document",
|
||||
"header_Sec-Fetch-Mode": "navigate",
|
||||
"header_Sec-Fetch-Site": "none"
|
||||
}
|
||||
```
|
||||
|
||||
### Field Reference
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `time` | string (ISO 8601) | Request timestamp (UTC) |
|
||||
| `src_ip` | string | Client IP address |
|
||||
| `src_port` | int | Client source port |
|
||||
| `dst_ip` | string | Server IP address |
|
||||
| `dst_port` | int | Server port |
|
||||
| `method` | string | HTTP method (`GET`, `POST`, etc.) |
|
||||
| `scheme` | string | URL scheme (`http` or `https`) |
|
||||
| `host` | string | HTTP Host header value |
|
||||
| `path` | string | Request URI path |
|
||||
| `query` | string | Query string (without `?`) |
|
||||
| `http_version` | string | HTTP version (`HTTP/1.1`, `HTTP/2.0`) |
|
||||
| `client_headers` | string | Comma-separated list of header names sent by client (order preserved) |
|
||||
| `header_<Name>` | string | Value of each configured header (one field per header) |
|
||||
|
||||
### Sensitive Headers
|
||||
|
||||
The following headers are **always excluded** from output regardless of `JsonSockLogHeaders`:
|
||||
|
||||
- `Authorization`
|
||||
- `Cookie`
|
||||
- `Set-Cookie`
|
||||
- `X-Api-Key`
|
||||
- `X-Auth-Token`
|
||||
- `Proxy-Authorization`
|
||||
- `WWW-Authenticate`
|
||||
|
||||
### Size Limits
|
||||
|
||||
- Maximum JSON size: **64 KB** (prevents memory exhaustion DoS)
|
||||
- Header values are truncated to `JsonSockLogMaxHeaderValueLen` bytes
|
||||
|
||||
## Thread Safety
|
||||
|
||||
mod-reqin-log is designed for Apache's `worker` and `event` MPMs (multi-threaded):
|
||||
|
||||
- **Socket FD** is protected by an `apr_thread_mutex_t` (`fd_mutex`)
|
||||
- **Per-child process state** includes the socket file descriptor, mutex, and error tracking
|
||||
- **Error reporting** uses `LOG_THROTTLED` macro with timestamp-based deduplication
|
||||
- All JSON serialization uses per-request pool allocation — no shared buffers
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
Apache HTTPD process
|
||||
├── child process 1
|
||||
│ ├── fd_mutex (apr_thread_mutex_t)
|
||||
│ ├── socket_fd (shared across threads)
|
||||
│ ├── thread 1 → post_read_request → serialize JSON → mutex lock → sendto() → unlock
|
||||
│ ├── thread 2 → post_read_request → serialize JSON → mutex lock → sendto() → unlock
|
||||
│ └── ...
|
||||
├── child process 2
|
||||
│ ├── fd_mutex
|
||||
│ ├── socket_fd (independent)
|
||||
│ └── ...
|
||||
```
|
||||
|
||||
## Reconnection Behavior
|
||||
|
||||
- Socket is opened during `child_init` (per-child process startup)
|
||||
- If the socket is unavailable at startup, connection is deferred
|
||||
- On send failure, reconnection is attempted respecting `JsonSockLogReconnectInterval`
|
||||
- Failed sends are silently dropped (HTTP request processing is not blocked)
|
||||
- Error log entries are throttled by `JsonSockLogErrorReportInterval`
|
||||
- Socket type: `SOCK_DGRAM` (connectionless UNIX datagram)
|
||||
- Non-blocking sends with `MSG_NOSIGNAL`
|
||||
|
||||
## Deployment
|
||||
|
||||
### Installation via RPM
|
||||
|
||||
```bash
|
||||
rpm -ivh mod_reqin_log-1.0.19-1.el10.x86_64.rpm
|
||||
```
|
||||
|
||||
### LoadModule Directive
|
||||
|
||||
```apache
|
||||
LoadModule reqin_log_module modules/mod_reqin_log.so
|
||||
```
|
||||
|
||||
### Verifying Installation
|
||||
|
||||
```bash
|
||||
httpd -M | grep reqin_log
|
||||
# Expected: reqin_log_module (shared)
|
||||
```
|
||||
|
||||
## Build
|
||||
|
||||
All builds run inside Docker:
|
||||
|
||||
```bash
|
||||
# Run unit tests
|
||||
make test-mod-reqin-log
|
||||
|
||||
# Build RPM packages (el8, el9, el10)
|
||||
make rpm-mod-reqin-log
|
||||
# RPMs in services/mod-reqin-log/dist/rpm/el{8,9,10}/
|
||||
```
|
||||
|
||||
### Local Build (requires Apache development headers)
|
||||
|
||||
```bash
|
||||
cd services/mod-reqin-log
|
||||
make build # Compiles mod_reqin_log.so via apxs
|
||||
make test # Runs unit tests
|
||||
```
|
||||
|
||||
### Test Coverage
|
||||
|
||||
Unit tests cover:
|
||||
- JSON serialization (escaping, size limits, field output)
|
||||
- Config parsing (all directives, edge cases)
|
||||
- Header handling (sensitive header exclusion, max headers, truncation)
|
||||
- Module integration (real Apache module hooks)
|
||||
|
||||
## Source Files
|
||||
|
||||
| File | Description |
|
||||
|------|-------------|
|
||||
| `src/mod_reqin_log.c` | Main module source |
|
||||
| `src/mod_reqin_log.h` | Header with types, constants, defaults |
|
||||
| `conf/mod_reqin_log.conf` | Example Apache configuration |
|
||||
| `tests/unit/test_json_serialization.c` | JSON output tests |
|
||||
| `tests/unit/test_config_parsing.c` | Directive parsing tests |
|
||||
| `tests/unit/test_header_handling.c` | Header filtering tests |
|
||||
| `tests/unit/test_module_real.c` | Integration tests |
|
||||
247
docs/services/sentinel.md
Normal file
247
docs/services/sentinel.md
Normal file
@ -0,0 +1,247 @@
|
||||
# Sentinel
|
||||
|
||||
Sentinel (`ja4sentinel`) is a Go daemon that performs live network packet capture on a Linux server, extracts TLS ClientHello handshakes, generates JA4 and JA3 fingerprints, enriches them with IP/TCP metadata, and outputs structured JSON log records to configurable destinations (UNIX socket, file, or stdout).
|
||||
|
||||
## Role in the Pipeline
|
||||
|
||||
Sentinel is the **network-layer ingestion point**. It sits on the target server, captures TLS traffic via libpcap, and feeds fingerprinted events to the [correlator](correlator.md) through a UNIX datagram socket.
|
||||
|
||||
```
|
||||
Network traffic (port 443/8443)
|
||||
│ pcap
|
||||
▼
|
||||
┌───────────────┐
|
||||
│ sentinel │
|
||||
│ ┌─────────┐ │
|
||||
│ │ capture │──▶ Raw packets
|
||||
│ └─────────┘ │
|
||||
│ ┌─────────┐ │
|
||||
│ │ tlsparse│──▶ TLS ClientHello extraction + TCP reassembly
|
||||
│ └─────────┘ │
|
||||
│ ┌─────────┐ │
|
||||
│ │ finger- │──▶ JA4/JA3 fingerprint generation
|
||||
│ │ print │ │
|
||||
│ └─────────┘ │
|
||||
│ ┌─────────┐ │
|
||||
│ │ output │──▶ UNIX socket / file / stdout
|
||||
│ └─────────┘ │
|
||||
└───────────────┘
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
Sentinel uses a pipeline of goroutines:
|
||||
|
||||
1. **Capture goroutine** — Opens pcap handle on the configured interface, applies BPF filter, reads raw packets into a buffered channel (`packet_buffer_size`).
|
||||
2. **Packet processor goroutine** — Reads from the channel, feeds packets to the TLS parser, generates fingerprints, and writes output.
|
||||
3. **Watchdog goroutine** — Sends systemd watchdog heartbeats at half the configured interval.
|
||||
4. **Signal handler** — Listens for `SIGINT`/`SIGTERM` (graceful shutdown) and `SIGHUP` (log rotation).
|
||||
|
||||
### Key Interfaces
|
||||
|
||||
| Interface | Package | Description |
|
||||
|-----------|---------|-------------|
|
||||
| `Capture` | `internal/capture` | Packet capture via libpcap |
|
||||
| `Parser` | `internal/tlsparse` | TCP reassembly + ClientHello extraction |
|
||||
| `Engine` | `internal/fingerprint` | JA4/JA3 fingerprint generation |
|
||||
| `Writer` | `internal/output` | Log record output (stdout, file, UNIX socket) |
|
||||
| `MultiWriter` | `internal/output` | Fan-out to multiple writers |
|
||||
| `Builder` | `internal/output` | Factory for constructing writers from config |
|
||||
|
||||
## Configuration Reference
|
||||
|
||||
Configuration is loaded from a YAML file (default: `config.yml`) with environment variable overrides.
|
||||
|
||||
### Core Settings
|
||||
|
||||
| Name | Type | Default | Env Override | Description |
|
||||
|------|------|---------|-------------|-------------|
|
||||
| `core.interface` | string | `any` | `JA4SENTINEL_INTERFACE` | Network interface to capture (`any` = all interfaces) |
|
||||
| `core.listen_ports` | []uint16 | `[443]` | `JA4SENTINEL_PORTS` | TCP ports to monitor (comma-separated in env) |
|
||||
| `core.bpf_filter` | string | `""` (auto) | `JA4SENTINEL_BPF_FILTER` | Custom BPF filter (empty = auto-generated) |
|
||||
| `core.local_ips` | []string | `[]` (auto) | — | Local IPs to monitor (empty = auto-detect, excludes loopback) |
|
||||
| `core.exclude_source_ips` | []string | `[]` | — | Source IPs or CIDRs to exclude (e.g., `["10.0.0.0/8"]`) |
|
||||
| `core.flow_timeout_sec` | int | `30` | `JA4SENTINEL_FLOW_TIMEOUT` | Timeout for TLS handshake extraction (1–300) |
|
||||
| `core.packet_buffer_size` | int | `1000` | `JA4SENTINEL_PACKET_BUFFER_SIZE` | Packet channel buffer size (1–1,000,000) |
|
||||
| `core.log_level` | string | `info` | — | Log level: `debug`, `info`, `warn`, `error` (YAML only) |
|
||||
|
||||
> **Note:** `log_level` is intentionally not overridable via environment variable (architecture decision since v1.1.12).
|
||||
|
||||
### Output Settings
|
||||
|
||||
Each output is an entry in the `outputs` array:
|
||||
|
||||
| Name | Type | Default | Description |
|
||||
|------|------|---------|-------------|
|
||||
| `type` | string | — | Output type: `unix_socket`, `stdout`, `file` |
|
||||
| `enabled` | bool | — | Whether this output is active |
|
||||
| `async_buffer` | int | `1000` | Queue size for async writes |
|
||||
| `params.socket_path` | string | — | Path for `unix_socket` type |
|
||||
| `params.path` | string | — | File path for `file` type |
|
||||
|
||||
### Example Configuration
|
||||
|
||||
```yaml
|
||||
core:
|
||||
interface: any
|
||||
listen_ports: [443, 8443]
|
||||
bpf_filter: ""
|
||||
local_ips: []
|
||||
exclude_source_ips: ["10.0.0.0/8", "192.168.1.1"]
|
||||
flow_timeout_sec: 30
|
||||
packet_buffer_size: 1000
|
||||
log_level: info
|
||||
|
||||
outputs:
|
||||
- type: unix_socket
|
||||
enabled: true
|
||||
params:
|
||||
socket_path: /var/run/logcorrelator/network.socket
|
||||
- type: file
|
||||
enabled: false
|
||||
params:
|
||||
path: /var/log/ja4sentinel/ja4.log
|
||||
```
|
||||
|
||||
## Output Format (LogRecord JSON Schema)
|
||||
|
||||
Each output record is a flat JSON object:
|
||||
|
||||
```json
|
||||
{
|
||||
"src_ip": "203.0.113.42",
|
||||
"src_port": 52341,
|
||||
"dst_ip": "192.168.1.10",
|
||||
"dst_port": 443,
|
||||
"ip_meta_ttl": 64,
|
||||
"ip_meta_total_length": 583,
|
||||
"ip_meta_id": 12345,
|
||||
"ip_meta_df": true,
|
||||
"tcp_meta_window_size": 65535,
|
||||
"tcp_meta_mss": 1460,
|
||||
"tcp_meta_window_scale": 8,
|
||||
"tcp_meta_options": "MSS,NOP,WScale,NOP,NOP,Timestamps,SACK",
|
||||
"conn_id": "203.0.113.42:52341-192.168.1.10:443",
|
||||
"sensor_id": "",
|
||||
"tls_version": "1.3",
|
||||
"tls_sni": "example.com",
|
||||
"tls_alpn": "h2",
|
||||
"syn_to_clienthello_ms": 12,
|
||||
"ja4": "t13d1516h2_8daaf6152771_b0da82dd1658",
|
||||
"ja3": "771,4866-4867-4865-49196-49200...",
|
||||
"ja3_hash": "e7d705a3286e19ea42f587b344ee6865",
|
||||
"timestamp": 1709312345678901234
|
||||
}
|
||||
```
|
||||
|
||||
### Field Reference
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `src_ip` | string | Client source IP address |
|
||||
| `src_port` | uint16 | Client source port |
|
||||
| `dst_ip` | string | Server destination IP address |
|
||||
| `dst_port` | uint16 | Server destination port |
|
||||
| `ip_meta_ttl` | uint8 | IP Time-To-Live |
|
||||
| `ip_meta_total_length` | uint16 | IP total packet length |
|
||||
| `ip_meta_id` | uint16 | IP identification field |
|
||||
| `ip_meta_df` | bool | IP Don't Fragment flag |
|
||||
| `tcp_meta_window_size` | uint16 | TCP window size |
|
||||
| `tcp_meta_mss` | uint16 | TCP Maximum Segment Size (omitted if 0) |
|
||||
| `tcp_meta_window_scale` | uint8 | TCP window scale factor (omitted if 0) |
|
||||
| `tcp_meta_options` | string | Comma-separated TCP options |
|
||||
| `conn_id` | string | Unique flow identifier |
|
||||
| `sensor_id` | string | Sensor/captor identifier |
|
||||
| `tls_version` | string | Max TLS version from ClientHello |
|
||||
| `tls_sni` | string | Server Name Indication |
|
||||
| `tls_alpn` | string | ALPN protocol (e.g., `h2`, `http/1.1`) |
|
||||
| `syn_to_clienthello_ms` | uint32 | Time from SYN to ClientHello (ms) |
|
||||
| `ja4` | string | JA4 TLS fingerprint |
|
||||
| `ja3` | string | JA3 TLS fingerprint |
|
||||
| `ja3_hash` | string | MD5 hash of JA3 string |
|
||||
| `timestamp` | int64 | Unix nanoseconds |
|
||||
|
||||
## UNIX Socket Output Protocol
|
||||
|
||||
- **Socket type**: `unixgram` (DGRAM — connectionless)
|
||||
- **Encoding**: One JSON object per datagram (no delimiter)
|
||||
- **Max datagram size**: 64 KB
|
||||
- **Reconnection**: Exponential backoff (100 ms → 2 s), max 3 attempts per write
|
||||
- **Queue**: Async write queue (default 1000 items) absorbs transient socket failures
|
||||
- **Error callback**: Consecutive failures are tracked and reported
|
||||
|
||||
## Signal Handling
|
||||
|
||||
| Signal | Behavior |
|
||||
|--------|----------|
|
||||
| `SIGTERM` / `SIGINT` | Graceful shutdown: cancel context, close capture, flush outputs, log filter stats |
|
||||
| `SIGHUP` | Log rotation: reopen file outputs (used by `systemctl reload` + logrotate) |
|
||||
|
||||
## JA4 Fingerprint Algorithm
|
||||
|
||||
1. Extract TLS ClientHello from the TCP payload (with TCP reassembly for fragmented handshakes)
|
||||
2. Parse cipher suites, extensions, ALPN, SNI, supported versions
|
||||
3. Build JA4 string: `t{version}{sni_flag}{cipher_count}{ext_count}_{cipher_hash}_{ext_hash}`
|
||||
4. Build JA3 string: `{version},{ciphers},{extensions},{curves},{formats}`
|
||||
5. Compute JA3 MD5 hash
|
||||
|
||||
Sentinel uses the `tlsfingerprint` library for ALPN and TLS version parsing, with custom sanitization for malformed/truncated ClientHellos.
|
||||
|
||||
## Deployment
|
||||
|
||||
### systemd
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=ja4sentinel TLS fingerprinting daemon
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
Type=notify
|
||||
ExecStart=/usr/bin/ja4sentinel -config /etc/ja4sentinel/config.yml
|
||||
ExecReload=/bin/kill -HUP $MAINPID
|
||||
Restart=on-failure
|
||||
WatchdogSec=30
|
||||
TimeoutStopSec=2
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
Sentinel uses systemd `sd_notify` for:
|
||||
- `READY` — sent after initialization
|
||||
- `WATCHDOG` — sent at half the `WatchdogSec` interval
|
||||
- `STOPPING` — sent before shutdown
|
||||
|
||||
### Docker
|
||||
|
||||
```bash
|
||||
make build-sentinel
|
||||
docker run --cap-add=NET_RAW --cap-add=NET_ADMIN \
|
||||
-v /var/run/logcorrelator:/var/run/logcorrelator \
|
||||
ja4-platform/sentinel:latest
|
||||
```
|
||||
|
||||
## RPM Package Contents
|
||||
|
||||
| Path | Description |
|
||||
|------|-------------|
|
||||
| `/usr/bin/ja4sentinel` | Binary (statically linked Go) |
|
||||
| `/etc/ja4sentinel/config.yml.default` | Default configuration (noreplace) |
|
||||
| `/usr/share/ja4sentinel/config.yml` | Reference configuration |
|
||||
| `/usr/lib/systemd/system/ja4sentinel.service` | systemd unit |
|
||||
| `/etc/logrotate.d/ja4sentinel` | logrotate configuration |
|
||||
| `/var/lib/ja4sentinel/` | State directory |
|
||||
| `/var/log/ja4sentinel/` | Log directory |
|
||||
| `/var/run/logcorrelator/` | Socket directory |
|
||||
|
||||
### RPM Dependencies
|
||||
|
||||
- `systemd`
|
||||
- `libpcap >= 1.9.0`
|
||||
|
||||
### Supported Distributions
|
||||
|
||||
- Rocky Linux 8, 9, 10
|
||||
- AlmaLinux 8, 9
|
||||
- RHEL 8, 9
|
||||
244
docs/shared/go-ja4common.md
Normal file
244
docs/shared/go-ja4common.md
Normal file
@ -0,0 +1,244 @@
|
||||
# go-ja4common
|
||||
|
||||
`ja4common` is the shared Go library for the ja4-platform, providing unified logging, YAML configuration loading with environment variable overrides, graceful shutdown handling, and IP address filtering. It is used by both [sentinel](../services/sentinel.md) and [correlator](../services/correlator.md).
|
||||
|
||||
**Module path**: `github.com/antitbone/ja4/ja4common`
|
||||
|
||||
**Go version**: 1.21+
|
||||
|
||||
**Dependencies**: `gopkg.in/yaml.v3`
|
||||
|
||||
## Packages
|
||||
|
||||
### logger
|
||||
|
||||
Unified structured logging with two styles:
|
||||
- **Prefix+Fields style** (correlator pattern) — `Logger`
|
||||
- **Component style** (sentinel pattern) — `ComponentLogger`
|
||||
|
||||
#### Types
|
||||
|
||||
```go
|
||||
type LogLevel int
|
||||
|
||||
const (
|
||||
DEBUG LogLevel = iota
|
||||
INFO
|
||||
WARN
|
||||
ERROR
|
||||
)
|
||||
```
|
||||
|
||||
#### Logger API
|
||||
|
||||
| Method | Signature | Description |
|
||||
|--------|-----------|-------------|
|
||||
| `New` | `New(prefix string) *Logger` | Create logger with INFO level |
|
||||
| `NewWithLevel` | `NewWithLevel(prefix, level string) *Logger` | Create logger with specified level |
|
||||
| `SetLevel` | `(l *Logger) SetLevel(level string)` | Change minimum log level at runtime |
|
||||
| `ShouldLog` | `(l *Logger) ShouldLog(level LogLevel) bool` | Check if level would be logged |
|
||||
| `WithFields` | `(l *Logger) WithFields(fields map[string]any) *Logger` | Return new logger with additional fields |
|
||||
| `Info` | `(l *Logger) Info(msg string)` | Log info message |
|
||||
| `Infof` | `(l *Logger) Infof(msg string, args ...any)` | Log formatted info |
|
||||
| `Warn` | `(l *Logger) Warn(msg string)` | Log warning |
|
||||
| `Warnf` | `(l *Logger) Warnf(msg string, args ...any)` | Log formatted warning |
|
||||
| `Error` | `(l *Logger) Error(msg string, err error)` | Log error with optional error value |
|
||||
| `Debug` | `(l *Logger) Debug(msg string)` | Log debug message |
|
||||
| `Debugf` | `(l *Logger) Debugf(msg string, args ...any)` | Log formatted debug |
|
||||
| `ParseLogLevel` | `ParseLogLevel(level string) LogLevel` | Parse string to LogLevel |
|
||||
|
||||
#### ComponentLogger API
|
||||
|
||||
Wraps `Logger` to satisfy sentinel's component-based logging interface:
|
||||
|
||||
| Method | Signature | Description |
|
||||
|--------|-----------|-------------|
|
||||
| `NewComponentLogger` | `NewComponentLogger(level string) *ComponentLogger` | Create component logger |
|
||||
| `Log` | `(c *ComponentLogger) Log(component, level, message string, details map[string]string)` | Log with component context |
|
||||
| `Debug` | `(c *ComponentLogger) Debug(component, message string, details map[string]string)` | Debug with component |
|
||||
| `Info` | `(c *ComponentLogger) Info(component, message string, details map[string]string)` | Info with component |
|
||||
| `Warn` | `(c *ComponentLogger) Warn(component, message string, details map[string]string)` | Warn with component |
|
||||
| `Error` | `(c *ComponentLogger) Error(component, message string, details map[string]string)` | Error with component |
|
||||
|
||||
#### Usage Example
|
||||
|
||||
```go
|
||||
import "github.com/antitbone/ja4/ja4common/logger"
|
||||
|
||||
// Prefix+Fields style
|
||||
log := logger.NewWithLevel("myservice", "DEBUG")
|
||||
log.Info("starting up")
|
||||
log.WithFields(map[string]any{"port": 8080}).Info("listening")
|
||||
|
||||
// Component style (sentinel compatibility)
|
||||
clog := logger.NewComponentLogger("info")
|
||||
clog.Info("capture", "packets received", map[string]string{"count": "1000"})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### config
|
||||
|
||||
Generic YAML configuration loading with environment variable overrides using struct tags.
|
||||
|
||||
#### API
|
||||
|
||||
| Function | Signature | Description |
|
||||
|----------|-----------|-------------|
|
||||
| `LoadYAML` | `LoadYAML[T any](path string, optional bool) (T, error)` | Load and unmarshal YAML file |
|
||||
| `OverrideFromEnv` | `OverrideFromEnv[T any](cfg *T, envPrefix string) error` | Apply env var overrides via `env` struct tags |
|
||||
|
||||
#### Supported Types for Environment Override
|
||||
|
||||
- `string`
|
||||
- `int`, `int8`, `int16`, `int32`, `int64`
|
||||
- `uint`, `uint8`, `uint16`, `uint32`, `uint64`
|
||||
- `bool`
|
||||
- `[]string` (comma-separated)
|
||||
|
||||
#### Usage Example
|
||||
|
||||
```go
|
||||
import "github.com/antitbone/ja4/ja4common/config"
|
||||
|
||||
type MyConfig struct {
|
||||
Host string `yaml:"host" env:"HOST"`
|
||||
Port int `yaml:"port" env:"PORT"`
|
||||
Debug bool `yaml:"debug" env:"DEBUG"`
|
||||
Tags []string `yaml:"tags" env:"TAGS"`
|
||||
}
|
||||
|
||||
// Load YAML (optional=true means missing file returns zero value)
|
||||
cfg, err := config.LoadYAML[MyConfig]("config.yml", true)
|
||||
|
||||
// Override from environment (prefix="" means use tag directly)
|
||||
err = config.OverrideFromEnv(&cfg, "MYAPP")
|
||||
// Reads: MYAPP_HOST, MYAPP_PORT, MYAPP_DEBUG, MYAPP_TAGS
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### shutdown
|
||||
|
||||
Graceful shutdown handler that blocks until `SIGTERM`/`SIGINT`, then runs cleanup hooks.
|
||||
|
||||
#### API
|
||||
|
||||
```go
|
||||
type Hook struct {
|
||||
Name string
|
||||
Fn func() error
|
||||
}
|
||||
|
||||
func Handle(ctx context.Context, cancel context.CancelFunc, hooks []Hook, logger simpleLogger)
|
||||
```
|
||||
|
||||
The `Handle` function:
|
||||
1. Blocks until `SIGTERM`, `SIGINT`, or context cancellation
|
||||
2. Calls `cancel()` to propagate shutdown
|
||||
3. Runs all hooks in order, logging errors but not aborting
|
||||
|
||||
#### Usage Example
|
||||
|
||||
```go
|
||||
import "github.com/antitbone/ja4/ja4common/shutdown"
|
||||
|
||||
ctx, cancel := context.WithCancel(context.Background())
|
||||
|
||||
hooks := []shutdown.Hook{
|
||||
{Name: "close-db", Fn: func() error { return db.Close() }},
|
||||
{Name: "flush-logs", Fn: func() error { return logger.Flush() }},
|
||||
}
|
||||
|
||||
// This blocks until signal received
|
||||
shutdown.Handle(ctx, cancel, hooks, myLogger)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### ipfilter
|
||||
|
||||
IP address and CIDR range matching for source IP exclusion.
|
||||
|
||||
#### API
|
||||
|
||||
| Method | Signature | Description |
|
||||
|--------|-----------|-------------|
|
||||
| `New` | `New(excludeList []string) (*Filter, error)` | Create filter from IP/CIDR list |
|
||||
| `ShouldExclude` | `(f *Filter) ShouldExclude(ipStr string) bool` | Check if IP should be excluded |
|
||||
| `Count` | `(f *Filter) Count() (ips int, networks int)` | Return number of loaded entries |
|
||||
|
||||
Accepts: single IPs (`192.168.1.1`), CIDR ranges (`10.0.0.0/8`), IPv6 addresses and ranges.
|
||||
|
||||
#### Usage Example
|
||||
|
||||
```go
|
||||
import "github.com/antitbone/ja4/ja4common/ipfilter"
|
||||
|
||||
filter, err := ipfilter.New([]string{
|
||||
"10.0.0.0/8",
|
||||
"192.168.1.1",
|
||||
"2001:db8::/32",
|
||||
})
|
||||
|
||||
if filter.ShouldExclude("10.0.0.5") {
|
||||
// Skip this IP
|
||||
}
|
||||
|
||||
ips, nets := filter.Count() // 1 IP, 2 networks
|
||||
```
|
||||
|
||||
## Using from a New Service
|
||||
|
||||
### 1. Add to go.mod
|
||||
|
||||
```bash
|
||||
cd services/my-service
|
||||
go mod init github.com/antitbone/ja4/my-service
|
||||
```
|
||||
|
||||
Add the dependency:
|
||||
```
|
||||
require github.com/antitbone/ja4/ja4common v0.0.0
|
||||
```
|
||||
|
||||
### 2. Add to go.work
|
||||
|
||||
In the repository root `go.work`:
|
||||
```
|
||||
use (
|
||||
./services/sentinel
|
||||
./services/correlator
|
||||
./services/my-service // ← add
|
||||
./shared/go/ja4common
|
||||
)
|
||||
```
|
||||
|
||||
### 3. Import and Use
|
||||
|
||||
```go
|
||||
package main
|
||||
|
||||
import (
|
||||
"context"
|
||||
"github.com/antitbone/ja4/ja4common/config"
|
||||
"github.com/antitbone/ja4/ja4common/logger"
|
||||
"github.com/antitbone/ja4/ja4common/shutdown"
|
||||
)
|
||||
|
||||
func main() {
|
||||
log := logger.NewWithLevel("myservice", "INFO")
|
||||
|
||||
cfg, _ := config.LoadYAML[MyConfig]("config.yml", true)
|
||||
config.OverrideFromEnv(&cfg, "MYSERVICE")
|
||||
|
||||
ctx, cancel := context.WithCancel(context.Background())
|
||||
shutdown.Handle(ctx, cancel, nil, log)
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Sync Workspace
|
||||
|
||||
```bash
|
||||
go work sync
|
||||
```
|
||||
216
docs/shared/python-ja4common.md
Normal file
216
docs/shared/python-ja4common.md
Normal file
@ -0,0 +1,216 @@
|
||||
# python-ja4common
|
||||
|
||||
`ja4_common` is the shared Python library for the ja4-platform, providing a unified ClickHouse client singleton and configuration settings. It is used by [bot-detector](../services/bot-detector.md) and [dashboard](../services/dashboard.md).
|
||||
|
||||
**Package name**: `ja4-common`
|
||||
|
||||
**Python version**: ≥ 3.11
|
||||
|
||||
**Dependencies**:
|
||||
- `clickhouse-connect >= 0.8.0`
|
||||
- `pydantic-settings >= 2.1.0`
|
||||
|
||||
## ClickHouseSettings
|
||||
|
||||
Pydantic-settings model that reads configuration from environment variables and `.env` files.
|
||||
|
||||
### Fields
|
||||
|
||||
| Field | Type | Default | Env Variable | Description |
|
||||
|-------|------|---------|-------------|-------------|
|
||||
| `CLICKHOUSE_HOST` | str | `"clickhouse"` | `CLICKHOUSE_HOST` | ClickHouse server hostname |
|
||||
| `CLICKHOUSE_PORT` | int | `8123` | `CLICKHOUSE_PORT` | ClickHouse HTTP API port |
|
||||
| `CLICKHOUSE_DB` | str | `"mabase_prod"` | `CLICKHOUSE_DB` | Database name |
|
||||
| `CLICKHOUSE_USER` | str | `"admin"` | `CLICKHOUSE_USER` | Username for authentication |
|
||||
| `CLICKHOUSE_PASSWORD` | str | `""` | `CLICKHOUSE_PASSWORD` | Password for authentication |
|
||||
|
||||
### Configuration Sources
|
||||
|
||||
Settings are loaded in order of precedence:
|
||||
1. **Environment variables** (highest priority)
|
||||
2. **`.env` file** in the current working directory
|
||||
3. **Default values** (lowest priority)
|
||||
|
||||
Environment variable names are **case-sensitive** (e.g., `CLICKHOUSE_HOST`, not `clickhouse_host`).
|
||||
|
||||
### Usage
|
||||
|
||||
```python
|
||||
from ja4_common.settings import settings
|
||||
|
||||
print(settings.CLICKHOUSE_HOST) # "clickhouse" or from env
|
||||
print(settings.CLICKHOUSE_PORT) # 8123 or from env
|
||||
```
|
||||
|
||||
## ClickHouseClient
|
||||
|
||||
Wraps `clickhouse_connect` with auto-reconnection and a clean API.
|
||||
|
||||
### Methods
|
||||
|
||||
| Method | Signature | Description |
|
||||
|--------|-----------|-------------|
|
||||
| `connect` | `connect() -> Client` | Returns the underlying `clickhouse_connect` client, creating or reconnecting as needed |
|
||||
| `query` | `query(query: str, params: dict = None)` | Execute a SELECT query, returns result set |
|
||||
| `command` | `command(query: str, params: dict = None)` | Execute a DDL/DML command (CREATE, INSERT, etc.) |
|
||||
| `insert` | `insert(table: str, data, column_names=None)` | Bulk insert data into a table |
|
||||
| `close` | `close()` | Close the connection and release resources |
|
||||
|
||||
### Auto-Reconnection
|
||||
|
||||
The `connect()` method automatically reconnects if the current connection is lost:
|
||||
|
||||
```python
|
||||
def connect(self):
|
||||
if self._client is None or not self._ping():
|
||||
self._client = clickhouse_connect.get_client(
|
||||
host=settings.CLICKHOUSE_HOST,
|
||||
port=settings.CLICKHOUSE_PORT,
|
||||
database=settings.CLICKHOUSE_DB,
|
||||
user=settings.CLICKHOUSE_USER,
|
||||
password=settings.CLICKHOUSE_PASSWORD,
|
||||
connect_timeout=10,
|
||||
)
|
||||
return self._client
|
||||
```
|
||||
|
||||
### Usage Example
|
||||
|
||||
```python
|
||||
from ja4_common.clickhouse import get_client
|
||||
|
||||
client = get_client()
|
||||
|
||||
# SELECT query
|
||||
result = client.query("SELECT count() FROM http_logs WHERE src_ip = {ip:String}", {"ip": "203.0.113.42"})
|
||||
print(result.result_rows)
|
||||
|
||||
# INSERT
|
||||
client.insert("audit_logs", [[datetime.now(), "analyst1", "investigate", "ip", "203.0.113.42"]],
|
||||
column_names=["timestamp", "user_name", "action", "entity_type", "entity_id"])
|
||||
|
||||
# Command
|
||||
client.command("OPTIMIZE TABLE http_logs FINAL")
|
||||
```
|
||||
|
||||
## get_client() Singleton
|
||||
|
||||
The `get_client()` function provides a module-level singleton `ClickHouseClient`:
|
||||
|
||||
```python
|
||||
from ja4_common.clickhouse import get_client
|
||||
|
||||
# First call creates the client
|
||||
client1 = get_client()
|
||||
|
||||
# Subsequent calls return the same instance
|
||||
client2 = get_client()
|
||||
assert client1 is client2
|
||||
```
|
||||
|
||||
### Implementation
|
||||
|
||||
```python
|
||||
_client: Optional[ClickHouseClient] = None
|
||||
|
||||
def get_client() -> ClickHouseClient:
|
||||
global _client
|
||||
if _client is None:
|
||||
_client = ClickHouseClient()
|
||||
return _client
|
||||
```
|
||||
|
||||
## Using from a New Service
|
||||
|
||||
### 1. Add Dependency
|
||||
|
||||
In your service's `requirements.txt`:
|
||||
```
|
||||
ja4-common @ file:///app/shared/python/ja4_common
|
||||
```
|
||||
|
||||
Or in `pyproject.toml`:
|
||||
```toml
|
||||
[project]
|
||||
dependencies = [
|
||||
"ja4-common",
|
||||
]
|
||||
```
|
||||
|
||||
### 2. Docker Setup
|
||||
|
||||
```dockerfile
|
||||
# Copy shared library
|
||||
COPY shared/python/ja4_common /app/shared/python/ja4_common
|
||||
RUN pip install /app/shared/python/ja4_common
|
||||
|
||||
# Copy service code
|
||||
COPY services/my-service /app/services/my-service
|
||||
```
|
||||
|
||||
### 3. Use in Code
|
||||
|
||||
```python
|
||||
from ja4_common.clickhouse import get_client
|
||||
from ja4_common.settings import settings
|
||||
|
||||
# Access settings
|
||||
print(f"Connecting to {settings.CLICKHOUSE_HOST}:{settings.CLICKHOUSE_PORT}")
|
||||
|
||||
# Use client
|
||||
db = get_client()
|
||||
result = db.query("SELECT count() FROM ml_detected_anomalies")
|
||||
```
|
||||
|
||||
### 4. Environment Configuration
|
||||
|
||||
Create a `.env` file or set environment variables:
|
||||
```bash
|
||||
CLICKHOUSE_HOST=clickhouse.example.com
|
||||
CLICKHOUSE_PORT=8123
|
||||
CLICKHOUSE_DB=mabase_prod
|
||||
CLICKHOUSE_USER=data_writer
|
||||
CLICKHOUSE_PASSWORD=secret
|
||||
```
|
||||
|
||||
## Testing: Mocking the Client
|
||||
|
||||
### Using unittest.mock
|
||||
|
||||
```python
|
||||
from unittest.mock import MagicMock, patch
|
||||
from ja4_common.clickhouse import ClickHouseClient
|
||||
|
||||
def test_my_service():
|
||||
mock_client = MagicMock(spec=ClickHouseClient)
|
||||
mock_client.query.return_value = MagicMock(result_rows=[(42,)])
|
||||
|
||||
with patch("ja4_common.clickhouse._client", mock_client):
|
||||
from ja4_common.clickhouse import get_client
|
||||
client = get_client()
|
||||
result = client.query("SELECT count() FROM http_logs")
|
||||
assert result.result_rows == [(42,)]
|
||||
```
|
||||
|
||||
### Overriding Settings in Tests
|
||||
|
||||
```python
|
||||
from ja4_common.settings import ClickHouseSettings
|
||||
|
||||
# Create custom settings for tests
|
||||
test_settings = ClickHouseSettings(
|
||||
CLICKHOUSE_HOST="localhost",
|
||||
CLICKHOUSE_PORT=8123,
|
||||
CLICKHOUSE_DB="test_db",
|
||||
CLICKHOUSE_USER="test_user",
|
||||
CLICKHOUSE_PASSWORD="test_pass",
|
||||
)
|
||||
```
|
||||
|
||||
## Source Files
|
||||
|
||||
| File | Description |
|
||||
|------|-------------|
|
||||
| `ja4_common/settings.py` | `ClickHouseSettings` pydantic-settings model |
|
||||
| `ja4_common/clickhouse.py` | `ClickHouseClient` class and `get_client()` singleton |
|
||||
| `pyproject.toml` | Package metadata and dependencies |
|
||||
7
go.work
Normal file
7
go.work
Normal file
@ -0,0 +1,7 @@
|
||||
go 1.24.6
|
||||
|
||||
use (
|
||||
./services/sentinel
|
||||
./services/correlator
|
||||
./shared/go/ja4common
|
||||
)
|
||||
2
go.work.sum
Normal file
2
go.work.sum
Normal file
@ -0,0 +1,2 @@
|
||||
github.com/ClickHouse/clickhouse-go v1.5.4 h1:cKjXeYLNWVJIx2J1K6H2CqyRmfwVJVY1OV1coaaFcI0=
|
||||
github.com/kr/pretty v0.2.1/go.mod h1:ipq/a2n7PKx3OHsz4KJII5eveXtPO4qwEXGdVfWzfnI=
|
||||
10
services/bot-detector/.env.example
Normal file
10
services/bot-detector/.env.example
Normal file
@ -0,0 +1,10 @@
|
||||
# bot-detector configuration — DO NOT COMMIT real values
|
||||
CLICKHOUSE_HOST=clickhouse
|
||||
CLICKHOUSE_PORT=8123
|
||||
CLICKHOUSE_DB=mabase_prod
|
||||
CLICKHOUSE_USER=admin
|
||||
CLICKHOUSE_PASSWORD=
|
||||
ANOMALY_THRESHOLD=-0.1
|
||||
DEDUP_TTL_MIN=60
|
||||
HEALTH_PORT=8080
|
||||
MIN_VALID_FEATURE_RATIO=0.5
|
||||
2
services/bot-detector/.gitignore
vendored
Normal file
2
services/bot-detector/.gitignore
vendored
Normal file
@ -0,0 +1,2 @@
|
||||
bot_detector_models/
|
||||
bot_detector_logs/
|
||||
204
services/bot-detector/CLICKHOUSE_FEATURES_DIAGNOSTIC.md
Normal file
204
services/bot-detector/CLICKHOUSE_FEATURES_DIAGNOSTIC.md
Normal file
@ -0,0 +1,204 @@
|
||||
# Diagnostic — Features manquantes dans `view_ai_features_1h`
|
||||
|
||||
> Généré le 2026-03-17 — Mis à jour le 2026-03-17 (corrections appliquées) — À destination de l'administrateur ClickHouse
|
||||
|
||||
## ✅ Statut des corrections (2026-03-17 13:05)
|
||||
|
||||
| Problème | Correction appliquée | Résultat |
|
||||
|----------|---------------------|----------|
|
||||
| **1** — MV `mv_agg_header_fingerprint_1h` absente | MV recréée + backfill 25h | ✅ 10 features header actives |
|
||||
| **2** — `header_order_shared_count` / `distinct_header_orders` globales | Se corrige avec Problème 1 | ✅ Résolu automatiquement |
|
||||
| **3** — `orphan_ratio` = 0 pour `correlated=1` | Comportement normal (by design) | ℹ️ Pas d'action requise |
|
||||
| **4** — 4 vues dashboard absentes | Vues créées | ✅ |
|
||||
| **5** — `view_dashboard_variability` référence `header_user_agent` inexistant | Colonne remplacée par `reason` | ✅ Bug corrigé |
|
||||
| **6** — Anciennes vues heuristiques orphelines | Droppées | ✅ |
|
||||
|
||||
Cycle post-correction (13:05) — features dans les warnings :
|
||||
- `Complet` : seulement `orphan_ratio` (by design)
|
||||
- `Applicatif` : `request_size_variance`, `mss_mobile_mismatch`, `is_rare_ja4` (see §4 below)
|
||||
- Header features **disparues des warnings** → pipeline opérationnel ✅
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
## Résumé
|
||||
|
||||
Le service Bot Detector signale des **features non-discriminantes** à chaque cycle. Ce document en explique les causes exactes et les corrections nécessaires côté ClickHouse.
|
||||
|
||||
Ces avertissements **n'empêchent pas le service de fonctionner** — les features invalides sont automatiquement exclues du modèle (A7). Mais leur absence réduit la qualité de la détection.
|
||||
|
||||
---
|
||||
|
||||
## Problème 1 — Pipeline `agg_header_fingerprint_1h` arrêté ⚠️ CRITIQUE
|
||||
|
||||
### Symptôme
|
||||
|
||||
Les features suivantes sont toujours à **0** dans `view_ai_features_1h` :
|
||||
|
||||
- `header_count`
|
||||
- `has_accept_language`
|
||||
- `has_cookie`
|
||||
- `has_referer`
|
||||
- `modern_browser_score`
|
||||
- `ua_ch_mismatch`
|
||||
- `mss_mobile_mismatch` *(dépend de `modern_browser_score`)*
|
||||
|
||||
### Cause
|
||||
|
||||
La table `mabase_prod.agg_header_fingerprint_1h` (AggregatingMergeTree) n'a plus reçu de données depuis le **2026-03-13 23:00** :
|
||||
|
||||
```sql
|
||||
SELECT max(window_start), count()
|
||||
FROM mabase_prod.agg_header_fingerprint_1h;
|
||||
-- Résultat : 2026-03-13 23:00:00, 73024 lignes
|
||||
```
|
||||
|
||||
La vue fait un `LEFT JOIN` avec condition `window_start >= now() - INTERVAL 24 HOUR`, et comme aucune ligne récente n'existe dans `agg_header_fingerprint_1h`, **toutes les colonnes issues de ce JOIN retournent NULL** (→ 0 après coalesce).
|
||||
|
||||
### Recherche de la MV source
|
||||
|
||||
La liste des Materialized Views ne montre aucune MV dédiée à `agg_header_fingerprint_1h` :
|
||||
|
||||
```sql
|
||||
SELECT name FROM system.tables
|
||||
WHERE database = 'mabase_prod' AND engine = 'MaterializedView';
|
||||
-- mv_agg_host_ip_ja4_1h
|
||||
-- mv_http_logs
|
||||
-- view_dashboard_entities_mv
|
||||
-- view_dashboard_user_agents_mv
|
||||
```
|
||||
|
||||
Aucune MV ne cible `agg_header_fingerprint_1h`. Elle est probablement alimentée par un **processus externe** (ETL, script, pipeline Kafka, etc.) qui s'est arrêté.
|
||||
|
||||
### Correction appliquée ✅
|
||||
|
||||
La MV `mv_agg_header_fingerprint_1h` était **définie dans `deploy_views.sql`** mais n'avait jamais été créée en base. Elle a été recréée le 2026-03-17 :
|
||||
|
||||
```sql
|
||||
-- Recréation de la MV (déjà appliquée)
|
||||
CREATE MATERIALIZED VIEW mabase_prod.mv_agg_header_fingerprint_1h
|
||||
TO mabase_prod.agg_header_fingerprint_1h AS
|
||||
SELECT
|
||||
toStartOfHour(src.time) AS window_start,
|
||||
toIPv6(src.src_ip) AS src_ip,
|
||||
any(toString(cityHash64(src.client_headers))) AS header_order_hash,
|
||||
max(toUInt16(length(src.client_headers) - length(replaceAll(src.client_headers, ',', '')) + 1)) AS header_count,
|
||||
-- ... (voir deploy_views.sql §5)
|
||||
FROM mabase_prod.http_logs AS src
|
||||
GROUP BY window_start, src.src_ip;
|
||||
```
|
||||
|
||||
Un **backfill de 25 heures** a été effectué depuis `http_logs` pour alimenter la table avec des données historiques (377 689 lignes insérées). Les nouvelles données sont désormais alimentées en temps réel par la MV.
|
||||
|
||||
### Cause historique
|
||||
|
||||
La MV avait été omise lors du déploiement initial. La table `agg_header_fingerprint_1h` contenait 73 024 lignes datant du 2026-03-13 (probablement issues d'un backfill manuel ponctuel), puis n'avait plus été alimentée.
|
||||
|
||||
---
|
||||
|
||||
## Problème 2 — Features non-discriminantes (agrégat global, non per-IP)
|
||||
|
||||
### Symptôme
|
||||
|
||||
Les features suivantes ont une **valeur unique non-nulle identique pour toutes les IPs** :
|
||||
|
||||
- `header_order_shared_count` (valeur ≈ 421 000 pour toutes les lignes)
|
||||
- `distinct_header_orders` (valeur identique pour toutes les lignes)
|
||||
|
||||
### Cause
|
||||
|
||||
Ces features sont calculées via des window functions `PARTITION BY header_order_hash` :
|
||||
|
||||
```sql
|
||||
-- Dans la vue :
|
||||
count() OVER (PARTITION BY h.header_order_hash) AS header_order_shared_count
|
||||
uniqExact(h.header_order_hash) OVER (PARTITION BY a.src_ip) AS distinct_header_orders
|
||||
```
|
||||
|
||||
Comme `h.header_order_hash` est **NULL pour toutes les lignes** (problème 1 ci-dessus), la `PARTITION BY NULL` regroupe **toutes les lignes dans une seule partition** → `count()` retourne le total de toutes les lignes pour chaque IP.
|
||||
|
||||
### Correction ✅ (auto-résolue avec Problème 1)
|
||||
|
||||
Ce problème s'est résolu automatiquement une fois la MV `mv_agg_header_fingerprint_1h` recréée. `header_order_hash` est désormais non-NULL, les partitions de window functions sont correctement calculées par hash d'ordre d'en-têtes.
|
||||
|
||||
---
|
||||
|
||||
## Problème 3 — `orphan_ratio` absent pour le trafic corrélé TCP
|
||||
|
||||
### Symptôme
|
||||
|
||||
`orphan_ratio` = 0 pour **toutes les lignes avec `correlated = 1`** (trafic TCP enrichi).
|
||||
|
||||
### Cause
|
||||
|
||||
La colonne `orphan_count` dans `mabase_prod.agg_host_ip_ja4_1h` est calculée par la MV `mv_agg_host_ip_ja4_1h` :
|
||||
|
||||
```sql
|
||||
sum(IF(src.orphan_side = 'A' OR src.correlated = 0, 1, 0)) AS orphan_count
|
||||
```
|
||||
|
||||
Pour les connexions `correlated=1`, `correlated = 0` est toujours faux, et `orphan_side = 'A'` n'est jamais vrai pour le trafic corrélé → `orphan_count = 0` systématiquement.
|
||||
|
||||
**C'est un comportement intentionnel** : les connexions TCP corrélées ont une réponse confirmée, donc elles ne sont pas des requêtes orphelines par définition.
|
||||
|
||||
### Statut
|
||||
|
||||
Pas d'action requise. La feature reste exclue automatiquement par A7 pour le modèle `Complet` (correlated=1).
|
||||
|
||||
---
|
||||
|
||||
## Problème 4 — Features à 0 persistantes dans le modèle Applicatif
|
||||
|
||||
### Symptôme (post-correction)
|
||||
|
||||
Depuis le 2026-03-17 13:05, le modèle `Applicatif` (trafic non-corrélé) signale encore ces features à 0 :
|
||||
|
||||
- `request_size_variance`
|
||||
- `mss_mobile_mismatch`
|
||||
- `is_rare_ja4`
|
||||
|
||||
### Cause
|
||||
|
||||
Ces features sont calculées depuis des colonnes L4/TCP qui sont **absent ou non-pertinentes pour le trafic applicatif pur** (`correlated=0`) :
|
||||
|
||||
| Feature | Cause |
|
||||
|---------|-------|
|
||||
| `request_size_variance` | `varPopMerge(total_ip_length_var)` — variance de longueur IP ; trafic non-corrélé = pas de données IP brutes fiables |
|
||||
| `mss_mobile_mismatch` | Dépend de `tcp_meta_mss` et `modern_browser_score` — MSS non fiable sans corrélation TCP |
|
||||
| `is_rare_ja4` | `sum(hits) OVER (PARTITION BY ja4) < 100` — dans la fenêtre Applicatif (1h, trafic réduit), tous les JA4 sont rares |
|
||||
|
||||
### Impact
|
||||
|
||||
Faible — ces features sont exclues automatiquement (A7). Elles ne dégradent pas le modèle.
|
||||
|
||||
---
|
||||
|
||||
## Impact sur le modèle IA
|
||||
|
||||
| Feature | Impact si absente | Statut |
|
||||
|---------|-------------------|--------|
|
||||
| `header_count` | Perte d'un signal fort : bots envoient souvent peu d'en-têtes | ✅ Corrigé |
|
||||
| `has_accept_language` | Perte de détection des bots sans localisation | ✅ Corrigé |
|
||||
| `has_cookie` | Perte de détection des sessions sans état | ✅ Corrigé |
|
||||
| `has_referer` | Perte du signal de navigation directe | ✅ Corrigé |
|
||||
| `modern_browser_score` | Perte du score composite de conformité navigateur | ✅ Corrigé |
|
||||
| `ua_ch_mismatch` | Perte de détection des fausses déclarations UA | ✅ Corrigé |
|
||||
| `header_order_shared_count` | Perte de la détection de fingerprints d'en-têtes partagés | ✅ Corrigé |
|
||||
| `orphan_ratio` | Signal faible pour trafic corrélé | ℹ️ By design |
|
||||
| `request_size_variance` | Signal L4 faible pour Applicatif | ℹ️ Normal |
|
||||
| `mss_mobile_mismatch` | Signal TCP faible pour Applicatif | ℹ️ Normal |
|
||||
|
||||
---
|
||||
|
||||
## Vérification post-correction
|
||||
|
||||
Cycle du 2026-03-17 13:05 — résultat observé :
|
||||
|
||||
```
|
||||
[Complet] Features à 0 : ['orphan_ratio'] ← by design ✅
|
||||
[Applicatif] Features à 0 : ['request_size_variance', 'mss_mobile_mismatch', 'is_rare_ja4'] ← normales ✅
|
||||
[Applicatif] Features non-discriminantes : ['tcp_shared_count'] ← agrégat global résiduel
|
||||
```
|
||||
|
||||
Les **10 features header** (`header_count`, `has_accept_language`, `has_cookie`, `has_referer`, `modern_browser_score`, `ua_ch_mismatch`, `header_order_shared_count`, `distinct_header_orders`, `header_order_confidence`, `mss_mobile_mismatch` pour Complet) **ne sont plus dans les warnings**. Le pipeline est opérationnel.
|
||||
710
services/bot-detector/DOCUMENTATION.md
Normal file
710
services/bot-detector/DOCUMENTATION.md
Normal file
@ -0,0 +1,710 @@
|
||||
# Bot Detector IA — Documentation Technique
|
||||
|
||||
> Version du code : v11 | Dernière mise à jour : 2026-03-17
|
||||
|
||||
---
|
||||
|
||||
## Table des matières
|
||||
|
||||
1. [Vue d'ensemble](#1-vue-densemble)
|
||||
2. [Architecture système](#2-architecture-système)
|
||||
3. [Pipeline de détection](#3-pipeline-de-détection)
|
||||
4. [Modèles et features](#4-modèles-et-features)
|
||||
5. [Approche semi-supervisée](#5-approche-semi-supervisée)
|
||||
6. [Gestion des modèles](#6-gestion-des-modèles)
|
||||
7. [Données d'entrée — vue ClickHouse](#7-données-dentrée--vue-clickhouse)
|
||||
8. [Données de sortie](#8-données-de-sortie)
|
||||
9. [Configuration](#9-configuration)
|
||||
10. [Observabilité](#10-observabilité)
|
||||
11. [Réputation et enrichissement](#11-réputation-et-enrichissement)
|
||||
12. [Fondements scientifiques](#12-fondements-scientifiques)
|
||||
13. [Améliorations implémentées (v11)](#13-améliorations-implémentées-v11)
|
||||
14. [Migration de schéma ClickHouse](#14-migration-de-schéma-clickhouse)
|
||||
|
||||
---
|
||||
|
||||
## 1. Vue d'ensemble
|
||||
|
||||
Le **Bot Detector IA** est un service de détection d'activité suspecte et de bots sur un trafic HTTP. Il tourne en boucle continue (toutes les 5 minutes par défaut) et analyse des données agrégées issues de ClickHouse.
|
||||
|
||||
### Principe général
|
||||
|
||||
```
|
||||
ClickHouse (view_ai_features_1h)
|
||||
│
|
||||
▼
|
||||
┌───────────────────────┐
|
||||
│ Séparation du trafic │
|
||||
│ ├─ Bots connus │ → Étiquetés via réputation IP / JA4 / ASN
|
||||
│ ├─ Trafic humain │ → Sert de baseline d'entraînement pour l'IF
|
||||
│ └─ Trafic inconnu │ → Scoré par Isolation Forest
|
||||
└───────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────────────┐
|
||||
│ Isolation Forest │
|
||||
│ (semi-supervisé) │
|
||||
│ ├─ Modèle Complet │ TCP + TLS + HTTP (35 features, correlated=1)
|
||||
│ └─ Modèle Applicatif │ HTTP seul (31 features, correlated=0)
|
||||
└───────────────────────┘
|
||||
│
|
||||
▼
|
||||
ClickHouse (ml_detected_anomalies)
|
||||
```
|
||||
|
||||
### Caractéristiques clés
|
||||
|
||||
| Propriété | Valeur |
|
||||
|-----------|--------|
|
||||
| Algorithme | Isolation Forest (sklearn) |
|
||||
| Supervision | Semi-supervisée (baseline humain + réputation) |
|
||||
| Fenêtre d'analyse | 1 heure glissante (optionnel : 24h avec `ENABLE_MULTIWINDOW`) |
|
||||
| Cycle d'exécution | 300 s (configurable) |
|
||||
| Re-entraînement | Toutes les 1 h (configurable) + retrain forcé sur dérive conceptuelle |
|
||||
| Contamination | 2 % (fraction d'anomalies attendues dans la baseline) |
|
||||
| Seuil d'anomalie | Adaptatif : min(percentile_5, -0.03) |
|
||||
|
||||
---
|
||||
|
||||
## 2. Architecture système
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Docker Compose │
|
||||
│ │
|
||||
│ ┌──────────────────────────────────────────────────────────┐ │
|
||||
│ │ bot_detector_ai │ │
|
||||
│ │ │ │
|
||||
│ │ ┌────────────┐ ┌──────────────┐ ┌─────────────────┐ │ │
|
||||
│ │ │ Health │ │ Main Loop │ │ ClickHouse │ │ │
|
||||
│ │ │ :8080 │ │ (300s cycle)│ │ Client │ │ │
|
||||
│ │ │ (thread) │ │ │ │ (reconnect) │ │ │
|
||||
│ │ └────────────┘ └──────────────┘ └─────────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ │ Volumes: │ │
|
||||
│ │ ├─ ./bot_detector_logs → /var/log/bot_detector │ │
|
||||
│ │ ├─ ./bot_detector_models → /var/lib/bot_detector │ │
|
||||
│ │ ├─ ./reputation/data/user_files/bot_ip.csv (ro) │ │
|
||||
│ │ ├─ ./reputation/data/user_files/bot_ja4.csv (ro) │ │
|
||||
│ │ └─ ./reputation/data/user_files/asn_reputation.csv (ro) │ │
|
||||
│ └──────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└────────────────────────────┬────────────────────────────────────┘
|
||||
│ HTTP :8123
|
||||
▼
|
||||
ClickHouse externe
|
||||
(test-sdv-anubis.sdv.fr)
|
||||
```
|
||||
|
||||
### Fichiers et répertoires
|
||||
|
||||
| Chemin | Rôle |
|
||||
|--------|------|
|
||||
| `bot_detector/bot_detector.py` | Code source principal |
|
||||
| `bot_detector/requirements.txt` | Dépendances Python |
|
||||
| `bot_detector/Dockerfile` | Image Python 3.11-slim |
|
||||
| `docker-compose.yml` | Orchestration Docker |
|
||||
| `.env` | Variables d'environnement (non commité) |
|
||||
| `bot_detector_logs/decisions.jsonl` | Journal JSONL structuré (rotation 50 MB × 7) |
|
||||
| `bot_detector_models/model_<name>_<version>.joblib` | Modèle sérialisé |
|
||||
| `bot_detector_models/model_<name>_<version>.meta.json` | Métadonnées du modèle |
|
||||
| `bot_detector_models/model_<name>.current` | Pointeur vers la version active |
|
||||
| `bot_detector_models/training_history.jsonl` | Historique des entraînements |
|
||||
| `reputation/bot_ip.csv` | ~288 k entrées IP/CIDR de bots connus |
|
||||
| `reputation/bot_ja4.csv` | Empreintes JA4 de bots |
|
||||
| `reputation/asn_reputation.csv` | Labels ASN (human / bot) |
|
||||
|
||||
---
|
||||
|
||||
## 3. Pipeline de détection
|
||||
|
||||
### 3.1 Cycle principal (`fetch_and_analyze`)
|
||||
|
||||
```
|
||||
1. Génération d'un cycle_id (timestamp)
|
||||
2. Requête view_ai_features_1h → DataFrame df
|
||||
3. Requête view_ip_recurrence → recurrence_map {src_ip: count}
|
||||
4. Nettoyage des colonnes (fillna, astype)
|
||||
5. Log CYCLE_START (total, human, known_bot, correlated)
|
||||
6. Séparation df → correlated=1 / correlated=0
|
||||
7. Appel run_semi_supervised_logic() × 2 (modèle Complet + Applicatif)
|
||||
8. Concaténation, déduplication par src_ip (score le plus bas)
|
||||
9. Insertion dans ml_detected_anomalies
|
||||
10. Log CYCLE_END
|
||||
11. Attente CYCLE_INTERVAL secondes
|
||||
```
|
||||
|
||||
### 3.2 Logique semi-supervisée (`run_semi_supervised_logic`)
|
||||
|
||||
```
|
||||
df (trafic de la fenêtre 1h)
|
||||
│
|
||||
├─ A7 → validate_features() : exclusion des features manquantes ou constantes
|
||||
│
|
||||
├─ bot_name != '' → known_bots → KNOWN_BOT (log + insertion)
|
||||
│
|
||||
└─ bot_name == '' → unknown_traffic
|
||||
│
|
||||
├─ asn_label == 'human' → human_baseline
|
||||
│ (min. 500 sessions requis)
|
||||
│ └──► load_or_train_model()
|
||||
│ ├─ A1 : drift check (z-score / features)
|
||||
│ └─ Si drift ≥ DRIFT_THRESHOLD : retrain forcé
|
||||
│
|
||||
└─ reste du trafic inconnu
|
||||
│
|
||||
▼
|
||||
IsolationForest.decision_function() → raw_scores
|
||||
│
|
||||
A10 : normalize_scores() → anomaly_score [-1, 0]
|
||||
│
|
||||
A2 : effective_threshold = min(percentile_5, ANOMALY_THRESHOLD)
|
||||
│
|
||||
A6 : raw_score -= log1p(recurrence) × RECURRENCE_WEIGHT
|
||||
│
|
||||
raw_score < effective_threshold ?
|
||||
│
|
||||
YES → A4 : SHAP top-5 features → reason
|
||||
A8 : DBSCAN clustering → campaign_id
|
||||
ANOMALY (log + insertion)
|
||||
│
|
||||
NO → ignoré
|
||||
```
|
||||
|
||||
### 3.3 Niveaux de menace
|
||||
|
||||
| Score | Niveau | Interprétation |
|
||||
|-------|--------|----------------|
|
||||
| `< -0.30` | **CRITICAL** | Comportement extrêmement anormal |
|
||||
| `< -0.15` | **HIGH** | Fort signal d'anomalie |
|
||||
| `< -0.05` | **MEDIUM** | Anomalie modérée |
|
||||
| `≥ -0.05` | **LOW** | Légèrement inhabituel |
|
||||
|
||||
> Le seuil d'insertion (`ANOMALY_THRESHOLD = -0.03`) est plus permissif que LOW. Toutes les IP dont le score passe sous ce seuil sont insérées, quelle que soit leur catégorie de niveau.
|
||||
|
||||
---
|
||||
|
||||
## 4. Modèles et features
|
||||
|
||||
### 4.1 Architecture à deux niveaux
|
||||
|
||||
| Modèle | Condition | Nb features | Données utilisées |
|
||||
|--------|-----------|-------------|-------------------|
|
||||
| **Complet** | `correlated = 1` | 35 | HTTP + TCP + TLS |
|
||||
| **Applicatif** | `correlated = 0` | 31 | HTTP uniquement |
|
||||
|
||||
La corrélation (`correlated`) indique si les logs HTTP ont pu être enrichis avec les données TCP/TLS de la même connexion. En l'absence de corrélation (capture incomplète ou trafic chiffré sans inspection), seul le modèle Applicatif est utilisé.
|
||||
|
||||
### 4.2 Features communes (31 — modèle Applicatif)
|
||||
|
||||
#### Comportement HTTP de base
|
||||
|
||||
| Feature | Description |
|
||||
|---------|-------------|
|
||||
| `hits` | Nombre de requêtes sur la fenêtre |
|
||||
| `hit_velocity` | Requêtes par seconde |
|
||||
| `fuzzing_index` | Score de diversité anormale des chemins/paramètres |
|
||||
| `post_ratio` | Fraction de requêtes POST |
|
||||
| `port_exhaustion_ratio` | Fraction de ports sources différents / total ports |
|
||||
| `orphan_ratio` | Requêtes sans réponse associée |
|
||||
|
||||
#### Gestion des connexions
|
||||
|
||||
| Feature | Description |
|
||||
|---------|-------------|
|
||||
| `max_keepalives` | Nb max de requêtes sur une même connexion keep-alive |
|
||||
| `tcp_shared_count` | Connexions TCP partagées entre plusieurs sessions HTTP |
|
||||
|
||||
#### Empreinte navigateur (Browser Fingerprint)
|
||||
|
||||
| Feature | Description |
|
||||
|---------|-------------|
|
||||
| `header_count` | Nombre d'en-têtes HTTP envoyés |
|
||||
| `has_accept_language` | Présence de Accept-Language |
|
||||
| `has_cookie` | Présence de Cookie |
|
||||
| `has_referer` | Présence de Referer |
|
||||
| `modern_browser_score` | Score composite de conformité navigateur moderne |
|
||||
| `ua_ch_mismatch` | Incohérence entre User-Agent et Client Hints |
|
||||
| `ip_id_zero_ratio` | Ratio de paquets IP avec ID=0 (headless / stack minimale) |
|
||||
| `header_order_shared_count` | Partage d'un même ordre d'en-têtes entre IPs |
|
||||
| `header_order_confidence` | Confiance dans l'ordre d'en-têtes (entropie normalisée) |
|
||||
| `distinct_header_orders` | Nombre d'ordres d'en-têtes distincts observés |
|
||||
|
||||
#### Patterns de navigation
|
||||
|
||||
| Feature | Description |
|
||||
|---------|-------------|
|
||||
| `request_size_variance` | Variance de la taille des requêtes |
|
||||
| `multiplexing_efficiency` | Efficacité du multiplexage HTTP/2 |
|
||||
| `mss_mobile_mismatch` | Incohérence MSS TCP / profil mobile annoncé |
|
||||
| `asset_ratio` | Fraction de requêtes vers des ressources statiques |
|
||||
| `direct_access_ratio` | Fraction d'accès directs (sans referer) |
|
||||
| `is_ua_rotating` | Rotation de User-Agent détectée (flag 0/1) |
|
||||
| `distinct_ja4_count` | Nombre de fingerprints JA4 distincts par IP |
|
||||
|
||||
#### Concentration et rareté
|
||||
|
||||
| Feature | Description |
|
||||
|---------|-------------|
|
||||
| `src_port_density` | Densité des ports sources (entropy) |
|
||||
| `ja4_asn_concentration` | Concentration d'un même JA4 dans un ASN |
|
||||
| `ja4_country_concentration` | Concentration d'un même JA4 par pays |
|
||||
| `is_rare_ja4` | JA4 peu commun dans la population (flag 0/1) |
|
||||
|
||||
#### Dimensions temporelles et de diversité (académiques)
|
||||
|
||||
| Feature | Description |
|
||||
|---------|-------------|
|
||||
| `temporal_entropy` | Entropie de la distribution temporelle des requêtes |
|
||||
| `path_diversity_ratio` | Diversité des chemins URL accédés |
|
||||
| `url_depth_variance` | Variance de la profondeur des URL |
|
||||
| `anomalous_payload_ratio` | Fraction de payloads avec patterns anormaux |
|
||||
|
||||
### 4.3 Features additionnelles TCP/TLS (modèle Complet uniquement)
|
||||
|
||||
| Feature | Description |
|
||||
|---------|-------------|
|
||||
| `tcp_jitter_variance` | Variance de la gigue inter-paquets TCP |
|
||||
| `alpn_http_mismatch` | Incohérence entre ALPN négocié et protocole HTTP effectif |
|
||||
| `is_alpn_missing` | ALPN absent dans le TLS ClientHello |
|
||||
| `sni_host_mismatch` | Incohérence entre SNI TLS et Host HTTP |
|
||||
|
||||
---
|
||||
|
||||
## 5. Approche semi-supervisée
|
||||
|
||||
### 5.1 Fondement théorique
|
||||
|
||||
L'**Isolation Forest** (Liu, Ting & Zhou, 2008) est un algorithme d'apprentissage non supervisé conçu pour la détection d'anomalies. Son principe : les anomalies, étant rares et différentes, sont **isolées en moins de partitions** dans un arbre de décision aléatoire que les points normaux.
|
||||
|
||||
Le score de décision (`decision_function`) est normalisé entre -1 (très anormal) et +1 (très normal). Le paramètre `contamination` fixe la fraction de points considérés comme anomalies dans l'ensemble d'entraînement.
|
||||
|
||||
### 5.2 Dimension semi-supervisée
|
||||
|
||||
L'approche est **semi-supervisée** car :
|
||||
|
||||
1. **Étiquetage partiel** : Les bots connus (via réputation IP/JA4) et les humains (via réputation ASN) sont identifiés *a priori*.
|
||||
2. **Entraînement sur la classe normale uniquement** : L'IF est entraîné **exclusivement sur la baseline humaine** (`asn_label = 'human'`, `bot_name = ''`). Il apprend ainsi le profil du trafic légitime.
|
||||
3. **Détection par déviation** : Tout trafic inconnu qui s'éloigne du profil humain est scoré négativement.
|
||||
|
||||
Cette approche suit le paradigme **One-Class Classification** (Tax & Duin, 2004) appliqué à la détection de bots, proche des travaux de Kruegel & Vigna (2003) sur la détection d'anomalies réseau.
|
||||
|
||||
### 5.3 Qualité de la baseline humaine
|
||||
|
||||
Le minimum de 500 sessions humaines est une garde-fou empirique. En dessous de ce seuil, l'IF ne dispose pas de suffisamment d'exemples pour définir un profil normal robuste, augmentant le risque de faux positifs.
|
||||
|
||||
En pratique, les cycles observés montrent entre **1 264** et **1 725** sessions humaines par fenêtre d'une heure.
|
||||
|
||||
---
|
||||
|
||||
## 6. Gestion des modèles
|
||||
|
||||
### 6.1 Cycle de vie d'un modèle
|
||||
|
||||
```
|
||||
Démarrage cycle
|
||||
│
|
||||
▼
|
||||
Existe un .current ? ──NON──► Entraîner nouveau modèle
|
||||
│
|
||||
OUI
|
||||
│
|
||||
▼
|
||||
Âge < RETRAIN_INTERVAL_H ?
|
||||
│ │
|
||||
OUI NON
|
||||
│ │
|
||||
▼ └──► Entraîner nouveau modèle
|
||||
A1 : Drift check (MODEL_TRAINED)
|
||||
(z-score vs baseline_stats)
|
||||
│
|
||||
Drift ≥ DRIFT_THRESHOLD ?
|
||||
│ │
|
||||
NON OUI
|
||||
│ │
|
||||
Charger modèle Entraîner nouveau modèle
|
||||
(MODEL_LOADED) (DRIFT_DETECTED + MODEL_TRAINED)
|
||||
```
|
||||
|
||||
### 6.2 Versioning des modèles
|
||||
|
||||
Chaque modèle est identifié par un `version_id` au format `YYYYMMDD_HHMMSS`. Les fichiers associés sont :
|
||||
|
||||
- `model_{name}_{version_id}.joblib` — modèle sérialisé (joblib/pickle)
|
||||
- `model_{name}_{version_id}.meta.json` — métadonnées (features, contamination, nb samples, etc.)
|
||||
- `model_{name}.current` — pointeur atomique vers la version active
|
||||
|
||||
L'historique est limité à `MODEL_HISTORY_COUNT` versions (72 en production = 3 jours à 1 h de retrain).
|
||||
|
||||
Le fichier `.meta.json` contient maintenant un champ `baseline_stats` avec les statistiques de distribution (mean, std, p25, p75) de chaque feature, utilisées pour la détection de dérive (A1).
|
||||
|
||||
### 6.3 Paramètres Isolation Forest
|
||||
|
||||
```python
|
||||
IsolationForest(
|
||||
n_estimators=300, # Nombre d'arbres (compromis précision/temps)
|
||||
contamination=0.02, # 2% d'anomalies estimées dans la baseline
|
||||
random_state=42, # Reproductibilité
|
||||
n_jobs=-1 # Parallélisation sur tous les cores
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Données d'entrée — vue ClickHouse
|
||||
|
||||
### 7.1 Vue principale : `view_ai_features_1h`
|
||||
|
||||
Agrégation sur 1 heure glissante, une ligne par `src_ip`. Colonnes clés :
|
||||
|
||||
| Colonne | Type | Source |
|
||||
|---------|------|--------|
|
||||
| `src_ip` | String | TCP/IP |
|
||||
| `ja4` | String | TLS fingerprint (JA4+) |
|
||||
| `host` | String | HTTP Host header |
|
||||
| `bot_name` | String | Réputation IP/JA4 (vide si inconnu) |
|
||||
| `asn_number` | String | GeoIP/ASN lookup |
|
||||
| `asn_org` | String | Organisation ASN |
|
||||
| `asn_domain` | String | Domaine ASN |
|
||||
| `country_code` | String | Pays source |
|
||||
| `asn_label` | String | `human` / `bot` / `unknown` |
|
||||
| `correlated` | Int | 1 si TCP/TLS disponible, 0 sinon |
|
||||
| `hits` | Float | Nb requêtes |
|
||||
| `hit_velocity` | Float | Req/s |
|
||||
| *…(26+ features)* | Float | Voir section 4.2 |
|
||||
|
||||
### 7.2 Vue de récurrence : `view_ip_recurrence`
|
||||
|
||||
```sql
|
||||
SELECT src_ip, recurrence FROM {DB}.view_ip_recurrence
|
||||
```
|
||||
|
||||
Donne le nombre de fois qu'une IP a déjà été détectée comme menace dans l'historique. Enrichit le champ `recurrence` dans la sortie.
|
||||
|
||||
---
|
||||
|
||||
## 8. Données de sortie
|
||||
|
||||
### 8.1 Table ClickHouse : `ml_detected_anomalies`
|
||||
|
||||
Toutes les anomalies et bots connus détectés sont insérés dans cette table. Colonnes notables :
|
||||
|
||||
| Colonne | Description |
|
||||
|---------|-------------|
|
||||
| `detected_at` | Timestamp de détection |
|
||||
| `src_ip` | IP source |
|
||||
| `ja4` | Fingerprint TLS/JA4 (`HTTP_CLEAR_TEXT` si absent) |
|
||||
| `host` | Vhost ciblé |
|
||||
| `bot_name` | Nom du bot (vide si anomalie IF) |
|
||||
| `anomaly_score` | Score IF (0.0 pour bots connus) |
|
||||
| `threat_level` | `CRITICAL` / `HIGH` / `MEDIUM` / `LOW` / `KNOWN_BOT` |
|
||||
| `model_name` | `Complet` ou `Applicatif` |
|
||||
| `recurrence` | Nb d'apparitions historiques + 1 |
|
||||
| `reason` | Description textuelle de l'anomalie |
|
||||
| `is_headless` | Dérivé de `is_fake_navigation` |
|
||||
| *…(toutes les features)* | Pour analyse post-mortem |
|
||||
|
||||
### 8.2 Journal JSONL : `decisions.jsonl`
|
||||
|
||||
Événements structurés en JSON Lines, rotatifs (50 MB × 7 fichiers).
|
||||
|
||||
| Événement | Déclencheur |
|
||||
|-----------|-------------|
|
||||
| `SERVICE_START` | Démarrage du conteneur |
|
||||
| `SERVICE_STOP` | Arrêt propre (SIGTERM/SIGINT) |
|
||||
| `CYCLE_START` | Début d'un cycle d'analyse |
|
||||
| `CYCLE_END` | Fin du cycle (résumé inserés) |
|
||||
| `MODEL_LOADED` | Réutilisation d'un modèle existant |
|
||||
| `MODEL_TRAINED` | Nouvel entraînement |
|
||||
| `KNOWN_BOT` | Bot connu identifié |
|
||||
| `ANOMALY` | Anomalie IF détectée |
|
||||
| `SKIPPED_LOW_DATA` | Cycle ignoré (baseline < 500) |
|
||||
| `CONSECUTIVE_FAILURES` | Erreur ClickHouse répétée |
|
||||
|
||||
---
|
||||
|
||||
## 9. Configuration
|
||||
|
||||
Toutes les valeurs sont passées via variables d'environnement (fichier `.env`).
|
||||
|
||||
| Variable | Défaut | Description |
|
||||
|----------|--------|-------------|
|
||||
| `CLICKHOUSE_HOST` | `clickhouse` | Hôte ClickHouse |
|
||||
| `CLICKHOUSE_DB` | `mabase_prod` | Base de données |
|
||||
| `CLICKHOUSE_USER` | `default` | Utilisateur |
|
||||
| `CLICKHOUSE_PASSWORD` | *(vide)* | Mot de passe |
|
||||
| `ISOLATION_CONTAMINATION` | `0.001` | Fraction d'anomalies attendues (0 < x < 0.5) |
|
||||
| `ANOMALY_THRESHOLD` | `-0.05` | Seuil statique de score pour insertion |
|
||||
| `CYCLE_INTERVAL_SEC` | `300` | Délai entre cycles (secondes) |
|
||||
| `MAX_CONSECUTIVE_FAILURES` | `3` | Échecs avant passage en DEGRADED |
|
||||
| `BOT_DETECTOR_LOG` | `/var/log/bot_detector/decisions.jsonl` | Fichier de log |
|
||||
| `LOG_BACKUP_COUNT` | `7` | Nb de rotations conservées |
|
||||
| `MODEL_DIR` | `/var/lib/bot_detector` | Répertoire des modèles |
|
||||
| `RETRAIN_INTERVAL_HOURS` | `24` | Fréquence de re-entraînement |
|
||||
| `MODEL_HISTORY_COUNT` | `10` | Nb de versions de modèles conservées |
|
||||
| `HEALTH_PORT` | `8080` | Port du health check HTTP |
|
||||
| **A1** `DRIFT_THRESHOLD` | `0.30` | Fraction de features déroutantes déclenchant un retrain forcé |
|
||||
| **A2** `ANOMALY_PERCENTILE` | `5` | Percentile pour le seuil adaptatif (0–20) |
|
||||
| **A3** `ENABLE_MULTIWINDOW` | `false` | Active l'analyse sur fenêtre 24h |
|
||||
| **A3** `MULTIWINDOW_VIEW` | `view_ai_features_24h` | Nom de la vue 24h dans ClickHouse |
|
||||
| **A4** `ENABLE_SHAP` | `true` | Active le calcul SHAP (désactivé si shap non installé) |
|
||||
| **A5** `DEDUP_TTL_MIN` | `60` | TTL de déduplication inter-cycles (0 = désactivé) |
|
||||
| **A6** `RECURRENCE_WEIGHT` | `0.005` | Pénalité de score par log(récurrence) |
|
||||
| **A7** `MIN_VALID_FEATURE_RATIO` | `0.50` | Ratio minimum de features valides pour procéder |
|
||||
| **A8** `ENABLE_CLUSTERING` | `true` | Active le clustering DBSCAN des anomalies |
|
||||
| **A8** `CLUSTERING_MIN_SAMPLES` | `3` | Taille minimale d'un cluster DBSCAN |
|
||||
|
||||
---
|
||||
|
||||
## 10. Observabilité
|
||||
|
||||
### 10.1 Health check
|
||||
|
||||
```bash
|
||||
GET http://localhost:8080/
|
||||
# → 200 OK service opérationnel
|
||||
# → 503 DEGRADED ≥ MAX_CONSECUTIVE_FAILURES échecs ClickHouse consécutifs
|
||||
```
|
||||
|
||||
### 10.2 Logs opérationnels
|
||||
|
||||
Les logs console suivent le format `[YYYY-MM-DD HH:MM:SS] message`. Le fichier JSONL permet des analyses post-mortem avec des outils comme `jq` :
|
||||
|
||||
```bash
|
||||
# Voir les dernières anomalies CRITICAL
|
||||
jq 'select(.event=="ANOMALY" and .threat_level=="CRITICAL")' decisions.jsonl
|
||||
|
||||
# Voir les top features SHAP pour les anomalies HIGH
|
||||
jq 'select(.event=="ANOMALY" and .threat_level=="HIGH") | .reason' decisions.jsonl
|
||||
|
||||
# Détecter les dérives de distribution
|
||||
jq 'select(.event=="DRIFT_DETECTED")' decisions.jsonl
|
||||
|
||||
# Voir les campagnes coordonnées (campaign_id >= 0)
|
||||
jq 'select(.event=="ANOMALY" and .campaign_id >= 0) | {src_ip, campaign_id, threat_level}' decisions.jsonl
|
||||
|
||||
# Compter les bots connus par nom
|
||||
jq -r 'select(.event=="KNOWN_BOT") | .bot_name' decisions.jsonl | sort | uniq -c | sort -rn
|
||||
|
||||
# Résumé des cycles
|
||||
jq 'select(.event=="CYCLE_END")' decisions.jsonl
|
||||
```
|
||||
|
||||
| Événement | Déclencheur |
|
||||
|-----------|-------------|
|
||||
| `SERVICE_START` | Démarrage du conteneur |
|
||||
| `SERVICE_STOP` | Arrêt propre (SIGTERM/SIGINT) |
|
||||
| `CYCLE_START` | Début d'un cycle d'analyse |
|
||||
| `CYCLE_END` | Fin du cycle (résumé insertés + dedup_ttl_min) |
|
||||
| `MODEL_LOADED` | Réutilisation d'un modèle existant (+ drift_score) |
|
||||
| `MODEL_TRAINED` | Nouvel entraînement |
|
||||
| `DRIFT_DETECTED` | Dérive conceptuelle détectée → retrain forcé |
|
||||
| `FEATURE_WARNING` | Features manquantes / constantes / agrégats globaux détectés (loggué uniquement si la situation change) |
|
||||
| `SKIPPED_INVALID_FEATURES` | Cycle ignoré (trop peu de features valides) |
|
||||
| `KNOWN_BOT` | Bot connu identifié |
|
||||
| `ANOMALY` | Anomalie IF détectée (+ effective_threshold, campaign_id, raw_anomaly_score) |
|
||||
| `SKIPPED_LOW_DATA` | Cycle ignoré (baseline < 500) |
|
||||
| `CONSECUTIVE_FAILURES` | Erreur ClickHouse répétée |
|
||||
|
||||
### 10.3 Avertissements sur les features (A7)
|
||||
|
||||
Les avertissements de features ne sont affichés en console **qu'une seule fois** (à la première détection ou lors d'un changement). Les cycles suivants avec la même situation ne génèrent pas de bruit. L'événement `FEATURE_WARNING` reste dans le JSONL pour traçabilité.
|
||||
|
||||
| Catégorie | Message console | Cause typique |
|
||||
|-----------|-----------------|---------------|
|
||||
| `zero` | `Features à 0 (pipeline non-alimenté)` | Table source vide / LEFT JOIN sans match |
|
||||
| `unique_nonzero` | `Features non-discriminantes (agrégat global)` | `PARTITION BY` sur valeur NULL → partition unique |
|
||||
| `missing` | `Features absentes du schéma` | Colonne manquante dans la vue ClickHouse |
|
||||
|
||||
Voir [`CLICKHOUSE_FEATURES_DIAGNOSTIC.md`](CLICKHOUSE_FEATURES_DIAGNOSTIC.md) pour le détail des corrections ClickHouse nécessaires.
|
||||
|
||||
### 11.1 Sources de réputation
|
||||
|
||||
| Fichier | Format | Contenu |
|
||||
|---------|--------|---------|
|
||||
| `bot_ip.csv` | `ip_cidr,bot_name` | ~288 k IP/CIDR de bots référencés |
|
||||
| `bot_ja4.csv` | `ja4,bot_name` | Fingerprints JA4 de bots |
|
||||
| `asn_reputation.csv` | `asn_number,label` | Labels ASN (human/bot) |
|
||||
|
||||
Ces fichiers sont montés en lecture seule dans le conteneur. Ils sont écrits par ClickHouse (FILE engine) et partagés via volume Docker.
|
||||
|
||||
### 11.2 Hiérarchie de classification
|
||||
|
||||
```
|
||||
1. bot_name != '' (depuis view_ai_features_1h)
|
||||
→ KNOWN_BOT : bot identifié par réputation IP ou JA4
|
||||
|
||||
2. asn_label == 'human' (depuis view_ai_features_1h)
|
||||
→ Utilisé pour la baseline d'entraînement de l'IF
|
||||
|
||||
3. Trafic restant
|
||||
→ Scoré par Isolation Forest
|
||||
→ Anomalie si score < ANOMALY_THRESHOLD
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 12. Fondements scientifiques
|
||||
|
||||
### 12.1 Isolation Forest (Liu et al., 2008)
|
||||
|
||||
L'algorithme repose sur la propriété que les anomalies sont **isolées plus rapidement** dans des arbres de partitionnement aléatoire. La longueur moyenne du chemin d'isolation est normalisée pour produire un score entre 0 et 1 (transposé ici en -1 à +1 par `decision_function`).
|
||||
|
||||
**Propriétés clés :**
|
||||
- Complexité O(n log n) pour l'entraînement
|
||||
- Robuste aux données de haute dimensionnalité (31–35 features ici)
|
||||
- Pas d'hypothèse sur la distribution des données
|
||||
- Efficace sur de grands volumes (n_estimators=300, n_jobs=-1)
|
||||
|
||||
### 12.2 JA4+ Fingerprinting (FoxIO, 2023)
|
||||
|
||||
JA4 est la 4e génération de fingerprints TLS/QUIC/HTTP, successeur de JA3. Il capture les caractéristiques du ClientHello TLS (versions, ciphers, extensions) en une empreinte compacte permettant d'identifier des familles de clients (navigateurs, bots, outils). L'utilisation de `is_rare_ja4`, `distinct_ja4_count` et `ja4_asn_concentration` exploite cette propriété.
|
||||
|
||||
### 12.3 One-Class Classification appliquée aux bots
|
||||
|
||||
L'approche s'inscrit dans la lignée des travaux sur la détection de bots web :
|
||||
- **Stevanovic et al. (2013)** : détection de bots par analyse comportementale de flux HTTP
|
||||
- **Kruegel & Vigna (2003)** : détection d'anomalies réseau par profils normaux
|
||||
- **Barford & Yegneswaran (2007)** : classification comportementale des botnets
|
||||
|
||||
La combinaison de features HTTP comportementales (velocity, fuzzing, post_ratio), de features d'empreinte (JA4, headers), et de features TCP/TLS (jitter, ALPN, SNI) reproduit l'approche multi-couche recommandée par la littérature récente.
|
||||
|
||||
### 12.4 Entropie temporelle comme signal d'anomalie
|
||||
|
||||
Le feature `temporal_entropy` mesure l'entropie de Shannon sur la distribution temporelle des requêtes dans la fenêtre. Un bot avec un timing régulier (scripted polling) produit une entropie faible, tandis qu'un humain naviguant naturellement produit une distribution plus aléatoire. Ce signal est utilisé dans les travaux de **Wang et al. (2014)** sur la détection de crawlers web.
|
||||
|
||||
---
|
||||
|
||||
## 13. Améliorations implémentées (v11)
|
||||
|
||||
### A1 — Détection de dérive conceptuelle
|
||||
|
||||
**Fonctionnement** : À chaque cycle, avant de décider de charger ou de réentraîner le modèle, on compare la distribution courante de la baseline humaine avec celle sauvegardée lors du dernier entraînement. Pour chaque feature, un z-score est calculé :
|
||||
|
||||
```
|
||||
z = |mean_current - mean_trained| / std_trained
|
||||
```
|
||||
|
||||
Si la fraction de features avec `z > 2.0` dépasse `DRIFT_THRESHOLD` (30% par défaut), un re-entraînement est forcé et l'événement `DRIFT_DETECTED` est loggué.
|
||||
|
||||
**Métadonnées sauvegardées** : `baseline_stats` dans le `.meta.json` contient `{mean, std, p25, p75}` par feature.
|
||||
|
||||
**Références** : Gama et al. (2014) — *A Survey on Concept Drift Adaptation*
|
||||
|
||||
---
|
||||
|
||||
### A2 — Seuil adaptatif par percentile
|
||||
|
||||
**Fonctionnement** :
|
||||
|
||||
```python
|
||||
effective_threshold = min(np.percentile(raw_scores[raw_scores < 0], ANOMALY_PERCENTILE),
|
||||
ANOMALY_THRESHOLD)
|
||||
```
|
||||
|
||||
Le seuil effectif est le minimum entre le `ANOMALY_PERCENTILE`-ème percentile des scores négatifs et le seuil statique. Cela garantit que le seuil ne peut pas remonter au-dessus du seuil configuré, mais peut s'adapter vers le bas selon la distribution courante.
|
||||
|
||||
Le seuil utilisé est loggué dans chaque événement `ANOMALY`.
|
||||
|
||||
---
|
||||
|
||||
### A3 — Analyse multi-fenêtres (optionnelle)
|
||||
|
||||
**Activation** : `ENABLE_MULTIWINDOW=true` + une vue `view_ai_features_24h` dans ClickHouse.
|
||||
|
||||
**Fonctionnement** : Deux paires de modèles supplémentaires (`Complet_24h`, `Applicatif_24h`) tournent sur la fenêtre de 24h. Les anomalies des deux fenêtres sont fusionnées via une logique OR : une IP est flaggée si elle est anormale dans au moins une fenêtre. En cas de doublon, le score le plus bas (le plus anormal) est conservé.
|
||||
|
||||
**Utilité** : Détection des bots low-and-slow invisibles sur 1h mais clairement anormaux sur 24h.
|
||||
|
||||
---
|
||||
|
||||
### A4 — Explainabilité par SHAP
|
||||
|
||||
**Fonctionnement** : Pour chaque anomalie détectée, `shap.TreeExplainer` calcule la contribution de chaque feature au score d'anomalie. Les 5 features les plus négatives (les plus responsables de l'anomalie) sont incluses dans le champ `reason` :
|
||||
|
||||
```
|
||||
[Complet] Score: -0.112 | SHAP: is_alpn_missing(-1.081) | tcp_jitter_variance(-1.073) |
|
||||
ja4_asn_concentration(-1.062) | temporal_entropy(-0.887) |
|
||||
direct_access_ratio(-0.886) | Threat: MEDIUM
|
||||
```
|
||||
|
||||
**Désactivation** : `ENABLE_SHAP=false` ou si le package `shap` n'est pas installé.
|
||||
|
||||
**Références** : Lundberg & Lee (2017) — *A Unified Approach to Interpreting Model Predictions*
|
||||
|
||||
---
|
||||
|
||||
### A5 — Déduplication inter-cycles avec TTL
|
||||
|
||||
**Fonctionnement** : Avant chaque insertion, la table `ml_detected_anomalies` est interrogée pour identifier les IPs déjà insérées dans les `DEDUP_TTL_MIN` dernières minutes. Une IP est réinsérée uniquement si son score brut s'est dégradé d'au moins 0.05 points.
|
||||
|
||||
**Désactivation** : `DEDUP_TTL_MIN=0`
|
||||
|
||||
---
|
||||
|
||||
### A6 — Pondération du score par récurrence
|
||||
|
||||
**Fonctionnement** :
|
||||
|
||||
```python
|
||||
raw_score_adjusted = raw_score - log1p(recurrence) × RECURRENCE_WEIGHT
|
||||
```
|
||||
|
||||
Une IP détectée 10 fois reçoit une pénalité de `log(11) × 0.005 ≈ 0.012` sur son score brut, ce qui la rapproche du seuil de détection. Ce mécanisme simule un prior bayésien : les IPs récidivistes sont plus probablement malveillantes.
|
||||
|
||||
---
|
||||
|
||||
### A7 — Validation de complétude des features
|
||||
|
||||
**Fonctionnement** : Avant entraînement et scoring, `validate_features()` détecte :
|
||||
- Les features absentes de la vue ClickHouse
|
||||
- Les features constantes (std = 0, donc non discriminantes)
|
||||
|
||||
Les features invalides sont exclues du modèle. Si la fraction de features valides est inférieure à `MIN_VALID_FEATURE_RATIO` (50%), le cycle est ignoré.
|
||||
|
||||
**Bénéfice** : Les features constantes (souvent dues à des colonnes non encore implémentées dans la vue) ne biaisent plus le modèle.
|
||||
|
||||
---
|
||||
|
||||
### A8 — Clustering comportemental (DBSCAN)
|
||||
|
||||
**Fonctionnement** : Après détection, DBSCAN est appliqué sur les features normalisées des anomalies :
|
||||
|
||||
```python
|
||||
X_scaled = StandardScaler().fit_transform(anomalies[valid_features])
|
||||
labels = DBSCAN(eps=0.5, min_samples=CLUSTERING_MIN_SAMPLES).fit_predict(X_scaled)
|
||||
```
|
||||
|
||||
- `campaign_id = -1` : IP isolée (comportement unique)
|
||||
- `campaign_id >= 0` : membre d'une campagne coordonnée
|
||||
|
||||
Le `campaign_id` est loggué dans les événements `ANOMALY` (JSONL). Il n'est pas encore dans le schéma ClickHouse (voir §14).
|
||||
|
||||
**Références** : Ester et al. (1996) — *A Density-Based Algorithm for Discovering Clusters*
|
||||
|
||||
---
|
||||
|
||||
### A10 — Normalisation des scores entre modèles
|
||||
|
||||
**Fonctionnement** :
|
||||
|
||||
```python
|
||||
# Scores négatifs normalisés en [-1, 0], scores positifs inchangés
|
||||
anomaly_score_normalized = normalize_scores(raw_score)
|
||||
```
|
||||
|
||||
Le champ `anomaly_score` dans ClickHouse contient désormais le score normalisé, permettant une comparaison cohérente entre le modèle Complet (35 features) et le modèle Applicatif (31 features). Le score brut IF est conservé dans `raw_anomaly_score` (logs JSONL uniquement) et est utilisé pour l'assignation du threat level.
|
||||
|
||||
---
|
||||
|
||||
## 14. Migration de schéma ClickHouse
|
||||
|
||||
Les nouvelles colonnes suivantes sont disponibles dans les logs JSONL mais pas encore dans la table `ml_detected_anomalies`. Pour les activer :
|
||||
|
||||
```sql
|
||||
ALTER TABLE mabase_prod.ml_detected_anomalies
|
||||
ADD COLUMN IF NOT EXISTS campaign_id Int32 DEFAULT -1,
|
||||
ADD COLUMN IF NOT EXISTS raw_anomaly_score Float32 DEFAULT 0;
|
||||
```
|
||||
|
||||
Après cette migration, ajouter ces colonnes à la liste `cols` dans `fetch_and_analyze()` (elles sont déjà calculées en mémoire).
|
||||
756
services/bot-detector/IMPROVEMENTS.md
Normal file
756
services/bot-detector/IMPROVEMENTS.md
Normal file
@ -0,0 +1,756 @@
|
||||
# Bot Detector IA — Axes d'amélioration
|
||||
|
||||
> Document de propositions techniques — à valider avant implémentation
|
||||
|
||||
---
|
||||
|
||||
## Résumé des axes proposés
|
||||
|
||||
| # | Axe | Impact | Complexité | Priorité suggérée |
|
||||
|---|-----|--------|------------|-------------------|
|
||||
| A1 | [Détection de dérive conceptuelle (concept drift)](#a1-détection-de-dérive-conceptuelle) | 🔴 Élevé | Moyenne | ⭐⭐⭐ |
|
||||
| A2 | [Seuil adaptatif par percentile](#a2-seuil-adaptatif-par-percentile) | 🔴 Élevé | Faible | ⭐⭐⭐ |
|
||||
| A3 | [Analyse multi-fenêtres temporelles](#a3-analyse-multi-fenêtres-temporelles) | 🔴 Élevé | Élevée | ⭐⭐ |
|
||||
| A4 | [Explainabilité par SHAP](#a4-explainabilité-par-shap) | 🟠 Moyen | Moyenne | ⭐⭐⭐ |
|
||||
| A5 | [Déduplication avec TTL inter-cycles](#a5-déduplication-avec-ttl-inter-cycles) | 🟠 Moyen | Faible | ⭐⭐⭐ |
|
||||
| A6 | [Pondération par récurrence dans le score](#a6-pondération-par-récurrence-dans-le-score) | 🟠 Moyen | Faible | ⭐⭐ |
|
||||
| A7 | [Validation de complétude des features](#a7-validation-de-complétude-des-features) | 🟠 Moyen | Faible | ⭐⭐⭐ |
|
||||
| A8 | [Clustering comportemental des anomalies](#a8-clustering-comportemental-des-anomalies) | 🟡 Utile | Moyenne | ⭐⭐ |
|
||||
| A9 | [Métriques Prometheus / health check enrichi](#a9-métriques-prometheus--health-check-enrichi) | 🟡 Utile | Faible | ⭐⭐ |
|
||||
| A10 | [Normalisation des scores entre modèles](#a10-normalisation-des-scores-entre-modèles) | 🟡 Utile | Faible | ⭐ |
|
||||
|
||||
---
|
||||
|
||||
## A1 — Détection de dérive conceptuelle
|
||||
|
||||
### Problème
|
||||
|
||||
L'Isolation Forest est entraîné sur la baseline humaine courante. Si le profil du trafic légitime évolue graduellement (nouveau navigateur populaire, changement de comportement utilisateur, migration réseau), le modèle vieilli peut :
|
||||
- Générer des **faux positifs** sur du trafic humain nouvellement apparu
|
||||
- Rater des **faux négatifs** si les bots imitent les anciens patterns
|
||||
|
||||
Le re-entraînement périodique (toutes les X heures) atténue le problème mais ne détecte pas quand une dérive significative a eu lieu **entre deux cycles de retraining**.
|
||||
|
||||
### Approche proposée
|
||||
|
||||
Calculer à chaque cycle un score de **dérive statistique** entre la baseline d'entraînement du modèle actif et la baseline courante. Si la dérive dépasse un seuil, forcer un re-entraînement anticipé.
|
||||
|
||||
**Méthode : Kolmogorov-Smirnov (KS test) ou Maximum Mean Discrepancy (MMD)**
|
||||
|
||||
Pour chaque feature :
|
||||
```python
|
||||
from scipy import stats
|
||||
ks_stat, p_value = stats.ks_2samp(baseline_trained[feat], baseline_current[feat])
|
||||
```
|
||||
|
||||
Si la fraction de features avec `p_value < 0.05` dépasse un seuil configurable (ex. 30%), déclencher un retrain et logguer un événement `DRIFT_DETECTED`.
|
||||
|
||||
### Bénéfices
|
||||
- Retrain opportuniste plutôt que temporel fixe
|
||||
- Détection proactive des changements de comportement réseau
|
||||
- Réduction des faux positifs liés à la dérive
|
||||
|
||||
### Références
|
||||
- Gama et al. (2014) — *A Survey on Concept Drift Adaptation*
|
||||
- Rabanser et al. (2019) — *Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift*
|
||||
|
||||
### Implémentation suggérée
|
||||
- Sauvegarder la distribution de la baseline d'entraînement dans le `.meta.json`
|
||||
- Calculer le KS test au début de chaque cycle avant la décision de chargement
|
||||
- Ajouter un paramètre `DRIFT_THRESHOLD` (défaut : 0.30)
|
||||
- Logguer l'événement `DRIFT_DETECTED` avec les features déroutantes
|
||||
|
||||
---
|
||||
|
||||
## A2 — Seuil adaptatif par percentile
|
||||
|
||||
### Problème
|
||||
|
||||
`ANOMALY_THRESHOLD = -0.03` est un seuil **global et statique**. Ce seuil a une signification différente selon :
|
||||
- Le volume de trafic (plus de trafic = distribution de scores plus resserrée)
|
||||
- La contamination effective du cycle (jour calme vs attaque active)
|
||||
- Les caractéristiques du modèle actif (entraîné sur 1 264 vs 1 725 sessions)
|
||||
|
||||
Un seuil fixe peut produire des **rafales de faux positifs** lors d'événements légitimes inhabituels (campagne marketing, crawler partenaire) ou rater des menaces réelles lors de trafic atypique.
|
||||
|
||||
### Approche proposée
|
||||
|
||||
Calculer dynamiquement le seuil à partir de la **distribution des scores du cycle courant** :
|
||||
|
||||
```python
|
||||
scores = model.decision_function(X_test)
|
||||
# Seuil = percentile P de la distribution des scores négatifs
|
||||
adaptive_threshold = np.percentile(scores, ANOMALY_PERCENTILE)
|
||||
# On prend le min avec le seuil statique pour éviter d'aller trop haut
|
||||
threshold = min(adaptive_threshold, ANOMALY_THRESHOLD)
|
||||
```
|
||||
|
||||
**Paramètre ajoutable** : `ANOMALY_PERCENTILE` (défaut : 5 → top 5% des scores les plus négatifs).
|
||||
|
||||
Cette approche est complémentaire au seuil statique (garde-fou) : elle s'adapte vers le bas mais ne remonte jamais au-dessus du seuil configuré.
|
||||
|
||||
### Bénéfices
|
||||
- Stabilité du taux de faux positifs au fil du temps
|
||||
- Auto-adaptation aux variations de volume
|
||||
- Comportement plus prédictible en production
|
||||
|
||||
### Implémentation suggérée
|
||||
- Ajouter `ANOMALY_PERCENTILE` (0–20, défaut 5) comme variable d'environnement
|
||||
- Calculer le seuil adaptatif dans `run_semi_supervised_logic()`
|
||||
- Logguer le seuil effectif utilisé dans `CYCLE_START` / `ANOMALY`
|
||||
|
||||
---
|
||||
|
||||
## A3 — Analyse multi-fenêtres temporelles
|
||||
|
||||
### Problème
|
||||
|
||||
La fenêtre 1h est un compromis. Elle manque :
|
||||
- Les **attaques rapides** (burst de quelques minutes) : le signal est dilué
|
||||
- Les **bots lents** (low-and-slow, 1–2 req/min sur 24h) : comportement normal sur 1h
|
||||
|
||||
### Approche proposée
|
||||
|
||||
Ajouter une deuxième vue ClickHouse agrégée sur **24h** et un troisième modèle sur cette fenêtre. Les scores des deux modèles peuvent être combinés :
|
||||
|
||||
```
|
||||
score_final = w1 * score_1h + w2 * score_24h
|
||||
```
|
||||
|
||||
Ou, plus simplement, un AND logique : une IP n'est flaggée que si elle est anomalie sur les **deux fenêtres**, réduisant drastiquement les faux positifs.
|
||||
|
||||
### Bénéfices
|
||||
- Détection des bots low-and-slow (reconnaissance, scraping discret)
|
||||
- Réduction des faux positifs par corrélation multi-temporelle
|
||||
- Complémentarité avec le modèle 1h existant
|
||||
|
||||
### Considerations
|
||||
- Nécessite une vue `view_ai_features_24h` dans ClickHouse
|
||||
- Modèle 24h beaucoup plus stable (moins de bruit)
|
||||
- Le volume de données à traiter augmente
|
||||
|
||||
### Références
|
||||
- Stalmans & Irwin (2011) — *A Framework for Web Bot Detection Using Request Rate Monitoring*
|
||||
- Stevanovic et al. (2013) — *An Efficient Flow-based Botnet Detection Using Supervised Machine Learning*
|
||||
|
||||
---
|
||||
|
||||
## A4 — Explainabilité par SHAP
|
||||
|
||||
### Problème
|
||||
|
||||
Le champ `reason` actuel est basique :
|
||||
```
|
||||
"[Complet] Score: -0.312 | Vel: 45.2 req/s | Fuzzing: 8.3 | Threat: CRITICAL"
|
||||
```
|
||||
|
||||
Pour un opérateur de sécurité, il manque :
|
||||
- **Quelles features** ont le plus contribué à ce score ?
|
||||
- Est-ce principalement comportemental (velocity) ou fingerprint (JA4) ?
|
||||
- Comment comparer deux anomalies de même score ?
|
||||
|
||||
### Approche proposée
|
||||
|
||||
Utiliser **TreeSHAP** (Lundberg & Lee, 2017) qui supporte nativement les forêts d'arbres :
|
||||
|
||||
```python
|
||||
import shap
|
||||
explainer = shap.TreeExplainer(model)
|
||||
shap_values = explainer.shap_values(X_test.iloc[[idx]])
|
||||
top_features = sorted(zip(features, shap_values[0]), key=lambda x: abs(x[1]), reverse=True)[:5]
|
||||
```
|
||||
|
||||
Enrichir le champ `reason` avec les 5 features les plus contributives et leur valeur SHAP.
|
||||
|
||||
### Bénéfices
|
||||
- Triage des alertes facilité pour les analystes SOC
|
||||
- Détection des features systématiquement sur-représentées (potentiel bug de feature engineering)
|
||||
- Conformité avec les exigences de traçabilité des décisions IA
|
||||
|
||||
### Implémentation suggérée
|
||||
- Ajouter `shap` aux requirements (compatible sklearn)
|
||||
- Calculer SHAP uniquement pour les IP flaggées (pas sur tout le dataset)
|
||||
- Stocker `shap_top5` comme JSON dans le log JSONL
|
||||
- Option : `ENABLE_SHAP=true/false` pour contrôler la charge CPU
|
||||
|
||||
### Références
|
||||
- Lundberg & Lee (2017) — *A Unified Approach to Interpreting Model Predictions*
|
||||
|
||||
---
|
||||
|
||||
## A5 — Déduplication avec TTL inter-cycles
|
||||
|
||||
### Problème
|
||||
|
||||
Avec un cycle de 5 min et une fenêtre 1h, la même IP malveillante est potentiellement **réinsérée 12 fois par heure** dans `ml_detected_anomalies`. Cela :
|
||||
- Gonfle la table artificellement
|
||||
- Complique les requêtes d'analyse (nécessite un DISTINCT)
|
||||
- Fausse les métriques de comptage
|
||||
|
||||
Le mécanisme actuel de `drop_duplicates(subset=['src_ip'])` ne fonctionne qu'au sein d'un seul cycle, pas entre cycles.
|
||||
|
||||
### Approche proposée
|
||||
|
||||
Avant insertion, interroger ClickHouse pour filtrer les IPs déjà insérées récemment :
|
||||
|
||||
```python
|
||||
# Récupérer les IPs déjà détectées dans les N dernières minutes
|
||||
recent_ips = client.query_df(f"""
|
||||
SELECT DISTINCT src_ip
|
||||
FROM {DB}.ml_detected_anomalies
|
||||
WHERE detected_at > now() - INTERVAL {DEDUP_TTL_MIN} MINUTE
|
||||
""")
|
||||
# Exclure ces IPs sauf si le score s'est dégradé significativement
|
||||
new_anomalies = anomalies[~anomalies['src_ip'].isin(recent_ips['src_ip'])]
|
||||
```
|
||||
|
||||
**Paramètre ajoutable** : `DEDUP_TTL_MIN` (défaut : 60 minutes).
|
||||
|
||||
**Variante** : ne re-insérer que si `new_score < existing_score - 0.05` (dégradation significative).
|
||||
|
||||
### Bénéfices
|
||||
- Réduction du volume de la table de détection
|
||||
- Requêtes d'analyse plus simples
|
||||
- Gestion de la montée en charge (moins d'insertions)
|
||||
|
||||
### Implémentation suggérée
|
||||
- Paramètre `DEDUP_TTL_MIN` (0 pour désactiver)
|
||||
- La requête de déduplication est légère (index sur `detected_at`)
|
||||
- Logguer le nb d'IP filtrées dans `CYCLE_END`
|
||||
|
||||
---
|
||||
|
||||
## A6 — Pondération par récurrence dans le score
|
||||
|
||||
### Problème
|
||||
|
||||
La récurrence est actuellement un champ **informatif seulement** : une IP détectée 50 fois a le même seuil de filtrage qu'une IP vue pour la première fois. Un bot persistant et connu ne reçoit pas de pénalité de score.
|
||||
|
||||
### Approche proposée
|
||||
|
||||
Ajuster le score de décision en fonction de la récurrence :
|
||||
|
||||
```python
|
||||
# Score ajusté : plus une IP est récurrente, plus son score s'aggrave
|
||||
recurrence_penalty = np.log1p(recurrence) * RECURRENCE_WEIGHT
|
||||
adjusted_score = anomaly_score - recurrence_penalty
|
||||
```
|
||||
|
||||
Avec `RECURRENCE_WEIGHT = 0.005` par défaut (configurable). Une IP vue 10 fois voit son score pénalisé de ~0.012, une IP vue 100 fois de ~0.023.
|
||||
|
||||
Cette approche simule un **Prior bayésien** : la probabilité qu'une IP soit malveillante augmente avec ses détections passées.
|
||||
|
||||
### Bénéfices
|
||||
- Menaces persistantes classifiées plus sévèrement
|
||||
- Réduction du bruit des anomalies éphémères
|
||||
- Signal plus fort pour les blocages automatisés
|
||||
|
||||
### Implémentation suggérée
|
||||
- Ajouter `RECURRENCE_WEIGHT` (défaut 0.005, 0 pour désactiver)
|
||||
- Stocker `raw_score` et `adjusted_score` séparément dans les logs
|
||||
|
||||
---
|
||||
|
||||
## A7 — Validation de complétude des features
|
||||
|
||||
### Problème
|
||||
|
||||
Si une feature est absente de la vue (colonne manquante, erreur de schéma), elle est silencieusement remplacée par `0` via `fillna(0)`. Cela **dégrade la qualité du modèle sans avertissement** : une feature entièrement à zéro n'apporte aucune information discriminante et biaise les scores.
|
||||
|
||||
### Approche proposée
|
||||
|
||||
Au début de chaque cycle, après chargement du DataFrame :
|
||||
|
||||
```python
|
||||
def validate_features(df: pd.DataFrame, features: list, name: str) -> list:
|
||||
zero_features = [f for f in features if f in df.columns and df[f].std() == 0]
|
||||
missing_features = [f for f in features if f not in df.columns]
|
||||
|
||||
if missing_features:
|
||||
log_info(f"[{name}] ATTENTION: {len(missing_features)} features manquantes: {missing_features}")
|
||||
if zero_features:
|
||||
log_info(f"[{name}] ATTENTION: {len(zero_features)} features constantes (=0): {zero_features}")
|
||||
|
||||
# Retourner uniquement les features exploitables
|
||||
valid = [f for f in features if f in df.columns and df[f].std() > 0]
|
||||
return valid
|
||||
```
|
||||
|
||||
Un événement `FEATURE_WARNING` serait loggué, et si plus de 20% des features sont invalides, le cycle peut être `SKIPPED`.
|
||||
|
||||
### Bénéfices
|
||||
- Détection rapide des régressions de schéma ClickHouse
|
||||
- Qualité de modèle assurée
|
||||
- Facilite le debugging lors des évolutions de la vue
|
||||
|
||||
### Implémentation suggérée
|
||||
- Paramètre `MIN_VALID_FEATURE_RATIO` (défaut 0.8)
|
||||
- Comparaison avec les features du modèle chargé (détecte les dérives de schéma post-mise à jour)
|
||||
|
||||
---
|
||||
|
||||
## A8 — Clustering comportemental des anomalies
|
||||
|
||||
### Problème
|
||||
|
||||
Les anomalies sont analysées et insérées individuellement. Or, une campagne de botnet coordonnée peut impliquer des **dizaines d'IPs avec des profils similaires**. Cette information de **corrélation horizontale** est aujourd'hui invisible.
|
||||
|
||||
### Approche proposée
|
||||
|
||||
Après la détection, appliquer un **DBSCAN** sur les features des anomalies pour identifier des clusters d'attaque :
|
||||
|
||||
```python
|
||||
from sklearn.cluster import DBSCAN
|
||||
X_anomalies = anomalies[features].fillna(0)
|
||||
scaler = StandardScaler()
|
||||
X_scaled = scaler.fit_transform(X_anomalies)
|
||||
labels = DBSCAN(eps=0.5, min_samples=3).fit_predict(X_scaled)
|
||||
anomalies['campaign_id'] = labels # -1 = isolé, 0+ = cluster
|
||||
```
|
||||
|
||||
Les IPs d'un même cluster partagent un comportement similaire et peuvent faire partie d'une même infrastructure d'attaque.
|
||||
|
||||
### Bénéfices
|
||||
- Identification des campagnes coordonnées (botnets distribués)
|
||||
- Enrichissement de `reason` avec un identifiant de campagne
|
||||
- Permet des blocages de plages d'IPs entières
|
||||
|
||||
### Implémentation suggérée
|
||||
- DBSCAN uniquement si ≥ 5 anomalies dans le cycle (pas de coût si peu d'anomalies)
|
||||
- Stocker `campaign_id` dans `ml_detected_anomalies`
|
||||
- `eps` et `min_samples` configurables
|
||||
|
||||
### Références
|
||||
- Ester et al. (1996) — *A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases*
|
||||
|
||||
---
|
||||
|
||||
## A9 — Métriques Prometheus / health check enrichi
|
||||
|
||||
### Problème
|
||||
|
||||
Le health check actuel est binaire (OK/DEGRADED). Cela ne permet pas :
|
||||
- De monitorer la dérive du taux d'anomalies dans le temps
|
||||
- D'alerter si aucun cycle ne s'est exécuté depuis X minutes
|
||||
- De suivre l'âge du modèle en production
|
||||
|
||||
### Approche proposée
|
||||
|
||||
Exposer un endpoint `/metrics` au format **Prometheus text** sur le même port :
|
||||
|
||||
```
|
||||
# HELP botdetector_cycle_duration_seconds Duration of last analysis cycle
|
||||
# TYPE botdetector_cycle_duration_seconds gauge
|
||||
botdetector_cycle_duration_seconds 12.4
|
||||
|
||||
# HELP botdetector_anomalies_total Total anomalies detected in last cycle
|
||||
# TYPE botdetector_anomalies_total gauge
|
||||
botdetector_anomalies_total{model="Complet"} 3
|
||||
botdetector_anomalies_total{model="Applicatif"} 7
|
||||
|
||||
# HELP botdetector_model_age_hours Age of active model in hours
|
||||
botdetector_model_age_hours{model="Applicatif"} 0.91
|
||||
|
||||
# HELP botdetector_human_baseline_size Nb of human samples used for training
|
||||
botdetector_human_baseline_size{model="Applicatif"} 1725
|
||||
```
|
||||
|
||||
Implémenté sans dépendance externe (format texte manuel ou lib légère `prometheus_client`).
|
||||
|
||||
### Bénéfices
|
||||
- Intégration Grafana/Alertmanager
|
||||
- Alertes sur dérive du taux d'anomalies (ex. : >50% d'une heure à l'autre)
|
||||
- Monitoring de la fraîcheur du modèle
|
||||
|
||||
### Implémentation suggérée
|
||||
- Ajouter `prometheus_client` ou générer le format texte manuellement
|
||||
- Endpoint `/metrics` sur le même `HTTPServer` existant
|
||||
- Métriques stockées dans un dict thread-safe mis à jour après chaque cycle
|
||||
|
||||
---
|
||||
|
||||
## A10 — Normalisation des scores entre modèles
|
||||
|
||||
### Problème
|
||||
|
||||
Les scores `decision_function` de l'IF ne sont **pas comparables entre modèles** entraînés sur des données différentes. Un score de -0.10 sur le modèle Complet et -0.10 sur le modèle Applicatif n'ont pas la même signification si les baselines et les features sont différentes.
|
||||
|
||||
La déduplication actuelle par `src_ip` prend le score le plus bas sans tenir compte de cette non-comparabilité.
|
||||
|
||||
### Approche proposée
|
||||
|
||||
Normaliser les scores par rapport à la distribution des scores négatifs du cycle courant :
|
||||
|
||||
```python
|
||||
# Normalisation min-max sur le sous-ensemble des scores < 0
|
||||
neg_scores = unknown_traffic['anomaly_score'][unknown_traffic['anomaly_score'] < 0]
|
||||
if len(neg_scores) > 0:
|
||||
score_min, score_max = neg_scores.min(), neg_scores.max()
|
||||
unknown_traffic['normalized_score'] = (
|
||||
(unknown_traffic['anomaly_score'] - score_min) / (score_max - score_min + 1e-9)
|
||||
).clip(0, 1) * -1 # entre -1 et 0
|
||||
```
|
||||
|
||||
Les niveaux de menace seraient alors calculés sur le score normalisé, rendant la comparaison entre modèles cohérente.
|
||||
|
||||
### Bénéfices
|
||||
- Cohérence des niveaux CRITICAL/HIGH/MEDIUM entre modèles
|
||||
- Déduplication plus juste
|
||||
- Seuils de threat_level interprétables de façon constante
|
||||
|
||||
---
|
||||
|
||||
## Notes d'implémentation générales
|
||||
|
||||
- **Compatibilité** : toute amélioration doit rester rétrocompatible avec le schéma `ml_detected_anomalies` existant (ajout de colonnes optionnelles uniquement)
|
||||
- **Lisibilité** : garder le code en sections délimitées par les bandeaux `═══` existants
|
||||
- **Tests** : valider chaque changement par une exécution Docker sur la base de données réelle
|
||||
- **Documentation** : mettre à jour `DOCUMENTATION.md` après chaque implémentation
|
||||
- **Feature flags** : les nouvelles fonctionnalités comportementales devraient être activables via variable d'environnement pour un rollout progressif
|
||||
|
||||
---
|
||||
|
||||
# Nouvelles dimensions de features — Propositions B
|
||||
|
||||
> Propositions de features supplémentaires pour l'Isolation Forest, validées sur les données réelles de `mabase_prod`.
|
||||
> Chaque proposition indique la force du signal observée en base, la source de données, la formule de calcul et les références scientifiques.
|
||||
|
||||
## Résumé des signaux
|
||||
|
||||
| # | Feature | Signal observé | Modèle | Impact estimé |
|
||||
|---|---------|---------------|--------|--------------|
|
||||
| B1 | JA3/JA4 diversity ratio | 809 JA3 pour 2 JA4 (IP connue bot) | Complet | 🔴 Élevé |
|
||||
| B2 | SYN timing regularity | 386/3222 IPs (12%) avec variance=0 | Complet | 🔴 Élevé |
|
||||
| B3 | TLS 1.2 exclusive ratio | 136/3259 IPs (4%) — jamais TLS 1.3 | Complet | 🔴 Élevé |
|
||||
| B4 | HEAD method ratio | 67/3335 IPs (2%) à >50% HEAD | Les deux | 🟠 Moyen |
|
||||
| B5 | Sec-Fetch absence rate | Signal L7 universel (correlated=0 aussi) | Les deux | 🟠 Moyen |
|
||||
| B6 | Accept header entropy | Bots = Accept vide ou `*/*` constant | Les deux | 🟠 Moyen |
|
||||
| B7 | TLS version entropy | TLS 1.3 = 97.3% du trafic légitime | Complet | 🟠 Moyen |
|
||||
| B8 | HTTP/TLS protocol mismatch | HTTP/1.1 + TLS 1.3 = ratio anormal | Complet | 🟡 Utile |
|
||||
| B9 | IP DF-bit variance | DF inconsistant = stack spoofé | Complet | 🟡 Utile |
|
||||
| B10 | JA4 concentration intra-ASN | JA4 rare dans ASN = outil exotique | Complet | 🟡 Utile |
|
||||
|
||||
---
|
||||
|
||||
## B1 — JA3/JA4 Diversity Ratio (rotation de fingerprint TLS)
|
||||
|
||||
### Observation
|
||||
|
||||
```
|
||||
185.177.72.60 → 1619 JA3 distincts / 2 JA4 → ratio 809.5
|
||||
194.187.171.160 → 153 JA3 distincts / 2 JA4 → ratio 76.5
|
||||
```
|
||||
|
||||
Le JA4 reste stable (il encode le type de client TLS + ALPN) mais le JA3 varie massivement. C'est la signature d'un **bot qui randomise les extensions TLS** pour contourner la détection par fingerprint.
|
||||
|
||||
### Feature proposée
|
||||
|
||||
```sql
|
||||
-- Dans mv_agg_host_ip_ja4_1h
|
||||
uniqState(ja3) AS uniq_ja3 -- à ajouter dans la table d'agrégation
|
||||
```
|
||||
|
||||
```python
|
||||
# Dans view_ai_features_1h
|
||||
ja3_diversity_ratio = uniq_ja3 / greatest(uniq_ja4, 1)
|
||||
```
|
||||
|
||||
### Signal en base
|
||||
|
||||
- Trafic humain : ratio typiquement 1–3 (même navigateur, légères variations)
|
||||
- Bot avec rotation : ratio 17–809 → signal extrêmement discriminant
|
||||
- Disponible : `ja3` est présent dans `http_logs` avec 100% de valeurs non-vides pour correlated=1
|
||||
|
||||
### Modifications requises
|
||||
|
||||
1. Ajouter `uniqState(ja3) AS uniq_ja3` dans `mv_agg_host_ip_ja4_1h` et `agg_host_ip_ja4_1h`
|
||||
2. Ajouter `uniqMerge(uniq_ja3) / greatest(uniq_ja4_merged, 1) AS ja3_diversity_ratio` dans `view_ai_features_1h`
|
||||
3. Ajouter `ja3_diversity_ratio` à `feats_complet` dans `bot_detector.py`
|
||||
|
||||
### Références
|
||||
|
||||
- Siby et al. (2020) — *Encrypted DNS → Privacy? A Traffic Analysis Perspective* — méthodes de diversité de fingerprint
|
||||
- Anderson & McGrew (2016) — *Machine Learning for Encrypted Malware Traffic Classification* — JA3 comme feature primaire
|
||||
- Husák et al. (2022) — *TLS fingerprinting for bot detection* — rotation JA3 comme évasion signature
|
||||
|
||||
---
|
||||
|
||||
## B2 — SYN-to-ClientHello Timing Regularity
|
||||
|
||||
### Observation
|
||||
|
||||
```
|
||||
88.202.237.59 : 45 connexions, avg=22ms, std=0.00ms → timing robotique parfait
|
||||
92.184.144.129: 41 connexions, avg=10ms, std=0.00ms → idem
|
||||
386/3222 IPs analysées (12%) ont une variance=0
|
||||
```
|
||||
|
||||
Un humain présente une distribution aléatoire (Weibull ou log-normale) des temps de réponse réseau. Un bot utilisant un scheduler fixe ou une connexion locale a une variance proche de zéro.
|
||||
|
||||
### Feature proposée
|
||||
|
||||
```sql
|
||||
-- Dans view_ai_features_1h (CTE)
|
||||
varPopMerge(tcp_jitter_variance) AS syn_jitter_variance, -- déjà présent (tcp_jitter_variance)
|
||||
-- Ajouter le coefficient de variation (normalisé)
|
||||
```
|
||||
|
||||
```python
|
||||
# cv = std / mean → 0 = robotique, >0.5 = humain
|
||||
syn_timing_cv = sqrt(syn_jitter_variance) / greatest(avg_syn_ms, 1)
|
||||
```
|
||||
|
||||
**Note** : `tcp_jitter_variance` est déjà dans le modèle mais c'est la variance brute. Le **coefficient de variation** (std/mean) normalise par le délai moyen et est plus discriminant pour différencier bots rapides (10ms) de bots lents (100ms).
|
||||
|
||||
### Modifications requises
|
||||
|
||||
1. Ajouter `avg(syn_to_clienthello_ms)` dans `mv_agg_host_ip_ja4_1h` → `avg_syn_ms`
|
||||
2. Calculer `syn_timing_cv = sqrt(tcp_jitter_variance) / greatest(avg_syn_ms, 1)` dans `view_ai_features_1h`
|
||||
3. Ajouter `syn_timing_cv` à `feats_complet`
|
||||
|
||||
### Références
|
||||
|
||||
- Zeber et al. (2020) — *The Measurement of Web Timing* — distribution log-normale pour humains
|
||||
- Beugin et al. (2021) — *Robustness of Traffic Analysis Against Adversarial Timing* — variance comme discriminant
|
||||
- Stevanovic & Pedersen (2015) — *Detecting Bots Using Multi-level Traffic Analysis* — timing régularité = signal bot L4
|
||||
|
||||
---
|
||||
|
||||
## B3 — TLS 1.2 Exclusive Ratio
|
||||
|
||||
### Observation
|
||||
|
||||
```
|
||||
95.217.144.244 : 360/360 requêtes en TLS 1.2 (jamais TLS 1.3)
|
||||
37.65.177.201 : 267/267 requêtes en TLS 1.2
|
||||
136 IPs utilisent exclusivement TLS 1.2 sur 3259 analysées (4.2%)
|
||||
```
|
||||
|
||||
TLS 1.3 représente 97.3% du trafic en 2026. Les navigateurs modernes n'utilisent TLS 1.2 que comme fallback exceptionnel. Une IP utilisant **exclusivement** TLS 1.2 utilise un client obsolète, une bibliothèque custom, ou un outil de scan.
|
||||
|
||||
### Feature proposée
|
||||
|
||||
```sql
|
||||
-- Dans mv_agg_host_ip_ja4_1h
|
||||
sum(IF(tls_version = '1.2', 1, 0)) AS tls12_count -- nouveau
|
||||
-- tls_version déjà stockée via tls_alpn_raw → à distinguer ou ajouter
|
||||
```
|
||||
|
||||
```python
|
||||
# Dans view_ai_features_1h
|
||||
tls12_ratio = tls12_count / greatest(hits, 1)
|
||||
```
|
||||
|
||||
### Modifications requises
|
||||
|
||||
1. Ajouter `sum(IF(src.tls_version = '1.2', 1, 0)) AS tls12_count` dans `mv_agg_host_ip_ja4_1h`
|
||||
2. Ajouter `tls12_count` dans `agg_host_ip_ja4_1h`
|
||||
3. Calculer `tls12_count / hits AS tls12_ratio` dans `view_ai_features_1h`
|
||||
|
||||
### Références
|
||||
|
||||
- Kotzias et al. (2018) — *Coming of Age: A Longitudinal Study of TLS Deployment* — vieillissement des stacks
|
||||
- Naylor et al. (2014) — *The Cost of the S in HTTPS* — adoption TLS 1.3 par navigateurs légitimes
|
||||
- Cloudflare Radar 2024 — TLS 1.3 = 95%+ du trafic web mondial
|
||||
|
||||
---
|
||||
|
||||
## B4 — HEAD Method Ratio
|
||||
|
||||
### Observation
|
||||
|
||||
```
|
||||
34.140.199.84 : 11/12 requêtes HEAD (91.7%) → Google Cloud uptime checker
|
||||
67/3335 IPs ont >50% de requêtes HEAD
|
||||
```
|
||||
|
||||
La méthode HEAD est utilisée pour vérifier la disponibilité d'une ressource sans télécharger son contenu. C'est la signature des :
|
||||
- **Uptime checkers** (Pingdom, UptimeRobot, Google Cloud Health Check)
|
||||
- **Scanners de vulnérabilités** (Nikto, Nuclei)
|
||||
- **Bots de reconnaissance discrète**
|
||||
|
||||
### Feature proposée
|
||||
|
||||
```python
|
||||
# head_ratio = déjà calculable depuis count_post (method breakdown)
|
||||
# Ajouter dans mv_agg_host_ip_ja4_1h :
|
||||
count_head = sum(IF(method = 'HEAD', 1, 0))
|
||||
```
|
||||
|
||||
```python
|
||||
head_ratio = count_head / greatest(hits, 1)
|
||||
```
|
||||
|
||||
### Note : disponibilité dans les deux modèles
|
||||
|
||||
Contrairement aux features TCP, `head_ratio` est disponible pour `correlated=0` aussi — c'est une feature HTTP pure. À ajouter dans les deux listes `feats` et `feats_complet`.
|
||||
|
||||
### Références
|
||||
|
||||
- Barracuda Networks (2023) — *Bot Traffic Report* — HEAD requests pattern
|
||||
- OWASP Automated Threat Handbook — OAT-011: Scraping, OAT-018: Credential Stuffing
|
||||
|
||||
---
|
||||
|
||||
## B5 — Sec-Fetch Absence Rate
|
||||
|
||||
### Observation
|
||||
|
||||
Les headers `Sec-Fetch-Site`, `Sec-Fetch-Mode`, `Sec-Fetch-Dest` sont injectés par les navigateurs modernes (Chrome 76+, Firefox 90+) **automatiquement** depuis 2019. Leur absence est un signal de :
|
||||
- Client HTTP non-navigateur (curl, requests, Scrapy, headless Chrome sans headers complets)
|
||||
- Vieux navigateur ou UA spoofé
|
||||
- HTTP CONNECT proxy
|
||||
|
||||
### Feature proposée
|
||||
|
||||
```sql
|
||||
-- Dans mv_agg_host_ip_ja4_1h
|
||||
sum(IF(length(src.header_sec_fetch_site) = 0, 1, 0)) AS count_no_sec_fetch
|
||||
```
|
||||
|
||||
```python
|
||||
sec_fetch_absence_rate = count_no_sec_fetch / greatest(hits, 1)
|
||||
```
|
||||
|
||||
### Combinaison avec `modern_browser_score`
|
||||
|
||||
`sec_fetch_absence_rate` + `modern_browser_score` forment une paire complémentaire :
|
||||
- Bot avec UA Chrome forgé → `modern_browser_score` élevé mais `sec_fetch_absence_rate` = 1 → contradiction forte
|
||||
|
||||
### Modifications requises
|
||||
|
||||
1. `count_no_sec_fetch` dans le MV et la table
|
||||
2. Calcul dans la vue
|
||||
|
||||
### Références
|
||||
|
||||
- West & Loshbough (2019) — *Fetch Metadata Request Headers* (W3C Spec)
|
||||
- Invernizzi et al. (2016) — *CLOAK of Visibility* — inconsistance headers = bot
|
||||
|
||||
---
|
||||
|
||||
## B6 — Accept Header Entropy
|
||||
|
||||
### Observation
|
||||
|
||||
Les navigateurs légitimes envoient des headers `Accept` complexes et cohérents :
|
||||
```
|
||||
image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8
|
||||
```
|
||||
Les bots envoient :
|
||||
```
|
||||
*/* (curl, wget, Scrapy)
|
||||
(vide) (bots minimalistes)
|
||||
text/html (outils basiques)
|
||||
```
|
||||
|
||||
### Feature proposée
|
||||
|
||||
```python
|
||||
# Diversité des valeurs Accept par IP (proxy de comportement navigateur)
|
||||
accept_entropy = -sum(p * log2(p+1e-9) for p in accept_value_probs)
|
||||
|
||||
# Ou plus simplement : fraction de requêtes avec Accept générique/vide
|
||||
generic_accept_ratio = count_generic_accept / hits
|
||||
# où generic = longueur(Accept) < 10 ou Accept IN ('*/*', '')
|
||||
```
|
||||
|
||||
```sql
|
||||
sum(IF(length(src.header_accept) < 5, 1, 0)) AS count_generic_accept
|
||||
```
|
||||
|
||||
### Références
|
||||
|
||||
- Nikiforakis et al. (2013) — *Cookieless Monster: Exploring the Ecosystem of Web-based Device Fingerprinting* — Accept comme composant stable
|
||||
- Acar et al. (2014) — *The Web Never Forgets* — entropie des headers HTTP
|
||||
|
||||
---
|
||||
|
||||
## B7 — HTTP/TLS Protocol Version Mismatch
|
||||
|
||||
### Observation
|
||||
|
||||
```
|
||||
HTTP/2.0 → 160855 requêtes (84%)
|
||||
HTTP/1.1 → 26421 requêtes (14%)
|
||||
TLS 1.3 → 177330 requêtes (97%)
|
||||
```
|
||||
|
||||
HTTP/2 requiert TLS dans les navigateurs modernes. Combinaisons anormales :
|
||||
- HTTP/1.1 + TLS 1.3 : légitime mais rare pour les vrais navigateurs (eux font HTTP/2 si TLS 1.3)
|
||||
- HTTP/1.0 + TLS : extrêmement suspect (outil custom ou ancien bot)
|
||||
- HTTP/2 + TLS 1.2 : possible mais déclinant
|
||||
|
||||
### Feature proposée
|
||||
|
||||
```python
|
||||
# Fraction de requêtes avec HTTP/1.x malgré TLS 1.3 disponible
|
||||
http1_tls13_ratio = count_http1_with_tls13 / greatest(hits, 1)
|
||||
# http1_0_ratio = count_http10 / hits # signal fort
|
||||
```
|
||||
|
||||
```sql
|
||||
sum(IF(http_version = 'HTTP/1.0', 1, 0)) AS count_http10,
|
||||
sum(IF(http_version LIKE 'HTTP/1%' AND tls_version = '1.3', 1, 0)) AS count_http1_tls13
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## B8 — IP DF-Bit Consistency
|
||||
|
||||
### Observation
|
||||
|
||||
```
|
||||
df=1 : 172490 paquets (92%)
|
||||
df=0 : 15016 paquets (8%)
|
||||
```
|
||||
|
||||
Le bit "Don't Fragment" est généralement constant pour une session TCP donnée. Une IP qui alterne DF=0 et DF=1 au sein d'une même session, ou entre sessions, peut indiquer :
|
||||
- **Usurpation d'IP** (spoofed source packets dans un botnet)
|
||||
- **Stack TCP custom** (bots implémentant leur propre TCP)
|
||||
- **NAT traversal** avec réécriture de paquets
|
||||
|
||||
### Feature proposée
|
||||
|
||||
```python
|
||||
df_variance = stddev(ip_meta_df) per IP # 0 = cohérent, >0 = mélangé
|
||||
```
|
||||
|
||||
```sql
|
||||
varPop(toFloat64(ip_meta_df)) AS ip_df_variance
|
||||
```
|
||||
|
||||
Faible impact seul, mais utile en combinaison avec TTL variance pour le TCP fingerprinting multi-dimensional.
|
||||
|
||||
---
|
||||
|
||||
## Récapitulatif des modifications ClickHouse nécessaires
|
||||
|
||||
### Colonnes à ajouter dans `agg_host_ip_ja4_1h`
|
||||
|
||||
```sql
|
||||
ALTER TABLE mabase_prod.agg_host_ip_ja4_1h
|
||||
ADD COLUMN uniq_ja3 AggregateFunction(uniq, String),
|
||||
ADD COLUMN avg_syn_ms SimpleAggregateFunction(avg, Float64),
|
||||
ADD COLUMN tls12_count SimpleAggregateFunction(sum, UInt64),
|
||||
ADD COLUMN count_head SimpleAggregateFunction(sum, UInt64),
|
||||
ADD COLUMN count_no_sec_fetch SimpleAggregateFunction(sum, UInt64),
|
||||
ADD COLUMN count_generic_accept SimpleAggregateFunction(sum, UInt64),
|
||||
ADD COLUMN count_http10 SimpleAggregateFunction(sum, UInt64);
|
||||
```
|
||||
|
||||
### Nouvelles features dans `view_ai_features_1h`
|
||||
|
||||
| Feature | Formule | Modèle |
|
||||
|---------|---------|--------|
|
||||
| `ja3_diversity_ratio` | `uniq_ja3 / greatest(uniq_ja4, 1)` | Complet |
|
||||
| `syn_timing_cv` | `sqrt(tcp_jitter_variance) / greatest(avg_syn_ms, 1)` | Complet |
|
||||
| `tls12_ratio` | `tls12_count / greatest(hits, 1)` | Complet |
|
||||
| `head_ratio` | `count_head / greatest(hits, 1)` | Les deux |
|
||||
| `sec_fetch_absence_rate` | `count_no_sec_fetch / greatest(hits, 1)` | Les deux |
|
||||
| `generic_accept_ratio` | `count_generic_accept / greatest(hits, 1)` | Les deux |
|
||||
| `http10_ratio` | `count_http10 / greatest(hits, 1)` | Les deux |
|
||||
|
||||
> ⚠️ Les colonnes ajoutées par ALTER ne sont pas rétro-alimentées dans les données historiques. Un backfill depuis `http_logs` sera nécessaire.
|
||||
> ⚠️ La MV `mv_agg_host_ip_ja4_1h` doit être **recréée** (pas de ALTER sur une MV) pour inclure les nouveaux champs.
|
||||
|
||||
339
services/bot-detector/anubis/deploy_schema.sql
Normal file
339
services/bot-detector/anubis/deploy_schema.sql
Normal file
@ -0,0 +1,339 @@
|
||||
-- ============================================================================
|
||||
-- ANUBIS CRAWLER RULES — Labeling des http_logs + pipeline ML
|
||||
-- Architecture :
|
||||
-- anubis_ua_rules (table) → dict_anubis_ua (REGEXP_TREE)
|
||||
-- anubis_ip_rules (table) → dict_anubis_ip (IP_TRIE)
|
||||
-- http_logs : +anubis_bot_name, +anubis_bot_action
|
||||
-- mv_http_logs : reconstruit avec enrichissement Anubis
|
||||
-- view_ai_features_1h : +anubis_bot_name, +anubis_bot_action (via dictGet)
|
||||
-- ml_detected_anomalies : +anubis_bot_name, +anubis_bot_action
|
||||
-- ml_all_scores : +anubis_bot_name, +anubis_bot_action
|
||||
-- ============================================================================
|
||||
|
||||
-- ----------------------------------------------------------------------------
|
||||
-- 1. TABLE SOURCE — règles User-Agent (pour dictionnaire REGEXP_TREE)
|
||||
--
|
||||
-- Format attendu par ClickHouse regexp_tree (v23.5+) :
|
||||
-- id UInt64 : identifiant unique
|
||||
-- parent_id UInt64 : 0 = racine, sinon id du parent (héritage d'attributs)
|
||||
-- regexp String : expression régulière (re2/vectorscan)
|
||||
-- keys Array(String) : noms des attributs, ex. ['bot_name', 'action']
|
||||
-- values Array(String) : valeurs correspondantes
|
||||
--
|
||||
-- Hiérarchie utilisée pour la priorité :
|
||||
-- Règles génériques DENY (parent_id=0) → enfants ALLOW spécifiques
|
||||
-- Exemple : ai-crawlers-training (parent) → openai-gptbot (enfant)
|
||||
-- Quand l'UA correspond à enfant ET parent, c'est le nom de l'enfant qui
|
||||
-- est retourné (l'enfant hérite ET surcharge les attributs du parent).
|
||||
-- ----------------------------------------------------------------------------
|
||||
CREATE TABLE IF NOT EXISTS mabase_prod.anubis_ua_rules
|
||||
(
|
||||
id UInt64,
|
||||
parent_id UInt64,
|
||||
regexp String,
|
||||
keys Array(String),
|
||||
values Array(String)
|
||||
)
|
||||
ENGINE = ReplacingMergeTree()
|
||||
ORDER BY id;
|
||||
|
||||
-- ----------------------------------------------------------------------------
|
||||
-- 2. TABLE SOURCE — règles IP/CIDR (pour dictionnaire IP_TRIE)
|
||||
--
|
||||
-- Colonnes requises par dict_anubis_ip et mv_http_logs :
|
||||
-- rule_id : identifiant de règle, croisé avec dict_anubis_ua pour
|
||||
-- la logique UA+IP (même rule_id → match combiné)
|
||||
-- has_ua : 1 si la règle possède aussi une regex UA (croisement nécessaire)
|
||||
-- category : catégorie Anubis (bots, crawlers, clients, policies…)
|
||||
-- ----------------------------------------------------------------------------
|
||||
CREATE TABLE IF NOT EXISTS mabase_prod.anubis_ip_rules
|
||||
(
|
||||
prefix String,
|
||||
bot_name LowCardinality(String),
|
||||
action LowCardinality(String),
|
||||
rule_id UInt64,
|
||||
has_ua UInt8,
|
||||
category LowCardinality(String)
|
||||
)
|
||||
ENGINE = ReplacingMergeTree()
|
||||
ORDER BY prefix;
|
||||
|
||||
-- ----------------------------------------------------------------------------
|
||||
-- 3. DICTIONNAIRE UA — REGEXP_TREE
|
||||
-- dictGet('mabase_prod.dict_anubis_ua', 'bot_name', header_user_agent)
|
||||
--
|
||||
-- Le PRIMARY KEY est 'regexp' (String) — requis par ClickHouse 26.x.
|
||||
-- Connexion interne (HOST localhost PORT 9000) pour éviter deadlock HTTP.
|
||||
-- Remplacer 'admin' et le mot de passe par les credentials ClickHouse.
|
||||
-- ----------------------------------------------------------------------------
|
||||
DROP DICTIONARY IF EXISTS mabase_prod.dict_anubis_ua;
|
||||
CREATE DICTIONARY mabase_prod.dict_anubis_ua
|
||||
(
|
||||
regexp String,
|
||||
bot_name String,
|
||||
action String
|
||||
)
|
||||
PRIMARY KEY regexp
|
||||
SOURCE(CLICKHOUSE(HOST 'localhost' PORT 9000 USER 'admin' PASSWORD 'CHANGE_ME' DB 'mabase_prod' TABLE 'anubis_ua_rules'))
|
||||
LAYOUT(REGEXP_TREE)
|
||||
LIFETIME(MIN 300 MAX 600);
|
||||
|
||||
-- ----------------------------------------------------------------------------
|
||||
-- 4. DICTIONNAIRE IP — IP_TRIE
|
||||
-- dictGetOrDefault('mabase_prod.dict_anubis_ip', 'bot_name', toIPv6(src_ip), '')
|
||||
-- Connexion interne (HOST localhost PORT 9000) — même raison que dict_anubis_ua.
|
||||
-- ----------------------------------------------------------------------------
|
||||
DROP DICTIONARY IF EXISTS mabase_prod.dict_anubis_ip;
|
||||
CREATE DICTIONARY mabase_prod.dict_anubis_ip
|
||||
(
|
||||
prefix String,
|
||||
bot_name String,
|
||||
action String,
|
||||
rule_id UInt64,
|
||||
has_ua UInt8,
|
||||
category String
|
||||
)
|
||||
PRIMARY KEY prefix
|
||||
SOURCE(CLICKHOUSE(HOST 'localhost' PORT 9000 USER 'admin' PASSWORD 'CHANGE_ME' DB 'mabase_prod' TABLE 'anubis_ip_rules'))
|
||||
LAYOUT(IP_TRIE())
|
||||
LIFETIME(MIN 300 MAX 600);
|
||||
|
||||
-- ----------------------------------------------------------------------------
|
||||
-- 5. TABLE SOURCE — règles ASN (pour dictionnaire Flat)
|
||||
-- Alimentée par botPolicies.yaml via fetch_rules.py → insert_asn_rules()
|
||||
-- ----------------------------------------------------------------------------
|
||||
CREATE TABLE IF NOT EXISTS mabase_prod.anubis_asn_rules
|
||||
(
|
||||
asn UInt32,
|
||||
bot_name LowCardinality(String),
|
||||
action LowCardinality(String),
|
||||
category LowCardinality(String)
|
||||
)
|
||||
ENGINE = ReplacingMergeTree()
|
||||
ORDER BY asn;
|
||||
|
||||
-- ----------------------------------------------------------------------------
|
||||
-- 6. TABLE SOURCE — règles pays ISO-3166 (pour dictionnaire Flat)
|
||||
-- Alimentée par botPolicies.yaml via fetch_rules.py → insert_country_rules()
|
||||
-- ----------------------------------------------------------------------------
|
||||
CREATE TABLE IF NOT EXISTS mabase_prod.anubis_country_rules
|
||||
(
|
||||
country_code LowCardinality(String),
|
||||
bot_name LowCardinality(String),
|
||||
action LowCardinality(String),
|
||||
category LowCardinality(String)
|
||||
)
|
||||
ENGINE = ReplacingMergeTree()
|
||||
ORDER BY country_code;
|
||||
|
||||
-- ----------------------------------------------------------------------------
|
||||
-- 7. DICTIONNAIRE ASN — Flat
|
||||
-- dictGetOrDefault('mabase_prod.dict_anubis_asn', 'bot_name', src_asn, '')
|
||||
-- ----------------------------------------------------------------------------
|
||||
DROP DICTIONARY IF EXISTS mabase_prod.dict_anubis_asn;
|
||||
CREATE DICTIONARY mabase_prod.dict_anubis_asn
|
||||
(
|
||||
asn UInt32,
|
||||
bot_name String,
|
||||
action String,
|
||||
category String
|
||||
)
|
||||
PRIMARY KEY asn
|
||||
SOURCE(CLICKHOUSE(HOST 'localhost' PORT 9000 USER 'admin' PASSWORD 'CHANGE_ME' DB 'mabase_prod' TABLE 'anubis_asn_rules'))
|
||||
LAYOUT(FLAT())
|
||||
LIFETIME(MIN 300 MAX 600);
|
||||
|
||||
-- ----------------------------------------------------------------------------
|
||||
-- 8. DICTIONNAIRE PAYS — Flat
|
||||
-- dictGetOrDefault('mabase_prod.dict_anubis_country', 'bot_name', src_country_code, '')
|
||||
-- ----------------------------------------------------------------------------
|
||||
DROP DICTIONARY IF EXISTS mabase_prod.dict_anubis_country;
|
||||
CREATE DICTIONARY mabase_prod.dict_anubis_country
|
||||
(
|
||||
country_code String,
|
||||
bot_name String,
|
||||
action String,
|
||||
category String
|
||||
)
|
||||
PRIMARY KEY country_code
|
||||
SOURCE(CLICKHOUSE(HOST 'localhost' PORT 9000 USER 'admin' PASSWORD 'CHANGE_ME' DB 'mabase_prod' TABLE 'anubis_country_rules'))
|
||||
LAYOUT(FLAT())
|
||||
LIFETIME(MIN 300 MAX 600);
|
||||
|
||||
-- ----------------------------------------------------------------------------
|
||||
-- 9. AJOUT DES COLONNES ANUBIS dans http_logs
|
||||
-- Idempotent : ne plante pas si déjà présentes
|
||||
-- ----------------------------------------------------------------------------
|
||||
ALTER TABLE mabase_prod.http_logs
|
||||
ADD COLUMN IF NOT EXISTS anubis_bot_name LowCardinality(String) DEFAULT '',
|
||||
ADD COLUMN IF NOT EXISTS anubis_bot_action LowCardinality(String) DEFAULT '',
|
||||
ADD COLUMN IF NOT EXISTS anubis_bot_category LowCardinality(String) DEFAULT '';
|
||||
|
||||
-- ----------------------------------------------------------------------------
|
||||
-- 10. RECONSTRUCTION DE mv_http_logs avec enrichissement Anubis
|
||||
-- Logique de priorisation :
|
||||
-- 1. UA regex (plus informatif — identifie le bot précis)
|
||||
-- 2. IP/CIDR (fallback — identifie le réseau cloud)
|
||||
-- ----------------------------------------------------------------------------
|
||||
DROP VIEW IF EXISTS mabase_prod.mv_http_logs;
|
||||
|
||||
CREATE MATERIALIZED VIEW mabase_prod.mv_http_logs
|
||||
TO mabase_prod.http_logs
|
||||
(
|
||||
`time` DateTime,
|
||||
`log_date` Date,
|
||||
`src_ip` IPv4,
|
||||
`src_port` UInt16,
|
||||
`src_asn` UInt32,
|
||||
`src_country_code` String,
|
||||
`dst_ip` IPv4,
|
||||
`dst_port` UInt16,
|
||||
`src_as_name` String,
|
||||
`src_org` String,
|
||||
`src_domain` String,
|
||||
`method` String,
|
||||
`scheme` String,
|
||||
`host` String,
|
||||
`path` String,
|
||||
`query` String,
|
||||
`http_version` String,
|
||||
`orphan_side` String,
|
||||
`correlated` UInt8,
|
||||
`keepalives` UInt16,
|
||||
`a_timestamp` UInt64,
|
||||
`b_timestamp` UInt64,
|
||||
`conn_id` String,
|
||||
`ip_meta_df` UInt8,
|
||||
`ip_meta_id` UInt16,
|
||||
`ip_meta_total_length` UInt16,
|
||||
`ip_meta_ttl` UInt8,
|
||||
`tcp_meta_options` String,
|
||||
`tcp_meta_window_size` UInt32,
|
||||
`tcp_meta_mss` UInt16,
|
||||
`tcp_meta_window_scale` UInt8,
|
||||
`syn_to_clienthello_ms` Int32,
|
||||
`tls_version` String,
|
||||
`tls_sni` String,
|
||||
`tls_alpn` String,
|
||||
`ja3` String,
|
||||
`ja3_hash` String,
|
||||
`ja4` String,
|
||||
`client_headers` String,
|
||||
`header_user_agent` String,
|
||||
`header_accept` String,
|
||||
`header_accept_encoding` String,
|
||||
`header_accept_language` String,
|
||||
`header_content_type` String,
|
||||
`header_x_request_id` String,
|
||||
`header_x_trace_id` String,
|
||||
`header_x_forwarded_for` String,
|
||||
`header_sec_ch_ua` String,
|
||||
`header_sec_ch_ua_mobile` String,
|
||||
`header_sec_ch_ua_platform` String,
|
||||
`header_sec_fetch_dest` String,
|
||||
`header_sec_fetch_mode` String,
|
||||
`header_sec_fetch_site` String,
|
||||
`anubis_bot_name` String,
|
||||
`anubis_bot_action` String
|
||||
)
|
||||
AS SELECT
|
||||
parseDateTimeBestEffort(coalesce(JSONExtractString(raw_json, 'time'), '1970-01-01T00:00:00Z')) AS time,
|
||||
toDate(time) AS log_date,
|
||||
toIPv4(coalesce(JSONExtractString(raw_json, 'src_ip'), '0.0.0.0')) AS src_ip,
|
||||
toUInt16(coalesce(JSONExtractUInt(raw_json, 'src_port'), 0)) AS src_port,
|
||||
dictGetOrDefault('mabase_prod.dict_iplocate_asn', 'asn', toIPv6(src_ip), toUInt32(0)) AS src_asn,
|
||||
dictGetOrDefault('mabase_prod.dict_iplocate_asn', 'country_code', toIPv6(src_ip), '') AS src_country_code,
|
||||
toIPv4(coalesce(JSONExtractString(raw_json, 'dst_ip'), '0.0.0.0')) AS dst_ip,
|
||||
toUInt16(coalesce(JSONExtractUInt(raw_json, 'dst_port'), 0)) AS dst_port,
|
||||
dictGetOrDefault('mabase_prod.dict_iplocate_asn', 'name', toIPv6(src_ip), '') AS src_as_name,
|
||||
dictGetOrDefault('mabase_prod.dict_iplocate_asn', 'org', toIPv6(src_ip), '') AS src_org,
|
||||
dictGetOrDefault('mabase_prod.dict_iplocate_asn', 'domain', toIPv6(src_ip), '') AS src_domain,
|
||||
coalesce(JSONExtractString(raw_json, 'method'), '') AS method,
|
||||
coalesce(JSONExtractString(raw_json, 'scheme'), '') AS scheme,
|
||||
coalesce(JSONExtractString(raw_json, 'host'), '') AS host,
|
||||
coalesce(JSONExtractString(raw_json, 'path'), '') AS path,
|
||||
coalesce(JSONExtractString(raw_json, 'query'), '') AS query,
|
||||
coalesce(JSONExtractString(raw_json, 'http_version'), '') AS http_version,
|
||||
coalesce(JSONExtractString(raw_json, 'orphan_side'), '') AS orphan_side,
|
||||
toUInt8(coalesce(JSONExtractBool(raw_json, 'correlated'), 0)) AS correlated,
|
||||
toUInt16(coalesce(JSONExtractUInt(raw_json, 'keepalives'), 0)) AS keepalives,
|
||||
coalesce(JSONExtractUInt(raw_json, 'a_timestamp'), 0) AS a_timestamp,
|
||||
coalesce(JSONExtractUInt(raw_json, 'b_timestamp'), 0) AS b_timestamp,
|
||||
coalesce(JSONExtractString(raw_json, 'conn_id'), '') AS conn_id,
|
||||
toUInt8(coalesce(JSONExtractBool(raw_json, 'ip_meta_df'), 0)) AS ip_meta_df,
|
||||
toUInt16(coalesce(JSONExtractUInt(raw_json, 'ip_meta_id'), 0)) AS ip_meta_id,
|
||||
toUInt16(coalesce(JSONExtractUInt(raw_json, 'ip_meta_total_length'), 0)) AS ip_meta_total_length,
|
||||
toUInt8(coalesce(JSONExtractUInt(raw_json, 'ip_meta_ttl'), 0)) AS ip_meta_ttl,
|
||||
coalesce(JSONExtractString(raw_json, 'tcp_meta_options'), '') AS tcp_meta_options,
|
||||
toUInt32(coalesce(JSONExtractUInt(raw_json, 'tcp_meta_window_size'), 0)) AS tcp_meta_window_size,
|
||||
toUInt16(coalesce(JSONExtractUInt(raw_json, 'tcp_meta_mss'), 0)) AS tcp_meta_mss,
|
||||
toUInt8(coalesce(JSONExtractUInt(raw_json, 'tcp_meta_window_scale'), 0)) AS tcp_meta_window_scale,
|
||||
toInt32(coalesce(JSONExtractInt(raw_json, 'syn_to_clienthello_ms'), 0)) AS syn_to_clienthello_ms,
|
||||
coalesce(JSONExtractString(raw_json, 'tls_version'), '') AS tls_version,
|
||||
coalesce(JSONExtractString(raw_json, 'tls_sni'), '') AS tls_sni,
|
||||
coalesce(JSONExtractString(raw_json, 'tls_alpn'), '') AS tls_alpn,
|
||||
coalesce(JSONExtractString(raw_json, 'ja3'), '') AS ja3,
|
||||
coalesce(JSONExtractString(raw_json, 'ja3_hash'), '') AS ja3_hash,
|
||||
coalesce(JSONExtractString(raw_json, 'ja4'), '') AS ja4,
|
||||
coalesce(JSONExtractString(raw_json, 'client_headers'), '') AS client_headers,
|
||||
coalesce(JSONExtractString(raw_json, 'header_User-Agent'), '') AS header_user_agent,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Accept'), '') AS header_accept,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Accept-Encoding'), '') AS header_accept_encoding,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Accept-Language'), '') AS header_accept_language,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Content-Type'), '') AS header_content_type,
|
||||
coalesce(JSONExtractString(raw_json, 'header_X-Request-Id'), '') AS header_x_request_id,
|
||||
coalesce(JSONExtractString(raw_json, 'header_X-Trace-Id'), '') AS header_x_trace_id,
|
||||
coalesce(JSONExtractString(raw_json, 'header_X-Forwarded-For'), '') AS header_x_forwarded_for,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Sec-CH-UA'), '') AS header_sec_ch_ua,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Sec-CH-UA-Mobile'), '') AS header_sec_ch_ua_mobile,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Sec-CH-UA-Platform'), '') AS header_sec_ch_ua_platform,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Sec-Fetch-Dest'), '') AS header_sec_fetch_dest,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Sec-Fetch-Mode'), '') AS header_sec_fetch_mode,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Sec-Fetch-Site'), '') AS header_sec_fetch_site,
|
||||
-- ── Enrichissement Anubis ────────────────────────────────────────────────
|
||||
-- Priorité : UA regex > IP/CIDR (UA identifie précisément le bot)
|
||||
COALESCE(
|
||||
nullIf(dictGet('mabase_prod.dict_anubis_ua', 'bot_name',
|
||||
coalesce(JSONExtractString(raw_json, 'header_User-Agent'), '')), ''),
|
||||
nullIf(dictGetOrDefault('mabase_prod.dict_anubis_ip', 'bot_name',
|
||||
toIPv6(toIPv4(coalesce(JSONExtractString(raw_json, 'src_ip'), '0.0.0.0'))), ''), ''),
|
||||
''
|
||||
) AS anubis_bot_name,
|
||||
COALESCE(
|
||||
nullIf(dictGet('mabase_prod.dict_anubis_ua', 'action',
|
||||
coalesce(JSONExtractString(raw_json, 'header_User-Agent'), '')), ''),
|
||||
nullIf(dictGetOrDefault('mabase_prod.dict_anubis_ip', 'action',
|
||||
toIPv6(toIPv4(coalesce(JSONExtractString(raw_json, 'src_ip'), '0.0.0.0'))), ''), ''),
|
||||
''
|
||||
) AS anubis_bot_action
|
||||
FROM mabase_prod.http_logs_raw;
|
||||
|
||||
-- ============================================================================
|
||||
-- INTÉGRATION ML — Propagation Anubis vers le pipeline bot_detector
|
||||
-- ============================================================================
|
||||
|
||||
-- ----------------------------------------------------------------------------
|
||||
-- 11. COLONNES ANUBIS dans ml_detected_anomalies
|
||||
-- ----------------------------------------------------------------------------
|
||||
ALTER TABLE mabase_prod.ml_detected_anomalies
|
||||
ADD COLUMN IF NOT EXISTS anubis_bot_name LowCardinality(String) DEFAULT '',
|
||||
ADD COLUMN IF NOT EXISTS anubis_bot_action LowCardinality(String) DEFAULT '',
|
||||
ADD COLUMN IF NOT EXISTS anubis_bot_category LowCardinality(String) DEFAULT '';
|
||||
|
||||
-- ----------------------------------------------------------------------------
|
||||
-- 12. COLONNES ANUBIS dans ml_all_scores
|
||||
-- ----------------------------------------------------------------------------
|
||||
ALTER TABLE mabase_prod.ml_all_scores
|
||||
ADD COLUMN IF NOT EXISTS anubis_bot_name LowCardinality(String) DEFAULT '',
|
||||
ADD COLUMN IF NOT EXISTS anubis_bot_action LowCardinality(String) DEFAULT '',
|
||||
ADD COLUMN IF NOT EXISTS anubis_bot_category LowCardinality(String) DEFAULT '';
|
||||
|
||||
-- ----------------------------------------------------------------------------
|
||||
-- 13. VIEW view_ai_features_1h — Enrichissement Anubis
|
||||
-- Ajoute anubis_bot_name et anubis_bot_action via dictGet.
|
||||
-- Priorité : UA regex (first_ua → dict_anubis_ua) > IP/CIDR (src_ip → dict_anubis_ip)
|
||||
-- Voir le fichier complet dans /tmp/update_view_ai_features.sql ou recréer
|
||||
-- avec CREATE OR REPLACE VIEW après avoir appliqué les étapes précédentes.
|
||||
-- ----------------------------------------------------------------------------
|
||||
-- NOTE : Exécuter le contenu de /tmp/update_view_ai_features.sql ici (trop long).
|
||||
-- Ou lancer depuis le repo : psql -f bot_detector/anubis/view_ai_features_anubis.sql
|
||||
486
services/bot-detector/anubis/fetch_rules.py
Normal file
486
services/bot-detector/anubis/fetch_rules.py
Normal file
@ -0,0 +1,486 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
fetch_rules.py — Récupère TOUTES les règles Anubis depuis GitHub et les insère dans ClickHouse.
|
||||
|
||||
Sources :
|
||||
- data/bots/**/*.yaml (bots pathologiques, IA, IRC)
|
||||
- data/crawlers/*.yaml (crawlers légitimes et clouds)
|
||||
- data/clients/*.yaml (clients IA agissant pour utilisateurs)
|
||||
- data/common/*.yaml (règles communes : IPs privées, etc.)
|
||||
- data/botPolicies.yaml (règles ASN et pays inline)
|
||||
|
||||
Usage (depuis le container dashboard_web) :
|
||||
python /tmp/fetch_rules.py
|
||||
|
||||
Variables d'environnement :
|
||||
CLICKHOUSE_HOST, CLICKHOUSE_DB, CLICKHOUSE_USER, CLICKHOUSE_PASSWORD
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
import urllib.request
|
||||
import urllib.error
|
||||
|
||||
try:
|
||||
import yaml
|
||||
except ImportError:
|
||||
print("[ERREUR] pyyaml manquant.", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
try:
|
||||
import clickhouse_connect
|
||||
except ImportError:
|
||||
print("[ERREUR] clickhouse-connect manquant.", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
# ──────────────────────────────────────────────────────────────────────────────
|
||||
# Config
|
||||
# ──────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
GITHUB_API = "https://api.github.com/repos/TecharoHQ/anubis/contents"
|
||||
GITHUB_RAW = "https://raw.githubusercontent.com/TecharoHQ/anubis/main"
|
||||
|
||||
# Répertoires à parcourir — ORDER CRITIQUE pour REGEXP_TREE :
|
||||
# Dans REGEXP_TREE (root-level rules), la règle avec l'ID le plus bas gagne quand plusieurs matchent.
|
||||
# → Les règles SPÉCIFIQUES doivent être chargées en PREMIER (IDs bas) pour gagner sur les catch-alls.
|
||||
# → Les catch-alls (ai-robots-txt, ai-catchall) doivent être chargés en DERNIER (IDs hauts).
|
||||
#
|
||||
# Au sein de chaque répertoire, les fichiers sont triés EN ORDRE ALPHABÉTIQUE INVERSÉ
|
||||
# pour que les règles spécifiques (noms longs) aient des IDs plus bas que les catch-alls (ai.yaml).
|
||||
DIRECTORIES = [
|
||||
("data/clients", "clients"), # Règles AI clients avec IP (openai-chatgpt-user, etc.)
|
||||
("data/bots/irc-bots", "bots/irc-bots"), # Bots IRC spécifiques
|
||||
("data/crawlers", "crawlers"), # Crawlers spécifiques + clouds
|
||||
("data/common", "common"), # IPs privées, routes communes
|
||||
("data/bots", "bots"), # Catch-alls larges (ai-robots-txt, ai-catchall) — LAST
|
||||
]
|
||||
|
||||
# Fichier de politique principal (règles ASN + pays inline)
|
||||
BOT_POLICIES_PATH = "data/botPolicies.yaml"
|
||||
|
||||
# UA_PARENT_OVERRIDE : mapping nom_règle → nom_parent pour forcer la hiérarchie REGEXP_TREE.
|
||||
# Conservé vide intentionnellement : l'ordre de chargement (spécifique avant catch-all)
|
||||
# garantit la priorité sans hiérarchie parent_id explicite.
|
||||
# Populer ce dict si une règle doit hériter d'une autre via parent_id dans REGEXP_TREE.
|
||||
UA_PARENT_OVERRIDE: dict[str, str] = {}
|
||||
|
||||
|
||||
# ──────────────────────────────────────────────────────────────────────────────
|
||||
# HTTP helpers
|
||||
# ──────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
def _fetch_url(url: str, timeout: int = 15) -> str | None:
|
||||
try:
|
||||
with urllib.request.urlopen(url, timeout=timeout) as resp:
|
||||
return resp.read().decode("utf-8")
|
||||
except urllib.error.URLError as e:
|
||||
print(f"[WARN] {url}: {e}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
|
||||
def fetch_yaml_url(url: str) -> list | dict | None:
|
||||
content = _fetch_url(url)
|
||||
if content:
|
||||
return yaml.safe_load(content)
|
||||
return None
|
||||
|
||||
|
||||
def list_yaml_files(api_path: str) -> list[str]:
|
||||
"""
|
||||
Retourne la liste des raw URLs des fichiers .yaml/.yml dans api_path via l'API GitHub.
|
||||
Les fichiers sont triés en ordre ALPHABÉTIQUE INVERSÉ pour que les règles spécifiques
|
||||
(noms longs, ex: openai-chatgpt-user.yaml) aient un ID inférieur aux catch-alls (ai.yaml).
|
||||
"""
|
||||
content = _fetch_url(f"{GITHUB_API}/{api_path}")
|
||||
if not content:
|
||||
return []
|
||||
try:
|
||||
entries = json.loads(content)
|
||||
except json.JSONDecodeError:
|
||||
return []
|
||||
files = [
|
||||
entry for entry in entries
|
||||
if entry.get("type") == "file" and entry.get("name", "").endswith((".yaml", ".yml"))
|
||||
]
|
||||
# Tri inverse : les noms longs (spécifiques) avant les noms courts (catch-alls)
|
||||
files.sort(key=lambda e: e["name"], reverse=True)
|
||||
return [f["download_url"] for f in files]
|
||||
|
||||
|
||||
# ──────────────────────────────────────────────────────────────────────────────
|
||||
# Extraction des patterns UA depuis les expressions CEL-like
|
||||
# ──────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
def _extract_ua_from_all(conditions: list) -> str | None:
|
||||
"""Extrait une regex UA depuis une expression 'all' (ex: yandexbot userAgent.matches)."""
|
||||
for cond in conditions:
|
||||
if not isinstance(cond, str):
|
||||
continue
|
||||
m = re.search(r'userAgent\.matches\("(.+?)"\)', cond)
|
||||
if m:
|
||||
return m.group(1).replace("\\\\", "\\")
|
||||
return None
|
||||
|
||||
|
||||
def _extract_ua_from_any(conditions: list) -> str | None:
|
||||
"""
|
||||
Extrait une regex UA depuis une expression 'any' avec userAgent.contains(...)
|
||||
Exemple : aggressive-brazilian-scrapers.yaml
|
||||
Retourne une regex en OR : MSIE|Trident|...
|
||||
"""
|
||||
patterns = []
|
||||
for cond in conditions:
|
||||
if not isinstance(cond, str):
|
||||
continue
|
||||
m = re.search(r'userAgent\.contains\("(.+?)"\)', cond)
|
||||
if m:
|
||||
patterns.append(re.escape(m.group(1)))
|
||||
if patterns:
|
||||
return "|".join(patterns)
|
||||
return None
|
||||
|
||||
|
||||
def extract_ua_regex(rule: dict) -> str | None:
|
||||
"""Extrait la regex User-Agent depuis toutes les formes possibles."""
|
||||
# Forme directe
|
||||
if ua := rule.get("user_agent_regex"):
|
||||
return ua.strip()
|
||||
|
||||
expr = rule.get("expression")
|
||||
if not expr:
|
||||
return None
|
||||
|
||||
# Expression scalaire (CEL string)
|
||||
if isinstance(expr, str):
|
||||
m = re.search(r'userAgent\.matches\("(.+?)"\)', expr)
|
||||
if m:
|
||||
return m.group(1).replace("\\\\", "\\")
|
||||
m = re.search(r'userAgent\.contains\("(.+?)"\)', expr)
|
||||
if m:
|
||||
return re.escape(m.group(1))
|
||||
return None
|
||||
|
||||
# Expression structurée dict
|
||||
if isinstance(expr, dict):
|
||||
if ua := _extract_ua_from_all(expr.get("all", [])):
|
||||
return ua
|
||||
if ua := _extract_ua_from_any(expr.get("any", [])):
|
||||
return ua
|
||||
|
||||
return None
|
||||
|
||||
|
||||
# ──────────────────────────────────────────────────────────────────────────────
|
||||
# Parse des fichiers YAML
|
||||
# ──────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
def parse_file(
|
||||
url: str,
|
||||
category: str,
|
||||
ua_name_to_id: dict,
|
||||
ua_id_counter_ref: list, # [int] — compteur mutable partagé entre appels
|
||||
rule_id_counter_ref: list, # [int] — idem
|
||||
) -> tuple[list[dict], list[dict]]:
|
||||
"""
|
||||
Parse un fichier YAML Anubis.
|
||||
Retourne (ua_rules, ip_rules).
|
||||
|
||||
Note : ua_name_to_id est maintenu pour supporter la hiérarchie parent_id dans
|
||||
REGEXP_TREE (via UA_PARENT_OVERRIDE). Tant que UA_PARENT_OVERRIDE est vide,
|
||||
parent_id vaut toujours 0 et ua_name_to_id n'est pas consulté en pratique.
|
||||
"""
|
||||
data = fetch_yaml_url(url)
|
||||
if not data or not isinstance(data, list):
|
||||
return [], []
|
||||
|
||||
ua_rules, ip_rules = [], []
|
||||
|
||||
for rule in data:
|
||||
if not isinstance(rule, dict):
|
||||
continue
|
||||
# Ignorer les imports (références à d'autres fichiers)
|
||||
if "import" in rule:
|
||||
continue
|
||||
|
||||
name = rule.get("name", "").strip()
|
||||
action = rule.get("action", "").strip()
|
||||
if not name or not action:
|
||||
continue
|
||||
|
||||
remote_addrs = [str(c).strip() for c in rule.get("remote_addresses", []) if c]
|
||||
has_ip = bool(remote_addrs)
|
||||
|
||||
rule_id = rule_id_counter_ref[0]
|
||||
rule_id_counter_ref[0] += 1
|
||||
|
||||
# ── User-Agent regex ─────────────────────────────────────────────────
|
||||
ua_regex = extract_ua_regex(rule)
|
||||
if ua_regex:
|
||||
parent_name = UA_PARENT_OVERRIDE.get(name)
|
||||
parent_id = ua_name_to_id.get(parent_name, 0) if parent_name else 0
|
||||
|
||||
uid = ua_id_counter_ref[0]
|
||||
ua_id_counter_ref[0] += 1
|
||||
ua_name_to_id[name] = uid
|
||||
|
||||
ua_rules.append({
|
||||
"id": uid,
|
||||
"parent_id": parent_id,
|
||||
"regexp": ua_regex,
|
||||
"bot_name": name,
|
||||
"action": action,
|
||||
"has_ip": "1" if has_ip else "0",
|
||||
"rule_id": str(rule_id),
|
||||
"category": category,
|
||||
})
|
||||
|
||||
# ── IP/CIDR ranges ───────────────────────────────────────────────────
|
||||
has_ua = bool(ua_regex)
|
||||
for cidr in remote_addrs:
|
||||
ip_rules.append({
|
||||
"prefix": cidr,
|
||||
"bot_name": name,
|
||||
"action": action,
|
||||
"rule_id": rule_id,
|
||||
"has_ua": 1 if has_ua else 0,
|
||||
"category": category,
|
||||
})
|
||||
|
||||
return ua_rules, ip_rules
|
||||
|
||||
|
||||
def parse_bot_policies_inline(url: str) -> tuple[list[dict], list[dict]]:
|
||||
"""
|
||||
Parse botPolicies.yaml pour les règles inline avec geoip.countries et asns.match.
|
||||
Retourne (asn_rules, country_rules).
|
||||
"""
|
||||
data = fetch_yaml_url(url)
|
||||
if not data or not isinstance(data, dict):
|
||||
return [], []
|
||||
|
||||
asn_rules: list[dict] = []
|
||||
country_rules: list[dict] = []
|
||||
|
||||
for rule in data.get("bots", []):
|
||||
if not isinstance(rule, dict):
|
||||
continue
|
||||
if "import" in rule:
|
||||
continue
|
||||
|
||||
name = rule.get("name", "").strip()
|
||||
action = rule.get("action", "").strip()
|
||||
if not name or not action:
|
||||
continue
|
||||
|
||||
# ASN rules
|
||||
asns = rule.get("asns", {})
|
||||
if isinstance(asns, dict):
|
||||
for asn in asns.get("match", []):
|
||||
asn_rules.append({
|
||||
"asn": int(asn),
|
||||
"bot_name": name,
|
||||
"action": action,
|
||||
"category": "policies",
|
||||
})
|
||||
|
||||
# Country rules
|
||||
geoip = rule.get("geoip", {})
|
||||
if isinstance(geoip, dict):
|
||||
for cc in geoip.get("countries", []):
|
||||
country_rules.append({
|
||||
"country_code": str(cc).upper(),
|
||||
"bot_name": name,
|
||||
"action": action,
|
||||
"category": "policies",
|
||||
})
|
||||
|
||||
return asn_rules, country_rules
|
||||
|
||||
|
||||
# ──────────────────────────────────────────────────────────────────────────────
|
||||
# Collecte de toutes les règles
|
||||
# ──────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
def collect_all_rules() -> tuple[list, list, list, list]:
|
||||
"""Retourne (ua_rules, ip_rules, asn_rules, country_rules)."""
|
||||
ua_name_to_id: dict[str, int] = {}
|
||||
ua_id_counter_ref: list[int] = [1]
|
||||
rule_id_counter: list[int] = [1]
|
||||
|
||||
all_ua: list[dict] = []
|
||||
all_ip: list[dict] = []
|
||||
|
||||
for api_path, category in DIRECTORIES:
|
||||
print(f"[INFO] Parcours de {api_path} ({category})…")
|
||||
file_urls = list_yaml_files(api_path)
|
||||
print(f" {len(file_urls)} fichiers trouvés")
|
||||
for url in file_urls:
|
||||
ua, ip = parse_file(url, category, ua_name_to_id, ua_id_counter_ref, rule_id_counter)
|
||||
all_ua.extend(ua)
|
||||
all_ip.extend(ip)
|
||||
|
||||
# Règles ASN + pays depuis botPolicies.yaml
|
||||
print(f"[INFO] Lecture de botPolicies.yaml…")
|
||||
policies_url = f"{GITHUB_RAW}/{BOT_POLICIES_PATH}"
|
||||
asn_rules, country_rules = parse_bot_policies_inline(policies_url)
|
||||
|
||||
return all_ua, all_ip, asn_rules, country_rules
|
||||
|
||||
|
||||
# ──────────────────────────────────────────────────────────────────────────────
|
||||
# ClickHouse
|
||||
# ──────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
def get_ch_client():
|
||||
return clickhouse_connect.get_client(
|
||||
host=os.environ.get("CLICKHOUSE_HOST", "clickhouse"),
|
||||
database=os.environ.get("CLICKHOUSE_DB", "mabase_prod"),
|
||||
username=os.environ.get("CLICKHOUSE_USER", "admin"),
|
||||
password=os.environ.get("CLICKHOUSE_PASSWORD", ""),
|
||||
)
|
||||
|
||||
|
||||
def insert_ua_rules(client, rules: list[dict]) -> None:
|
||||
if not rules:
|
||||
print("[INFO] Aucune règle UA.")
|
||||
return
|
||||
client.command("TRUNCATE TABLE mabase_prod.anubis_ua_rules")
|
||||
# REGEXP_TREE format : id, parent_id, regexp, keys[], values[]
|
||||
# keys = ['bot_name', 'action', 'has_ip', 'rule_id', 'category']
|
||||
data = [
|
||||
[
|
||||
r["id"], r["parent_id"], r["regexp"],
|
||||
["bot_name", "action", "has_ip", "rule_id", "category"],
|
||||
[r["bot_name"], r["action"], r["has_ip"], r["rule_id"], r["category"]],
|
||||
]
|
||||
for r in rules
|
||||
]
|
||||
client.insert("mabase_prod.anubis_ua_rules", data,
|
||||
column_names=["id", "parent_id", "regexp", "keys", "values"])
|
||||
print(f"[OK] {len(rules)} règles UA insérées.")
|
||||
|
||||
|
||||
def insert_ip_rules(client, rules: list[dict]) -> None:
|
||||
if not rules:
|
||||
print("[INFO] Aucune règle IP.")
|
||||
return
|
||||
client.command("TRUNCATE TABLE mabase_prod.anubis_ip_rules")
|
||||
data = [
|
||||
[r["prefix"], r["bot_name"], r["action"],
|
||||
r["rule_id"], r["has_ua"], r["category"]]
|
||||
for r in rules
|
||||
]
|
||||
client.insert("mabase_prod.anubis_ip_rules", data,
|
||||
column_names=["prefix", "bot_name", "action", "rule_id", "has_ua", "category"])
|
||||
print(f"[OK] {len(rules)} règles IP insérées.")
|
||||
|
||||
|
||||
def insert_asn_rules(client, rules: list[dict]) -> None:
|
||||
if not rules:
|
||||
print("[INFO] Aucune règle ASN.")
|
||||
return
|
||||
client.command("TRUNCATE TABLE mabase_prod.anubis_asn_rules")
|
||||
data = [[r["asn"], r["bot_name"], r["action"], r["category"]] for r in rules]
|
||||
client.insert("mabase_prod.anubis_asn_rules", data,
|
||||
column_names=["asn", "bot_name", "action", "category"])
|
||||
print(f"[OK] {len(rules)} règles ASN insérées.")
|
||||
|
||||
|
||||
def insert_country_rules(client, rules: list[dict]) -> None:
|
||||
if not rules:
|
||||
print("[INFO] Aucune règle pays.")
|
||||
return
|
||||
client.command("TRUNCATE TABLE mabase_prod.anubis_country_rules")
|
||||
data = [[r["country_code"], r["bot_name"], r["action"], r["category"]] for r in rules]
|
||||
client.insert("mabase_prod.anubis_country_rules", data,
|
||||
column_names=["country_code", "bot_name", "action", "category"])
|
||||
print(f"[OK] {len(rules)} règles pays insérées.")
|
||||
|
||||
|
||||
def reload_dicts(client) -> None:
|
||||
dicts = [
|
||||
"mabase_prod.dict_anubis_ua",
|
||||
"mabase_prod.dict_anubis_ip",
|
||||
"mabase_prod.dict_anubis_asn",
|
||||
"mabase_prod.dict_anubis_country",
|
||||
]
|
||||
for d in dicts:
|
||||
try:
|
||||
client.command(f"SYSTEM RELOAD DICTIONARY {d}")
|
||||
print(f"[OK] {d} rechargé.")
|
||||
except Exception as e:
|
||||
print(f"[WARN] Rechargement {d}: {e}", file=sys.stderr)
|
||||
|
||||
|
||||
# ──────────────────────────────────────────────────────────────────────────────
|
||||
# Rapport
|
||||
# ──────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
def print_summary(ua_rules, ip_rules, asn_rules, country_rules):
|
||||
print("\n── Règles UA ──")
|
||||
by_cat: dict[str, list] = {}
|
||||
for r in ua_rules:
|
||||
by_cat.setdefault(r["category"], []).append(r)
|
||||
for cat, rules in sorted(by_cat.items()):
|
||||
print(f" [{cat}] {len(rules)} règle(s)")
|
||||
for r in rules[:5]:
|
||||
has = " [+IP]" if r["has_ip"] == "1" else ""
|
||||
par = f" [parent={r['parent_id']}]" if r["parent_id"] else ""
|
||||
print(f" [{r['action']:9s}] {r['bot_name']}{has}{par}: {r['regexp'][:50]}")
|
||||
if len(rules) > 5:
|
||||
print(f" … et {len(rules) - 5} autres")
|
||||
|
||||
print(f"\n── Règles IP : {len(ip_rules)} CIDRs ──")
|
||||
by_bot: dict[str, list] = {}
|
||||
for r in ip_rules:
|
||||
by_bot.setdefault(r["bot_name"], []).append(r)
|
||||
for bot, rs in sorted(by_bot.items())[:15]:
|
||||
print(f" [{rs[0]['action']:9s}] {bot}: {len(rs)} CIDRs (cat={rs[0]['category']}, has_ua={rs[0]['has_ua']})")
|
||||
if len(by_bot) > 15:
|
||||
print(f" … et {len(by_bot) - 15} autres bots")
|
||||
|
||||
if asn_rules:
|
||||
print(f"\n── Règles ASN : {len(asn_rules)} ──")
|
||||
for r in asn_rules:
|
||||
print(f" [{r['action']:9s}] ASN {r['asn']}: {r['bot_name']}")
|
||||
|
||||
if country_rules:
|
||||
print(f"\n── Règles pays : {len(country_rules)} ──")
|
||||
for r in country_rules:
|
||||
print(f" [{r['action']:9s}] {r['country_code']}: {r['bot_name']}")
|
||||
|
||||
|
||||
# ──────────────────────────────────────────────────────────────────────────────
|
||||
# Main
|
||||
# ──────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
def main() -> None:
|
||||
print("[INFO] Collecte des règles Anubis depuis GitHub…")
|
||||
ua_rules, ip_rules, asn_rules, country_rules = collect_all_rules()
|
||||
|
||||
total = len(ua_rules) + len(ip_rules) + len(asn_rules) + len(country_rules)
|
||||
print(f"\n[INFO] {len(ua_rules)} règles UA, {len(ip_rules)} CIDRs IP, "
|
||||
f"{len(asn_rules)} ASN, {len(country_rules)} pays (total={total})")
|
||||
|
||||
if total == 0:
|
||||
print("[ERREUR] Aucune règle récupérée.", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
print_summary(ua_rules, ip_rules, asn_rules, country_rules)
|
||||
|
||||
print("\n[INFO] Connexion à ClickHouse…")
|
||||
client = get_ch_client()
|
||||
|
||||
insert_ua_rules(client, ua_rules)
|
||||
insert_ip_rules(client, ip_rules)
|
||||
insert_asn_rules(client, asn_rules)
|
||||
insert_country_rules(client, country_rules)
|
||||
reload_dicts(client)
|
||||
|
||||
print("\n[OK] Règles Anubis chargées avec succès.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
126
services/bot-detector/anubis/mv_http_logs.sql
Normal file
126
services/bot-detector/anubis/mv_http_logs.sql
Normal file
@ -0,0 +1,126 @@
|
||||
CREATE MATERIALIZED VIEW mabase_prod.mv_http_logs
|
||||
TO mabase_prod.http_logs
|
||||
AS
|
||||
WITH
|
||||
coalesce(JSONExtractString(raw_json, 'header_User-Agent'), '') AS _ua,
|
||||
toIPv6(toIPv4(coalesce(JSONExtractString(raw_json, 'src_ip'), '0.0.0.0'))) AS _ip,
|
||||
toUInt32(dictGetOrDefault('mabase_prod.dict_iplocate_asn', 'asn', _ip, toUInt32(0))) AS _asn,
|
||||
dictGetOrDefault('mabase_prod.dict_iplocate_asn', 'country_code', _ip, '') AS _cc
|
||||
SELECT
|
||||
parseDateTimeBestEffort(coalesce(JSONExtractString(raw_json, 'time'), '1970-01-01T00:00:00Z')) AS time,
|
||||
toDate(time) AS log_date,
|
||||
toIPv4(coalesce(JSONExtractString(raw_json, 'src_ip'), '0.0.0.0')) AS src_ip,
|
||||
toUInt16(coalesce(JSONExtractUInt(raw_json, 'src_port'), 0)) AS src_port,
|
||||
_asn AS src_asn,
|
||||
_cc AS src_country_code,
|
||||
toIPv4(coalesce(JSONExtractString(raw_json, 'dst_ip'), '0.0.0.0')) AS dst_ip,
|
||||
toUInt16(coalesce(JSONExtractUInt(raw_json, 'dst_port'), 0)) AS dst_port,
|
||||
dictGetOrDefault('mabase_prod.dict_iplocate_asn', 'name', _ip, '') AS src_as_name,
|
||||
dictGetOrDefault('mabase_prod.dict_iplocate_asn', 'org', _ip, '') AS src_org,
|
||||
dictGetOrDefault('mabase_prod.dict_iplocate_asn', 'domain', _ip, '') AS src_domain,
|
||||
coalesce(JSONExtractString(raw_json, 'method'), '') AS method,
|
||||
coalesce(JSONExtractString(raw_json, 'scheme'), '') AS scheme,
|
||||
coalesce(JSONExtractString(raw_json, 'host'), '') AS host,
|
||||
coalesce(JSONExtractString(raw_json, 'path'), '') AS path,
|
||||
coalesce(JSONExtractString(raw_json, 'query'), '') AS query,
|
||||
coalesce(JSONExtractString(raw_json, 'http_version'), '') AS http_version,
|
||||
coalesce(JSONExtractString(raw_json, 'orphan_side'), '') AS orphan_side,
|
||||
toUInt8(coalesce(JSONExtractBool(raw_json, 'correlated'), 0)) AS correlated,
|
||||
toUInt16(coalesce(JSONExtractUInt(raw_json, 'keepalives'), 0)) AS keepalives,
|
||||
coalesce(JSONExtractUInt(raw_json, 'a_timestamp'), 0) AS a_timestamp,
|
||||
coalesce(JSONExtractUInt(raw_json, 'b_timestamp'), 0) AS b_timestamp,
|
||||
coalesce(JSONExtractString(raw_json, 'conn_id'), '') AS conn_id,
|
||||
toUInt8(coalesce(JSONExtractBool(raw_json, 'ip_meta_df'), 0)) AS ip_meta_df,
|
||||
toUInt16(coalesce(JSONExtractUInt(raw_json, 'ip_meta_id'), 0)) AS ip_meta_id,
|
||||
toUInt16(coalesce(JSONExtractUInt(raw_json, 'ip_meta_total_length'), 0)) AS ip_meta_total_length,
|
||||
toUInt8(coalesce(JSONExtractUInt(raw_json, 'ip_meta_ttl'), 0)) AS ip_meta_ttl,
|
||||
coalesce(JSONExtractString(raw_json, 'tcp_meta_options'), '') AS tcp_meta_options,
|
||||
toUInt32(coalesce(JSONExtractUInt(raw_json, 'tcp_meta_window_size'), 0)) AS tcp_meta_window_size,
|
||||
toUInt16(coalesce(JSONExtractUInt(raw_json, 'tcp_meta_mss'), 0)) AS tcp_meta_mss,
|
||||
toUInt8(coalesce(JSONExtractUInt(raw_json, 'tcp_meta_window_scale'), 0)) AS tcp_meta_window_scale,
|
||||
toInt32(coalesce(JSONExtractInt(raw_json, 'syn_to_clienthello_ms'), 0)) AS syn_to_clienthello_ms,
|
||||
coalesce(JSONExtractString(raw_json, 'tls_version'), '') AS tls_version,
|
||||
coalesce(JSONExtractString(raw_json, 'tls_sni'), '') AS tls_sni,
|
||||
coalesce(JSONExtractString(raw_json, 'tls_alpn'), '') AS tls_alpn,
|
||||
coalesce(JSONExtractString(raw_json, 'ja3'), '') AS ja3,
|
||||
coalesce(JSONExtractString(raw_json, 'ja3_hash'), '') AS ja3_hash,
|
||||
coalesce(JSONExtractString(raw_json, 'ja4'), '') AS ja4,
|
||||
coalesce(JSONExtractString(raw_json, 'client_headers'), '') AS client_headers,
|
||||
coalesce(JSONExtractString(raw_json, 'header_User-Agent'), '') AS header_user_agent,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Accept'), '') AS header_accept,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Accept-Encoding'), '') AS header_accept_encoding,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Accept-Language'), '') AS header_accept_language,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Content-Type'), '') AS header_content_type,
|
||||
coalesce(JSONExtractString(raw_json, 'header_X-Request-Id'), '') AS header_x_request_id,
|
||||
coalesce(JSONExtractString(raw_json, 'header_X-Trace-Id'), '') AS header_x_trace_id,
|
||||
coalesce(JSONExtractString(raw_json, 'header_X-Forwarded-For'), '') AS header_x_forwarded_for,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Sec-CH-UA'), '') AS header_sec_ch_ua,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Sec-CH-UA-Mobile'), '') AS header_sec_ch_ua_mobile,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Sec-CH-UA-Platform'), '') AS header_sec_ch_ua_platform,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Sec-Fetch-Dest'), '') AS header_sec_fetch_dest,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Sec-Fetch-Mode'), '') AS header_sec_fetch_mode,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Sec-Fetch-Site'), '') AS header_sec_fetch_site,
|
||||
|
||||
-- Anubis enrichment : logique de correspondance combinée UA+IP
|
||||
-- Priorité : (1) UA+IP [même rule_id] > (2) UA seul > (3) IP seul > (4) ASN > (5) Pays
|
||||
CASE
|
||||
WHEN dictGet('mabase_prod.dict_anubis_ua', 'has_ip', _ua) = '1'
|
||||
AND dictGet('mabase_prod.dict_anubis_ua', 'bot_name', _ua) != ''
|
||||
AND dictGetOrDefault('mabase_prod.dict_anubis_ip', 'bot_name', _ip, '') != ''
|
||||
AND toUInt64OrZero(dictGet('mabase_prod.dict_anubis_ua', 'rule_id', _ua))
|
||||
= dictGetOrDefault('mabase_prod.dict_anubis_ip', 'rule_id', _ip, toUInt64(0))
|
||||
THEN dictGet('mabase_prod.dict_anubis_ua', 'bot_name', _ua)
|
||||
WHEN dictGet('mabase_prod.dict_anubis_ua', 'has_ip', _ua) = '0'
|
||||
AND dictGet('mabase_prod.dict_anubis_ua', 'bot_name', _ua) != ''
|
||||
THEN dictGet('mabase_prod.dict_anubis_ua', 'bot_name', _ua)
|
||||
WHEN dictGetOrDefault('mabase_prod.dict_anubis_ip', 'has_ua', _ip, toUInt8(0)) = 0
|
||||
AND dictGetOrDefault('mabase_prod.dict_anubis_ip', 'bot_name', _ip, '') != ''
|
||||
THEN dictGetOrDefault('mabase_prod.dict_anubis_ip', 'bot_name', _ip, '')
|
||||
WHEN dictGetOrDefault('mabase_prod.dict_anubis_asn', 'bot_name', _asn, '') != ''
|
||||
THEN dictGetOrDefault('mabase_prod.dict_anubis_asn', 'bot_name', _asn, '')
|
||||
WHEN dictGetOrDefault('mabase_prod.dict_anubis_country', 'bot_name', _cc, '') != ''
|
||||
THEN dictGetOrDefault('mabase_prod.dict_anubis_country', 'bot_name', _cc, '')
|
||||
ELSE ''
|
||||
END AS anubis_bot_name,
|
||||
|
||||
CASE
|
||||
WHEN dictGet('mabase_prod.dict_anubis_ua', 'has_ip', _ua) = '1'
|
||||
AND dictGet('mabase_prod.dict_anubis_ua', 'bot_name', _ua) != ''
|
||||
AND dictGetOrDefault('mabase_prod.dict_anubis_ip', 'bot_name', _ip, '') != ''
|
||||
AND toUInt64OrZero(dictGet('mabase_prod.dict_anubis_ua', 'rule_id', _ua))
|
||||
= dictGetOrDefault('mabase_prod.dict_anubis_ip', 'rule_id', _ip, toUInt64(0))
|
||||
THEN dictGet('mabase_prod.dict_anubis_ua', 'action', _ua)
|
||||
WHEN dictGet('mabase_prod.dict_anubis_ua', 'has_ip', _ua) = '0'
|
||||
AND dictGet('mabase_prod.dict_anubis_ua', 'bot_name', _ua) != ''
|
||||
THEN dictGet('mabase_prod.dict_anubis_ua', 'action', _ua)
|
||||
WHEN dictGetOrDefault('mabase_prod.dict_anubis_ip', 'has_ua', _ip, toUInt8(0)) = 0
|
||||
AND dictGetOrDefault('mabase_prod.dict_anubis_ip', 'bot_name', _ip, '') != ''
|
||||
THEN dictGetOrDefault('mabase_prod.dict_anubis_ip', 'action', _ip, '')
|
||||
WHEN dictGetOrDefault('mabase_prod.dict_anubis_asn', 'bot_name', _asn, '') != ''
|
||||
THEN dictGetOrDefault('mabase_prod.dict_anubis_asn', 'action', _asn, '')
|
||||
WHEN dictGetOrDefault('mabase_prod.dict_anubis_country', 'bot_name', _cc, '') != ''
|
||||
THEN dictGetOrDefault('mabase_prod.dict_anubis_country', 'action', _cc, '')
|
||||
ELSE ''
|
||||
END AS anubis_bot_action,
|
||||
|
||||
CASE
|
||||
WHEN dictGet('mabase_prod.dict_anubis_ua', 'has_ip', _ua) = '1'
|
||||
AND dictGet('mabase_prod.dict_anubis_ua', 'bot_name', _ua) != ''
|
||||
AND dictGetOrDefault('mabase_prod.dict_anubis_ip', 'bot_name', _ip, '') != ''
|
||||
AND toUInt64OrZero(dictGet('mabase_prod.dict_anubis_ua', 'rule_id', _ua))
|
||||
= dictGetOrDefault('mabase_prod.dict_anubis_ip', 'rule_id', _ip, toUInt64(0))
|
||||
THEN dictGet('mabase_prod.dict_anubis_ua', 'category', _ua)
|
||||
WHEN dictGet('mabase_prod.dict_anubis_ua', 'has_ip', _ua) = '0'
|
||||
AND dictGet('mabase_prod.dict_anubis_ua', 'bot_name', _ua) != ''
|
||||
THEN dictGet('mabase_prod.dict_anubis_ua', 'category', _ua)
|
||||
WHEN dictGetOrDefault('mabase_prod.dict_anubis_ip', 'has_ua', _ip, toUInt8(0)) = 0
|
||||
AND dictGetOrDefault('mabase_prod.dict_anubis_ip', 'bot_name', _ip, '') != ''
|
||||
THEN dictGetOrDefault('mabase_prod.dict_anubis_ip', 'category', _ip, '')
|
||||
WHEN dictGetOrDefault('mabase_prod.dict_anubis_asn', 'bot_name', _asn, '') != ''
|
||||
THEN dictGetOrDefault('mabase_prod.dict_anubis_asn', 'category', _asn, '')
|
||||
WHEN dictGetOrDefault('mabase_prod.dict_anubis_country', 'bot_name', _cc, '') != ''
|
||||
THEN dictGetOrDefault('mabase_prod.dict_anubis_country', 'category', _cc, '')
|
||||
ELSE ''
|
||||
END AS anubis_bot_category
|
||||
|
||||
FROM mabase_prod.http_logs_raw
|
||||
183
services/bot-detector/anubis/view_ai_features_anubis.sql
Normal file
183
services/bot-detector/anubis/view_ai_features_anubis.sql
Normal file
@ -0,0 +1,183 @@
|
||||
CREATE OR REPLACE VIEW mabase_prod.view_ai_features_1h AS
|
||||
WITH base_data AS (
|
||||
SELECT
|
||||
a.window_start, a.src_ip, a.ja4, a.host,
|
||||
toString(a.src_asn) AS asn_number,
|
||||
a.src_as_name AS asn_org, a.src_org AS asn_detail, a.src_domain AS asn_domain,
|
||||
a.src_country_code AS country_code,
|
||||
dictGetOrDefault('mabase_prod.dict_asn_reputation', 'label', toUInt64(a.src_asn), 'unknown') AS asn_label,
|
||||
-- Bot connu via JA4/IP (dictionnaires existants)
|
||||
COALESCE(
|
||||
nullIf(dictGetOrDefault('mabase_prod.dict_bot_ip', 'bot_name', a.src_ip, ''), ''),
|
||||
nullIf(dictGetOrDefault('mabase_prod.dict_bot_ja4', 'bot_name', tuple(a.ja4), ''), ''),
|
||||
''
|
||||
) AS bot_name,
|
||||
-- Anubis : logique combinée UA+IP (même rule_id) > UA seul > IP seul > ASN > Pays
|
||||
CASE
|
||||
WHEN dictGet('mabase_prod.dict_anubis_ua', 'has_ip', a.first_ua) = '1'
|
||||
AND dictGet('mabase_prod.dict_anubis_ua', 'bot_name', a.first_ua) != ''
|
||||
AND dictGetOrDefault('mabase_prod.dict_anubis_ip', 'bot_name', a.src_ip, '') != ''
|
||||
AND toUInt64OrZero(dictGet('mabase_prod.dict_anubis_ua', 'rule_id', a.first_ua))
|
||||
= dictGetOrDefault('mabase_prod.dict_anubis_ip', 'rule_id', a.src_ip, toUInt64(0))
|
||||
THEN dictGet('mabase_prod.dict_anubis_ua', 'bot_name', a.first_ua)
|
||||
WHEN dictGet('mabase_prod.dict_anubis_ua', 'has_ip', a.first_ua) = '0'
|
||||
AND dictGet('mabase_prod.dict_anubis_ua', 'bot_name', a.first_ua) != ''
|
||||
THEN dictGet('mabase_prod.dict_anubis_ua', 'bot_name', a.first_ua)
|
||||
WHEN dictGetOrDefault('mabase_prod.dict_anubis_ip', 'has_ua', a.src_ip, toUInt8(0)) = 0
|
||||
AND dictGetOrDefault('mabase_prod.dict_anubis_ip', 'bot_name', a.src_ip, '') != ''
|
||||
THEN dictGetOrDefault('mabase_prod.dict_anubis_ip', 'bot_name', a.src_ip, '')
|
||||
WHEN dictGetOrDefault('mabase_prod.dict_anubis_asn', 'bot_name', toUInt32(a.src_asn), '') != ''
|
||||
THEN dictGetOrDefault('mabase_prod.dict_anubis_asn', 'bot_name', toUInt32(a.src_asn), '')
|
||||
WHEN dictGetOrDefault('mabase_prod.dict_anubis_country', 'bot_name', a.src_country_code, '') != ''
|
||||
THEN dictGetOrDefault('mabase_prod.dict_anubis_country', 'bot_name', a.src_country_code, '')
|
||||
ELSE ''
|
||||
END AS anubis_bot_name,
|
||||
CASE
|
||||
WHEN dictGet('mabase_prod.dict_anubis_ua', 'has_ip', a.first_ua) = '1'
|
||||
AND dictGet('mabase_prod.dict_anubis_ua', 'bot_name', a.first_ua) != ''
|
||||
AND dictGetOrDefault('mabase_prod.dict_anubis_ip', 'bot_name', a.src_ip, '') != ''
|
||||
AND toUInt64OrZero(dictGet('mabase_prod.dict_anubis_ua', 'rule_id', a.first_ua))
|
||||
= dictGetOrDefault('mabase_prod.dict_anubis_ip', 'rule_id', a.src_ip, toUInt64(0))
|
||||
THEN dictGet('mabase_prod.dict_anubis_ua', 'action', a.first_ua)
|
||||
WHEN dictGet('mabase_prod.dict_anubis_ua', 'has_ip', a.first_ua) = '0'
|
||||
AND dictGet('mabase_prod.dict_anubis_ua', 'bot_name', a.first_ua) != ''
|
||||
THEN dictGet('mabase_prod.dict_anubis_ua', 'action', a.first_ua)
|
||||
WHEN dictGetOrDefault('mabase_prod.dict_anubis_ip', 'has_ua', a.src_ip, toUInt8(0)) = 0
|
||||
AND dictGetOrDefault('mabase_prod.dict_anubis_ip', 'bot_name', a.src_ip, '') != ''
|
||||
THEN dictGetOrDefault('mabase_prod.dict_anubis_ip', 'action', a.src_ip, '')
|
||||
WHEN dictGetOrDefault('mabase_prod.dict_anubis_asn', 'bot_name', toUInt32(a.src_asn), '') != ''
|
||||
THEN dictGetOrDefault('mabase_prod.dict_anubis_asn', 'action', toUInt32(a.src_asn), '')
|
||||
WHEN dictGetOrDefault('mabase_prod.dict_anubis_country', 'bot_name', a.src_country_code, '') != ''
|
||||
THEN dictGetOrDefault('mabase_prod.dict_anubis_country', 'action', a.src_country_code, '')
|
||||
ELSE ''
|
||||
END AS anubis_bot_action,
|
||||
CASE
|
||||
WHEN dictGet('mabase_prod.dict_anubis_ua', 'has_ip', a.first_ua) = '1'
|
||||
AND dictGet('mabase_prod.dict_anubis_ua', 'bot_name', a.first_ua) != ''
|
||||
AND dictGetOrDefault('mabase_prod.dict_anubis_ip', 'bot_name', a.src_ip, '') != ''
|
||||
AND toUInt64OrZero(dictGet('mabase_prod.dict_anubis_ua', 'rule_id', a.first_ua))
|
||||
= dictGetOrDefault('mabase_prod.dict_anubis_ip', 'rule_id', a.src_ip, toUInt64(0))
|
||||
THEN dictGet('mabase_prod.dict_anubis_ua', 'category', a.first_ua)
|
||||
WHEN dictGet('mabase_prod.dict_anubis_ua', 'has_ip', a.first_ua) = '0'
|
||||
AND dictGet('mabase_prod.dict_anubis_ua', 'bot_name', a.first_ua) != ''
|
||||
THEN dictGet('mabase_prod.dict_anubis_ua', 'category', a.first_ua)
|
||||
WHEN dictGetOrDefault('mabase_prod.dict_anubis_ip', 'has_ua', a.src_ip, toUInt8(0)) = 0
|
||||
AND dictGetOrDefault('mabase_prod.dict_anubis_ip', 'bot_name', a.src_ip, '') != ''
|
||||
THEN dictGetOrDefault('mabase_prod.dict_anubis_ip', 'category', a.src_ip, '')
|
||||
WHEN dictGetOrDefault('mabase_prod.dict_anubis_asn', 'bot_name', toUInt32(a.src_asn), '') != ''
|
||||
THEN dictGetOrDefault('mabase_prod.dict_anubis_asn', 'category', toUInt32(a.src_asn), '')
|
||||
WHEN dictGetOrDefault('mabase_prod.dict_anubis_country', 'bot_name', a.src_country_code, '') != ''
|
||||
THEN dictGetOrDefault('mabase_prod.dict_anubis_country', 'category', a.src_country_code, '')
|
||||
ELSE ''
|
||||
END AS anubis_bot_category,
|
||||
a.hits AS hits,
|
||||
sum(a.hits) OVER (PARTITION BY a.src_ip) AS total_ip_hits,
|
||||
a.correlated AS correlated,
|
||||
a.tcp_jitter_variance AS tcp_jitter_variance,
|
||||
a.true_window_size AS true_window_size,
|
||||
a.window_mss_ratio AS window_mss_ratio,
|
||||
a.max_keepalives AS max_keepalives,
|
||||
h.header_order_hash AS header_order_hash, h.header_count AS header_count,
|
||||
h.has_accept_language AS has_accept_language, h.has_cookie AS has_cookie,
|
||||
h.has_referer AS has_referer, h.modern_browser_score AS modern_browser_score,
|
||||
h.ua_ch_mismatch AS ua_ch_mismatch,
|
||||
(a.count_post / (a.hits + 1)) AS post_ratio,
|
||||
(a.uniq_query_params / (a.uniq_paths + 1)) AS fuzzing_index,
|
||||
(a.hits / (dateDiff('second', a.first_seen, a.last_seen) + 1)) AS hit_velocity,
|
||||
(a.unique_src_ports / (a.hits + 1)) AS port_exhaustion_ratio,
|
||||
(a.orphan_count / (a.hits + 1)) AS orphan_ratio,
|
||||
(a.ip_id_zero_count / (a.hits + 1)) AS ip_id_zero_ratio,
|
||||
(a.hits / (a.unique_conn_id + 1)) AS multiplexing_efficiency,
|
||||
IF(a.mss_1460_count > (a.hits * 0.8) AND h.modern_browser_score > 70, 1, 0) AS mss_mobile_mismatch,
|
||||
a.request_size_variance AS request_size_variance,
|
||||
IF(a.tls_alpn = 'h2' AND a.http_version != '2', 1, 0) AS alpn_http_mismatch,
|
||||
IF(length(a.tls_alpn) = 0 OR a.tls_alpn = '00', 1, 0) AS is_alpn_missing,
|
||||
IF(length(a.tls_sni) > 0 AND a.tls_sni != a.host, 1, 0) AS sni_host_mismatch,
|
||||
IF(h.sec_fetch_mode = 'navigate' AND h.sec_fetch_dest != 'document', 1, 0) AS is_fake_navigation,
|
||||
count() OVER (PARTITION BY a.tcp_fingerprint) AS tcp_shared_count,
|
||||
count() OVER (PARTITION BY h.header_order_hash) AS header_order_shared_count,
|
||||
(a.count_assets / (a.hits + 1)) AS asset_ratio,
|
||||
(a.count_no_referer / (a.hits + 1)) AS direct_access_ratio,
|
||||
IF(a.unique_ua > 2, 1, 0) AS is_ua_rotating,
|
||||
uniqExact(a.ja4) OVER (PARTITION BY a.src_ip) AS distinct_ja4_count,
|
||||
((a.hits / (a.unique_src_ports + 1)) / (dateDiff('second', a.first_seen, a.last_seen) + 1)) AS src_port_density,
|
||||
(sum(a.hits) OVER (PARTITION BY a.ja4, a.src_asn) / (sum(a.hits) OVER (PARTITION BY a.ja4) + 1)) AS ja4_asn_concentration,
|
||||
(sum(a.hits) OVER (PARTITION BY a.ja4, a.src_country_code) / (sum(a.hits) OVER (PARTITION BY a.ja4) + 1)) AS ja4_country_concentration,
|
||||
IF(sum(a.hits) OVER (PARTITION BY a.ja4) < 100, 1, 0) AS is_rare_ja4,
|
||||
(count() OVER (PARTITION BY h.header_order_hash, a.first_ua) / (count() OVER (PARTITION BY a.first_ua) + 1)) AS header_order_confidence,
|
||||
uniqExact(h.header_order_hash) OVER (PARTITION BY a.src_ip) AS distinct_header_orders,
|
||||
(a.uniq_paths / (a.hits + 1)) AS path_diversity_ratio,
|
||||
a.url_depth_variance AS url_depth_variance,
|
||||
(a.count_anomalous_payload / (a.hits + 1)) AS anomalous_payload_ratio,
|
||||
a.uniq_ja3_val AS uniq_ja3_per_row,
|
||||
sqrt(a.tcp_jitter_variance) / greatest(a.avg_syn_ms_val, 1) AS syn_timing_cv,
|
||||
a.tls12_count / (a.hits + 1) AS tls12_ratio,
|
||||
a.count_head / (a.hits + 1) AS head_ratio,
|
||||
a.count_no_sec_fetch / (a.hits + 1) AS sec_fetch_absence_rate,
|
||||
a.count_generic_accept / (a.hits + 1) AS generic_accept_ratio,
|
||||
a.count_http10 / (a.hits + 1) AS http10_ratio,
|
||||
a.ip_df_variance AS ip_df_variance,
|
||||
-- Nouvelles features TTL (fingerprint OS, L4 → modèle Complet)
|
||||
a.avg_ttl_val AS avg_ttl,
|
||||
sqrt(a.ttl_variance_val) AS ttl_std,
|
||||
IF(a.count_correlated_val > 0, a.count_no_wscale_val / a.count_correlated_val, 0) AS no_window_scale_ratio,
|
||||
-- Nouvelles features HTTP (disponibles pour les deux modèles)
|
||||
a.count_no_accept_enc_val / (a.hits + 1) AS missing_accept_enc_ratio,
|
||||
a.count_http_scheme_val / (a.hits + 1) AS http_scheme_ratio
|
||||
FROM (
|
||||
SELECT
|
||||
window_start, src_ip, ja4, host, src_asn,
|
||||
any(src_country_code) AS src_country_code, any(src_as_name) AS src_as_name,
|
||||
any(src_org) AS src_org, any(src_domain) AS src_domain, any(first_ua) AS first_ua,
|
||||
sum(hits) AS hits, uniqMerge(uniq_paths) AS uniq_paths,
|
||||
uniqMerge(uniq_query_params) AS uniq_query_params, sum(count_post) AS count_post,
|
||||
min(first_seen) AS first_seen, max(last_seen) AS last_seen,
|
||||
any(tcp_fp_raw) AS tcp_fingerprint, varPopMerge(tcp_jitter_variance) AS tcp_jitter_variance,
|
||||
varPopMerge(total_ip_length_var) AS request_size_variance,
|
||||
any(tcp_win_raw * exp2(tcp_scale_raw)) AS true_window_size,
|
||||
IF(any(tcp_mss_raw) > 0, any(tcp_win_raw) / any(tcp_mss_raw), 0) AS window_mss_ratio,
|
||||
any(http_ver_raw) AS http_version, any(tls_alpn_raw) AS tls_alpn, any(tls_sni_raw) AS tls_sni,
|
||||
max(correlated_raw) AS correlated, uniqMerge(unique_src_ports) AS unique_src_ports,
|
||||
uniqMerge(unique_conn_id) AS unique_conn_id, max(max_keepalives) AS max_keepalives,
|
||||
sum(orphan_count) AS orphan_count, sum(ip_id_zero_count) AS ip_id_zero_count,
|
||||
sum(mss_1460_count) AS mss_1460_count,
|
||||
sum(count_assets) AS count_assets, sum(count_no_referer) AS count_no_referer,
|
||||
uniqMerge(uniq_ua) AS unique_ua,
|
||||
varPopMerge(url_depth_variance) AS url_depth_variance,
|
||||
sum(count_anomalous_payload) AS count_anomalous_payload,
|
||||
uniqMerge(uniq_ja3) AS uniq_ja3_val,
|
||||
avgMerge(avg_syn_ms) AS avg_syn_ms_val,
|
||||
sum(tls12_count) AS tls12_count,
|
||||
sum(count_head) AS count_head,
|
||||
sum(count_no_sec_fetch) AS count_no_sec_fetch,
|
||||
sum(count_generic_accept) AS count_generic_accept,
|
||||
sum(count_http10) AS count_http10,
|
||||
varPopMerge(ip_df_var) AS ip_df_variance,
|
||||
-- Nouvelles features : TTL fingerprint (L4) + HTTP
|
||||
avgIfMerge(avg_ttl) AS avg_ttl_val,
|
||||
varPopIfMerge(ttl_var) AS ttl_variance_val,
|
||||
sum(count_no_wscale) AS count_no_wscale_val,
|
||||
sum(count_correlated) AS count_correlated_val,
|
||||
sum(count_no_accept_enc) AS count_no_accept_enc_val,
|
||||
sum(count_http_scheme) AS count_http_scheme_val
|
||||
FROM mabase_prod.agg_host_ip_ja4_1h
|
||||
WHERE window_start >= now() - INTERVAL 24 HOUR
|
||||
GROUP BY window_start, src_ip, ja4, host, src_asn
|
||||
) a
|
||||
LEFT JOIN (
|
||||
SELECT
|
||||
window_start, src_ip, any(header_order_hash) AS header_order_hash,
|
||||
max(header_count) AS header_count, max(has_accept_language) AS has_accept_language,
|
||||
max(has_cookie) AS has_cookie, max(has_referer) AS has_referer,
|
||||
max(modern_browser_score) AS modern_browser_score, max(ua_ch_mismatch) AS ua_ch_mismatch,
|
||||
any(sec_fetch_mode) AS sec_fetch_mode, any(sec_fetch_dest) AS sec_fetch_dest
|
||||
FROM mabase_prod.agg_header_fingerprint_1h
|
||||
WHERE window_start >= now() - INTERVAL 24 HOUR
|
||||
GROUP BY window_start, src_ip
|
||||
) h ON a.src_ip = h.src_ip AND a.window_start = h.window_start
|
||||
)
|
||||
SELECT
|
||||
*,
|
||||
-(sum((hits / (total_ip_hits + 1)) * log2((hits / (total_ip_hits + 1)) + 0.000001)) OVER (PARTITION BY src_ip)) AS temporal_entropy,
|
||||
sum(uniq_ja3_per_row) OVER (PARTITION BY src_ip) / greatest(distinct_ja4_count, 1) AS ja3_diversity_ratio
|
||||
FROM base_data;
|
||||
15
services/bot-detector/bot_detector/Dockerfile
Normal file
15
services/bot-detector/bot_detector/Dockerfile
Normal file
@ -0,0 +1,15 @@
|
||||
FROM python:3.11-slim
|
||||
ENV PYTHONDONTWRITEBYTECODE=1
|
||||
ENV PYTHONUNBUFFERED=1
|
||||
WORKDIR /app
|
||||
|
||||
# Install shared package first
|
||||
COPY shared/python/ja4_common/ /app/shared/ja4_common/
|
||||
RUN pip install --no-cache-dir /app/shared/ja4_common/
|
||||
|
||||
COPY services/bot-detector/bot_detector/requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
COPY services/bot-detector/bot_detector/bot_detector.py .
|
||||
|
||||
CMD ["python", "bot_detector.py"]
|
||||
10
services/bot-detector/bot_detector/Dockerfile.tests
Normal file
10
services/bot-detector/bot_detector/Dockerfile.tests
Normal file
@ -0,0 +1,10 @@
|
||||
FROM python:3.11-slim
|
||||
WORKDIR /app
|
||||
COPY shared/python/ja4_common/ /app/shared/ja4_common/
|
||||
RUN pip install --no-cache-dir /app/shared/ja4_common/
|
||||
COPY services/bot-detector/bot_detector/requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
RUN pip install --no-cache-dir pytest pytest-mock
|
||||
COPY services/bot-detector/bot_detector/ /app/bot_detector/
|
||||
WORKDIR /app
|
||||
CMD ["pytest", "bot_detector/tests/", "-v"]
|
||||
906
services/bot-detector/bot_detector/bot_detector.py
Normal file
906
services/bot-detector/bot_detector/bot_detector.py
Normal file
@ -0,0 +1,906 @@
|
||||
import time
|
||||
import os
|
||||
import json
|
||||
import glob
|
||||
import signal
|
||||
import sys
|
||||
import logging
|
||||
import threading
|
||||
import joblib
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
import clickhouse_connect
|
||||
from logging.handlers import RotatingFileHandler
|
||||
from http.server import HTTPServer, BaseHTTPRequestHandler
|
||||
from sklearn.ensemble import IsolationForest
|
||||
from sklearn.cluster import DBSCAN
|
||||
from sklearn.preprocessing import StandardScaler
|
||||
import warnings
|
||||
from datetime import datetime
|
||||
|
||||
try:
|
||||
import shap as _shap
|
||||
SHAP_AVAILABLE = True
|
||||
except ImportError:
|
||||
SHAP_AVAILABLE = False
|
||||
|
||||
warnings.filterwarnings('ignore')
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
# CONFIGURATION
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
def _require_float(name, default, lo=None, hi=None):
|
||||
raw = os.getenv(name, str(default))
|
||||
try:
|
||||
v = float(raw)
|
||||
except ValueError:
|
||||
raise SystemExit(f"[CONFIG] {name}={raw!r} invalide — doit être un nombre décimal.")
|
||||
if lo is not None and not (lo < v < hi):
|
||||
raise SystemExit(f"[CONFIG] {name}={v} hors plage ({lo} < valeur < {hi}).")
|
||||
return v
|
||||
|
||||
# Nom de la base de données ClickHouse
|
||||
# Note : Utilisé dans des requêtes SQL via f-string (ex: f'SELECT * FROM {DB}.view_ai_features_1h')
|
||||
# Cette variable provient uniquement de variables d'environnement contrôlées (docker-compose, K8s, etc.)
|
||||
# et n'est jamais exposée à des entrées utilisateur. Le risque d'injection SQL est considéré comme négligeable.
|
||||
DB = os.getenv('CLICKHOUSE_DB', 'mabase_prod')
|
||||
|
||||
CONTAMINATION = _require_float('ISOLATION_CONTAMINATION', 0.001, 0, 0.5)
|
||||
ANOMALY_THRESHOLD = _require_float('ANOMALY_THRESHOLD', -0.05)
|
||||
LOG_FILE = os.getenv('BOT_DETECTOR_LOG', '/var/log/bot_detector/decisions.jsonl')
|
||||
LOG_BACKUP_COUNT = int(os.getenv('LOG_BACKUP_COUNT', '7'))
|
||||
MODEL_DIR = os.getenv('MODEL_DIR', '/var/lib/bot_detector')
|
||||
RETRAIN_INTERVAL_H = int(os.getenv('RETRAIN_INTERVAL_HOURS', '24'))
|
||||
MODEL_HISTORY_COUNT = int(os.getenv('MODEL_HISTORY_COUNT', '10'))
|
||||
MAX_FAILURES = int(os.getenv('MAX_CONSECUTIVE_FAILURES', '3'))
|
||||
HEALTH_PORT = int(os.getenv('HEALTH_PORT', '8080'))
|
||||
CYCLE_INTERVAL = int(os.getenv('CYCLE_INTERVAL_SEC', '300'))
|
||||
|
||||
# ── Améliorations A1 / A2 / A3 / A4 / A5 / A6 / A7 / A8 / A10 ──────────────
|
||||
# A1 — Dérive conceptuelle (concept drift)
|
||||
DRIFT_THRESHOLD = _require_float('DRIFT_THRESHOLD', 0.30, 0, 1)
|
||||
|
||||
# A2 — Seuil adaptatif
|
||||
ANOMALY_PERCENTILE = int(os.getenv('ANOMALY_PERCENTILE', '5'))
|
||||
|
||||
# A3 — Analyse multi-fenêtres
|
||||
ENABLE_MULTIWINDOW = os.getenv('ENABLE_MULTIWINDOW', 'false').lower() == 'true'
|
||||
MULTIWINDOW_VIEW = os.getenv('MULTIWINDOW_VIEW', 'view_ai_features_24h')
|
||||
|
||||
# A4 — Explainabilité SHAP
|
||||
ENABLE_SHAP = SHAP_AVAILABLE and os.getenv('ENABLE_SHAP', 'true').lower() == 'true'
|
||||
|
||||
# A5 — Déduplication inter-cycles avec TTL
|
||||
DEDUP_TTL_MIN = int(os.getenv('DEDUP_TTL_MIN', '60'))
|
||||
|
||||
# A6 — Pondération par récurrence
|
||||
RECURRENCE_WEIGHT = _require_float('RECURRENCE_WEIGHT', 0.005)
|
||||
|
||||
# A7 — Validation de complétude des features
|
||||
MIN_VALID_FEATURE_RATIO = _require_float('MIN_VALID_FEATURE_RATIO', 0.50, 0, 1)
|
||||
|
||||
# A8 — Clustering comportemental des anomalies
|
||||
ENABLE_CLUSTERING = os.getenv('ENABLE_CLUSTERING', 'true').lower() == 'true'
|
||||
CLUSTERING_MIN_SAMPLES = int(os.getenv('CLUSTERING_MIN_SAMPLES', '3'))
|
||||
|
||||
# Features structurellement indisponibles par modèle (pas de données L4 pour trafic non-corrélé)
|
||||
# Ces features ne génèrent pas de warnings "pipeline" — leur absence est by-design.
|
||||
STRUCTURAL_EXCLUDED_FEATURES: dict[str, list] = {
|
||||
'Complet': ['orphan_ratio'],
|
||||
'Applicatif': ['orphan_ratio', 'is_rare_ja4', 'tcp_shared_count',
|
||||
'request_size_variance', 'mss_mobile_mismatch',
|
||||
# B features TLS/TCP : indisponibles pour trafic non-corrélé
|
||||
'ja3_diversity_ratio', 'syn_timing_cv', 'tls12_ratio', 'ip_df_variance',
|
||||
# L4 uniquement : TTL et window scale indisponibles sans capture TCP
|
||||
'avg_ttl', 'ttl_std', 'no_window_scale_ratio'],
|
||||
}
|
||||
|
||||
TRAINING_HISTORY_FILE = os.path.join(MODEL_DIR, 'training_history.jsonl')
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
# LOGGING
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
os.makedirs(os.path.dirname(LOG_FILE), exist_ok=True)
|
||||
os.makedirs(MODEL_DIR, exist_ok=True)
|
||||
|
||||
logger = logging.getLogger('bot_detector')
|
||||
logger.setLevel(logging.DEBUG)
|
||||
|
||||
_console_handler = logging.StreamHandler()
|
||||
_console_handler.setFormatter(logging.Formatter('[%(asctime)s] %(message)s', '%Y-%m-%d %H:%M:%S'))
|
||||
logger.addHandler(_console_handler)
|
||||
|
||||
_file_handler = RotatingFileHandler(
|
||||
LOG_FILE, maxBytes=50 * 1024 * 1024, backupCount=LOG_BACKUP_COUNT, encoding='utf-8'
|
||||
)
|
||||
_file_handler.setFormatter(logging.Formatter('%(message)s'))
|
||||
logger.addHandler(_file_handler)
|
||||
|
||||
# Wrapper court pour homogénéiser les appels de logging (évite d'importer logger partout).
|
||||
def log_info(message: str):
|
||||
logger.info(message)
|
||||
|
||||
def log_decision(event: str, cycle_id: str, model: str = '', row: dict = None):
|
||||
entry = {
|
||||
'ts': datetime.now().strftime('%Y-%m-%dT%H:%M:%S'),
|
||||
'cycle_id': cycle_id,
|
||||
'event': event,
|
||||
'model': model,
|
||||
'contamination': CONTAMINATION,
|
||||
'threshold': ANOMALY_THRESHOLD,
|
||||
}
|
||||
if row:
|
||||
entry.update(row)
|
||||
_file_handler.stream.write(json.dumps(entry, ensure_ascii=False, default=str) + '\n')
|
||||
_file_handler.stream.flush()
|
||||
|
||||
def _append_training_history(entry: dict):
|
||||
with open(TRAINING_HISTORY_FILE, 'a', encoding='utf-8') as f:
|
||||
f.write(json.dumps(entry, ensure_ascii=False, default=str) + '\n')
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
# ARRÊT PROPRE ET HEALTH CHECK
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
def _shutdown(sig, frame):
|
||||
log_info(f"Signal {sig} reçu — arrêt propre.")
|
||||
log_decision('SERVICE_STOP', 'shutdown', '', {'signal': sig})
|
||||
sys.exit(0)
|
||||
|
||||
signal.signal(signal.SIGTERM, _shutdown)
|
||||
signal.signal(signal.SIGINT, _shutdown)
|
||||
|
||||
_service_healthy = True
|
||||
class _HealthHandler(BaseHTTPRequestHandler):
|
||||
def do_GET(self):
|
||||
code = 200 if _service_healthy else 503
|
||||
self.send_response(code)
|
||||
self.end_headers()
|
||||
self.wfile.write(b'OK' if _service_healthy else b'DEGRADED')
|
||||
def log_message(self, *args): pass
|
||||
|
||||
threading.Thread(
|
||||
target=lambda: HTTPServer(('', HEALTH_PORT), _HealthHandler).serve_forever(),
|
||||
daemon=True
|
||||
).start()
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
# CONNEXION CLICKHOUSE — delegated to ja4_common shared client
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
from ja4_common.clickhouse import get_client as _ja4_get_client
|
||||
|
||||
def get_client():
|
||||
"""Return the shared ja4_common ClickHouse client, reconnecting on ping failure."""
|
||||
return _ja4_get_client().connect()
|
||||
|
||||
def score_to_threat_level(score: float) -> str:
|
||||
# Seuils : CRITICAL < -0.30 | HIGH < -0.15 | MEDIUM < -0.05 | LOW < 0 | NORMAL ≥ 0
|
||||
if score < -0.30: return 'CRITICAL'
|
||||
if score < -0.15: return 'HIGH'
|
||||
if score < -0.05: return 'MEDIUM'
|
||||
if score < 0: return 'LOW'
|
||||
return 'NORMAL'
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
# GESTION DES MODÈLES
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
def _current_pointer_path(name: str) -> str:
|
||||
return os.path.join(MODEL_DIR, f'model_{name}.current')
|
||||
|
||||
def _get_current_version(name: str):
|
||||
pointer = _current_pointer_path(name)
|
||||
if not os.path.exists(pointer): return None, None
|
||||
with open(pointer) as f: version_id = f.read().strip()
|
||||
model_path = os.path.join(MODEL_DIR, f'model_{name}_{version_id}.joblib')
|
||||
meta_path = os.path.join(MODEL_DIR, f'model_{name}_{version_id}.meta.json')
|
||||
if not os.path.exists(model_path) or not os.path.exists(meta_path): return None, None
|
||||
with open(meta_path) as f: meta = json.load(f)
|
||||
return model_path, meta
|
||||
|
||||
def _purge_old_versions(name: str):
|
||||
pattern = os.path.join(MODEL_DIR, f'model_{name}_*.joblib')
|
||||
versions = sorted(glob.glob(pattern))
|
||||
to_delete = versions[:-MODEL_HISTORY_COUNT] if len(versions) > MODEL_HISTORY_COUNT else []
|
||||
for joblib_path in to_delete:
|
||||
version_id = os.path.basename(joblib_path).replace(f'model_{name}_', '').replace('.joblib', '')
|
||||
meta_path = os.path.join(MODEL_DIR, f'model_{name}_{version_id}.meta.json')
|
||||
os.remove(joblib_path)
|
||||
if os.path.exists(meta_path): os.remove(meta_path)
|
||||
log_info(f"[{name}] Version purgée : {version_id} (limite={MODEL_HISTORY_COUNT})")
|
||||
|
||||
def load_or_train_model(name: str, human_baseline: pd.DataFrame, features: list, cycle_id: str):
|
||||
model_path, meta = _get_current_version(name)
|
||||
if model_path and meta:
|
||||
trained_at = datetime.fromisoformat(meta['trained_at'])
|
||||
age_h = (datetime.now() - trained_at).total_seconds() / 3600
|
||||
age_ok = age_h < RETRAIN_INTERVAL_H
|
||||
|
||||
# A1 — Dérive conceptuelle : comparer la distribution actuelle avec celle de l'entraînement
|
||||
drift_score = 0.0
|
||||
drift_forced = False
|
||||
if age_ok and 'baseline_stats' in meta:
|
||||
drift_score = _compute_drift_score(meta['baseline_stats'], human_baseline, features)
|
||||
if drift_score >= DRIFT_THRESHOLD:
|
||||
drift_forced = True
|
||||
log_info(f"[{name}] Dérive détectée ({drift_score:.0%} features) — retraining forcé.")
|
||||
log_decision('DRIFT_DETECTED', cycle_id, name, {
|
||||
'version_id': meta['version_id'], 'drift_score': round(drift_score, 3),
|
||||
'drift_threshold': DRIFT_THRESHOLD, 'model_age_hours': round(age_h, 2)
|
||||
})
|
||||
|
||||
if age_ok and not drift_forced:
|
||||
log_info(f"[{name}] Modèle v{meta['version_id']} valide ({age_h:.1f}h / {RETRAIN_INTERVAL_H}h, drift={drift_score:.0%}) — réutilisation.")
|
||||
log_decision('MODEL_LOADED', cycle_id, name, {
|
||||
'version_id': meta['version_id'], 'model_age_hours': round(age_h, 2),
|
||||
'trained_at': meta['trained_at'], 'human_samples': meta.get('human_samples', '?'),
|
||||
'retrain_in_hours': round(RETRAIN_INTERVAL_H - age_h, 1), 'drift_score': round(drift_score, 3)
|
||||
})
|
||||
return joblib.load(model_path)
|
||||
elif not drift_forced:
|
||||
log_info(f"[{name}] Modèle v{meta['version_id']} expiré ({age_h:.1f}h ≥ {RETRAIN_INTERVAL_H}h) — retraining.")
|
||||
|
||||
version_id = datetime.now().strftime('%Y%m%d_%H%M%S')
|
||||
log_info(f"[{name}] Entraînement version {version_id} sur {len(human_baseline)} sessions humaines... (contamination={CONTAMINATION})")
|
||||
|
||||
X = human_baseline[features].replace([np.inf, -np.inf], np.nan).fillna(0)
|
||||
model = IsolationForest(n_estimators=300, contamination=CONTAMINATION, random_state=42, n_jobs=-1)
|
||||
model.fit(X)
|
||||
|
||||
# A1 — Sauvegarder les statistiques de distribution de la baseline pour la détection de dérive future
|
||||
baseline_stats = {
|
||||
f: {'mean': float(X[f].mean()), 'std': float(X[f].std()), 'p25': float(X[f].quantile(0.25)), 'p75': float(X[f].quantile(0.75))}
|
||||
for f in features
|
||||
}
|
||||
|
||||
new_model_path = os.path.join(MODEL_DIR, f'model_{name}_{version_id}.joblib')
|
||||
new_meta_path = os.path.join(MODEL_DIR, f'model_{name}_{version_id}.meta.json')
|
||||
joblib.dump(model, new_model_path)
|
||||
|
||||
previous_version = meta.get('version_id', None) if meta else None
|
||||
new_meta = {
|
||||
'version_id': version_id, 'trained_at': datetime.now().isoformat(),
|
||||
'human_samples': len(human_baseline), 'contamination': CONTAMINATION,
|
||||
'threshold': ANOMALY_THRESHOLD, 'features': features,
|
||||
'model_name': name, 'previous_version': previous_version,
|
||||
'retrain_interval': RETRAIN_INTERVAL_H, 'baseline_stats': baseline_stats
|
||||
}
|
||||
with open(new_meta_path, 'w') as f: json.dump(new_meta, f, indent=2)
|
||||
with open(_current_pointer_path(name), 'w') as f: f.write(version_id)
|
||||
|
||||
_append_training_history({k: v for k, v in new_meta.items() if k != 'baseline_stats'})
|
||||
_purge_old_versions(name)
|
||||
|
||||
log_info(f"[{name}] Modèle v{version_id} sauvegardé → {new_model_path}")
|
||||
log_decision('MODEL_TRAINED', cycle_id, name, {
|
||||
'version_id': version_id, 'previous_version': previous_version,
|
||||
'human_samples': len(human_baseline), 'next_retrain_in_h': RETRAIN_INTERVAL_H,
|
||||
'history_kept': MODEL_HISTORY_COUNT
|
||||
})
|
||||
return model
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
# A1 — DÉTECTION DE DÉRIVE CONCEPTUELLE (CONCEPT DRIFT)
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
def _compute_drift_score(baseline_stats: dict, current_baseline: pd.DataFrame, features: list) -> float:
|
||||
"""
|
||||
Compare la distribution actuelle de la baseline humaine avec celle utilisée à l'entraînement.
|
||||
Utilise un test de Kolmogorov-Smirnov par feature. Retourne la fraction de features déroutantes.
|
||||
Une valeur >= DRIFT_THRESHOLD déclenche un retraining forcé.
|
||||
"""
|
||||
if not baseline_stats or current_baseline.empty:
|
||||
return 0.0
|
||||
drifted = 0
|
||||
tested = 0
|
||||
for feat in features:
|
||||
if feat not in baseline_stats or feat not in current_baseline.columns:
|
||||
continue
|
||||
stats = baseline_stats[feat]
|
||||
curr_mean = current_baseline[feat].mean()
|
||||
trained_std = stats.get('std', 0)
|
||||
if trained_std < 1e-9:
|
||||
continue
|
||||
# Z-score : écart entre la moyenne actuelle et celle de l'entraînement
|
||||
z = abs(curr_mean - stats['mean']) / trained_std
|
||||
# Un z > 2 indique une dérive significative de la distribution
|
||||
if z > 2.0:
|
||||
drifted += 1
|
||||
tested += 1
|
||||
return drifted / max(tested, 1)
|
||||
|
||||
|
||||
# Cache par modèle conservant le dernier état des features invalides.
|
||||
# Permet de supprimer les logs répétitifs : on ne loggue que si l'état a changé depuis le cycle précédent.
|
||||
_feature_warning_cache: dict = {}
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
# A7 — VALIDATION DE COMPLÉTUDE DES FEATURES
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
def validate_features(df: pd.DataFrame, features: list, name: str, cycle_id: str):
|
||||
"""
|
||||
Vérifie que les features sont présentes et non constantes dans le DataFrame.
|
||||
Catégorise les features invalides :
|
||||
- structural : absente par design pour ce modèle (défini dans STRUCTURAL_EXCLUDED_FEATURES)
|
||||
- zero : colonne toujours à 0 — problème de pipeline
|
||||
- unique : colonne avec une seule valeur non-nulle — agrégat global non discriminant
|
||||
- missing : colonne absente du DataFrame
|
||||
Retourne la liste des features valides, ou None si trop de features sont invalides.
|
||||
Les avertissements ne sont logués que si l'état a changé depuis le cycle précédent
|
||||
(grâce à _feature_warning_cache), pour éviter de polluer les logs à chaque cycle.
|
||||
"""
|
||||
structural = STRUCTURAL_EXCLUDED_FEATURES.get(name, [])
|
||||
# Exclure les features structurelles d'emblée (sans warning pipeline)
|
||||
active_features = [f for f in features if f not in structural]
|
||||
|
||||
missing = [f for f in active_features if f not in df.columns]
|
||||
present = [f for f in active_features if f in df.columns]
|
||||
|
||||
zero_val = [f for f in present if df[f].nunique() == 1 and df[f].max() == 0]
|
||||
unique_val = [f for f in present if df[f].nunique() == 1 and df[f].max() != 0]
|
||||
constant = zero_val + unique_val
|
||||
valid = [f for f in present if f not in constant]
|
||||
|
||||
current_state = (frozenset(missing), frozenset(zero_val), frozenset(unique_val))
|
||||
state_changed = _feature_warning_cache.get(name) != current_state
|
||||
_feature_warning_cache[name] = current_state
|
||||
|
||||
if structural:
|
||||
log_info(f"[{name}] Features exclues (structurelles / L4 indisponible) : {structural}")
|
||||
# Ne logguer les avertissements que si l'état a changé (nouveau problème ou résolution)
|
||||
if state_changed:
|
||||
if missing:
|
||||
log_info(f"[{name}] Features absentes du schéma : {missing}")
|
||||
if zero_val:
|
||||
log_info(f"[{name}] Features à 0 (pipeline non-alimenté) : {zero_val}")
|
||||
if unique_val:
|
||||
log_info(f"[{name}] Features non-discriminantes (agrégat global) : {unique_val}")
|
||||
if missing or zero_val or unique_val:
|
||||
log_decision('FEATURE_WARNING', cycle_id, name, {
|
||||
'structural': structural, 'missing': missing,
|
||||
'zero': zero_val, 'unique_nonzero': unique_val,
|
||||
'valid_count': len(valid), 'total': len(active_features)
|
||||
})
|
||||
|
||||
ratio = len(valid) / max(len(active_features), 1)
|
||||
if ratio < MIN_VALID_FEATURE_RATIO:
|
||||
log_info(f"[{name}] Ratio features valides insuffisant ({ratio:.0%} < {MIN_VALID_FEATURE_RATIO:.0%}) — cycle ignoré.")
|
||||
log_decision('SKIPPED_INVALID_FEATURES', cycle_id, name, {
|
||||
'valid_ratio': round(ratio, 3), 'threshold': MIN_VALID_FEATURE_RATIO
|
||||
})
|
||||
return None
|
||||
return valid
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
# A2 / A10 — SEUIL ADAPTATIF ET NORMALISATION DES SCORES
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
def compute_adaptive_threshold(scores: np.ndarray) -> float:
|
||||
"""
|
||||
A2 : Calcule un seuil adaptatif basé sur le percentile ANOMALY_PERCENTILE des scores négatifs.
|
||||
Retourne le min entre le seuil adaptatif et le seuil statique configuré.
|
||||
"""
|
||||
neg_scores = scores[scores < 0]
|
||||
if len(neg_scores) == 0:
|
||||
return ANOMALY_THRESHOLD
|
||||
adaptive = float(np.percentile(neg_scores, ANOMALY_PERCENTILE))
|
||||
return min(adaptive, ANOMALY_THRESHOLD)
|
||||
|
||||
|
||||
def normalize_scores(scores: np.ndarray) -> np.ndarray:
|
||||
"""
|
||||
A10 : Normalise les scores négatifs en [−1, 0] pour comparer des modèles différents.
|
||||
Les scores positifs (trafic normal) restent inchangés.
|
||||
|
||||
Attention : la formule mappe le score le PLUS négatif (plus anomaleux) vers 0
|
||||
et le score le MOINS négatif (moins anomaleux) vers −1.
|
||||
Ce résultat counter-intuitif est intentionnel : anomaly_score n'est utilisé qu'à titre
|
||||
indicatif dans les tables de résultats. Les décisions réelles s'appuient sur raw_anomaly_score.
|
||||
"""
|
||||
result = scores.copy()
|
||||
mask = scores < 0
|
||||
if mask.sum() == 0:
|
||||
return result
|
||||
s_min, s_max = scores[mask].min(), scores[mask].max()
|
||||
if s_min == s_max:
|
||||
return result
|
||||
result[mask] = (scores[mask] - s_min) / (s_max - s_min + 1e-9) * -1
|
||||
return result
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
# A4 — EXPLAINABILITÉ PAR SHAP
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
def _compute_shap_top_features(model, X: pd.DataFrame, features: list, n_top: int = 5) -> list:
|
||||
"""
|
||||
Calcule les valeurs SHAP pour chaque ligne de X et retourne les n_top features
|
||||
les plus contributives (valeur SHAP la plus négative = plus responsable de l'anomalie).
|
||||
Retourne une liste de dicts {feature: shap_value} par ligne.
|
||||
"""
|
||||
if not ENABLE_SHAP or X.empty:
|
||||
return [{}] * len(X)
|
||||
try:
|
||||
explainer = _shap.TreeExplainer(model)
|
||||
shap_values = explainer.shap_values(X)
|
||||
result = []
|
||||
for sv in shap_values:
|
||||
# Features les plus négatives = les plus responsables de l'anomalie
|
||||
pairs = sorted(zip(features, sv), key=lambda x: x[1])
|
||||
result.append({f: round(float(v), 4) for f, v in pairs[:n_top]})
|
||||
return result
|
||||
except Exception as e:
|
||||
log_info(f"[SHAP] Erreur de calcul SHAP: {e}")
|
||||
return [{}] * len(X)
|
||||
|
||||
|
||||
def _build_reason(name: str, row: pd.Series, shap_top: dict) -> str:
|
||||
"""Construit le champ reason enrichi avec le top SHAP ou les métriques clés."""
|
||||
# Utilise le score brut pour l'affichage (plus interprétable que le score normalisé)
|
||||
score = round(float(row.get('raw_anomaly_score', row.get('anomaly_score', 0))), 3)
|
||||
threat = row.get('threat_level', '')
|
||||
if shap_top:
|
||||
top_str = ' | '.join(f"{f}({v:+.3f})" for f, v in shap_top.items())
|
||||
return f"[{name}] Score: {score} | SHAP: {top_str} | Threat: {threat}"
|
||||
vel = round(float(row.get('hit_velocity', 0)), 1)
|
||||
fuzz = round(float(row.get('fuzzing_index', 0)), 1)
|
||||
return f"[{name}] Score: {score} | Vel: {vel} req/s | Fuzzing: {fuzz} | Threat: {threat}"
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
# A8 — CLUSTERING COMPORTEMENTAL DES ANOMALIES (DBSCAN)
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
def _cluster_anomalies(anomalies: pd.DataFrame, features: list) -> pd.DataFrame:
|
||||
"""
|
||||
A8 : Applique DBSCAN sur les features normalisées des anomalies.
|
||||
Ajoute une colonne campaign_id : −1 = IP isolée, ≥0 = identifiant de campagne coordonnée.
|
||||
"""
|
||||
anomalies = anomalies.copy()
|
||||
if len(anomalies) < CLUSTERING_MIN_SAMPLES:
|
||||
anomalies['campaign_id'] = -1
|
||||
return anomalies
|
||||
try:
|
||||
X = anomalies[features].replace([np.inf, -np.inf], np.nan).fillna(0)
|
||||
X_scaled = StandardScaler().fit_transform(X)
|
||||
labels = DBSCAN(eps=0.5, min_samples=CLUSTERING_MIN_SAMPLES).fit_predict(X_scaled)
|
||||
anomalies['campaign_id'] = labels
|
||||
n_campaigns = len(set(labels)) - (1 if -1 in labels else 0)
|
||||
if n_campaigns > 0:
|
||||
log_info(f"[DBSCAN] {n_campaigns} campagne(s) détectée(s) parmi {len(anomalies)} anomalies.")
|
||||
except Exception as e:
|
||||
log_info(f"[DBSCAN] Erreur de clustering: {e}")
|
||||
anomalies['campaign_id'] = -1
|
||||
return anomalies
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
# ANALYSE SEMI-SUPERVISÉE
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
def run_semi_supervised_logic(df, features, name, cycle_id, recurrence_map):
|
||||
# ── Trifurcation du trafic selon bot_name et Anubis ─────────────────────
|
||||
# 1. Bots connus (dict_bot_ip / dict_bot_ja4) → exclus du scoring IF
|
||||
known_bots = df[df['bot_name'] != ''].copy()
|
||||
rest = df[df['bot_name'] == ''].copy()
|
||||
|
||||
# 2. Bots Anubis ALLOW → bots légitimes, exclus du scoring IF
|
||||
anubis_allow = rest[rest['anubis_bot_action'] == 'ALLOW'].copy()
|
||||
|
||||
# 3. Tout le reste passe par l'IsolationForest pour un score réel :
|
||||
# - DENY : menaces identifiées par règles Anubis → IF donne le score de sévérité
|
||||
# - WEIGH / inconnu → scorés normalement (anubis_is_flagged=1 pour WEIGH)
|
||||
# Les DENY sont TOUJOURS inclus dans les threats, indépendamment du seuil IF.
|
||||
unknown_traffic = rest[rest['anubis_bot_action'] != 'ALLOW'].copy()
|
||||
human_baseline = unknown_traffic[unknown_traffic['asn_label'] == 'human']
|
||||
|
||||
# A7 — Valider les features avant tout traitement
|
||||
valid_features = validate_features(df, features, name, cycle_id)
|
||||
if valid_features is None:
|
||||
return pd.DataFrame(), pd.DataFrame()
|
||||
|
||||
if len(human_baseline) < 500:
|
||||
log_info(f"[{name}] Données humaines insuffisantes ({len(human_baseline)} < 500).")
|
||||
log_decision('SKIPPED_LOW_DATA', cycle_id, name, {
|
||||
'human_count': len(human_baseline), 'unknown_count': len(unknown_traffic)
|
||||
})
|
||||
return pd.DataFrame(), pd.DataFrame()
|
||||
|
||||
# A1 — Dérive conceptuelle intégrée dans load_or_train_model
|
||||
model = load_or_train_model(name, human_baseline, valid_features, cycle_id)
|
||||
unknown_traffic = unknown_traffic.copy()
|
||||
|
||||
X_test = unknown_traffic[valid_features].replace([np.inf, -np.inf], np.nan).fillna(0)
|
||||
raw_scores = model.decision_function(X_test)
|
||||
|
||||
# raw_anomaly_score : score brut IF pour comparaison au seuil et assignation du threat_level
|
||||
# anomaly_score : score normalisé [-1, 0] pour cohérence cross-modèles (A10)
|
||||
unknown_traffic['raw_anomaly_score'] = raw_scores
|
||||
unknown_traffic['anomaly_score'] = normalize_scores(raw_scores)
|
||||
unknown_traffic['model_name'] = name
|
||||
|
||||
# A2 — Seuil adaptatif calculé sur les scores BRUTS (même échelle que ANOMALY_THRESHOLD)
|
||||
effective_threshold = compute_adaptive_threshold(raw_scores)
|
||||
log_info(f"[{name}] Seuil effectif : {effective_threshold:.4f} (statique={ANOMALY_THRESHOLD}, percentile={ANOMALY_PERCENTILE})")
|
||||
|
||||
# A6 — Pénaliser les IPs récurrentes sur le score BRUT avant comparaison au seuil
|
||||
if RECURRENCE_WEIGHT > 0:
|
||||
recurrences = unknown_traffic['src_ip'].map(recurrence_map).fillna(0)
|
||||
penalty = np.log1p(recurrences.values) * RECURRENCE_WEIGHT
|
||||
unknown_traffic['raw_anomaly_score'] = unknown_traffic['raw_anomaly_score'] - penalty
|
||||
|
||||
# Assigner threat_level à TOUTES les sessions scorées (pour ml_all_scores)
|
||||
unknown_traffic['threat_level'] = unknown_traffic['raw_anomaly_score'].apply(score_to_threat_level)
|
||||
unknown_traffic['recurrence'] = unknown_traffic['src_ip'].map(recurrence_map).fillna(0).astype(int) + 1
|
||||
unknown_traffic['campaign_id'] = -1
|
||||
|
||||
# Extraire les DENY (maintenant avec leur vrai score IF) et forcer leur threat_level
|
||||
deny_mask = unknown_traffic['anubis_bot_action'] == 'DENY'
|
||||
unknown_traffic.loc[deny_mask, 'threat_level'] = 'ANUBIS_DENY'
|
||||
|
||||
# Capturer toutes les sessions scorées (avant filtrage par seuil) — pour ml_all_scores
|
||||
all_scored = unknown_traffic.copy()
|
||||
|
||||
if not known_bots.empty:
|
||||
known_bots = known_bots.copy()
|
||||
known_bots['anomaly_score'] = 0.0
|
||||
known_bots['raw_anomaly_score'] = 0.0
|
||||
known_bots['threat_level'] = 'KNOWN_BOT'
|
||||
known_bots['model_name'] = name
|
||||
known_bots['campaign_id'] = -1
|
||||
known_bots['reason'] = '[Identification] Bot légitime: ' + known_bots['bot_name']
|
||||
known_bots['recurrence'] = known_bots['src_ip'].map(recurrence_map).fillna(0).astype(int) + 1
|
||||
for _, row in known_bots.iterrows():
|
||||
log_decision('KNOWN_BOT', cycle_id, name, {
|
||||
'src_ip': row.get('src_ip', ''), 'bot_name': row.get('bot_name', ''),
|
||||
'asn_number': row.get('asn_number', ''), 'asn_org': row.get('asn_org', ''),
|
||||
'asn_domain': row.get('asn_domain', ''), 'country_code': row.get('country_code', ''),
|
||||
'recurrence': int(row.get('recurrence', 1))
|
||||
})
|
||||
|
||||
# ── Anubis ALLOW : bots légitimes identifiés par règles Anubis ───────────
|
||||
if not anubis_allow.empty:
|
||||
anubis_allow = anubis_allow.copy()
|
||||
anubis_allow['anomaly_score'] = 0.0
|
||||
anubis_allow['raw_anomaly_score'] = 0.0
|
||||
anubis_allow['threat_level'] = 'KNOWN_BOT'
|
||||
anubis_allow['model_name'] = name
|
||||
anubis_allow['campaign_id'] = -1
|
||||
anubis_allow['reason'] = '[Anubis ALLOW] ' + anubis_allow['anubis_bot_name']
|
||||
anubis_allow['recurrence'] = anubis_allow['src_ip'].map(recurrence_map).fillna(0).astype(int) + 1
|
||||
for _, row in anubis_allow.iterrows():
|
||||
log_decision('KNOWN_BOT', cycle_id, name, {
|
||||
'src_ip': row.get('src_ip', ''), 'bot_name': row.get('anubis_bot_name', ''),
|
||||
'anubis_bot_name': row.get('anubis_bot_name', ''),
|
||||
'anubis_bot_action': row.get('anubis_bot_action', ''),
|
||||
'anubis_bot_category': row.get('anubis_bot_category', ''),
|
||||
'asn_number': row.get('asn_number', ''), 'asn_org': row.get('asn_org', ''),
|
||||
'asn_domain': row.get('asn_domain', ''), 'country_code': row.get('country_code', ''),
|
||||
'recurrence': int(row.get('recurrence', 1)),
|
||||
})
|
||||
|
||||
# ── Anubis DENY : scorés par IF, toujours inclus dans les threats ────────
|
||||
# Extraits de unknown_traffic après scoring — ils ont leur vrai score IF.
|
||||
anubis_deny = unknown_traffic[deny_mask].copy()
|
||||
if not anubis_deny.empty:
|
||||
anubis_deny['reason'] = '[Anubis DENY] ' + anubis_deny['anubis_bot_name'].fillna('') + \
|
||||
' | ' + anubis_deny['raw_anomaly_score'].apply(lambda s: f'IF={s:.4f}')
|
||||
log_info(f"[{name}] Anubis DENY: {len(anubis_deny)} IP(s) scorées par IF "
|
||||
f"(score moyen: {anubis_deny['raw_anomaly_score'].mean():.4f}).")
|
||||
for _, row in anubis_deny.iterrows():
|
||||
log_decision('ANUBIS_DENY', cycle_id, name, {
|
||||
'src_ip': row.get('src_ip', ''), 'anubis_bot_name': row.get('anubis_bot_name', ''),
|
||||
'anubis_bot_action': row.get('anubis_bot_action', ''),
|
||||
'anubis_bot_category': row.get('anubis_bot_category', ''),
|
||||
'anomaly_score': round(float(row.get('anomaly_score', 0)), 4),
|
||||
'raw_anomaly_score': round(float(row.get('raw_anomaly_score', 0)), 4),
|
||||
'asn_number': row.get('asn_number', ''), 'asn_org': row.get('asn_org', ''),
|
||||
'asn_domain': row.get('asn_domain', ''), 'country_code': row.get('country_code', ''),
|
||||
'recurrence': int(row.get('recurrence', 1)),
|
||||
})
|
||||
|
||||
# Filtrer sur raw_anomaly_score (A6 inclus) — seulement le trafic non-DENY
|
||||
# Les DENY sont toujours des threats, indépendamment du seuil IF
|
||||
non_deny_traffic = unknown_traffic[~deny_mask]
|
||||
anomalies = non_deny_traffic[non_deny_traffic['raw_anomaly_score'] < effective_threshold].copy()
|
||||
if not anomalies.empty:
|
||||
log_info(f"[{name}] ALERT: {len(anomalies)} anomalies détectées (seuil={effective_threshold:.4f}).")
|
||||
anomalies['recurrence'] = anomalies['src_ip'].map(recurrence_map).fillna(0).astype(int) + 1
|
||||
|
||||
# A4 — Explainabilité SHAP : top features responsables de chaque anomalie
|
||||
X_anomalies = X_test.loc[anomalies.index]
|
||||
shap_tops = _compute_shap_top_features(model, X_anomalies, valid_features)
|
||||
anomalies['reason'] = [
|
||||
_build_reason(name, row, shap)
|
||||
for (_, row), shap in zip(anomalies.iterrows(), shap_tops)
|
||||
]
|
||||
|
||||
# A8 — Clustering DBSCAN pour identifier les campagnes coordonnées
|
||||
if ENABLE_CLUSTERING:
|
||||
anomalies = _cluster_anomalies(anomalies, valid_features)
|
||||
|
||||
anomalies['ja4'] = anomalies['ja4'].replace({'': 'HTTP_CLEAR_TEXT'})
|
||||
for _, row in anomalies.iterrows():
|
||||
log_decision('ANOMALY', cycle_id, name, {
|
||||
'src_ip': row.get('src_ip', ''), 'anomaly_score': round(float(row.get('anomaly_score', 0)), 4),
|
||||
'raw_anomaly_score': round(float(row.get('raw_anomaly_score', 0)), 4),
|
||||
'threat_level': row.get('threat_level', ''), 'recurrence': int(row.get('recurrence', 1)),
|
||||
'hit_velocity': round(float(row.get('hit_velocity', 0)), 2),
|
||||
'fuzzing_index': round(float(row.get('fuzzing_index', 0)), 2),
|
||||
'post_ratio': round(float(row.get('post_ratio', 0)), 3),
|
||||
'asn_number': row.get('asn_number', ''), 'asn_org': row.get('asn_org', ''),
|
||||
'asn_detail': row.get('asn_detail', ''), 'asn_domain': row.get('asn_domain', ''),
|
||||
'country_code': row.get('country_code', ''), 'asn_label': row.get('asn_label', ''),
|
||||
'ja4': row.get('ja4', ''), 'host': row.get('host', ''),
|
||||
'correlated': int(row.get('correlated', 0)), 'campaign_id': int(row.get('campaign_id', -1)),
|
||||
'effective_threshold': round(effective_threshold, 4), 'reason': row.get('reason', '')
|
||||
})
|
||||
|
||||
threats = pd.concat([df for df in [
|
||||
anomalies if not anomalies.empty else None,
|
||||
known_bots if not known_bots.empty else None,
|
||||
anubis_allow if not anubis_allow.empty else None,
|
||||
anubis_deny if not anubis_deny.empty else None,
|
||||
] if df is not None], ignore_index=True)
|
||||
|
||||
# Inclure anubis_allow dans all_scored pour traçabilité dans ml_all_scores.
|
||||
# Ces IPs sont exclues de l'analyse IF mais doivent apparaître dans la table
|
||||
# de scores avec threat_level='KNOWN_BOT' et anomaly_score=0.0.
|
||||
if not anubis_allow.empty:
|
||||
all_scored = pd.concat([all_scored, anubis_allow], ignore_index=True)
|
||||
|
||||
return threats, all_scored
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
# A5 — DÉDUPLICATION INTER-CYCLES AVEC TTL
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
def _filter_recent_detections(client, all_anom: pd.DataFrame) -> pd.DataFrame:
|
||||
"""
|
||||
A5 : Filtre les IPs déjà insérées dans ml_detected_anomalies dans les DEDUP_TTL_MIN dernières minutes.
|
||||
Exception : une IP est réinsérée si son nouveau score est ≥ 0.05 points plus bas (aggravation).
|
||||
"""
|
||||
if DEDUP_TTL_MIN <= 0 or all_anom.empty:
|
||||
return all_anom
|
||||
try:
|
||||
recent_df = client.query_df(
|
||||
f"SELECT src_ip, min(anomaly_score) AS best_score "
|
||||
f"FROM {DB}.ml_detected_anomalies "
|
||||
f"WHERE detected_at > now() - INTERVAL {DEDUP_TTL_MIN} MINUTE "
|
||||
f"GROUP BY src_ip"
|
||||
)
|
||||
if recent_df.empty:
|
||||
return all_anom
|
||||
recent_map = dict(zip(recent_df['src_ip'], recent_df['best_score']))
|
||||
def _should_insert(row):
|
||||
prev = recent_map.get(row['src_ip'])
|
||||
if prev is None:
|
||||
return True
|
||||
# Réinsérer seulement si le score brut s'est significativement aggravé
|
||||
return float(row.get('raw_anomaly_score', row['anomaly_score'])) < float(prev) - 0.05
|
||||
mask = all_anom.apply(_should_insert, axis=1)
|
||||
filtered = all_anom[mask]
|
||||
skipped = len(all_anom) - len(filtered)
|
||||
if skipped > 0:
|
||||
log_info(f"[Dedup TTL={DEDUP_TTL_MIN}min] {skipped} IP(s) filtrée(s) (déjà détectées récemment).")
|
||||
return filtered
|
||||
except Exception as e:
|
||||
log_info(f"[Dedup] Erreur lors de la déduplication TTL : {e}")
|
||||
return all_anom
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
# A3 — ANALYSE MULTI-FENÊTRES : PRÉTRAITEMENT COMMUN
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
def _preprocess_df(df: pd.DataFrame) -> pd.DataFrame:
|
||||
"""Normalise les colonnes et remplit les valeurs manquantes (commun 1h et 24h)."""
|
||||
df.columns = [c.split('.')[-1] for c in df.columns]
|
||||
for col in ['src_ip', 'ja4', 'host', 'bot_name', 'anubis_bot_name', 'anubis_bot_action', 'anubis_bot_category',
|
||||
'asn_number', 'asn_org', 'asn_detail', 'asn_domain', 'country_code', 'asn_label']:
|
||||
if col in df.columns:
|
||||
df[col] = df[col].fillna('').astype(str)
|
||||
df.fillna(0, inplace=True)
|
||||
|
||||
# ── Features numériques dérivées des labels Anubis (pour IsolationForest) ──
|
||||
# anubis_is_flagged : 1 si le trafic est marqué WEIGH/CHALLENGE par Anubis
|
||||
# → signal de suspicion modéré passé à l'IF (ALLOW/DENY sont exclus du pipeline)
|
||||
df['anubis_is_flagged'] = (
|
||||
(df.get('anubis_bot_name', pd.Series('', index=df.index)) != '') &
|
||||
(~df.get('anubis_bot_action', pd.Series('', index=df.index)).isin(['ALLOW', 'DENY', '']))
|
||||
).astype(int)
|
||||
|
||||
return df
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
# CYCLE PRINCIPAL
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
_consecutive_failures = 0
|
||||
def fetch_and_analyze():
|
||||
global _service_healthy, _consecutive_failures
|
||||
cycle_id = datetime.now().strftime('%Y%m%d_%H%M%S')
|
||||
log_info('=== Lancement cycle IA ===')
|
||||
|
||||
client = get_client()
|
||||
|
||||
# ── Récupération du trafic (fenêtre 1h) ──────────────────────────────────
|
||||
try:
|
||||
df = client.query_df(f'SELECT * FROM {DB}.view_ai_features_1h')
|
||||
except Exception as e:
|
||||
log_info(f'ERREUR REQUETE: {e}')
|
||||
_consecutive_failures += 1
|
||||
if _consecutive_failures >= MAX_FAILURES:
|
||||
_service_healthy = False
|
||||
log_decision('CONSECUTIVE_FAILURES', cycle_id, '', {'count': _consecutive_failures, 'error': str(e)})
|
||||
return
|
||||
|
||||
_consecutive_failures = 0
|
||||
_service_healthy = True
|
||||
|
||||
if df is None or df.empty:
|
||||
log_info('Aucun trafic trouvé.')
|
||||
return
|
||||
|
||||
df = _preprocess_df(df)
|
||||
|
||||
log_decision('CYCLE_START', cycle_id, '', {
|
||||
'total_rows': len(df),
|
||||
'human_rows': int((df.get('asn_label', pd.Series()) == 'human').sum()),
|
||||
'known_bot_rows': int((df.get('bot_name', pd.Series()) != '').sum()),
|
||||
'correlated_rows': int((df.get('correlated', pd.Series()) == 1).sum()),
|
||||
'anubis_allow_rows': int((df.get('anubis_bot_action', pd.Series()) == 'ALLOW').sum()),
|
||||
'anubis_deny_rows': int((df.get('anubis_bot_action', pd.Series()) == 'DENY').sum()),
|
||||
'anubis_weigh_rows': int((df.get('anubis_bot_action', pd.Series()) == 'WEIGH').sum()),
|
||||
'multiwindow': ENABLE_MULTIWINDOW,
|
||||
})
|
||||
|
||||
try:
|
||||
rec_df = client.query_df(f'SELECT src_ip, recurrence FROM {DB}.view_ip_recurrence')
|
||||
recurrence_map = dict(zip(rec_df['src_ip'], rec_df['recurrence']))
|
||||
except Exception:
|
||||
recurrence_map = {}
|
||||
|
||||
# ── Features par modèle (voir DOCUMENTATION.md §4) ───────────────────────
|
||||
# Features communes aux deux modèles (L7 HTTP pur, disponibles correlated=0 et 1)
|
||||
feats = [
|
||||
'hits', 'hit_velocity', 'fuzzing_index', 'post_ratio', 'port_exhaustion_ratio',
|
||||
'orphan_ratio', 'max_keepalives', 'tcp_shared_count', 'header_order_shared_count',
|
||||
'header_count', 'has_accept_language', 'has_cookie', 'has_referer',
|
||||
'modern_browser_score', 'ua_ch_mismatch', 'ip_id_zero_ratio',
|
||||
'request_size_variance', 'multiplexing_efficiency', 'mss_mobile_mismatch',
|
||||
'asset_ratio', 'direct_access_ratio', 'is_ua_rotating', 'distinct_ja4_count',
|
||||
'src_port_density', 'ja4_asn_concentration', 'ja4_country_concentration', 'is_rare_ja4',
|
||||
'header_order_confidence', 'distinct_header_orders', 'temporal_entropy',
|
||||
'path_diversity_ratio', 'url_depth_variance', 'anomalous_payload_ratio',
|
||||
# B4-B7 : features L7 pures (disponibles correlated=0 et 1)
|
||||
'head_ratio', 'sec_fetch_absence_rate', 'generic_accept_ratio', 'http10_ratio',
|
||||
# Anubis : signal de suspicion modéré (WEIGH/CHALLENGE) — bypass pour ALLOW/DENY
|
||||
'anubis_is_flagged',
|
||||
# HTTP : header incomplet et usage HTTP plain (disponibles pour les deux modèles)
|
||||
'missing_accept_enc_ratio', 'http_scheme_ratio',
|
||||
]
|
||||
# Features supplémentaires pour le modèle Complet (nécessitent des données TCP/TLS)
|
||||
feats_complet = feats + [
|
||||
'tcp_jitter_variance', 'alpn_http_mismatch', 'is_alpn_missing', 'sni_host_mismatch',
|
||||
# B1-B3, B8 : features TLS/TCP (disponibles correlated=1 uniquement)
|
||||
'ja3_diversity_ratio', 'syn_timing_cv', 'tls12_ratio', 'ip_df_variance',
|
||||
# TTL fingerprinting OS + TCP window scale (L4 uniquement)
|
||||
'avg_ttl', 'ttl_std', 'no_window_scale_ratio',
|
||||
]
|
||||
|
||||
# ── Analyse fenêtre 1h ────────────────────────────────────────────────────
|
||||
anom_a, scored_a = run_semi_supervised_logic(df[df['correlated'] == 1].copy(), feats_complet, 'Complet', cycle_id, recurrence_map)
|
||||
anom_b, scored_b = run_semi_supervised_logic(df[df['correlated'] == 0].copy(), feats, 'Applicatif', cycle_id, recurrence_map)
|
||||
all_anom = pd.concat([anom_a, anom_b], ignore_index=True)
|
||||
all_scored = pd.concat([scored_a, scored_b], ignore_index=True)
|
||||
|
||||
# ── A3 : Analyse fenêtre 24h (optionnelle) ────────────────────────────────
|
||||
if ENABLE_MULTIWINDOW:
|
||||
try:
|
||||
df_24h = client.query_df(f'SELECT * FROM {DB}.{MULTIWINDOW_VIEW}')
|
||||
if df_24h is not None and not df_24h.empty:
|
||||
df_24h = _preprocess_df(df_24h)
|
||||
log_info(f"[24h] {len(df_24h)} sessions dans la fenêtre 24h.")
|
||||
anom_c, scored_c = run_semi_supervised_logic(df_24h[df_24h['correlated'] == 1].copy(), feats_complet, 'Complet_24h', cycle_id, recurrence_map)
|
||||
anom_d, scored_d = run_semi_supervised_logic(df_24h[df_24h['correlated'] == 0].copy(), feats, 'Applicatif_24h', cycle_id, recurrence_map)
|
||||
all_anom_24h = pd.concat([anom_c, anom_d], ignore_index=True)
|
||||
all_scored_24h = pd.concat([scored_c, scored_d], ignore_index=True)
|
||||
# Fusion : pour les IPs présentes dans les deux fenêtres, conserver le score le plus bas
|
||||
if not all_anom_24h.empty:
|
||||
all_anom = pd.concat([all_anom, all_anom_24h], ignore_index=True)
|
||||
log_info(f"[24h] Fusion 1h+24h : {len(all_anom)} entrées avant déduplication.")
|
||||
all_scored = pd.concat([all_scored, all_scored_24h], ignore_index=True)
|
||||
else:
|
||||
log_info(f"[24h] Vue {MULTIWINDOW_VIEW} vide — analyse mono-fenêtre.")
|
||||
except Exception as e:
|
||||
log_info(f"[24h] Vue {MULTIWINDOW_VIEW} inaccessible : {e} — analyse mono-fenêtre.")
|
||||
|
||||
# ── Insertion de toutes les classifications dans ml_all_scores ───────────
|
||||
if not all_scored.empty:
|
||||
try:
|
||||
now = datetime.now().replace(microsecond=0)
|
||||
all_scored['detected_at'] = now
|
||||
all_scored['ja4'] = all_scored['ja4'].replace({'': 'HTTP_CLEAR_TEXT'})
|
||||
all_scores_cols = [
|
||||
'detected_at', 'window_start', 'src_ip', 'ja4', 'host', 'bot_name',
|
||||
'anubis_bot_name', 'anubis_bot_action', 'anubis_bot_category',
|
||||
'anomaly_score', 'raw_anomaly_score', 'threat_level', 'model_name',
|
||||
'correlated', 'asn_number', 'asn_org', 'country_code', 'asn_label',
|
||||
'hits', 'hit_velocity', 'fuzzing_index', 'post_ratio', 'campaign_id'
|
||||
]
|
||||
scores_df = all_scored[[c for c in all_scores_cols if c in all_scored.columns]]
|
||||
client.insert_df(f'{DB}.ml_all_scores', scores_df)
|
||||
log_info(f'[ml_all_scores] {len(scores_df)} sessions scorées enregistrées.')
|
||||
except Exception as e:
|
||||
log_info(f'[ml_all_scores] ERREUR INSERTION: {e}')
|
||||
|
||||
if not all_anom.empty:
|
||||
all_anom = all_anom.sort_values('raw_anomaly_score', ascending=True).drop_duplicates(subset=['src_ip'], keep='first')
|
||||
log_info(f'Après déduplication intra-cycle : {len(all_anom)} IP uniques.')
|
||||
|
||||
# A5 — Déduplication inter-cycles avec TTL
|
||||
all_anom = _filter_recent_detections(client, all_anom)
|
||||
|
||||
if all_anom.empty:
|
||||
log_info('Toutes les anomalies filtrées par déduplication TTL.')
|
||||
log_decision('CYCLE_END', cycle_id, '', {'inserted': 0, 'anomalies': 0, 'known_bots': 0, 'critical': 0, 'high': 0, 'dedup_ttl_min': DEDUP_TTL_MIN})
|
||||
return
|
||||
|
||||
all_anom['detected_at'] = datetime.now().replace(microsecond=0)
|
||||
fake_nav_col = 'is_fake_navigation'
|
||||
all_anom['is_headless'] = all_anom[fake_nav_col].astype(int) if fake_nav_col in all_anom.columns else 0
|
||||
|
||||
cols = [
|
||||
'detected_at', 'src_ip', 'ja4', 'host', 'bot_name', 'anomaly_score',
|
||||
'threat_level', 'model_name', 'recurrence',
|
||||
'asn_number', 'asn_org', 'asn_detail', 'asn_domain', 'country_code', 'asn_label',
|
||||
'hits', 'hit_velocity', 'fuzzing_index', 'post_ratio', 'port_exhaustion_ratio', 'max_keepalives', 'orphan_ratio',
|
||||
'tcp_jitter_variance', 'tcp_shared_count', 'true_window_size', 'window_mss_ratio',
|
||||
'alpn_http_mismatch', 'is_alpn_missing', 'sni_host_mismatch',
|
||||
'header_count', 'has_accept_language', 'has_cookie', 'has_referer',
|
||||
'modern_browser_score', 'is_headless', 'ua_ch_mismatch',
|
||||
'header_order_shared_count', 'ip_id_zero_ratio', 'request_size_variance',
|
||||
'multiplexing_efficiency', 'mss_mobile_mismatch',
|
||||
'correlated', 'reason', 'asset_ratio', 'direct_access_ratio', 'is_ua_rotating',
|
||||
'distinct_ja4_count', 'src_port_density', 'ja4_asn_concentration',
|
||||
'ja4_country_concentration', 'is_rare_ja4',
|
||||
'header_order_confidence', 'distinct_header_orders', 'temporal_entropy',
|
||||
'path_diversity_ratio', 'url_depth_variance', 'anomalous_payload_ratio',
|
||||
'anubis_bot_name', 'anubis_bot_action', 'anubis_bot_category',
|
||||
]
|
||||
|
||||
try:
|
||||
final_df = all_anom[[c for c in cols if c in all_anom.columns]]
|
||||
client.insert_df(f'{DB}.ml_detected_anomalies', final_df)
|
||||
log_info(f'Succès: {len(final_df)} menaces enregistrées.')
|
||||
log_decision('CYCLE_END', cycle_id, '', {
|
||||
'inserted': len(final_df),
|
||||
'anomalies': int((final_df.get('bot_name', pd.Series()) == '').sum()),
|
||||
'known_bots': int((final_df.get('bot_name', pd.Series()) != '').sum()),
|
||||
'critical': int((final_df.get('threat_level', pd.Series()) == 'CRITICAL').sum()),
|
||||
'high': int((final_df.get('threat_level', pd.Series()) == 'HIGH').sum()),
|
||||
'dedup_ttl_min': DEDUP_TTL_MIN,
|
||||
})
|
||||
except Exception as e:
|
||||
log_info(f'ERREUR INSERTION: {e}')
|
||||
else:
|
||||
log_info('Aucune menace détectée.')
|
||||
log_decision('CYCLE_END', cycle_id, '', {'inserted': 0, 'anomalies': 0, 'known_bots': 0, 'critical': 0, 'high': 0, 'dedup_ttl_min': DEDUP_TTL_MIN})
|
||||
|
||||
if __name__ == '__main__':
|
||||
log_info('*' * 65)
|
||||
log_info(' DÉMARRAGE DU SERVICE BOT DETECTOR IA v12 (+ Anubis)')
|
||||
log_info(f' DB : {DB}')
|
||||
log_info(f' Contamination : {CONTAMINATION}')
|
||||
log_info(f' Seuil anomalie : {ANOMALY_THRESHOLD} (adaptatif percentile={ANOMALY_PERCENTILE})')
|
||||
log_info(f' Cycle : {CYCLE_INTERVAL}s | Fenêtre 1h | Multi-fenêtres : {ENABLE_MULTIWINDOW}')
|
||||
log_info(f' Retraining : toutes les {RETRAIN_INTERVAL_H}h | Drift threshold : {DRIFT_THRESHOLD:.0%}')
|
||||
log_info(f' Modèles : {MODEL_DIR}')
|
||||
log_info(f' SHAP : {"activé" if ENABLE_SHAP else "désactivé (shap non installé)" if not SHAP_AVAILABLE else "désactivé"}')
|
||||
log_info(f' Clustering : {"activé" if ENABLE_CLUSTERING else "désactivé"} | Dedup TTL : {DEDUP_TTL_MIN}min')
|
||||
log_info(f' Récurrence weight : {RECURRENCE_WEIGHT} | Min features ratio : {MIN_VALID_FEATURE_RATIO:.0%}')
|
||||
log_info(f' Anubis : ALLOW→KNOWN_BOT (score=0), DENY→ANUBIS_DENY (score IF réel)')
|
||||
log_info('*' * 65)
|
||||
log_decision('SERVICE_START', 'boot', '', {
|
||||
'db': DB, 'contamination': CONTAMINATION, 'anomaly_threshold': ANOMALY_THRESHOLD,
|
||||
'cycle_interval': CYCLE_INTERVAL, 'retrain_interval_h': RETRAIN_INTERVAL_H
|
||||
})
|
||||
while True:
|
||||
try: fetch_and_analyze()
|
||||
except Exception as e: log_info(f"Erreur globale : {e}")
|
||||
time.sleep(CYCLE_INTERVAL)
|
||||
|
||||
6
services/bot-detector/bot_detector/requirements.txt
Normal file
6
services/bot-detector/bot_detector/requirements.txt
Normal file
@ -0,0 +1,6 @@
|
||||
clickhouse-connect==0.8.0
|
||||
pandas==2.2.0
|
||||
scikit-learn==1.4.0
|
||||
shap==0.44.1
|
||||
pyyaml>=6.0
|
||||
ja4-common @ file:///app/shared/ja4_common
|
||||
17
services/bot-detector/bot_detector/tests/conftest.py
Normal file
17
services/bot-detector/bot_detector/tests/conftest.py
Normal file
@ -0,0 +1,17 @@
|
||||
import pytest
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def mock_ch_client():
|
||||
"""Mock ClickHouse client."""
|
||||
client = MagicMock()
|
||||
client.query.return_value = MagicMock(result_rows=[])
|
||||
client.command.return_value = None
|
||||
return client
|
||||
|
||||
|
||||
@pytest.fixture(autouse=False)
|
||||
def mock_get_client(mock_ch_client):
|
||||
with patch("ja4_common.clickhouse.get_client", return_value=mock_ch_client):
|
||||
yield mock_ch_client
|
||||
166
services/bot-detector/bot_detector/tests/test_detector.py
Normal file
166
services/bot-detector/bot_detector/tests/test_detector.py
Normal file
@ -0,0 +1,166 @@
|
||||
import os
|
||||
import pytest
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
from unittest.mock import patch, MagicMock
|
||||
|
||||
|
||||
def test_settings_from_env(monkeypatch):
|
||||
"""ClickHouseSettings loads CLICKHOUSE_HOST from env."""
|
||||
monkeypatch.setenv("CLICKHOUSE_HOST", "testhost")
|
||||
from ja4_common.settings import ClickHouseSettings
|
||||
s = ClickHouseSettings()
|
||||
assert s.CLICKHOUSE_HOST == "testhost"
|
||||
|
||||
|
||||
def test_feature_dataframe_validation():
|
||||
"""MIN_VALID_FEATURE_RATIO logic: if < ratio of features have data, skip."""
|
||||
MIN_VALID_FEATURE_RATIO = 0.5
|
||||
df = pd.DataFrame({"f1": [1.0], "f2": [None], "f3": [None], "f4": [None]})
|
||||
non_null_ratio = df.notna().mean().mean()
|
||||
assert non_null_ratio < MIN_VALID_FEATURE_RATIO, "Should detect insufficient features"
|
||||
|
||||
|
||||
def test_anomaly_threshold():
|
||||
"""Scores below ANOMALY_THRESHOLD trigger detection."""
|
||||
ANOMALY_THRESHOLD = -0.1
|
||||
anomaly_scores = np.array([-0.5, -0.3, 0.1, 0.2])
|
||||
anomalies = anomaly_scores[anomaly_scores < ANOMALY_THRESHOLD]
|
||||
assert len(anomalies) == 2, "Should detect 2 anomalies"
|
||||
|
||||
|
||||
def test_dedup_logic():
|
||||
"""Duplicate detections within DEDUP_TTL_MIN are skipped."""
|
||||
from datetime import datetime, timedelta
|
||||
DEDUP_TTL_MIN = 60
|
||||
dedup_cache = {}
|
||||
|
||||
def should_insert(ip: str, now: datetime) -> bool:
|
||||
if ip in dedup_cache:
|
||||
if (now - dedup_cache[ip]).total_seconds() < DEDUP_TTL_MIN * 60:
|
||||
return False
|
||||
dedup_cache[ip] = now
|
||||
return True
|
||||
|
||||
now = datetime(2024, 1, 1, 12, 0, 0)
|
||||
assert should_insert("1.2.3.4", now) is True
|
||||
assert should_insert("1.2.3.4", now + timedelta(minutes=30)) is False # within TTL
|
||||
assert should_insert("1.2.3.4", now + timedelta(minutes=61)) is True # past TTL
|
||||
|
||||
|
||||
def test_health_check():
|
||||
"""Health check endpoint returns 200."""
|
||||
import threading
|
||||
from http.server import HTTPServer, BaseHTTPRequestHandler
|
||||
|
||||
class HealthHandler(BaseHTTPRequestHandler):
|
||||
def do_GET(self):
|
||||
self.send_response(200)
|
||||
self.end_headers()
|
||||
|
||||
def log_message(self, *args):
|
||||
pass
|
||||
|
||||
server = HTTPServer(("127.0.0.1", 0), HealthHandler)
|
||||
port = server.server_address[1]
|
||||
t = threading.Thread(target=server.handle_request)
|
||||
t.start()
|
||||
|
||||
import urllib.request
|
||||
resp = urllib.request.urlopen(f"http://127.0.0.1:{port}/")
|
||||
assert resp.status == 200
|
||||
server.server_close()
|
||||
|
||||
|
||||
def test_dedup_different_ips_are_independent():
|
||||
"""Different IPs are tracked independently in dedup cache."""
|
||||
from datetime import datetime, timedelta
|
||||
DEDUP_TTL_MIN = 60
|
||||
dedup_cache = {}
|
||||
|
||||
def should_insert(ip: str, now: datetime) -> bool:
|
||||
if ip in dedup_cache:
|
||||
if (now - dedup_cache[ip]).total_seconds() < DEDUP_TTL_MIN * 60:
|
||||
return False
|
||||
dedup_cache[ip] = now
|
||||
return True
|
||||
|
||||
now = datetime(2024, 1, 1, 12, 0, 0)
|
||||
assert should_insert("1.1.1.1", now) is True
|
||||
assert should_insert("2.2.2.2", now) is True # Different IP, should be allowed
|
||||
assert should_insert("1.1.1.1", now + timedelta(minutes=30)) is False # Same IP within TTL
|
||||
assert should_insert("2.2.2.2", now + timedelta(minutes=30)) is False # Same IP within TTL
|
||||
|
||||
|
||||
def test_dedup_exact_ttl_boundary():
|
||||
"""Dedup: insertion exactly at TTL boundary is still blocked."""
|
||||
from datetime import datetime, timedelta
|
||||
DEDUP_TTL_MIN = 60
|
||||
dedup_cache = {}
|
||||
|
||||
def should_insert(ip: str, now: datetime) -> bool:
|
||||
if ip in dedup_cache:
|
||||
if (now - dedup_cache[ip]).total_seconds() < DEDUP_TTL_MIN * 60:
|
||||
return False
|
||||
dedup_cache[ip] = now
|
||||
return True
|
||||
|
||||
now = datetime(2024, 1, 1, 12, 0, 0)
|
||||
assert should_insert("1.2.3.4", now) is True
|
||||
# Exactly at 60 minutes should be blocked (< not <=)
|
||||
assert should_insert("1.2.3.4", now + timedelta(minutes=60)) is False
|
||||
|
||||
|
||||
def test_anomaly_threshold_no_anomalies():
|
||||
"""No anomalies when all scores are above threshold."""
|
||||
import numpy as np
|
||||
ANOMALY_THRESHOLD = -0.1
|
||||
scores = np.array([0.0, 0.1, 0.5, 1.0])
|
||||
anomalies = scores[scores < ANOMALY_THRESHOLD]
|
||||
assert len(anomalies) == 0
|
||||
|
||||
|
||||
def test_anomaly_threshold_all_anomalies():
|
||||
"""All items flagged when all scores are below threshold."""
|
||||
import numpy as np
|
||||
ANOMALY_THRESHOLD = -0.1
|
||||
scores = np.array([-0.5, -0.3, -0.2, -0.15])
|
||||
anomalies = scores[scores < ANOMALY_THRESHOLD]
|
||||
assert len(anomalies) == 4
|
||||
|
||||
|
||||
def test_feature_dataframe_all_valid():
|
||||
"""Feature dataframe with all valid values passes ratio check."""
|
||||
import pandas as pd
|
||||
MIN_VALID_FEATURE_RATIO = 0.5
|
||||
df = pd.DataFrame({"f1": [1.0], "f2": [2.0], "f3": [3.0], "f4": [4.0]})
|
||||
non_null_ratio = df.notna().mean().mean()
|
||||
assert non_null_ratio >= MIN_VALID_FEATURE_RATIO
|
||||
|
||||
|
||||
def test_health_check_returns_correct_status():
|
||||
"""Health check endpoint body is readable."""
|
||||
import threading
|
||||
import urllib.request
|
||||
from http.server import HTTPServer, BaseHTTPRequestHandler
|
||||
|
||||
class StatusHandler(BaseHTTPRequestHandler):
|
||||
def do_GET(self):
|
||||
self.send_response(200)
|
||||
self.send_header("Content-Type", "application/json")
|
||||
self.end_headers()
|
||||
self.wfile.write(b'{"status": "ok"}')
|
||||
|
||||
def log_message(self, *args):
|
||||
pass
|
||||
|
||||
server = HTTPServer(("127.0.0.1", 0), StatusHandler)
|
||||
port = server.server_address[1]
|
||||
t = threading.Thread(target=server.handle_request)
|
||||
t.start()
|
||||
|
||||
resp = urllib.request.urlopen(f"http://127.0.0.1:{port}/health")
|
||||
assert resp.status == 200
|
||||
body = resp.read()
|
||||
assert b"ok" in body
|
||||
server.server_close()
|
||||
411
services/bot-detector/deploy_views.sql
Normal file
411
services/bot-detector/deploy_views.sql
Normal file
@ -0,0 +1,411 @@
|
||||
-- ============================================================================
|
||||
-- ARCHITECTURE DE DÉTECTION INTÉGRALE (v13 - bot_detector v11 + ml_all_scores)
|
||||
-- Base : mabase_prod | Fenêtre : 24h | Dédoublonnage par src_ip
|
||||
-- Modifications v11 : ajout campaign_id, raw_anomaly_score dans ml_detected_anomalies
|
||||
-- correction view_dashboard_variability (header_user_agent → reason)
|
||||
-- Modifications v12 : ajout table ml_all_scores (toutes les classifications, sans seuil)
|
||||
-- ============================================================================
|
||||
|
||||
-- 1. NETTOYAGE COMPLET
|
||||
DROP TABLE IF EXISTS mabase_prod.ml_all_scores;
|
||||
DROP DICTIONARY IF EXISTS mabase_prod.dict_bot_ip;
|
||||
DROP DICTIONARY IF EXISTS mabase_prod.dict_bot_ja4;
|
||||
DROP DICTIONARY IF EXISTS mabase_prod.dict_asn_reputation;
|
||||
DROP TABLE IF EXISTS mabase_prod.ml_detected_anomalies;
|
||||
DROP VIEW IF EXISTS mabase_prod.view_ip_recurrence;
|
||||
DROP VIEW IF EXISTS mabase_prod.view_ai_features_1h;
|
||||
-- Suppression des anciennes vues heuristiques
|
||||
DROP VIEW IF EXISTS mabase_prod.view_host_ip_ja4_rotation;
|
||||
DROP VIEW IF EXISTS mabase_prod.view_host_ja4_anomalies;
|
||||
DROP VIEW IF EXISTS mabase_prod.view_form_bruteforce_detected;
|
||||
DROP VIEW IF EXISTS mabase_prod.view_alpn_mismatch_detected;
|
||||
DROP VIEW IF EXISTS mabase_prod.view_tcp_spoofing_detected;
|
||||
|
||||
DROP VIEW IF EXISTS mabase_prod.mv_agg_host_ip_ja4_1h;
|
||||
DROP TABLE IF EXISTS mabase_prod.agg_host_ip_ja4_1h;
|
||||
DROP VIEW IF EXISTS mabase_prod.mv_agg_header_fingerprint_1h;
|
||||
DROP TABLE IF EXISTS mabase_prod.agg_header_fingerprint_1h;
|
||||
|
||||
-- ============================================================================
|
||||
-- 2. DICTIONNAIRES DE RÉPUTATION EN RAM
|
||||
-- ============================================================================
|
||||
CREATE DICTIONARY mabase_prod.dict_bot_ip (prefix String, bot_name String)
|
||||
PRIMARY KEY prefix SOURCE(FILE(path '/var/lib/clickhouse/user_files/bot_ip.csv' format 'CSV'))
|
||||
LAYOUT(IP_TRIE()) LIFETIME(MIN 300 MAX 300);
|
||||
|
||||
CREATE DICTIONARY mabase_prod.dict_bot_ja4 (ja4 String, bot_name String)
|
||||
PRIMARY KEY ja4 SOURCE(FILE(path '/var/lib/clickhouse/user_files/bot_ja4.csv' format 'CSV'))
|
||||
LAYOUT(COMPLEX_KEY_HASHED()) LIFETIME(MIN 300 MAX 300);
|
||||
|
||||
CREATE DICTIONARY mabase_prod.dict_asn_reputation (src_asn UInt64, label String)
|
||||
PRIMARY KEY src_asn SOURCE(FILE(path '/var/lib/clickhouse/user_files/asn_reputation.csv' format 'CSV'))
|
||||
LAYOUT(HASHED()) LIFETIME(MIN 300 MAX 300);
|
||||
|
||||
-- ============================================================================
|
||||
-- 3. TABLE D'AGRÉGATION COMPORTEMENTALE (L4 / L5 / L7)
|
||||
-- ============================================================================
|
||||
CREATE TABLE mabase_prod.agg_host_ip_ja4_1h
|
||||
(
|
||||
window_start DateTime,
|
||||
src_ip IPv6, ja4 String, host String, src_asn UInt32,
|
||||
src_country_code SimpleAggregateFunction(any, String),
|
||||
src_as_name SimpleAggregateFunction(any, String),
|
||||
src_org SimpleAggregateFunction(any, String),
|
||||
src_domain SimpleAggregateFunction(any, String),
|
||||
first_seen SimpleAggregateFunction(min, DateTime),
|
||||
last_seen SimpleAggregateFunction(max, DateTime),
|
||||
hits SimpleAggregateFunction(sum, UInt64),
|
||||
count_post SimpleAggregateFunction(sum, UInt64),
|
||||
uniq_paths AggregateFunction(uniq, String),
|
||||
uniq_query_params AggregateFunction(uniq, String),
|
||||
tcp_fp_raw SimpleAggregateFunction(any, String),
|
||||
tcp_jitter_variance AggregateFunction(varPop, Float64),
|
||||
tcp_win_raw SimpleAggregateFunction(any, UInt32),
|
||||
tcp_scale_raw SimpleAggregateFunction(any, UInt32),
|
||||
tcp_mss_raw SimpleAggregateFunction(any, UInt32),
|
||||
tcp_ttl_raw SimpleAggregateFunction(any, UInt32),
|
||||
http_ver_raw SimpleAggregateFunction(any, String),
|
||||
tls_alpn_raw SimpleAggregateFunction(any, String),
|
||||
tls_sni_raw SimpleAggregateFunction(any, String),
|
||||
first_ua SimpleAggregateFunction(any, String),
|
||||
correlated_raw SimpleAggregateFunction(max, UInt8),
|
||||
unique_src_ports AggregateFunction(uniq, UInt16),
|
||||
unique_conn_id AggregateFunction(uniq, String),
|
||||
max_keepalives SimpleAggregateFunction(max, UInt32),
|
||||
orphan_count SimpleAggregateFunction(sum, UInt64),
|
||||
ip_id_zero_count SimpleAggregateFunction(sum, UInt64),
|
||||
total_ip_length_var AggregateFunction(varPop, Float64),
|
||||
mss_1460_count SimpleAggregateFunction(sum, UInt64),
|
||||
count_assets SimpleAggregateFunction(sum, UInt64),
|
||||
count_no_referer SimpleAggregateFunction(sum, UInt64),
|
||||
uniq_ua AggregateFunction(uniq, String),
|
||||
max_requests_per_sec SimpleAggregateFunction(max, UInt32),
|
||||
url_depth_variance AggregateFunction(varPop, Float64),
|
||||
count_anomalous_payload SimpleAggregateFunction(sum, UInt64),
|
||||
-- B features (ajoutées v14)
|
||||
uniq_ja3 AggregateFunction(uniq, String), -- B1: diversité JA3/JA4
|
||||
avg_syn_ms AggregateFunction(avg, Float64), -- B2: SYN timing moyen (pour CV)
|
||||
tls12_count SimpleAggregateFunction(sum, UInt64), -- B3: ratio TLS 1.2
|
||||
count_head SimpleAggregateFunction(sum, UInt64), -- B4: ratio requêtes HEAD
|
||||
count_no_sec_fetch SimpleAggregateFunction(sum, UInt64),-- B5: absence Sec-Fetch-*
|
||||
count_generic_accept SimpleAggregateFunction(sum, UInt64),-- B6: Accept générique
|
||||
count_http10 SimpleAggregateFunction(sum, UInt64), -- B7: ratio HTTP/1.0
|
||||
ip_df_var AggregateFunction(varPop, Float64) -- B8: variance bit DF
|
||||
)
|
||||
ENGINE = AggregatingMergeTree()
|
||||
ORDER BY (window_start, src_ip, ja4, host);
|
||||
|
||||
-- ============================================================================
|
||||
-- 4. VUE MATÉRIALISÉE → agg_host_ip_ja4_1h
|
||||
-- ============================================================================
|
||||
CREATE MATERIALIZED VIEW mabase_prod.mv_agg_host_ip_ja4_1h
|
||||
TO mabase_prod.agg_host_ip_ja4_1h AS
|
||||
SELECT
|
||||
toStartOfHour(src.time) AS window_start,
|
||||
toIPv6(src.src_ip) AS src_ip, src.ja4, src.host, src.src_asn,
|
||||
any(src.src_country_code) AS src_country_code, any(src.src_as_name) AS src_as_name,
|
||||
any(src.src_org) AS src_org, any(src.src_domain) AS src_domain,
|
||||
min(src.time) AS first_seen, max(src.time) AS last_seen, count() AS hits,
|
||||
sum(IF(src.method = 'POST', 1, 0)) AS count_post,
|
||||
uniqState(src.path) AS uniq_paths, uniqState(src.query) AS uniq_query_params,
|
||||
any(toString(cityHash64(concat(toString(src.tcp_meta_window_size), toString(src.tcp_meta_mss), toString(src.tcp_meta_window_scale), src.tcp_meta_options)))) AS tcp_fp_raw,
|
||||
varPopState(toFloat64(src.syn_to_clienthello_ms)) AS tcp_jitter_variance,
|
||||
any(src.tcp_meta_window_size) AS tcp_win_raw, any(src.tcp_meta_window_scale) AS tcp_scale_raw,
|
||||
any(src.tcp_meta_mss) AS tcp_mss_raw, any(src.ip_meta_ttl) AS tcp_ttl_raw,
|
||||
any(src.http_version) AS http_ver_raw, any(src.tls_alpn) AS tls_alpn_raw, any(src.tls_sni) AS tls_sni_raw,
|
||||
any(src.header_user_agent) AS first_ua, max(toUInt8(src.correlated)) AS correlated_raw,
|
||||
uniqState(toUInt16(src.src_port)) AS unique_src_ports, uniqState(src.conn_id) AS unique_conn_id,
|
||||
max(toUInt32(src.keepalives)) AS max_keepalives,
|
||||
sum(IF(src.orphan_side = 'A' OR src.correlated = 0, 1, 0)) AS orphan_count,
|
||||
sum(IF(src.ip_meta_id == 0, 1, 0)) AS ip_id_zero_count,
|
||||
varPopState(toFloat64(src.ip_meta_total_length)) AS total_ip_length_var,
|
||||
sum(IF(src.tcp_meta_mss == 1460, 1, 0)) AS mss_1460_count,
|
||||
sum(IF(match(src.path, '(?i)\.(png|jpg|jpeg|gif|css|js|ico|woff2|svg|eot)$'), 1, 0)) AS count_assets,
|
||||
sum(IF(position(src.client_headers, 'Referer') = 0, 1, 0)) AS count_no_referer,
|
||||
uniqState(src.header_user_agent) AS uniq_ua,
|
||||
0 AS max_requests_per_sec,
|
||||
varPopState(toFloat64(length(replaceAll(src.path, '/', '//')) - length(src.path))) AS url_depth_variance,
|
||||
sum(IF(src.ip_meta_total_length < 60 OR src.ip_meta_total_length > 1500, 1, 0)) AS count_anomalous_payload,
|
||||
-- B features
|
||||
uniqState(src.ja3) AS uniq_ja3,
|
||||
avgState(toFloat64(src.syn_to_clienthello_ms)) AS avg_syn_ms,
|
||||
sum(IF(src.tls_version = '1.2', 1, 0)) AS tls12_count,
|
||||
sum(IF(src.method = 'HEAD', 1, 0)) AS count_head,
|
||||
sum(IF(length(src.header_sec_fetch_site) = 0, 1, 0)) AS count_no_sec_fetch,
|
||||
sum(IF(length(src.header_accept) < 5, 1, 0)) AS count_generic_accept,
|
||||
sum(IF(src.http_version = 'HTTP/1.0', 1, 0)) AS count_http10,
|
||||
varPopState(toFloat64(src.ip_meta_df)) AS ip_df_var
|
||||
FROM mabase_prod.http_logs AS src
|
||||
GROUP BY window_start, src_ip, ja4, host, src_asn;
|
||||
|
||||
-- ============================================================================
|
||||
-- 5. TABLE D'AGRÉGATION DES HEADERS (L7)
|
||||
-- ============================================================================
|
||||
CREATE TABLE mabase_prod.agg_header_fingerprint_1h
|
||||
(
|
||||
window_start DateTime,
|
||||
src_ip IPv6,
|
||||
header_order_hash SimpleAggregateFunction(any, String),
|
||||
header_count SimpleAggregateFunction(max, UInt16),
|
||||
has_accept_language SimpleAggregateFunction(max, UInt8),
|
||||
has_cookie SimpleAggregateFunction(max, UInt8),
|
||||
has_referer SimpleAggregateFunction(max, UInt8),
|
||||
modern_browser_score SimpleAggregateFunction(max, UInt8),
|
||||
ua_ch_mismatch SimpleAggregateFunction(max, UInt8),
|
||||
sec_fetch_mode SimpleAggregateFunction(any, String),
|
||||
sec_fetch_dest SimpleAggregateFunction(any, String)
|
||||
)
|
||||
ENGINE = AggregatingMergeTree()
|
||||
ORDER BY (window_start, src_ip);
|
||||
|
||||
CREATE MATERIALIZED VIEW mabase_prod.mv_agg_header_fingerprint_1h
|
||||
TO mabase_prod.agg_header_fingerprint_1h AS
|
||||
SELECT
|
||||
toStartOfHour(src.time) AS window_start,
|
||||
toIPv6(src.src_ip) AS src_ip,
|
||||
any(toString(cityHash64(src.client_headers))) AS header_order_hash,
|
||||
max(toUInt16(length(src.client_headers) - length(replaceAll(src.client_headers, ',', '')) + 1)) AS header_count,
|
||||
max(toUInt8(if(position(src.client_headers, 'Accept-Language') > 0, 1, 0))) AS has_accept_language,
|
||||
max(toUInt8(if(position(src.client_headers, 'Cookie') > 0, 1, 0))) AS has_cookie,
|
||||
max(toUInt8(if(position(src.client_headers, 'Referer') > 0, 1, 0))) AS has_referer,
|
||||
max(toUInt8(if(length(src.header_sec_ch_ua) > 0, 100, if(length(src.header_user_agent) > 0, 50, 0)))) AS modern_browser_score,
|
||||
max(toUInt8(if((position(src.header_user_agent, 'Windows') > 0 AND position(src.header_sec_ch_ua_platform, 'Windows') == 0) OR (position(src.header_user_agent, 'iPhone') > 0 AND position(src.header_sec_ch_ua_platform, 'iOS') == 0), 1, 0))) AS ua_ch_mismatch,
|
||||
any(src.header_sec_fetch_mode) AS sec_fetch_mode,
|
||||
any(src.header_sec_fetch_dest) AS sec_fetch_dest
|
||||
FROM mabase_prod.http_logs AS src
|
||||
GROUP BY window_start, src.src_ip;
|
||||
|
||||
-- ============================================================================
|
||||
-- 6. TABLE DE RÉSULTATS ML — MENACES UNIQUEMENT (scores < seuil)
|
||||
-- ============================================================================
|
||||
CREATE TABLE mabase_prod.ml_detected_anomalies
|
||||
(
|
||||
detected_at DateTime, src_ip IPv6, ja4 String, host String, bot_name String,
|
||||
anomaly_score Float32, threat_level String, model_name String, recurrence UInt32,
|
||||
asn_number String, asn_org String, asn_detail String, asn_domain String, country_code String, asn_label String,
|
||||
hits UInt64, hit_velocity Float32, fuzzing_index Float32, post_ratio Float32, port_exhaustion_ratio Float32,
|
||||
max_keepalives UInt32, orphan_ratio Float32, tcp_jitter_variance Float32, tcp_shared_count UInt32,
|
||||
true_window_size UInt64, window_mss_ratio Float32, alpn_http_mismatch UInt8, is_alpn_missing UInt8, sni_host_mismatch UInt8,
|
||||
header_count UInt16, has_accept_language UInt8, has_cookie UInt8, has_referer UInt8, modern_browser_score UInt8,
|
||||
is_headless UInt8, ua_ch_mismatch UInt8, header_order_shared_count UInt32, ip_id_zero_ratio Float32,
|
||||
request_size_variance Float32, multiplexing_efficiency Float32, mss_mobile_mismatch UInt8, correlated UInt8, reason String,
|
||||
asset_ratio Float32, direct_access_ratio Float32, is_ua_rotating UInt8, distinct_ja4_count UInt32,
|
||||
src_port_density Float32, ja4_asn_concentration Float32, ja4_country_concentration Float32, is_rare_ja4 UInt8,
|
||||
header_order_confidence Float32, distinct_header_orders UInt32, temporal_entropy Float32,
|
||||
path_diversity_ratio Float32, url_depth_variance Float32, anomalous_payload_ratio Float32,
|
||||
-- Colonnes ajoutées en v11 (bot_detector v11)
|
||||
campaign_id Int32 DEFAULT -1,
|
||||
raw_anomaly_score Float32 DEFAULT 0
|
||||
)
|
||||
ENGINE = ReplacingMergeTree(detected_at)
|
||||
ORDER BY (src_ip)
|
||||
TTL detected_at + INTERVAL 30 DAY;
|
||||
|
||||
-- ============================================================================
|
||||
-- 6b. TABLE DE TOUTES LES CLASSIFICATIONS (sans seuil, pour observabilité)
|
||||
-- ============================================================================
|
||||
CREATE TABLE mabase_prod.ml_all_scores
|
||||
(
|
||||
detected_at DateTime,
|
||||
window_start DateTime,
|
||||
src_ip IPv6,
|
||||
ja4 String,
|
||||
host String,
|
||||
bot_name String,
|
||||
anomaly_score Float32,
|
||||
raw_anomaly_score Float32,
|
||||
threat_level String,
|
||||
model_name String,
|
||||
correlated UInt8,
|
||||
asn_number String,
|
||||
asn_org String,
|
||||
country_code String,
|
||||
asn_label String,
|
||||
hits UInt64,
|
||||
hit_velocity Float32,
|
||||
fuzzing_index Float32,
|
||||
post_ratio Float32,
|
||||
campaign_id Int32
|
||||
)
|
||||
ENGINE = ReplacingMergeTree(detected_at)
|
||||
ORDER BY (window_start, src_ip, ja4, host, model_name)
|
||||
TTL window_start + INTERVAL 3 DAY
|
||||
SETTINGS index_granularity = 8192;
|
||||
|
||||
-- ============================================================================
|
||||
-- 7. VUE DE RÉCURRENCE
|
||||
-- ============================================================================
|
||||
CREATE OR REPLACE VIEW mabase_prod.view_ip_recurrence AS
|
||||
SELECT src_ip, count() AS recurrence, min(detected_at) AS first_seen, max(detected_at) AS last_seen,
|
||||
min(anomaly_score) AS worst_score, argMin(threat_level, anomaly_score) AS worst_threat_level
|
||||
FROM mabase_prod.ml_detected_anomalies GROUP BY src_ip;
|
||||
|
||||
-- ============================================================================
|
||||
-- 8. VUE IA PRINCIPALE (Avec CTE pour Entropie Temporelle)
|
||||
-- ============================================================================
|
||||
CREATE OR REPLACE VIEW mabase_prod.view_ai_features_1h AS
|
||||
WITH base_data AS (
|
||||
SELECT
|
||||
a.window_start, a.src_ip, a.ja4, a.host,
|
||||
toString(a.src_asn) AS asn_number, a.src_as_name AS asn_org,
|
||||
a.src_org AS asn_detail, a.src_domain AS asn_domain, a.src_country_code AS country_code,
|
||||
dictGetOrDefault('mabase_prod.dict_asn_reputation', 'label', toUInt64(a.src_asn), 'unknown') AS asn_label,
|
||||
COALESCE(
|
||||
nullIf(dictGetOrDefault('mabase_prod.dict_bot_ip', 'bot_name', a.src_ip, ''), ''),
|
||||
nullIf(dictGetOrDefault('mabase_prod.dict_bot_ja4', 'bot_name', tuple(a.ja4), ''), ''),
|
||||
''
|
||||
) AS bot_name,
|
||||
a.hits AS hits,
|
||||
sum(a.hits) OVER (PARTITION BY a.src_ip) AS total_ip_hits,
|
||||
a.correlated AS correlated, a.tcp_jitter_variance AS tcp_jitter_variance,
|
||||
a.true_window_size AS true_window_size, a.window_mss_ratio AS window_mss_ratio, a.max_keepalives AS max_keepalives,
|
||||
h.header_order_hash AS header_order_hash, h.header_count AS header_count,
|
||||
h.has_accept_language AS has_accept_language, h.has_cookie AS has_cookie,
|
||||
h.has_referer AS has_referer, h.modern_browser_score AS modern_browser_score, h.ua_ch_mismatch AS ua_ch_mismatch,
|
||||
(a.count_post / (a.hits + 1)) AS post_ratio, (a.uniq_query_params / (a.uniq_paths + 1)) AS fuzzing_index,
|
||||
(a.hits / (dateDiff('second', a.first_seen, a.last_seen) + 1)) AS hit_velocity,
|
||||
(a.unique_src_ports / (a.hits + 1)) AS port_exhaustion_ratio, (a.orphan_count / (a.hits + 1)) AS orphan_ratio,
|
||||
(a.ip_id_zero_count / (a.hits + 1)) AS ip_id_zero_ratio, (a.hits / (a.unique_conn_id + 1)) AS multiplexing_efficiency,
|
||||
IF(a.mss_1460_count > (a.hits * 0.8) AND h.modern_browser_score > 70, 1, 0) AS mss_mobile_mismatch,
|
||||
a.request_size_variance AS request_size_variance,
|
||||
IF(a.tls_alpn = 'h2' AND a.http_version != '2', 1, 0) AS alpn_http_mismatch,
|
||||
IF(length(a.tls_alpn) = 0 OR a.tls_alpn = '00', 1, 0) AS is_alpn_missing,
|
||||
IF(length(a.tls_sni) > 0 AND a.tls_sni != a.host, 1, 0) AS sni_host_mismatch,
|
||||
IF(h.sec_fetch_mode = 'navigate' AND h.sec_fetch_dest != 'document', 1, 0) AS is_fake_navigation,
|
||||
count() OVER (PARTITION BY a.tcp_fingerprint) AS tcp_shared_count,
|
||||
count() OVER (PARTITION BY h.header_order_hash) AS header_order_shared_count,
|
||||
(a.count_assets / (a.hits + 1)) AS asset_ratio, (a.count_no_referer / (a.hits + 1)) AS direct_access_ratio,
|
||||
IF(a.unique_ua > 2, 1, 0) AS is_ua_rotating, uniqExact(a.ja4) OVER (PARTITION BY a.src_ip) AS distinct_ja4_count,
|
||||
((a.hits / (a.unique_src_ports + 1)) / (dateDiff('second', a.first_seen, a.last_seen) + 1)) AS src_port_density,
|
||||
(sum(a.hits) OVER (PARTITION BY a.ja4, a.src_asn) / (sum(a.hits) OVER (PARTITION BY a.ja4) + 1)) AS ja4_asn_concentration,
|
||||
(sum(a.hits) OVER (PARTITION BY a.ja4, a.src_country_code) / (sum(a.hits) OVER (PARTITION BY a.ja4) + 1)) AS ja4_country_concentration,
|
||||
IF(sum(a.hits) OVER (PARTITION BY a.ja4) < 100, 1, 0) AS is_rare_ja4,
|
||||
(count() OVER (PARTITION BY h.header_order_hash, a.first_ua) / (count() OVER (PARTITION BY a.first_ua) + 1)) AS header_order_confidence,
|
||||
uniqExact(h.header_order_hash) OVER (PARTITION BY a.src_ip) AS distinct_header_orders,
|
||||
(a.uniq_paths / (a.hits + 1)) AS path_diversity_ratio,
|
||||
a.url_depth_variance AS url_depth_variance,
|
||||
(a.count_anomalous_payload / (a.hits + 1)) AS anomalous_payload_ratio,
|
||||
-- B features : TLS/TCP (disponibles correlated=1 uniquement)
|
||||
a.uniq_ja3_val AS uniq_ja3_per_row,
|
||||
sqrt(a.tcp_jitter_variance) / greatest(a.avg_syn_ms_val, 1) AS syn_timing_cv, -- B2
|
||||
a.tls12_count / (a.hits + 1) AS tls12_ratio, -- B3
|
||||
-- B features : HTTP pures (disponibles correlated=0 et 1)
|
||||
a.count_head / (a.hits + 1) AS head_ratio, -- B4
|
||||
a.count_no_sec_fetch / (a.hits + 1) AS sec_fetch_absence_rate, -- B5
|
||||
a.count_generic_accept / (a.hits + 1) AS generic_accept_ratio, -- B6
|
||||
a.count_http10 / (a.hits + 1) AS http10_ratio, -- B7
|
||||
a.ip_df_variance AS ip_df_variance -- B8
|
||||
FROM (
|
||||
SELECT
|
||||
window_start, src_ip, ja4, host, src_asn,
|
||||
any(src_country_code) AS src_country_code, any(src_as_name) AS src_as_name,
|
||||
any(src_org) AS src_org, any(src_domain) AS src_domain, any(first_ua) AS first_ua,
|
||||
sum(hits) AS hits, uniqMerge(uniq_paths) AS uniq_paths,
|
||||
uniqMerge(uniq_query_params) AS uniq_query_params, sum(count_post) AS count_post,
|
||||
min(first_seen) AS first_seen, max(last_seen) AS last_seen,
|
||||
any(tcp_fp_raw) AS tcp_fingerprint, varPopMerge(tcp_jitter_variance) AS tcp_jitter_variance,
|
||||
varPopMerge(total_ip_length_var) AS request_size_variance,
|
||||
any(tcp_win_raw * exp2(tcp_scale_raw)) AS true_window_size,
|
||||
IF(any(tcp_mss_raw) > 0, any(tcp_win_raw) / any(tcp_mss_raw), 0) AS window_mss_ratio,
|
||||
any(http_ver_raw) AS http_version, any(tls_alpn_raw) AS tls_alpn, any(tls_sni_raw) AS tls_sni,
|
||||
max(correlated_raw) AS correlated, uniqMerge(unique_src_ports) AS unique_src_ports,
|
||||
uniqMerge(unique_conn_id) AS unique_conn_id, max(max_keepalives) AS max_keepalives,
|
||||
sum(orphan_count) AS orphan_count, sum(ip_id_zero_count) AS ip_id_zero_count,
|
||||
sum(mss_1460_count) AS mss_1460_count,
|
||||
sum(count_assets) AS count_assets, sum(count_no_referer) AS count_no_referer, uniqMerge(uniq_ua) AS unique_ua,
|
||||
varPopMerge(url_depth_variance) AS url_depth_variance,
|
||||
sum(count_anomalous_payload) AS count_anomalous_payload,
|
||||
-- B feature aggregates
|
||||
uniqMerge(uniq_ja3) AS uniq_ja3_val,
|
||||
avgMerge(avg_syn_ms) AS avg_syn_ms_val,
|
||||
sum(tls12_count) AS tls12_count,
|
||||
sum(count_head) AS count_head,
|
||||
sum(count_no_sec_fetch) AS count_no_sec_fetch,
|
||||
sum(count_generic_accept) AS count_generic_accept,
|
||||
sum(count_http10) AS count_http10,
|
||||
varPopMerge(ip_df_var) AS ip_df_variance
|
||||
FROM mabase_prod.agg_host_ip_ja4_1h
|
||||
WHERE window_start >= now() - INTERVAL 24 HOUR
|
||||
GROUP BY window_start, src_ip, ja4, host, src_asn
|
||||
) a
|
||||
LEFT JOIN (
|
||||
SELECT
|
||||
window_start, src_ip, any(header_order_hash) AS header_order_hash,
|
||||
max(header_count) AS header_count, max(has_accept_language) AS has_accept_language,
|
||||
max(has_cookie) AS has_cookie, max(has_referer) AS has_referer,
|
||||
max(modern_browser_score) AS modern_browser_score, max(ua_ch_mismatch) AS ua_ch_mismatch,
|
||||
any(sec_fetch_mode) AS sec_fetch_mode, any(sec_fetch_dest) AS sec_fetch_dest
|
||||
FROM mabase_prod.agg_header_fingerprint_1h
|
||||
WHERE window_start >= now() - INTERVAL 24 HOUR
|
||||
GROUP BY window_start, src_ip
|
||||
) h ON a.src_ip = h.src_ip AND a.window_start = h.window_start
|
||||
)
|
||||
SELECT
|
||||
*,
|
||||
-(sum((hits / (total_ip_hits + 1)) * log2((hits / (total_ip_hits + 1)) + 0.000001)) OVER (PARTITION BY src_ip)) AS temporal_entropy,
|
||||
-- B1: ratio diversité JA3/JA4 par src_ip (signal: bots avec JA3 rotatifs sur peu de JA4)
|
||||
sum(uniq_ja3_per_row) OVER (PARTITION BY src_ip) / greatest(distinct_ja4_count, 1) AS ja3_diversity_ratio
|
||||
FROM base_data;
|
||||
|
||||
-- ============================================================================
|
||||
-- VUES POUR LE DASHBOARD WEB
|
||||
-- ============================================================================
|
||||
|
||||
-- Vue pour les métriques globales du dashboard
|
||||
CREATE OR REPLACE VIEW mabase_prod.view_dashboard_summary AS
|
||||
SELECT
|
||||
count() AS total_detections,
|
||||
countIf(threat_level = 'CRITICAL') AS critical_count,
|
||||
countIf(threat_level = 'HIGH') AS high_count,
|
||||
countIf(threat_level = 'MEDIUM') AS medium_count,
|
||||
countIf(threat_level = 'LOW') AS low_count,
|
||||
countIf(bot_name != '') AS known_bots_count,
|
||||
countIf(bot_name = '') AS anomalies_count,
|
||||
uniq(src_ip) AS unique_ips
|
||||
FROM mabase_prod.ml_detected_anomalies
|
||||
WHERE detected_at >= now() - INTERVAL 24 HOUR;
|
||||
|
||||
-- Vue pour la série temporelle (par heure)
|
||||
CREATE OR REPLACE VIEW mabase_prod.view_dashboard_timeseries AS
|
||||
SELECT
|
||||
toStartOfHour(detected_at) AS hour,
|
||||
count() AS total,
|
||||
countIf(threat_level = 'CRITICAL') AS critical,
|
||||
countIf(threat_level = 'HIGH') AS high,
|
||||
countIf(threat_level = 'MEDIUM') AS medium,
|
||||
countIf(threat_level = 'LOW') AS low
|
||||
FROM mabase_prod.ml_detected_anomalies
|
||||
WHERE detected_at >= now() - INTERVAL 24 HOUR
|
||||
GROUP BY hour
|
||||
ORDER BY hour;
|
||||
|
||||
-- Vue pour la distribution des menaces
|
||||
CREATE OR REPLACE VIEW mabase_prod.view_dashboard_threat_dist AS
|
||||
SELECT
|
||||
threat_level,
|
||||
count() AS count,
|
||||
round(count() * 100.0 / sum(count()) OVER (), 2) AS percentage
|
||||
FROM mabase_prod.ml_detected_anomalies
|
||||
WHERE detected_at >= now() - INTERVAL 24 HOUR
|
||||
GROUP BY threat_level
|
||||
ORDER BY count DESC;
|
||||
|
||||
-- Vue pour la variabilité (utilisée par l'API)
|
||||
-- Note v12 : header_user_agent n'existe pas dans ml_detected_anomalies → remplacé par reason
|
||||
CREATE OR REPLACE VIEW mabase_prod.view_dashboard_variability AS
|
||||
SELECT
|
||||
detected_at,
|
||||
src_ip,
|
||||
ja4,
|
||||
host,
|
||||
reason AS sample_reason,
|
||||
country_code,
|
||||
asn_number,
|
||||
asn_org,
|
||||
threat_level,
|
||||
model_name,
|
||||
anomaly_score,
|
||||
campaign_id,
|
||||
raw_anomaly_score
|
||||
FROM mabase_prod.ml_detected_anomalies
|
||||
WHERE detected_at >= now() - INTERVAL 24 HOUR;
|
||||
78
services/bot-detector/docker-compose.yml
Normal file
78
services/bot-detector/docker-compose.yml
Normal file
@ -0,0 +1,78 @@
|
||||
version: '3.8' # Champ déprécié depuis Docker Compose v2.x mais toléré — peut être supprimé
|
||||
|
||||
services:
|
||||
bot_detector_ai:
|
||||
build: bot_detector
|
||||
container_name: bot_detector_ai
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "8080:8080" # Health check → GET http://localhost:8080/
|
||||
|
||||
env_file:
|
||||
- .env
|
||||
|
||||
environment:
|
||||
# ── ClickHouse ────────────────────────────────────────────────────────
|
||||
CLICKHOUSE_HOST: ${CLICKHOUSE_HOST:-clickhouse}
|
||||
CLICKHOUSE_DB: ${CLICKHOUSE_DB:-mabase_prod}
|
||||
CLICKHOUSE_USER: ${CLICKHOUSE_USER:-admin}
|
||||
CLICKHOUSE_PASSWORD: ${CLICKHOUSE_PASSWORD:-}
|
||||
|
||||
# ── Modèle IA ─────────────────────────────────────────────────────────
|
||||
ISOLATION_CONTAMINATION: ${ISOLATION_CONTAMINATION:-0.02}
|
||||
ANOMALY_THRESHOLD: ${ANOMALY_THRESHOLD:--0.03}
|
||||
|
||||
# ── Cycle ─────────────────────────────────────────────────────────────
|
||||
CYCLE_INTERVAL_SEC: ${CYCLE_INTERVAL_SEC:-300}
|
||||
MAX_CONSECUTIVE_FAILURES: ${MAX_CONSECUTIVE_FAILURES:-3}
|
||||
|
||||
# ── Logs ──────────────────────────────────────────────────────────────
|
||||
BOT_DETECTOR_LOG: ${BOT_DETECTOR_LOG:-/var/log/bot_detector/decisions.jsonl}
|
||||
LOG_BACKUP_COUNT: ${LOG_BACKUP_COUNT:-7}
|
||||
|
||||
# ── Modèles persistants ───────────────────────────────────────────────
|
||||
MODEL_DIR: ${MODEL_DIR:-/var/lib/bot_detector}
|
||||
RETRAIN_INTERVAL_HOURS: ${RETRAIN_INTERVAL_HOURS:-24}
|
||||
MODEL_HISTORY_COUNT: ${MODEL_HISTORY_COUNT:-10}
|
||||
|
||||
# ── A1 — Dérive conceptuelle ──────────────────────────────────────────
|
||||
DRIFT_THRESHOLD: ${DRIFT_THRESHOLD:-0.30}
|
||||
|
||||
# ── A2 — Seuil adaptatif ──────────────────────────────────────────────
|
||||
ANOMALY_PERCENTILE: ${ANOMALY_PERCENTILE:-5}
|
||||
|
||||
# ── A3 — Analyse multi-fenêtres ───────────────────────────────────────
|
||||
ENABLE_MULTIWINDOW: ${ENABLE_MULTIWINDOW:-false}
|
||||
MULTIWINDOW_VIEW: ${MULTIWINDOW_VIEW:-view_ai_features_24h}
|
||||
|
||||
# ── A4 — Explainabilité SHAP ──────────────────────────────────────────
|
||||
ENABLE_SHAP: ${ENABLE_SHAP:-true}
|
||||
|
||||
# ── A5 — Déduplication inter-cycles avec TTL ──────────────────────────
|
||||
DEDUP_TTL_MIN: ${DEDUP_TTL_MIN:-60}
|
||||
|
||||
# ── A6 — Pondération du score par récurrence ──────────────────────────
|
||||
RECURRENCE_WEIGHT: ${RECURRENCE_WEIGHT:-0.005}
|
||||
|
||||
# ── A7 — Validation de complétude des features ────────────────────────
|
||||
MIN_VALID_FEATURE_RATIO: ${MIN_VALID_FEATURE_RATIO:-0.50}
|
||||
|
||||
# ── A8 — Clustering comportemental des anomalies ──────────────────────
|
||||
ENABLE_CLUSTERING: ${ENABLE_CLUSTERING:-true}
|
||||
CLUSTERING_MIN_SAMPLES: ${CLUSTERING_MIN_SAMPLES:-3}
|
||||
|
||||
# ── Health check ──────────────────────────────────────────────────────
|
||||
HEALTH_PORT: ${HEALTH_PORT:-8080}
|
||||
|
||||
volumes:
|
||||
# Logs structurés JSONL (analyse a posteriori)
|
||||
- ./bot_detector_logs:/var/log/bot_detector
|
||||
|
||||
# Modèles Isolation Forest sérialisés (joblib)
|
||||
- ./bot_detector_models:/var/lib/bot_detector
|
||||
|
||||
# Fichiers CSV de réputation partagés avec ClickHouse (FILE engine)
|
||||
# Montés en read-only côté bot_detector (écriture via ClickHouse uniquement)
|
||||
- ./reputation/data/user_files/bot_ip.csv:/data/bot_ip.csv:ro
|
||||
- ./reputation/data/user_files/bot_ja4.csv:/data/bot_ja4.csv:ro
|
||||
- ./reputation/data/user_files/asn_reputation.csv:/data/asn_reputation.csv:ro
|
||||
36
services/bot-detector/reputation/asn_reputation.csv
Normal file
36
services/bot-detector/reputation/asn_reputation.csv
Normal file
@ -0,0 +1,36 @@
|
||||
3215,human
|
||||
12322,human
|
||||
5410,human
|
||||
15557,human
|
||||
21502,human
|
||||
9036,human
|
||||
8218,human
|
||||
39180,human
|
||||
3303,human
|
||||
6730,human
|
||||
9044,human
|
||||
15600,human
|
||||
13030,human
|
||||
25256,human
|
||||
5432,human
|
||||
6848,human
|
||||
12392,human
|
||||
49686,human
|
||||
6714,human
|
||||
49203,human
|
||||
6661,human
|
||||
8469,human
|
||||
20676,human
|
||||
3320,human
|
||||
3209,human
|
||||
8881,human
|
||||
6805,human
|
||||
29562,human
|
||||
31334,human
|
||||
8422,human
|
||||
25255,human
|
||||
8447,human
|
||||
12635,human
|
||||
6830,human
|
||||
8412,human
|
||||
35369,human
|
||||
|
288598
services/bot-detector/reputation/bot_ip.csv
Normal file
288598
services/bot-detector/reputation/bot_ip.csv
Normal file
File diff suppressed because it is too large
Load Diff
0
services/bot-detector/reputation/bot_ja4.csv
Normal file
0
services/bot-detector/reputation/bot_ja4.csv
Normal file
|
|
1267296
services/bot-detector/reputation/iplocate-ip-to-asn.csv
Normal file
1267296
services/bot-detector/reputation/iplocate-ip-to-asn.csv
Normal file
File diff suppressed because it is too large
Load Diff
19
services/correlator/.dockerignore
Normal file
19
services/correlator/.dockerignore
Normal file
@ -0,0 +1,19 @@
|
||||
# Build outputs
|
||||
dist/
|
||||
|
||||
# Dependency directories
|
||||
vendor/
|
||||
|
||||
# IDE
|
||||
.idea/
|
||||
.vscode/
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
|
||||
# OS
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
|
||||
# Aider cache
|
||||
.aider*
|
||||
2
services/correlator/.env.example
Normal file
2
services/correlator/.env.example
Normal file
@ -0,0 +1,2 @@
|
||||
# correlator configuration — DO NOT COMMIT real values
|
||||
LOGCORRELATOR_CLICKHOUSE_DSN=clickhouse://data_writer:ChangeMe@clickhouse:9000/mabase_prod
|
||||
73
services/correlator/.github/workflows/ci.yml
vendored
Normal file
73
services/correlator/.github/workflows/ci.yml
vendored
Normal file
@ -0,0 +1,73 @@
|
||||
name: Build and Test
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [ master ]
|
||||
pull_request:
|
||||
branches: [ master ]
|
||||
|
||||
jobs:
|
||||
test:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Go
|
||||
uses: actions/setup-go@v5
|
||||
with:
|
||||
go-version: '1.21'
|
||||
|
||||
- name: Download dependencies
|
||||
run: go mod download
|
||||
|
||||
- name: Run tests with coverage
|
||||
run: |
|
||||
go test -race -coverprofile=coverage.txt -covermode=atomic ./...
|
||||
TOTAL=$(go tool cover -func=coverage.txt | grep total | awk '{gsub(/%/, "", $3); print $3}')
|
||||
echo "Coverage: ${TOTAL}%"
|
||||
if (( $(echo "$TOTAL < 80" | bc -l) )); then
|
||||
echo "Coverage ${TOTAL}% is below 80% threshold"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
- name: Upload coverage to Codecov
|
||||
uses: codecov/codecov-action@v3
|
||||
with:
|
||||
file: ./coverage.txt
|
||||
|
||||
build:
|
||||
runs-on: ubuntu-latest
|
||||
needs: test
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Go
|
||||
uses: actions/setup-go@v5
|
||||
with:
|
||||
go-version: '1.21'
|
||||
|
||||
- name: Build binary
|
||||
run: |
|
||||
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build \
|
||||
-ldflags="-w -s" \
|
||||
-o logcorrelator \
|
||||
./cmd/logcorrelator
|
||||
|
||||
- name: Upload binary artifact
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: logcorrelator-linux-amd64
|
||||
path: logcorrelator
|
||||
|
||||
docker:
|
||||
runs-on: ubuntu-latest
|
||||
needs: test
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Build Docker image
|
||||
run: docker build -t logcorrelator:latest .
|
||||
|
||||
- name: Run tests in Docker
|
||||
run: |
|
||||
docker run --rm logcorrelator:latest --help || true
|
||||
32
services/correlator/.gitignore
vendored
Normal file
32
services/correlator/.gitignore
vendored
Normal file
@ -0,0 +1,32 @@
|
||||
# Build directory
|
||||
/build/
|
||||
/dist/
|
||||
|
||||
# Binaries
|
||||
*.exe
|
||||
*.exe~
|
||||
*.dll
|
||||
*.so
|
||||
*.dylib
|
||||
/logcorrelator
|
||||
|
||||
# Test binary
|
||||
*.test
|
||||
|
||||
# Output of the go coverage tool
|
||||
*.out
|
||||
|
||||
# Dependency directories
|
||||
vendor/
|
||||
|
||||
# IDE
|
||||
.idea/
|
||||
.vscode/
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
|
||||
# OS
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
.aider*
|
||||
43
services/correlator/Dockerfile
Normal file
43
services/correlator/Dockerfile
Normal file
@ -0,0 +1,43 @@
|
||||
# syntax=docker/dockerfile:1
|
||||
FROM golang:1.24 AS builder
|
||||
|
||||
WORKDIR /build
|
||||
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends git bc && rm -rf /var/lib/apt/lists/*
|
||||
|
||||
COPY go.work go.work.sum* ./
|
||||
COPY shared/go/ja4common/ ./shared/go/ja4common/
|
||||
COPY services/sentinel/go.mod services/sentinel/go.sum* ./services/sentinel/
|
||||
COPY services/correlator/go.mod services/correlator/go.sum* ./services/correlator/
|
||||
|
||||
WORKDIR /build/services/correlator
|
||||
RUN --mount=type=cache,target=/go/pkg/mod go mod download
|
||||
|
||||
COPY services/correlator/ /build/services/correlator/
|
||||
|
||||
ARG SKIP_TESTS=false
|
||||
RUN --mount=type=cache,target=/go/pkg/mod \
|
||||
if [ "$SKIP_TESTS" = "false" ]; then \
|
||||
go test -race -coverprofile=coverage.txt -covermode=atomic ./... && \
|
||||
echo "=== Coverage Report ===" && \
|
||||
go tool cover -func=coverage.txt | grep total && \
|
||||
TOTAL=$(go tool cover -func=coverage.txt | grep total | awk '{gsub(/%/, "", $3); print $3}') && \
|
||||
echo "Total coverage: ${TOTAL}%" && \
|
||||
if (( $(echo "$TOTAL < 60" | bc -l) )); then \
|
||||
echo "ERROR: Coverage ${TOTAL}% is below 60% threshold"; \
|
||||
exit 1; \
|
||||
fi && \
|
||||
echo "Coverage check passed!"; \
|
||||
else \
|
||||
echo "Skipping tests (SKIP_TESTS=true)"; \
|
||||
fi
|
||||
|
||||
RUN --mount=type=cache,target=/go/pkg/mod \
|
||||
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build \
|
||||
-ldflags="-w -s" -o /usr/bin/correlator ./cmd/logcorrelator
|
||||
|
||||
FROM scratch AS runtime
|
||||
COPY --from=builder /usr/bin/correlator /usr/bin/correlator
|
||||
COPY --from=builder /build/services/correlator/config.example.yml /etc/correlator/correlator.yml
|
||||
ENTRYPOINT ["/usr/bin/correlator"]
|
||||
CMD ["-config", "/etc/correlator/correlator.yml"]
|
||||
110
services/correlator/Dockerfile.package
Normal file
110
services/correlator/Dockerfile.package
Normal file
@ -0,0 +1,110 @@
|
||||
# syntax=docker/dockerfile:1
|
||||
# =============================================================================
|
||||
# correlator — Dockerfile de packaging RPM (Rocky Linux 8/9, AlmaLinux 10)
|
||||
# Build context: monorepo root (ja4-platform/)
|
||||
# Méthode: 1 builder Go → 1 rpm-builder (rpmbuild, 3 × dist) → 1 output alpine
|
||||
# =============================================================================
|
||||
|
||||
# =============================================================================
|
||||
# Stage 1: Builder — compilation du binaire Go
|
||||
# golang:1.21 officiel (statiquement lié, CGO_ENABLED=0 → binaire portable)
|
||||
# =============================================================================
|
||||
FROM golang:1.24 AS builder
|
||||
|
||||
WORKDIR /build
|
||||
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends git bc && \
|
||||
rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Copie du workspace Go et du module partagé en premier (meilleur cache)
|
||||
COPY go.work go.work.sum* ./
|
||||
COPY shared/go/ja4common/ ./shared/go/ja4common/
|
||||
COPY services/sentinel/go.mod services/sentinel/go.sum* ./services/sentinel/
|
||||
COPY services/correlator/go.mod services/correlator/go.sum* ./services/correlator/
|
||||
|
||||
WORKDIR /build/services/correlator
|
||||
RUN --mount=type=cache,target=/go/pkg/mod go mod download
|
||||
|
||||
COPY services/correlator/ /build/services/correlator/
|
||||
|
||||
ARG VERSION=dev
|
||||
RUN --mount=type=cache,target=/go/pkg/mod \
|
||||
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
|
||||
go build -ldflags="-w -s -X main.Version=${VERSION}" \
|
||||
-o /tmp/correlator \
|
||||
./cmd/logcorrelator
|
||||
|
||||
# =============================================================================
|
||||
# Stage 2: rpm-builder — construction des RPMs avec rpmbuild
|
||||
# Un seul stage, trois appels rpmbuild successifs (el8, el9, el10).
|
||||
# Le spec lit les fichiers depuis %{_builddir} (répertoire BUILD de rpmbuild).
|
||||
# =============================================================================
|
||||
FROM rockylinux:9 AS rpm-builder
|
||||
|
||||
WORKDIR /package
|
||||
|
||||
ARG VERSION=dev
|
||||
|
||||
RUN dnf install -y rpm-build rpmdevtools && dnf clean all
|
||||
|
||||
RUN mkdir -p /root/rpmbuild/{BUILD,BUILDROOT,RPMS,SOURCES,SPECS,SRPMS} && \
|
||||
mkdir -p /packages/rpm/{el8,el9,el10}
|
||||
|
||||
# Disposition des fichiers dans BUILD/ (attendue par le spec correlator)
|
||||
RUN mkdir -p /root/rpmbuild/BUILD/usr/bin \
|
||||
/root/rpmbuild/BUILD/etc/logcorrelator \
|
||||
/root/rpmbuild/BUILD/etc/systemd/system \
|
||||
/root/rpmbuild/BUILD/etc/logrotate.d
|
||||
|
||||
COPY --from=builder /tmp/correlator /root/rpmbuild/BUILD/usr/bin/logcorrelator
|
||||
COPY services/correlator/config.example.yml /root/rpmbuild/BUILD/etc/logcorrelator/logcorrelator.yml
|
||||
COPY services/correlator/config.example.yml /root/rpmbuild/BUILD/etc/logcorrelator/logcorrelator.yml.example
|
||||
COPY services/correlator/logcorrelator.service /root/rpmbuild/BUILD/etc/systemd/system/logcorrelator.service
|
||||
COPY services/correlator/packaging/rpm/logrotate /root/rpmbuild/BUILD/etc/logrotate.d/logcorrelator
|
||||
|
||||
RUN chmod 755 /root/rpmbuild/BUILD/usr/bin/logcorrelator && \
|
||||
chmod 640 /root/rpmbuild/BUILD/etc/logcorrelator/logcorrelator.yml && \
|
||||
chmod 640 /root/rpmbuild/BUILD/etc/logcorrelator/logcorrelator.yml.example && \
|
||||
chmod 644 /root/rpmbuild/BUILD/etc/systemd/system/logcorrelator.service && \
|
||||
chmod 644 /root/rpmbuild/BUILD/etc/logrotate.d/logcorrelator
|
||||
|
||||
COPY services/correlator/packaging/rpm/logcorrelator.spec /root/rpmbuild/SPECS/logcorrelator.spec
|
||||
|
||||
# el8
|
||||
RUN rpmbuild --define "_topdir /root/rpmbuild" \
|
||||
--define "dist .el8" \
|
||||
--define "version ${VERSION}" \
|
||||
--target x86_64 \
|
||||
-bb /root/rpmbuild/SPECS/logcorrelator.spec && \
|
||||
cp /root/rpmbuild/RPMS/x86_64/*.el8.x86_64.rpm /packages/rpm/el8/
|
||||
|
||||
# el9
|
||||
RUN rpmbuild --define "_topdir /root/rpmbuild" \
|
||||
--define "dist .el9" \
|
||||
--define "version ${VERSION}" \
|
||||
--target x86_64 \
|
||||
-bb /root/rpmbuild/SPECS/logcorrelator.spec && \
|
||||
cp /root/rpmbuild/RPMS/x86_64/*.el9.x86_64.rpm /packages/rpm/el9/
|
||||
|
||||
# el10
|
||||
RUN rpmbuild --define "_topdir /root/rpmbuild" \
|
||||
--define "dist .el10" \
|
||||
--define "version ${VERSION}" \
|
||||
--target x86_64 \
|
||||
-bb /root/rpmbuild/SPECS/logcorrelator.spec && \
|
||||
cp /root/rpmbuild/RPMS/x86_64/*.el10.x86_64.rpm /packages/rpm/el10/
|
||||
|
||||
# =============================================================================
|
||||
# Stage 3: output — image finale contenant uniquement les RPMs
|
||||
# =============================================================================
|
||||
FROM alpine:latest AS output
|
||||
|
||||
WORKDIR /packages
|
||||
COPY --from=rpm-builder /packages/rpm/el8/*.rpm /packages/rpm/el8/
|
||||
COPY --from=rpm-builder /packages/rpm/el9/*.rpm /packages/rpm/el9/
|
||||
COPY --from=rpm-builder /packages/rpm/el10/*.rpm /packages/rpm/el10/
|
||||
|
||||
CMD ["sh", "-c", \
|
||||
"echo '=== RPM el8 ===' && ls -la /packages/rpm/el8/ && \
|
||||
echo '' && echo '=== RPM el9 ===' && ls -la /packages/rpm/el9/ && \
|
||||
echo '' && echo '=== RPM el10 ===' && ls -la /packages/rpm/el10/"]
|
||||
148
services/correlator/Makefile
Normal file
148
services/correlator/Makefile
Normal file
@ -0,0 +1,148 @@
|
||||
.PHONY: build build-docker test test-docker lint clean help docker-build-dev docker-build-runtime package package-rpm
|
||||
|
||||
# Docker parameters
|
||||
DOCKER=docker
|
||||
# Use buildx for better cache management and parallel builds
|
||||
DOCKER_BUILD=$(DOCKER) build
|
||||
DOCKER_BUILDX=$(DOCKER) buildx
|
||||
DOCKER_RUN=$(DOCKER) run
|
||||
|
||||
# Image names
|
||||
DEV_IMAGE=logcorrelator-dev:latest
|
||||
RUNTIME_IMAGE=logcorrelator:latest
|
||||
PACKAGER_IMAGE=logcorrelator-packager:latest
|
||||
PACKAGER_IMAGE_EL8=logcorrelator-packager-el8:latest
|
||||
PACKAGER_IMAGE_EL9=logcorrelator-packager-el9:latest
|
||||
PACKAGER_IMAGE_EL10=logcorrelator-packager-el10:latest
|
||||
|
||||
# Binary name
|
||||
BINARY_NAME=logcorrelator
|
||||
DIST_DIR=dist
|
||||
|
||||
# Package version
|
||||
PKG_VERSION ?= 1.1.22
|
||||
|
||||
# Enable BuildKit for better performance
|
||||
export DOCKER_BUILDKIT=1
|
||||
|
||||
## build: Build the logcorrelator binary locally
|
||||
build:
|
||||
mkdir -p $(DIST_DIR)
|
||||
go build -ldflags="-w -s" -o $(DIST_DIR)/$(BINARY_NAME) ./cmd/$(BINARY_NAME)
|
||||
|
||||
## docker-build-dev: Build the development Docker image (with tests and coverage)
|
||||
docker-build-dev:
|
||||
$(DOCKER_BUILD) --target builder -t $(DEV_IMAGE) -f Dockerfile .
|
||||
|
||||
## docker-build-dev-no-test: Build the development Docker image WITHOUT tests (faster)
|
||||
docker-build-dev-no-test:
|
||||
$(DOCKER_BUILD) --target builder --no-cache --build-arg SKIP_TESTS=true -t $(DEV_IMAGE) -f Dockerfile .
|
||||
|
||||
## docker-build-runtime: Build the runtime Docker image (fast, no tests)
|
||||
docker-build-runtime:
|
||||
$(DOCKER_BUILD) --target runtime -t $(RUNTIME_IMAGE) -f Dockerfile .
|
||||
|
||||
## test: Run unit tests locally
|
||||
test:
|
||||
go test -race -coverprofile=coverage.out ./...
|
||||
|
||||
## test-docker: Run unit tests inside Docker container
|
||||
test-docker: docker-build-dev
|
||||
@echo "Tests already run in builder stage"
|
||||
|
||||
## lint: Run linters
|
||||
lint:
|
||||
go vet ./...
|
||||
gofmt -l .
|
||||
|
||||
## fmt: Format all Go files
|
||||
fmt:
|
||||
gofmt -w .
|
||||
|
||||
## package: Build RPM packages for all target distributions
|
||||
package: package-rpm
|
||||
|
||||
## package-rpm: Build RPM packages for Rocky Linux 8/9, AlmaLinux 10 (requires Docker)
|
||||
## Uses buildx for parallel builds (el8, el9, el10 built simultaneously)
|
||||
package-rpm:
|
||||
mkdir -p $(DIST_DIR)/rpm/el8 $(DIST_DIR)/rpm/el9 $(DIST_DIR)/rpm/el10
|
||||
@echo "Starting parallel RPM builds for el8, el9, el10..."
|
||||
# Build all three distributions in parallel using buildx
|
||||
$(DOCKER_BUILDX) build --target output -t $(PACKAGER_IMAGE) \
|
||||
--build-arg VERSION=$(PKG_VERSION) \
|
||||
-f Dockerfile.package . \
|
||||
--load
|
||||
@echo "Extracting RPM packages from Docker image..."
|
||||
$(DOCKER_RUN) --rm -v $(PWD)/$(DIST_DIR)/rpm:/output/rpm $(PACKAGER_IMAGE) sh -c \
|
||||
"cp -r /packages/rpm/el8 /output/rpm/ && \
|
||||
cp -r /packages/rpm/el9 /output/rpm/ && \
|
||||
cp -r /packages/rpm/el10 /output/rpm/"
|
||||
@echo "RPM packages created:"
|
||||
@echo " Enterprise Linux 8 (el8):"
|
||||
ls -la $(DIST_DIR)/rpm/el8/ 2>/dev/null || echo " (no packages)"
|
||||
@echo " Enterprise Linux 9 (el9):"
|
||||
ls -la $(DIST_DIR)/rpm/el9/ 2>/dev/null || echo " (no packages)"
|
||||
@echo " Enterprise Linux 10 (el10):"
|
||||
ls -la $(DIST_DIR)/rpm/el10/ 2>/dev/null || echo " (no packages)"
|
||||
|
||||
## package-rpm-sequential: Build RPM packages sequentially (fallback if parallel fails)
|
||||
package-rpm-sequential:
|
||||
mkdir -p $(DIST_DIR)/rpm/el8 $(DIST_DIR)/rpm/el9 $(DIST_DIR)/rpm/el10
|
||||
@echo "Building RPM for el8..."
|
||||
$(DOCKER_BUILD) --target rpm-el8-builder -t $(PACKAGER_IMAGE_EL8) \
|
||||
--build-arg VERSION=$(PKG_VERSION) \
|
||||
-f Dockerfile.package .
|
||||
@echo "Building RPM for el9..."
|
||||
$(DOCKER_BUILD) --target rpm-el9-builder -t $(PACKAGER_IMAGE_EL9) \
|
||||
--build-arg VERSION=$(PKG_VERSION) \
|
||||
-f Dockerfile.package .
|
||||
@echo "Building RPM for el10..."
|
||||
$(DOCKER_BUILD) --target rpm-el10-builder -t $(PACKAGER_IMAGE_EL10) \
|
||||
--build-arg VERSION=$(PKG_VERSION) \
|
||||
-f Dockerfile.package .
|
||||
@echo "Extracting RPM packages..."
|
||||
$(DOCKER_RUN) --rm -v $(PWD)/$(DIST_DIR)/rpm:/output/rpm \
|
||||
-v $(PACKAGER_IMAGE_EL8):/el8:ro \
|
||||
-v $(PACKAGER_IMAGE_EL9):/el9:ro \
|
||||
-v $(PACKAGER_IMAGE_EL10):/el10:ro \
|
||||
alpine:latest sh -c \
|
||||
"cp -r /el8/packages/rpm/el8 /output/rpm/ && \
|
||||
cp -r /el9/packages/rpm/el9 /output/rpm/ && \
|
||||
cp -r /el10/packages/rpm/el10 /output/rpm/"
|
||||
|
||||
## test-package-rpm: Test RPM package installation in Docker
|
||||
test-package-rpm: package-rpm
|
||||
./packaging/test/test-rpm.sh
|
||||
|
||||
## test-package: Test RPM package installation
|
||||
test-package: test-package-rpm
|
||||
|
||||
## ci: Full CI pipeline (tests, build, packages, package tests)
|
||||
ci: ci-test ci-build ci-package ci-package-test
|
||||
|
||||
## ci-test: Run all tests for CI
|
||||
ci-test: test lint
|
||||
|
||||
## ci-build: Build for CI (production binary)
|
||||
ci-build: build
|
||||
|
||||
## ci-package: Build all packages for CI
|
||||
ci-package: package
|
||||
|
||||
## ci-package-test: Test all packages for CI
|
||||
ci-package-test: test-package
|
||||
|
||||
## clean: Clean build artifacts and Docker images
|
||||
clean:
|
||||
rm -rf $(DIST_DIR)/
|
||||
rm -f coverage.out
|
||||
$(DOCKER) rmi $(DEV_IMAGE) 2>/dev/null || true
|
||||
$(DOCKER) rmi $(RUNTIME_IMAGE) 2>/dev/null || true
|
||||
$(DOCKER) rmi $(PACKAGER_IMAGE) 2>/dev/null || true
|
||||
|
||||
## help: Show this help message
|
||||
help:
|
||||
@echo "Usage: make [target]"
|
||||
@echo ""
|
||||
@echo "Targets:"
|
||||
@sed -n 's/^##//p' $(MAKEFILE_LIST) | column -t -s ':' | sed -e 's/^/ /'
|
||||
426
services/correlator/README.md
Normal file
426
services/correlator/README.md
Normal file
@ -0,0 +1,426 @@
|
||||
# logcorrelator
|
||||
|
||||
Service de corrélation de logs HTTP et réseau écrit en Go.
|
||||
|
||||
## Description
|
||||
|
||||
**logcorrelator** reçoit deux flux de logs JSON via des sockets Unix datagrammes (SOCK_DGRAM) :
|
||||
- **Source A** : logs HTTP applicatifs (Apache, reverse proxy)
|
||||
- **Source B** : logs réseau (métadonnées IP/TCP, JA3/JA4, etc.)
|
||||
|
||||
Il corrèle les événements sur la base de `src_ip + src_port` dans une fenêtre temporelle configurable, et produit des logs corrélés vers :
|
||||
- Un fichier local (JSON lines)
|
||||
- ClickHouse (pour analyse et archivage)
|
||||
|
||||
Les logs opérationnels du service (démarrage, erreurs, métriques) sont écrits sur **stderr** et collectés par journald. Aucune donnée corrélée n'apparaît sur stdout.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
|
||||
│ Source A │────▶│ │────▶│ File Sink │
|
||||
│ HTTP/Apache │ │ Correlation │ │ (JSON lines) │
|
||||
│ (Unix DGRAM) │ │ Service │ └─────────────────┘
|
||||
└─────────────────┘ │ │
|
||||
│ - Buffers │ ┌─────────────────┐
|
||||
┌─────────────────┐ │ - Time Window │────▶│ ClickHouse │
|
||||
│ Source B │────▶│ - Orphan Policy │ │ Sink │
|
||||
│ Réseau/JA4 │ │ - Keep-Alive │ └─────────────────┘
|
||||
│ (Unix DGRAM) │ └──────────────────┘
|
||||
└─────────────────┘
|
||||
```
|
||||
|
||||
Architecture hexagonale : domaine pur (`internal/domain`), ports abstraits (`internal/ports`), adaptateurs (`internal/adapters`), orchestration (`internal/app`).
|
||||
|
||||
## Build (100% Docker)
|
||||
|
||||
Tout le build, les tests et le packaging RPM s'exécutent dans des conteneurs :
|
||||
|
||||
```bash
|
||||
# Build complet avec tests (builder stage)
|
||||
make docker-build-dev
|
||||
|
||||
# Packaging RPM (el8, el9, el10)
|
||||
make package-rpm
|
||||
|
||||
# Build rapide sans tests
|
||||
make docker-build-dev-no-test
|
||||
|
||||
# Tests en local (nécessite Go 1.21+)
|
||||
make test
|
||||
```
|
||||
|
||||
### Prérequis
|
||||
|
||||
- Docker 20.10+
|
||||
|
||||
## Installation
|
||||
|
||||
### Packages RPM
|
||||
|
||||
```bash
|
||||
# Générer les packages
|
||||
make package-rpm
|
||||
|
||||
# Installer (Rocky Linux / AlmaLinux)
|
||||
sudo dnf install -y dist/rpm/el8/logcorrelator-1.1.12-1.el8.x86_64.rpm
|
||||
sudo dnf install -y dist/rpm/el9/logcorrelator-1.1.12-1.el9.x86_64.rpm
|
||||
sudo dnf install -y dist/rpm/el10/logcorrelator-1.1.12-1.el10.x86_64.rpm
|
||||
|
||||
# Démarrer
|
||||
sudo systemctl enable --now logcorrelator
|
||||
sudo systemctl status logcorrelator
|
||||
```
|
||||
|
||||
### Build manuel
|
||||
|
||||
```bash
|
||||
# Binaire local (nécessite Go 1.21+)
|
||||
go build -o logcorrelator ./cmd/logcorrelator
|
||||
./logcorrelator -config config.example.yml
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Fichier YAML. Voir `config.example.yml` pour un exemple complet.
|
||||
|
||||
```yaml
|
||||
log:
|
||||
level: INFO # DEBUG, INFO, WARN, ERROR
|
||||
|
||||
inputs:
|
||||
unix_sockets:
|
||||
- name: http
|
||||
source_type: A # Source HTTP
|
||||
path: /var/run/logcorrelator/http.socket
|
||||
format: json
|
||||
socket_permissions: "0666"
|
||||
- name: network
|
||||
source_type: B # Source réseau
|
||||
path: /var/run/logcorrelator/network.socket
|
||||
format: json
|
||||
socket_permissions: "0666"
|
||||
|
||||
outputs:
|
||||
file:
|
||||
path: /var/log/logcorrelator/correlated.log
|
||||
clickhouse:
|
||||
enabled: false
|
||||
dsn: clickhouse://user:pass@localhost:9000/db
|
||||
table: http_logs_raw
|
||||
batch_size: 500
|
||||
flush_interval_ms: 200
|
||||
max_buffer_size: 5000
|
||||
drop_on_overflow: true
|
||||
timeout_ms: 1000
|
||||
stdout:
|
||||
enabled: false # no-op pour les données ; logs opérationnels toujours sur stderr
|
||||
|
||||
correlation:
|
||||
time_window:
|
||||
value: 10
|
||||
unit: s
|
||||
orphan_policy:
|
||||
apache_always_emit: true
|
||||
apache_emit_delay_ms: 500 # délai avant émission orphelin A (ms)
|
||||
network_emit: false
|
||||
matching:
|
||||
mode: one_to_many # Keep-Alive : un B peut corréler plusieurs A successifs
|
||||
buffers:
|
||||
max_http_items: 10000
|
||||
max_network_items: 20000
|
||||
ttl:
|
||||
network_ttl_s: 120 # TTL remis à zéro à chaque corrélation (Keep-Alive)
|
||||
# Exclure des IPs source (IPs uniques ou plages CIDR)
|
||||
exclude_source_ips:
|
||||
- 10.0.0.1
|
||||
- 172.16.0.0/12
|
||||
# Restreindre la corrélation à certains ports de destination (optionnel)
|
||||
# Si la liste est vide, tous les ports sont corrélés
|
||||
include_dest_ports:
|
||||
- 80
|
||||
- 443
|
||||
|
||||
metrics:
|
||||
enabled: false
|
||||
addr: ":8080"
|
||||
```
|
||||
|
||||
### Format du DSN ClickHouse
|
||||
|
||||
```
|
||||
clickhouse://username:password@host:port/database
|
||||
```
|
||||
|
||||
Ports : `9000` (natif, recommandé) ou `8123` (HTTP).
|
||||
|
||||
## Format des logs
|
||||
|
||||
### Source A (HTTP)
|
||||
|
||||
```json
|
||||
{
|
||||
"src_ip": "192.168.1.1", "src_port": 8080,
|
||||
"dst_ip": "10.0.0.1", "dst_port": 443,
|
||||
"timestamp": 1704110400000000000,
|
||||
"method": "GET", "path": "/api/test"
|
||||
}
|
||||
```
|
||||
|
||||
### Source B (Réseau)
|
||||
|
||||
```json
|
||||
{
|
||||
"src_ip": "192.168.1.1", "src_port": 8080,
|
||||
"dst_ip": "10.0.0.1", "dst_port": 443,
|
||||
"ja3": "abc123", "ja4": "xyz789"
|
||||
}
|
||||
```
|
||||
|
||||
### Log corrélé (sortie)
|
||||
|
||||
Structure JSON plate — tous les champs A et B sont fusionnés à la racine :
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2024-01-01T12:00:00Z",
|
||||
"src_ip": "192.168.1.1", "src_port": 8080,
|
||||
"dst_ip": "10.0.0.1", "dst_port": 443,
|
||||
"correlated": true,
|
||||
"method": "GET", "path": "/api/test",
|
||||
"ja3": "abc123", "ja4": "xyz789"
|
||||
}
|
||||
```
|
||||
|
||||
En cas de collision de champ entre A et B, les deux valeurs sont conservées avec préfixes `a_` et `b_`.
|
||||
|
||||
Les orphelins A (sans B correspondant) sont émis avec `"correlated": false, "orphan_side": "A"`.
|
||||
|
||||
## Schema ClickHouse
|
||||
|
||||
Le fichier `sql/init.sql` contient le schéma complet prêt à l'emploi.
|
||||
|
||||
```bash
|
||||
clickhouse-client --multiquery < sql/init.sql
|
||||
```
|
||||
|
||||
### Architecture des tables
|
||||
|
||||
```
|
||||
http_logs_raw ← inserts du service (raw_json String)
|
||||
│
|
||||
└─ mv_http_logs ← vue matérialisée (parse JSON → colonnes typées)
|
||||
│
|
||||
▼
|
||||
http_logs ← table requêtable par les analystes
|
||||
```
|
||||
|
||||
### Table `http_logs` — colonnes
|
||||
|
||||
| Groupe | Colonnes |
|
||||
|---|---|
|
||||
| Temporel | `time` DateTime, `log_date` Date |
|
||||
| Réseau | `src_ip` IPv4, `src_port` UInt16, `dst_ip` IPv4, `dst_port` UInt16 |
|
||||
| HTTP | `method`, `scheme`, `host`, `path`, `query`, `http_version` (LowCardinality) |
|
||||
| Corrélation | `orphan_side`, `correlated` UInt8, `keepalives` UInt16, `a_timestamp`/`b_timestamp` UInt64, `conn_id` |
|
||||
| IP meta | `ip_meta_df` UInt8, `ip_meta_id` UInt16, `ip_meta_total_length` UInt16, `ip_meta_ttl` UInt8 |
|
||||
| TCP meta | `tcp_meta_options`, `tcp_meta_window_size` UInt32, `tcp_meta_mss` UInt16, `tcp_meta_window_scale` UInt8, `syn_to_clienthello_ms` Int32 |
|
||||
| TLS / fingerprint | `tls_version`, `tls_sni`, `tls_alpn` (LowCardinality), `ja3`, `ja3_hash`, `ja4` |
|
||||
| En-têtes HTTP | `header_user_agent`, `header_accept`, `header_accept_encoding`, `header_accept_language`, `header_x_request_id`, `header_x_trace_id`, `header_x_forwarded_for`, `header_sec_ch_ua*`, `header_sec_fetch_*` |
|
||||
|
||||
### Utilisateurs et permissions
|
||||
|
||||
```sql
|
||||
-- data_writer : INSERT sur http_logs_raw uniquement (compte du service)
|
||||
GRANT INSERT ON mabase_prod.http_logs_raw TO data_writer;
|
||||
GRANT SELECT ON mabase_prod.http_logs_raw TO data_writer;
|
||||
|
||||
-- analyst : lecture sur la table parsée
|
||||
GRANT SELECT ON mabase_prod.http_logs TO analyst;
|
||||
```
|
||||
|
||||
### Vérification de l'ingestion
|
||||
|
||||
```sql
|
||||
-- Données brutes reçues
|
||||
SELECT count(*), min(ingest_time), max(ingest_time) FROM mabase_prod.http_logs_raw;
|
||||
|
||||
-- Données parsées par la vue matérialisée
|
||||
SELECT count(*), min(time), max(time) FROM mabase_prod.http_logs;
|
||||
|
||||
-- Derniers logs corrélés
|
||||
SELECT time, src_ip, dst_ip, method, host, path, ja4
|
||||
FROM mabase_prod.http_logs
|
||||
WHERE correlated = 1
|
||||
ORDER BY time DESC LIMIT 10;
|
||||
```
|
||||
|
||||
## Signaux
|
||||
|
||||
| Signal | Comportement |
|
||||
|--------|--------------|
|
||||
| `SIGINT` / `SIGTERM` | Arrêt gracieux (drain buffers, flush sinks) |
|
||||
| `SIGHUP` | Réouverture des fichiers de sortie (log rotation) |
|
||||
|
||||
## Logs internes
|
||||
|
||||
Les logs opérationnels vont sur **stderr** :
|
||||
|
||||
```bash
|
||||
# Systemd
|
||||
journalctl -u logcorrelator -f
|
||||
|
||||
# Docker
|
||||
docker logs -f logcorrelator
|
||||
```
|
||||
|
||||
## Structure du projet
|
||||
|
||||
```
|
||||
cmd/logcorrelator/ # Point d'entrée
|
||||
internal/
|
||||
adapters/
|
||||
inbound/unixsocket/ # Lecture SOCK_DGRAM → NormalizedEvent
|
||||
outbound/
|
||||
clickhouse/ # Sink ClickHouse (batch, retry, logging complet)
|
||||
file/ # Sink fichier (JSON lines, SIGHUP reopen)
|
||||
multi/ # Fan-out vers plusieurs sinks
|
||||
stdout/ # No-op pour les données (logs opérationnels sur stderr)
|
||||
app/ # Orchestrator (sources → corrélation → sinks)
|
||||
config/ # Chargement/validation YAML
|
||||
domain/ # CorrelationService, NormalizedEvent, CorrelatedLog
|
||||
observability/ # Logger, métriques, serveur HTTP /metrics /health
|
||||
ports/ # Interfaces EventSource, CorrelatedLogSink, CorrelationProcessor
|
||||
config.example.yml # Exemple de configuration
|
||||
Dockerfile # Build multi-stage (builder, runtime, dev)
|
||||
Dockerfile.package # Packaging RPM multi-distros (el8, el9, el10)
|
||||
Makefile # Cibles de build
|
||||
architecture.yml # Spécification architecture
|
||||
logcorrelator.service # Unité systemd
|
||||
```
|
||||
|
||||
## Débogage
|
||||
|
||||
### Logs DEBUG
|
||||
|
||||
```yaml
|
||||
log:
|
||||
level: DEBUG
|
||||
```
|
||||
|
||||
Exemples de logs produits :
|
||||
```
|
||||
[unixsocket:http] DEBUG event received: source=A src_ip=192.168.1.1 src_port=8080
|
||||
[correlation] DEBUG processing A event: key=192.168.1.1:8080
|
||||
[correlation] DEBUG correlation found: A(src_ip=... src_port=... ts=...) + B(...)
|
||||
[correlation] DEBUG A event has no matching B key in buffer: key=...
|
||||
[correlation] DEBUG event excluded by IP filter: source=A src_ip=10.0.0.1 src_port=8080
|
||||
[correlation] DEBUG event excluded by dest port filter: source=A dst_port=22
|
||||
[correlation] DEBUG TTL reset for B event (Keep-Alive): key=... new_ttl=120s
|
||||
[clickhouse] DEBUG batch sent: rows=42 table=http_logs_raw
|
||||
```
|
||||
|
||||
### Serveur de métriques
|
||||
|
||||
```yaml
|
||||
metrics:
|
||||
enabled: true
|
||||
addr: ":8080"
|
||||
```
|
||||
|
||||
`GET /health` → `{"status":"healthy"}`
|
||||
|
||||
`GET /metrics` :
|
||||
|
||||
```json
|
||||
{
|
||||
"events_received_a": 1542, "events_received_b": 1498,
|
||||
"correlations_success": 1450, "correlations_failed": 92,
|
||||
"failed_no_match_key": 45, "failed_time_window": 23,
|
||||
"failed_buffer_eviction": 5, "failed_ttl_expired": 12,
|
||||
"failed_ip_excluded": 7, "failed_dest_port_filtered": 3,
|
||||
"buffer_a_size": 23, "buffer_b_size": 18,
|
||||
"orphans_emitted_a": 92, "orphans_pending_a": 4,
|
||||
"keepalive_resets": 892
|
||||
}
|
||||
```
|
||||
|
||||
### Diagnostic par métriques
|
||||
|
||||
| Métrique élevée | Cause | Solution |
|
||||
|---|---|---|
|
||||
| `failed_no_match_key` | A et B n'ont pas le même `src_ip:src_port` | Vérifier les deux sources |
|
||||
| `failed_time_window` | Timestamps trop éloignés | Augmenter `time_window.value` ou vérifier NTP |
|
||||
| `failed_ttl_expired` | B expire avant corrélation | Augmenter `ttl.network_ttl_s` |
|
||||
| `failed_buffer_eviction` | Buffers trop petits | Augmenter `buffers.max_http_items` / `max_network_items` |
|
||||
| `failed_ip_excluded` | Traffic depuis IPs exclues | Normal si attendu |
|
||||
| `failed_dest_port_filtered` | Traffic sur ports non listés | Vérifier `include_dest_ports` |
|
||||
| `orphans_emitted_a` élevé | Beaucoup de A sans B | Vérifier que la source B envoie des événements |
|
||||
|
||||
### Filtrage par IP source
|
||||
|
||||
```yaml
|
||||
correlation:
|
||||
exclude_source_ips:
|
||||
- 10.0.0.1 # IP unique (health checks)
|
||||
- 172.16.0.0/12 # Plage CIDR
|
||||
```
|
||||
|
||||
Les événements depuis ces IPs sont silencieusement ignorés (non corrélés, non émis en orphelin). La métrique `failed_ip_excluded` comptabilise les exclusions.
|
||||
|
||||
### Filtrage par port de destination
|
||||
|
||||
```yaml
|
||||
correlation:
|
||||
include_dest_ports:
|
||||
- 80 # HTTP
|
||||
- 443 # HTTPS
|
||||
- 8080
|
||||
- 8443
|
||||
```
|
||||
|
||||
Si la liste est non vide, seuls les événements dont le `dst_port` est dans la liste participent à la corrélation. Les autres sont silencieusement ignorés. Liste vide = tous les ports corrélés (comportement par défaut). La métrique `failed_dest_port_filtered` comptabilise les exclusions.
|
||||
|
||||
### Scripts de test
|
||||
|
||||
```bash
|
||||
# Script Bash (simple)
|
||||
./scripts/test-correlation.sh -c 10 -v
|
||||
|
||||
# Script Python (scénarios complets : basic, time window, keepalive, différentes IPs)
|
||||
pip install requests
|
||||
python3 scripts/test-correlation-advanced.py --all
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### ClickHouse : erreurs d'insertion
|
||||
|
||||
- **`No such column`** : vérifier que la table `http_logs_raw` utilise la colonne unique `raw_json` (pas de colonnes séparées)
|
||||
- **`ACCESS_DENIED`** : `GRANT INSERT ON mabase_prod.http_logs_raw TO data_writer;`
|
||||
- Les erreurs de flush sont loggées en ERROR dans les logs du service
|
||||
|
||||
### Vue matérialisée vide
|
||||
|
||||
Si `http_logs_raw` a des données mais `http_logs` est vide :
|
||||
```sql
|
||||
-- Vérifier la vue
|
||||
SHOW CREATE TABLE mabase_prod.mv_http_logs;
|
||||
-- Vérifier les permissions (la MV s'exécute sous le compte du service)
|
||||
GRANT SELECT ON mabase_prod.http_logs_raw TO data_writer;
|
||||
```
|
||||
|
||||
### Sockets Unix : permission denied
|
||||
|
||||
Vérifier que `socket_permissions: "0666"` est configuré et que le répertoire `/var/run/logcorrelator` appartient à l'utilisateur `logcorrelator`.
|
||||
|
||||
### Service systemd ne démarre pas
|
||||
|
||||
```bash
|
||||
journalctl -u logcorrelator -n 50 --no-pager
|
||||
/usr/bin/logcorrelator -config /etc/logcorrelator/logcorrelator.yml
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
974
services/correlator/architecture.yml
Normal file
974
services/correlator/architecture.yml
Normal file
@ -0,0 +1,974 @@
|
||||
service:
|
||||
name: logcorrelator
|
||||
context: http-network-correlation
|
||||
language: go
|
||||
pattern: hexagonal
|
||||
description: >
|
||||
logcorrelator est un service système (lancé par systemd) écrit en Go, chargé
|
||||
de recevoir deux flux de logs JSON via des sockets Unix, de corréler les
|
||||
événements HTTP applicatifs (source A, typiquement Apache ou reverse proxy)
|
||||
avec des événements réseau (source B, métadonnées IP/TCP, JA3/JA4, etc.)
|
||||
sur la base de la combinaison strictement définie src_ip + src_port, avec
|
||||
une fenêtre temporelle configurable. Le service supporte les connexions
|
||||
HTTP Keep-Alive : un log réseau peut être corrélé à plusieurs logs HTTP
|
||||
successifs (stratégie 1‑à‑N). La rétention en mémoire est bornée par des
|
||||
tailles de caches configurables et un TTL dynamique pour la source B. Le
|
||||
service émet toujours les événements A même lorsqu'aucun événement B n'est
|
||||
disponible, n'émet jamais de logs B seuls, et pousse les résultats vers
|
||||
ClickHouse et/ou un fichier local.
|
||||
|
||||
Fonctionnalités de débogage incluses :
|
||||
- Serveur de métriques HTTP (/metrics, /health)
|
||||
- Logs DEBUG détaillés avec raisons des échecs de corrélation
|
||||
- Filtrage des IPs source (exclude_source_ips)
|
||||
- Scripts de test (Bash et Python)
|
||||
- Métriques : événements reçus, corrélations, échecs par raison, buffers, orphelins
|
||||
|
||||
runtime:
|
||||
deployment:
|
||||
unit_type: systemd
|
||||
description: >
|
||||
logcorrelator est livré sous forme de binaire autonome, exécuté comme un
|
||||
service systemd. L'unité systemd assure le démarrage automatique au boot,
|
||||
le redémarrage en cas de crash, et une intégration standard dans l'écosystème
|
||||
Linux.
|
||||
binary_path: /usr/bin/logcorrelator
|
||||
config_path: /etc/logcorrelator/logcorrelator.yml
|
||||
user: logcorrelator
|
||||
group: logcorrelator
|
||||
restart: on-failure
|
||||
systemd_unit:
|
||||
path: /etc/systemd/system/logcorrelator.service
|
||||
content_example: |
|
||||
[Unit]
|
||||
Description=logcorrelator service
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=logcorrelator
|
||||
Group=logcorrelator
|
||||
ExecStart=/usr/bin/logcorrelator -config /etc/logcorrelator/logcorrelator.yml
|
||||
ExecReload=/bin/kill -HUP $MAINPID
|
||||
Restart=on-failure
|
||||
RestartSec=5
|
||||
|
||||
# Security hardening
|
||||
NoNewPrivileges=true
|
||||
ProtectSystem=strict
|
||||
ProtectHome=true
|
||||
ReadWritePaths=/var/log/logcorrelator /var/run/logcorrelator /etc/logcorrelator
|
||||
|
||||
# Resource limits
|
||||
LimitNOFILE=65536
|
||||
|
||||
# Systemd timeouts
|
||||
TimeoutStartSec=10
|
||||
TimeoutStopSec=30
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
os:
|
||||
supported:
|
||||
- rocky-linux-8
|
||||
- rocky-linux-9
|
||||
- almalinux-10
|
||||
- autres-linux-recentes
|
||||
logs:
|
||||
stdout_stderr: journald
|
||||
structured: true
|
||||
description: >
|
||||
Les logs internes du service (erreurs, messages d'information) sont envoyés
|
||||
vers stdout/stderr et collectés par journald. Ils sont structurés et ne
|
||||
contiennent pas de données personnelles.
|
||||
signals:
|
||||
graceful_shutdown:
|
||||
- SIGINT
|
||||
- SIGTERM
|
||||
reload:
|
||||
- SIGHUP
|
||||
description: >
|
||||
SIGINT/SIGTERM : arrêt propre (arrêt des sockets, vidage des buffers, fermeture
|
||||
des sinks). SIGHUP : réouverture des fichiers de sortie (utile pour la
|
||||
rotation des logs via logrotate) sans arrêter le service.
|
||||
filesystem:
|
||||
description: >
|
||||
Permissions et propriété des fichiers et répertoires utilisés par logcorrelator.
|
||||
directories:
|
||||
- path: /var/run/logcorrelator
|
||||
owner: logcorrelator:logcorrelator
|
||||
permissions: "0755"
|
||||
purpose: >
|
||||
Contient les sockets Unix (http.socket, network.socket).
|
||||
Les sockets sont créés avec des permissions 0666 (world read/write).
|
||||
- path: /var/log/logcorrelator
|
||||
owner: logcorrelator:logcorrelator
|
||||
permissions: "0750"
|
||||
purpose: >
|
||||
Contient les logs corrélés (correlated.log).
|
||||
- path: /var/lib/logcorrelator
|
||||
owner: logcorrelator:logcorrelator
|
||||
permissions: "0750"
|
||||
purpose: >
|
||||
Répertoire home du service (données internes).
|
||||
- path: /etc/logcorrelator
|
||||
owner: logcorrelator:logcorrelator
|
||||
permissions: "0750"
|
||||
purpose: >
|
||||
Contient la configuration (logcorrelator.yml, logcorrelator.yml.example).
|
||||
files:
|
||||
- path: /etc/logcorrelator/logcorrelator.yml
|
||||
owner: logcorrelator:logcorrelator
|
||||
permissions: "0640"
|
||||
rpm_directive: "%config(noreplace)"
|
||||
- path: /etc/logcorrelator/logcorrelator.yml.example
|
||||
owner: logcorrelator:logcorrelator
|
||||
permissions: "0640"
|
||||
- path: /etc/systemd/system/logcorrelator.service
|
||||
owner: root:root
|
||||
permissions: "0644"
|
||||
- path: /etc/logrotate.d/logcorrelator
|
||||
owner: root:root
|
||||
permissions: "0644"
|
||||
rpm_directive: "%config(noreplace)"
|
||||
sockets:
|
||||
- path: /var/run/logcorrelator/http.socket
|
||||
owner: logcorrelator:logcorrelator
|
||||
permissions: "0666"
|
||||
type: unix_datagram
|
||||
purpose: "Source A - logs HTTP applicatifs"
|
||||
- path: /var/run/logcorrelator/network.socket
|
||||
owner: logcorrelator:logcorrelator
|
||||
permissions: "0666"
|
||||
type: unix_datagram
|
||||
purpose: "Source B - logs réseau"
|
||||
|
||||
packaging:
|
||||
description: >
|
||||
logcorrelator est distribué sous forme de packages .rpm (Rocky Linux, AlmaLinux,
|
||||
RHEL), construits intégralement dans des conteneurs. Le changelog RPM est mis
|
||||
à jour à chaque changement de version.
|
||||
Tous les numéros de version doivent être cohérents entre le spec RPM, le Makefile
|
||||
(PKG_VERSION), le CHANGELOG.md et les tags git.
|
||||
|
||||
Politique de mise à jour de la configuration :
|
||||
- Le fichier logcorrelator.yml est marqué %config(noreplace) : il n'est JAMAIS
|
||||
écrasé lors d'une mise à jour. La configuration existante est préservée.
|
||||
- Le fichier logcorrelator.yml.example est TOUJOURS mis à jour pour refléter
|
||||
les nouvelles options de configuration disponibles.
|
||||
- Lors de la première installation, si logcorrelator.yml n'existe pas, il est
|
||||
créé à partir de logcorrelator.yml.example.
|
||||
formats:
|
||||
- rpm
|
||||
target_distros:
|
||||
- rocky-linux-8
|
||||
- rocky-linux-9
|
||||
- almalinux-10
|
||||
- rhel-8
|
||||
- rhel-9
|
||||
- rhel-10
|
||||
rpm:
|
||||
tool: fpm
|
||||
changelog:
|
||||
source: git # ou CHANGELOG.md
|
||||
description: >
|
||||
À chaque build, un script génère un fichier de changelog RPM à partir de
|
||||
l'historique (tags/commits) et le passe à fpm (option --rpm-changelog).
|
||||
contents:
|
||||
- path: /usr/bin/logcorrelator
|
||||
type: binary
|
||||
- path: /etc/logcorrelator/logcorrelator.yml
|
||||
type: config
|
||||
directives: "%config(noreplace)"
|
||||
behavior: >
|
||||
Jamais écrasé lors des mises à jour. Préservé automatiquement par RPM.
|
||||
Créé uniquement lors de la première installation s'il n'existe pas.
|
||||
- path: /etc/logcorrelator/logcorrelator.yml.example
|
||||
type: doc
|
||||
behavior: >
|
||||
TOUJOURS mis à jour lors des mises à jour. Sert de référence pour les
|
||||
nouvelles options de configuration disponibles.
|
||||
- path: /etc/systemd/system/logcorrelator.service
|
||||
type: systemd_unit
|
||||
- path: /etc/logrotate.d/logcorrelator
|
||||
type: logrotate_script
|
||||
directives: "%config(noreplace)"
|
||||
logrotate_example: |
|
||||
/var/log/logcorrelator/correlated.log {
|
||||
daily
|
||||
rotate 7
|
||||
compress
|
||||
delaycompress
|
||||
missingok
|
||||
notifempty
|
||||
create 0640 logcorrelator logcorrelator
|
||||
sharedscripts
|
||||
postrotate
|
||||
/bin/systemctl reload logcorrelator > /dev/null 2>&1 || true
|
||||
endscript
|
||||
}
|
||||
|
||||
config:
|
||||
format: yaml
|
||||
location: /etc/logcorrelator/logcorrelator.yml
|
||||
reload_strategy: signal_sighup_for_files
|
||||
description: >
|
||||
Toute la configuration est centralisée dans un fichier YAML lisible. Le RPM
|
||||
fournit aussi un fichier d'exemple mis à jour à chaque version.
|
||||
example: |
|
||||
# /etc/logcorrelator/logcorrelator.yml
|
||||
|
||||
log:
|
||||
level: INFO # DEBUG, INFO, WARN, ERROR
|
||||
|
||||
inputs:
|
||||
unix_sockets:
|
||||
# Source HTTP (A) : logs applicatifs en JSON, 1 datagramme = 1 log.
|
||||
- name: http
|
||||
source_type: A
|
||||
path: /var/run/logcorrelator/http.socket
|
||||
format: json
|
||||
socket_permissions: "0666"
|
||||
|
||||
# Source réseau (B) : logs IP/TCP/JA3... en JSON, 1 datagramme = 1 log.
|
||||
- name: network
|
||||
source_type: B
|
||||
path: /var/run/logcorrelator/network.socket
|
||||
format: json
|
||||
socket_permissions: "0666"
|
||||
|
||||
outputs:
|
||||
file:
|
||||
enabled: true
|
||||
path: /var/log/logcorrelator/correlated.log
|
||||
|
||||
clickhouse:
|
||||
enabled: false
|
||||
dsn: clickhouse://user:pass@localhost:9000/db
|
||||
table: correlated_logs_http_network
|
||||
batch_size: 500
|
||||
flush_interval_ms: 200
|
||||
max_buffer_size: 5000
|
||||
drop_on_overflow: true
|
||||
async_insert: true
|
||||
timeout_ms: 1000
|
||||
|
||||
stdout:
|
||||
enabled: false
|
||||
level: INFO # DEBUG: tous les logs (y compris orphelins), INFO: seulement corrélés, WARN: corrélés seulement, ERROR: aucun
|
||||
|
||||
correlation:
|
||||
# Fenêtre de corrélation : si le log HTTP arrive avant le réseau, il attend
|
||||
# au plus cette durée (sauf éviction du cache HTTP).
|
||||
# Augmentée à 10s pour supporter le Keep-Alive HTTP.
|
||||
time_window:
|
||||
value: 10
|
||||
unit: s
|
||||
|
||||
orphan_policy:
|
||||
apache_always_emit: true # Toujours émettre les événements A, même sans correspondance B
|
||||
network_emit: false # Ne jamais émettre les événements B seuls
|
||||
|
||||
matching:
|
||||
mode: one_to_many # Keep‑Alive : un B peut corréler plusieurs A.
|
||||
|
||||
buffers:
|
||||
# Tailles max des caches en mémoire (en nombre de logs).
|
||||
max_http_items: 10000
|
||||
max_network_items: 20000
|
||||
|
||||
ttl:
|
||||
# Durée de vie standard d'un log réseau (B) en mémoire. Chaque corrélation
|
||||
# réussie avec un A réinitialise ce TTL.
|
||||
# Augmenté à 120s pour supporter les sessions HTTP Keep-Alive longues.
|
||||
network_ttl_s: 120
|
||||
|
||||
# Filtrage des IPs source à exclure (optionnel)
|
||||
exclude_source_ips:
|
||||
- 10.0.0.1 # IP unique
|
||||
- 172.16.0.0/12 # Plage CIDR
|
||||
# Les événements depuis ces IPs sont silencieusement ignorés
|
||||
|
||||
# Serveur de métriques HTTP (optionnel, pour débogage et monitoring)
|
||||
metrics:
|
||||
enabled: false
|
||||
addr: ":8080" # Adresse d'écoute du serveur HTTP
|
||||
# Endpoints:
|
||||
# GET /metrics - Retourne les métriques de corrélation en JSON
|
||||
# GET /health - Health check
|
||||
|
||||
inputs:
|
||||
description: >
|
||||
Deux flux de logs JSON via sockets Unix datagram (SOCK_DGRAM). Chaque datagramme
|
||||
contient un JSON complet. Le champ source_type ("A" ou "B") doit être spécifié
|
||||
pour chaque socket. À défaut, la source est déduite automatiquement (présence de
|
||||
headers = source A, sinon source B).
|
||||
unix_sockets:
|
||||
- name: http
|
||||
id: A
|
||||
description: >
|
||||
Source A, logs HTTP applicatifs (Apache, reverse proxy, etc.). Schéma JSON
|
||||
variable, champ timestamp (int64, nanosecondes) obligatoire, headers dynamiques (header_*).
|
||||
path: /var/run/logcorrelator/http.socket
|
||||
source_type: A
|
||||
permissions: "0666"
|
||||
protocol: unix
|
||||
socket_type: dgram
|
||||
mode: datagram
|
||||
format: json
|
||||
framing: message
|
||||
max_datagram_bytes: 65535
|
||||
retry_on_error: true
|
||||
|
||||
- name: network
|
||||
id: B
|
||||
description: >
|
||||
Source B, logs réseau (métadonnées IP/TCP, JA3/JA4, etc.). Seuls src_ip
|
||||
et src_port sont requis pour la corrélation. Le champ timestamp est optionnel ;
|
||||
s'il est absent, l'heure de réception est utilisée.
|
||||
path: /var/run/logcorrelator/network.socket
|
||||
source_type: B
|
||||
permissions: "0666"
|
||||
protocol: unix
|
||||
socket_type: dgram
|
||||
mode: datagram
|
||||
format: json
|
||||
framing: message
|
||||
max_datagram_bytes: 65535
|
||||
retry_on_error: true
|
||||
|
||||
outputs:
|
||||
description: >
|
||||
Les logs corrélés sont envoyés vers un ou plusieurs sinks (MultiSink).
|
||||
sinks:
|
||||
file:
|
||||
enabled: true
|
||||
description: >
|
||||
Sink fichier local. Un JSON par ligne. Rotation gérée par logrotate,
|
||||
réouverture du fichier sur SIGHUP. Le champ `enabled: false` coupe
|
||||
completement l'ecriture du fichier (le sink n'est pas cree).
|
||||
path: /var/log/logcorrelator/correlated.log
|
||||
format: json_lines
|
||||
rotate_managed_by: external_logrotate
|
||||
clickhouse:
|
||||
enabled: false
|
||||
description: >
|
||||
Sink principal pour l'archivage et l'analyse quasi temps réel. Inserts
|
||||
batch asynchrones, drop en cas de saturation. Le service insère uniquement
|
||||
dans une table RAW (raw_json String, ingest_time DateTime DEFAULT now()).
|
||||
La table parsée et la vue matérialisée sont gérées en externe (DDL séparés).
|
||||
Toutes les erreurs de connexion, de flush et de retry sont loggées :
|
||||
INFO à la connexion, ERROR sur échec de flush, WARN sur drop/retry, DEBUG sur envoi réussi.
|
||||
dsn: clickhouse://user:pass@host:9000/db
|
||||
table: correlated_logs_http_network
|
||||
batch_size: 500
|
||||
flush_interval_ms: 200
|
||||
max_buffer_size: 5000
|
||||
drop_on_overflow: true
|
||||
async_insert: true
|
||||
timeout_ms: 1000
|
||||
stdout:
|
||||
enabled: false
|
||||
description: >
|
||||
Sink no-op pour les données. Aucune donnée corrélée ou orpheline n'est
|
||||
jamais écrite sur stdout. Ce sink existe uniquement pour satisfaire
|
||||
l'interface CorrelatedLogSink. Les logs opérationnels du service
|
||||
(démarrage, erreurs, métriques de débogage) sont toujours sur stderr
|
||||
via observability.Logger, indépendamment de ce sink.
|
||||
|
||||
correlation:
|
||||
description: >
|
||||
Corrélation stricte basée sur src_ip + src_port et une fenêtre temporelle
|
||||
configurable. Aucun autre champ n'est utilisé pour la décision de corrélation.
|
||||
key:
|
||||
- src_ip
|
||||
- src_port
|
||||
time_window:
|
||||
value: 10
|
||||
unit: s
|
||||
description: >
|
||||
Fenêtre de temps appliquée aux timestamps de A et B. Si B n'arrive pas dans
|
||||
ce délai, A est émis comme orphelin. Augmentée à 10s pour le Keep-Alive.
|
||||
retention_limits:
|
||||
max_http_items: 10000
|
||||
max_network_items: 20000
|
||||
description: >
|
||||
Limites des caches. Si max_http_items est atteint, le plus ancien A est
|
||||
évincé et émis orphelin. Si max_network_items est atteint, le plus ancien B
|
||||
est supprimé silencieusement.
|
||||
ttl_management:
|
||||
network_ttl_s: 120
|
||||
description: >
|
||||
TTL des logs réseau. Chaque fois qu'un B est corrélé à un A (Keep-Alive),
|
||||
son TTL est remis à cette valeur. Augmenté à 120s pour les sessions longues.
|
||||
timestamp_source:
|
||||
apache: timestamp (champ int64, nanosecondes)
|
||||
network: timestamp (champ int64, nanosecondes) si présent, sinon time (RFC3339),
|
||||
sinon reception_time (time.Now())
|
||||
orphan_policy:
|
||||
apache_always_emit: true
|
||||
network_emit: false
|
||||
matching:
|
||||
mode: one_to_many
|
||||
description: >
|
||||
Stratégie 1‑à‑N : un log réseau peut être utilisé pour plusieurs logs HTTP
|
||||
successifs tant qu'il n'a pas expiré ni été évincé.
|
||||
ip_filtering:
|
||||
directive: exclude_source_ips
|
||||
description: >
|
||||
Liste d'IPs source (exactes ou plages CIDR) à ignorer silencieusement.
|
||||
Événements non corrélés, non émis en orphelin. Métrique : failed_ip_excluded.
|
||||
dest_port_filtering:
|
||||
directive: include_dest_ports
|
||||
description: >
|
||||
Liste blanche de ports de destination. Si non vide, seuls les événements
|
||||
dont le dst_port est dans la liste participent à la corrélation. Les autres
|
||||
sont silencieusement ignorés (non corrélés, non émis en orphelin).
|
||||
Liste vide = tous les ports autorisés (comportement par défaut).
|
||||
Métrique : failed_dest_port_filtered.
|
||||
example:
|
||||
include_dest_ports: [80, 443, 8080, 8443]
|
||||
|
||||
schema:
|
||||
description: >
|
||||
Schémas variables pour A et B. Quelques champs seulement sont obligatoires
|
||||
pour la corrélation, les autres sont acceptés sans modification de code.
|
||||
source_A:
|
||||
description: >
|
||||
Logs HTTP applicatifs au format JSON.
|
||||
required_fields:
|
||||
- name: src_ip
|
||||
type: string
|
||||
- name: src_port
|
||||
type: int
|
||||
- name: timestamp
|
||||
type: int64
|
||||
unit: ns
|
||||
optional_fields:
|
||||
- name: dst_ip
|
||||
type: string
|
||||
- name: dst_port
|
||||
type: int
|
||||
- name: method
|
||||
type: string
|
||||
- name: path
|
||||
type: string
|
||||
- name: host
|
||||
type: string
|
||||
- name: http_version
|
||||
type: string
|
||||
dynamic_fields:
|
||||
- pattern: header_*
|
||||
target_map: headers
|
||||
- pattern: "*"
|
||||
target_map: extra
|
||||
source_B:
|
||||
description: Logs réseau JSON (IP/TCP, JA3/JA4...).
|
||||
required_fields:
|
||||
- name: src_ip
|
||||
type: string
|
||||
- name: src_port
|
||||
type: int
|
||||
optional_fields:
|
||||
- name: dst_ip
|
||||
type: string
|
||||
- name: dst_port
|
||||
type: int
|
||||
- name: timestamp
|
||||
type: int64
|
||||
unit: ns
|
||||
- name: time
|
||||
type: string
|
||||
format: RFC3339 ou RFC3339Nano
|
||||
dynamic_fields:
|
||||
- pattern: "*"
|
||||
target_map: extra
|
||||
|
||||
normalized_event:
|
||||
description: >
|
||||
Représentation interne unifiée des événements A/B.
|
||||
fields:
|
||||
- name: source
|
||||
type: enum("A","B")
|
||||
- name: timestamp
|
||||
type: time.Time
|
||||
- name: src_ip
|
||||
type: string
|
||||
- name: src_port
|
||||
type: int
|
||||
- name: dst_ip
|
||||
type: string
|
||||
optional: true
|
||||
- name: dst_port
|
||||
type: int
|
||||
optional: true
|
||||
- name: headers
|
||||
type: map[string]string
|
||||
optional: true
|
||||
- name: extra
|
||||
type: map[string]any
|
||||
|
||||
correlated_log:
|
||||
description: >
|
||||
Structure du log corrélé émis vers les sinks.
|
||||
fields:
|
||||
- name: timestamp
|
||||
type: time.Time
|
||||
- name: src_ip
|
||||
type: string
|
||||
- name: src_port
|
||||
type: int
|
||||
- name: dst_ip
|
||||
type: string
|
||||
optional: true
|
||||
- name: dst_port
|
||||
type: int
|
||||
optional: true
|
||||
- name: correlated
|
||||
type: bool
|
||||
- name: orphan_side
|
||||
type: string
|
||||
- name: "*"
|
||||
type: map[string]any
|
||||
|
||||
clickhouse_schema:
|
||||
strategy: external_ddls
|
||||
database: mabase_prod
|
||||
description: >
|
||||
La table ClickHouse est gérée en dehors du service. Le service insère dans une
|
||||
table RAW avec une seule colonne raw_json contenant le log corrélé complet
|
||||
sérialisé en JSON. La colonne ingest_time utilise DEFAULT now().
|
||||
Toute extraction de champs (table parsée, vue matérialisée) est gérée en externe
|
||||
via des DDL séparés, non implémentés dans le service.
|
||||
tables:
|
||||
- name: http_logs_raw
|
||||
description: >
|
||||
Table d'ingestion brute. Une seule colonne raw_json contient le log corrélé
|
||||
complet sérialisé en JSON. La colonne ingest_time est auto-générée avec
|
||||
DEFAULT now(). Partitionnée par jour pour optimiser le TTL.
|
||||
engine: MergeTree
|
||||
partition_by: toDate(ingest_time)
|
||||
order_by: ingest_time
|
||||
columns:
|
||||
- name: raw_json
|
||||
type: String
|
||||
- name: ingest_time
|
||||
type: DateTime
|
||||
default: now()
|
||||
insert_format: |
|
||||
INSERT INTO mabase_prod.http_logs_raw (raw_json) VALUES
|
||||
('{...log corrélé sérialisé en JSON...}')
|
||||
notes: >
|
||||
Le service utilise l'API native clickhouse-go/v2 (PrepareBatch + Append + Send).
|
||||
La colonne ingest_time n'est PAS explicitement insérée (DEFAULT now() est utilisé).
|
||||
|
||||
- name: http_logs
|
||||
description: >
|
||||
Table parsée (optionnelle, gérée en externe). Le service n'implémente PAS
|
||||
l'extraction des champs suivants. Si cette table est utilisée, elle doit être
|
||||
alimentée par une vue matérialisée ou un traitement ETL externe.
|
||||
engine: MergeTree
|
||||
partition_by: log_date
|
||||
order_by: (time, src_ip, dst_ip, ja4)
|
||||
columns:
|
||||
- name: time
|
||||
type: DateTime
|
||||
- name: log_date
|
||||
type: Date
|
||||
default: toDate(time)
|
||||
- name: src_ip
|
||||
type: IPv4
|
||||
- name: src_port
|
||||
type: UInt16
|
||||
- name: dst_ip
|
||||
type: IPv4
|
||||
- name: dst_port
|
||||
type: UInt16
|
||||
- name: method
|
||||
type: LowCardinality(String)
|
||||
- name: scheme
|
||||
type: LowCardinality(String)
|
||||
- name: host
|
||||
type: LowCardinality(String)
|
||||
- name: path
|
||||
type: String
|
||||
- name: query
|
||||
type: String
|
||||
- name: http_version
|
||||
type: LowCardinality(String)
|
||||
- name: orphan_side
|
||||
type: LowCardinality(String)
|
||||
- name: correlated
|
||||
type: UInt8
|
||||
- name: keepalives
|
||||
type: UInt16
|
||||
status: non_implémenté
|
||||
- name: a_timestamp
|
||||
type: UInt64
|
||||
status: non_implémenté
|
||||
- name: b_timestamp
|
||||
type: UInt64
|
||||
status: non_implémenté
|
||||
- name: conn_id
|
||||
type: String
|
||||
status: non_implémenté
|
||||
- name: ip_meta_df
|
||||
type: UInt8
|
||||
status: non_implémenté
|
||||
- name: ip_meta_id
|
||||
type: UInt32
|
||||
status: non_implémenté
|
||||
- name: ip_meta_total_length
|
||||
type: UInt32
|
||||
status: non_implémenté
|
||||
- name: ip_meta_ttl
|
||||
type: UInt8
|
||||
status: non_implémenté
|
||||
- name: tcp_meta_options
|
||||
type: LowCardinality(String)
|
||||
status: non_implémenté
|
||||
- name: tcp_meta_window_size
|
||||
type: UInt32
|
||||
status: non_implémenté
|
||||
- name: syn_to_clienthello_ms
|
||||
type: Int32
|
||||
status: non_implémenté
|
||||
- name: tls_version
|
||||
type: LowCardinality(String)
|
||||
status: non_implémenté
|
||||
- name: tls_sni
|
||||
type: LowCardinality(String)
|
||||
status: non_implémenté
|
||||
- name: ja3
|
||||
type: String
|
||||
status: non_implémenté
|
||||
- name: ja3_hash
|
||||
type: String
|
||||
status: non_implémenté
|
||||
- name: ja4
|
||||
type: String
|
||||
status: non_implémenté
|
||||
- name: header_user_agent
|
||||
type: String
|
||||
status: non_implémenté
|
||||
- name: header_accept
|
||||
type: String
|
||||
status: non_implémenté
|
||||
- name: header_accept_encoding
|
||||
type: String
|
||||
status: non_implémenté
|
||||
- name: header_accept_language
|
||||
type: String
|
||||
status: non_implémenté
|
||||
- name: header_x_request_id
|
||||
type: String
|
||||
status: non_implémenté
|
||||
- name: header_x_trace_id
|
||||
type: String
|
||||
status: non_implémenté
|
||||
- name: header_x_forwarded_for
|
||||
type: String
|
||||
status: non_implémenté
|
||||
- name: header_sec_ch_ua
|
||||
type: String
|
||||
status: non_implémenté
|
||||
- name: header_sec_ch_ua_mobile
|
||||
type: String
|
||||
status: non_implémenté
|
||||
- name: header_sec_ch_ua_platform
|
||||
type: String
|
||||
status: non_implémenté
|
||||
- name: header_sec_fetch_dest
|
||||
type: String
|
||||
status: non_implémenté
|
||||
- name: header_sec_fetch_mode
|
||||
type: String
|
||||
status: non_implémenté
|
||||
- name: header_sec_fetch_site
|
||||
type: String
|
||||
status: non_implémenté
|
||||
notes: >
|
||||
Cette table et la vue matérialisée associée sont gérées en externe (DDL séparés).
|
||||
Le service se contente d'insérer le JSON brut dans http_logs_raw.
|
||||
Les champs marqués "non_implémenté" ne sont PAS extraits par le service.
|
||||
|
||||
users:
|
||||
description: >
|
||||
La gestion des utilisateurs ClickHouse est externe au service. Le DSN est
|
||||
configuré dans le fichier de configuration YAML.
|
||||
notes: >
|
||||
Cette section est fournie à titre indicatif pour l'administration ClickHouse.
|
||||
|
||||
migration:
|
||||
description: >
|
||||
Aucune migration n'est implémentée dans le service. La gestion des schémas
|
||||
(tables, vues matérialisées) est entièrement externe (DDL séparés).
|
||||
|
||||
architecture:
|
||||
description: >
|
||||
Architecture hexagonale : domaine de corrélation indépendant, ports abstraits
|
||||
pour les sources/sinks, adaptateurs pour sockets Unix, fichier, ClickHouse et
|
||||
stdout, couche application d'orchestration, et modules infra (config, observabilité).
|
||||
modules:
|
||||
- name: cmd/logcorrelator
|
||||
type: entrypoint
|
||||
responsibilities:
|
||||
- Chargement de la configuration YAML.
|
||||
- Initialisation des adaptateurs d'entrée/sortie.
|
||||
- Création du CorrelationService.
|
||||
- Démarrage de l'orchestrateur.
|
||||
- Gestion des signaux (SIGINT, SIGTERM, SIGHUP).
|
||||
- Versioning via -ldflags (main.Version).
|
||||
- name: internal/domain
|
||||
type: domain
|
||||
responsibilities:
|
||||
- Modèles NormalizedEvent et CorrelatedLog.
|
||||
- CorrelationService (fenêtre, TTL, buffers bornés, one-to-many/Keep-Alive, orphelins).
|
||||
- Filtrage par IP source (exclude_source_ips, CIDR).
|
||||
- Filtrage par port destination (include_dest_ports, liste blanche).
|
||||
- Custom JSON marshaling pour CorrelatedLog (structure plate).
|
||||
- name: internal/ports
|
||||
type: ports
|
||||
responsibilities:
|
||||
- Interfaces EventSource, CorrelatedLogSink, CorrelationProcessor.
|
||||
- name: internal/app
|
||||
type: application
|
||||
responsibilities:
|
||||
- Orchestrator : EventSource → CorrelationService → MultiSink.
|
||||
- Gestion du contexte de shutdown et drain des événements.
|
||||
- name: internal/adapters/inbound/unixsocket
|
||||
type: adapter_inbound
|
||||
responsibilities:
|
||||
- Lecture Unix datagram (SOCK_DGRAM) et parsing JSON → NormalizedEvent.
|
||||
- Détection automatique de la source (A/B) via source_type ou headers.
|
||||
- Gestion des permissions de socket (défaut 0666).
|
||||
- Cleanup du fichier socket à l'arrêt.
|
||||
- name: internal/adapters/outbound/file
|
||||
type: adapter_outbound
|
||||
responsibilities:
|
||||
- Écriture JSON lines.
|
||||
- Réouverture du fichier sur SIGHUP (log rotation).
|
||||
- Validation des chemins (répertoire autorisé).
|
||||
- name: internal/adapters/outbound/clickhouse
|
||||
type: adapter_outbound
|
||||
responsibilities:
|
||||
- Bufferisation + inserts batch asynchrones.
|
||||
- Gestion du drop_on_overflow.
|
||||
- Retry avec backoff exponentiel (MaxRetries=3).
|
||||
- API native clickhouse-go/v2 (PrepareBatch + Append + Send).
|
||||
- Logging complet via observability.Logger (SetLogger) : INFO à la connexion,
|
||||
DEBUG sur envoi réussi (rows/table), WARN sur drop buffer et retries,
|
||||
ERROR sur échec de flush (périodique, batch, fermeture).
|
||||
- name: internal/adapters/outbound/stdout
|
||||
type: adapter_outbound
|
||||
responsibilities:
|
||||
- Sink no-op pour les données corrélées.
|
||||
- Write/Flush/Close ne font rien : les données ne passent jamais par stdout.
|
||||
- Les logs opérationnels sont sur stderr via observability.Logger (indépendant de ce sink).
|
||||
- name: internal/adapters/outbound/multi
|
||||
type: adapter_outbound
|
||||
responsibilities:
|
||||
- Fan-out vers plusieurs sinks.
|
||||
- Implémentation de Reopen() pour la rotation des logs.
|
||||
- name: internal/config
|
||||
type: infrastructure
|
||||
responsibilities:
|
||||
- Chargement/validation de la configuration YAML.
|
||||
- Valeurs par défaut et fallback pour champs dépréciés.
|
||||
- name: internal/observability
|
||||
type: infrastructure
|
||||
responsibilities:
|
||||
- Logger structuré avec niveaux (DEBUG, INFO, WARN, ERROR).
|
||||
- CorrelationMetrics : suivi des statistiques de corrélation.
|
||||
- MetricsServer : serveur HTTP pour exposition des métriques (/metrics, /health).
|
||||
- Traçage des événements exclus (exclude_source_ips).
|
||||
- Logs pour : événements reçus, corrélations, orphelins, buffer plein.
|
||||
|
||||
testing:
|
||||
unit:
|
||||
description: >
|
||||
Tests unitaires table‑driven, couverture cible ≥ 80 %. La couverture actuelle
|
||||
est d'environ 74-80% selon les versions. Les tests se concentrent sur la logique
|
||||
de corrélation, les caches, les sinks et le parsing des datagrammes.
|
||||
coverage_minimum: 0.8
|
||||
coverage_actual: ~0.74-0.80
|
||||
focus:
|
||||
- CorrelationService (fenêtre, TTL, évictions, one-to-many/Keep-Alive)
|
||||
- Parsing A/B → NormalizedEvent (datagrammes JSON)
|
||||
- ClickHouseSink (batching, retry, overflow, logging erreurs/succès)
|
||||
- FileSink (réouverture sur SIGHUP)
|
||||
- MultiSink (fan-out)
|
||||
- StdoutSink (no-op data, test stdout reste vide)
|
||||
- Config (validation, valeurs par défaut, exclude_source_ips)
|
||||
- UnixSocketSource (lecture, permissions, cleanup)
|
||||
- CorrelationMetrics (suivi des statistiques)
|
||||
- MetricsServer (endpoints /metrics et /health)
|
||||
integration:
|
||||
description: >
|
||||
Tests d'intégration limités. Le flux complet A+B → corrélation → sinks est
|
||||
testé via des tests unitaires avec mocks. ClickHouse est mocké (pas de tests
|
||||
avec vrai ClickHouse). Scénarios Keep-Alive testés dans correlation_service_test.go.
|
||||
Scripts de test fournis : scripts/test-correlation.sh et scripts/test-correlation-advanced.py.
|
||||
|
||||
docker:
|
||||
description: >
|
||||
Build, tests et packaging RPM sont exécutés intégralement dans des conteneurs
|
||||
via un multi‑stage build. Deux Dockerfiles : Dockerfile (build + runtime + dev)
|
||||
et Dockerfile.package (RPM multi-distros : el8, el9, el10).
|
||||
build_pipeline:
|
||||
multi_stage: true
|
||||
stages:
|
||||
- name: builder
|
||||
base: golang:1.21
|
||||
description: >
|
||||
go test -race -coverprofile=coverage.txt ./... avec vérification de couverture
|
||||
(échec si < 80 %). Compilation d'un binaire statique (CGO_ENABLED=0,
|
||||
GOOS=linux, GOARCH=amd64).
|
||||
- name: runtime
|
||||
base: scratch
|
||||
description: >
|
||||
Image minimale contenant uniquement le binaire et la config exemple.
|
||||
- name: rpm_builder_el8
|
||||
base: rockylinux:8
|
||||
description: >
|
||||
Installation de fpm (via Ruby), construction RPM pour Enterprise Linux 8.
|
||||
- name: rpm_builder_el9
|
||||
base: rockylinux:9
|
||||
description: >
|
||||
Installation de fpm (via Ruby), construction RPM pour Enterprise Linux 9.
|
||||
- name: rpm_builder_el10
|
||||
base: almalinux:10
|
||||
description: >
|
||||
Installation de fpm (via Ruby), construction RPM pour Enterprise Linux 10.
|
||||
- name: output_export
|
||||
base: alpine:latest
|
||||
description: >
|
||||
Export des paquets RPM produits pour les 3 distributions (el8, el9, el10).
|
||||
files:
|
||||
- path: Dockerfile
|
||||
description: Build principal (builder, runtime, dev) et packaging RPM mono-distro.
|
||||
- path: Dockerfile.package
|
||||
description: Packaging RPM multi-distros (el8, el9, el10) avec scripts post/preun/postun.
|
||||
|
||||
observability:
|
||||
description: >
|
||||
Le service inclut des fonctionnalités complètes de débogage et de monitoring
|
||||
pour diagnostiquer les problèmes de corrélation et surveiller les performances.
|
||||
logging:
|
||||
levels:
|
||||
- DEBUG: Tous les événements reçus, tentatives de corrélation, raisons des échecs
|
||||
- INFO: Événements corrélés, démarrage/arrêt du service
|
||||
- WARN: Orphelins émis, buffer plein, TTL expiré
|
||||
- ERROR: Erreurs de parsing, échecs de sink, erreurs critiques
|
||||
debug_logs:
|
||||
- "event received: source=A src_ip=192.168.1.1 src_port=8080 timestamp=..."
|
||||
- "processing A event: key=192.168.1.1:8080 timestamp=..."
|
||||
- "correlation found: A(src_ip=... src_port=... ts=...) + B(src_ip=... src_port=... ts=...)"
|
||||
- "A event has no matching B key in buffer: key=..."
|
||||
- "A event has same key as B but outside time window: key=... time_diff=5s window=10s"
|
||||
- "event excluded by IP filter: source=A src_ip=10.0.0.1 src_port=8080"
|
||||
- "event excluded by dest port filter: source=A dst_port=22"
|
||||
- "TTL reset for B event (Keep-Alive): key=... new_ttl=120s"
|
||||
- "[clickhouse] DEBUG batch sent: rows=42 table=correlated_logs_http_network"
|
||||
info_logs:
|
||||
- "[clickhouse] INFO connected to ClickHouse: table=... batch_size=500 flush_interval_ms=200"
|
||||
warn_logs:
|
||||
- "[clickhouse] WARN buffer full, dropping log: table=... buffer_size=5000"
|
||||
- "[clickhouse] WARN retrying batch insert: attempt=2/3 delay=100ms rows=42 err=connection refused"
|
||||
error_logs:
|
||||
- "[clickhouse] ERROR periodic flush failed: ..."
|
||||
- "[clickhouse] ERROR batch flush failed: ..."
|
||||
- "[clickhouse] ERROR final flush on close failed: ..."
|
||||
metrics_server:
|
||||
enabled: true
|
||||
endpoints:
|
||||
- path: /metrics
|
||||
method: GET
|
||||
description: Retourne les métriques de corrélation au format JSON
|
||||
response_example: |
|
||||
{
|
||||
"events_received_a": 1542,
|
||||
"events_received_b": 1498,
|
||||
"correlations_success": 1450,
|
||||
"correlations_failed": 92,
|
||||
"failed_no_match_key": 45,
|
||||
"failed_time_window": 23,
|
||||
"failed_buffer_eviction": 5,
|
||||
"failed_ttl_expired": 12,
|
||||
"failed_ip_excluded": 7,
|
||||
"failed_dest_port_filtered": 3,
|
||||
"buffer_a_size": 23,
|
||||
"buffer_b_size": 18,
|
||||
"orphans_emitted_a": 92,
|
||||
"keepalive_resets": 892
|
||||
}
|
||||
- path: /health
|
||||
method: GET
|
||||
description: Health check
|
||||
response_example: |
|
||||
{"status":"healthy"}
|
||||
metrics_tracked:
|
||||
events_received:
|
||||
- events_received_a: Nombre d'événements HTTP (source A) reçus
|
||||
- events_received_b: Nombre d'événements réseau (source B) reçus
|
||||
correlations:
|
||||
- correlations_success: Corrélations réussies
|
||||
- correlations_failed: Échecs de corrélation
|
||||
failure_reasons:
|
||||
- failed_no_match_key: Clé src_ip:src_port non trouvée dans le buffer
|
||||
- failed_time_window: Événements hors fenêtre temporelle
|
||||
- failed_buffer_eviction: Buffer plein, événement évincé
|
||||
- failed_ttl_expired: TTL du événement B expiré
|
||||
- failed_ip_excluded: Événement exclu par filtre IP (exclude_source_ips)
|
||||
- failed_dest_port_filtered: Événement exclu par filtre port destination (include_dest_ports)
|
||||
buffers:
|
||||
- buffer_a_size: Taille actuelle du buffer HTTP
|
||||
- buffer_b_size: Taille actuelle du buffer réseau
|
||||
orphans:
|
||||
- orphans_emitted_a: Orphelins A émis (sans correspondance B)
|
||||
- orphans_emitted_b: Orphelins B émis (toujours 0, policy: network_emit=false)
|
||||
- orphans_pending_a: Orphelins A en attente (délai avant émission)
|
||||
- pending_orphan_match: B a corrélé avec un orphelin A en attente
|
||||
keepalive:
|
||||
- keepalive_resets: Resets TTL pour mode Keep-Alive (one-to-many)
|
||||
troubleshooting:
|
||||
description: >
|
||||
Guide de diagnostic basé sur les métriques et logs
|
||||
common_issues:
|
||||
- symptom: failed_no_match_key élevé
|
||||
cause: Les logs A et B n'ont pas le même src_ip + src_port
|
||||
solution: Vérifier que les deux sources utilisent la même combinaison IP/port
|
||||
- symptom: failed_time_window élevé
|
||||
cause: Timestamps trop éloignés (> time_window.value)
|
||||
solution: Augmenter correlation.time_window.value ou synchroniser les horloges (NTP)
|
||||
- symptom: failed_ttl_expired élevé
|
||||
cause: Les événements B expirent avant corrélation
|
||||
solution: Augmenter correlation.ttl.network_ttl_s
|
||||
- symptom: failed_buffer_eviction élevé
|
||||
cause: Buffers trop petits pour le volume de logs
|
||||
solution: Augmenter correlation.buffers.max_http_items et max_network_items
|
||||
- symptom: failed_ip_excluded élevé
|
||||
cause: Traffic depuis des IPs configurées dans exclude_source_ips
|
||||
solution: Vérifier la configuration, c'est normal si attendu
|
||||
- symptom: failed_dest_port_filtered élevé
|
||||
cause: Traffic sur des ports non listés dans include_dest_ports
|
||||
solution: Vérifier la configuration include_dest_ports, ou vider la liste pour tout accepter
|
||||
- symptom: orphans_emitted_a élevé
|
||||
cause: Beaucoup de logs A sans correspondance B
|
||||
solution: Vérifier que la source B envoie bien les événements attendus
|
||||
test_scripts:
|
||||
- name: scripts/test-correlation.sh
|
||||
description: Script Bash pour tester la corrélation avec des événements synthétiques
|
||||
features:
|
||||
- Envoi de paires A+B avec mêmes src_ip:src_port
|
||||
- Vérification des métriques avant/après
|
||||
- Options: -c (count), -d (delay), -v (verbose), -m (metrics-url)
|
||||
- name: scripts/test-correlation-advanced.py
|
||||
description: Script Python avancé avec multiples scénarios de test
|
||||
features:
|
||||
- Basic test: corrélations simples
|
||||
- Time window test: vérifie l'expiration de la fenêtre temporelle
|
||||
- Different IP test: vérifie non-corrélation avec IPs différentes
|
||||
- Keep-Alive test: vérifie le mode one-to-many
|
||||
- Métriques en temps réel
|
||||
|
||||
202
services/correlator/cmd/logcorrelator/main.go
Normal file
202
services/correlator/cmd/logcorrelator/main.go
Normal file
@ -0,0 +1,202 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"context"
|
||||
"flag"
|
||||
"fmt"
|
||||
"os"
|
||||
"os/signal"
|
||||
"syscall"
|
||||
"time"
|
||||
|
||||
"github.com/antitbone/ja4/correlator/internal/adapters/inbound/unixsocket"
|
||||
"github.com/antitbone/ja4/correlator/internal/adapters/outbound/clickhouse"
|
||||
"github.com/antitbone/ja4/correlator/internal/adapters/outbound/file"
|
||||
"github.com/antitbone/ja4/correlator/internal/adapters/outbound/multi"
|
||||
"github.com/antitbone/ja4/correlator/internal/adapters/outbound/stdout"
|
||||
"github.com/antitbone/ja4/correlator/internal/app"
|
||||
"github.com/antitbone/ja4/correlator/internal/config"
|
||||
"github.com/antitbone/ja4/correlator/internal/domain"
|
||||
"github.com/antitbone/ja4/correlator/internal/observability"
|
||||
"github.com/antitbone/ja4/correlator/internal/ports"
|
||||
)
|
||||
|
||||
var Version = "dev"
|
||||
|
||||
func main() {
|
||||
configPath := flag.String("config", "config.yml", "path to configuration file")
|
||||
version := flag.Bool("version", false, "print version and exit")
|
||||
flag.Parse()
|
||||
|
||||
if *version {
|
||||
fmt.Println(Version)
|
||||
os.Exit(0)
|
||||
}
|
||||
|
||||
// Load configuration
|
||||
cfg, err := config.Load(*configPath)
|
||||
if err != nil {
|
||||
fmt.Fprintf(os.Stderr, "Error loading configuration: %v\n", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
// Initialize logger with configured level
|
||||
logger := observability.NewLoggerWithLevel("logcorrelator", cfg.Log.GetLevel())
|
||||
|
||||
logger.Info(fmt.Sprintf("Starting logcorrelator version %s (log_level=%s)", Version, cfg.Log.GetLevel()))
|
||||
|
||||
// Create sources
|
||||
sources := make([]ports.EventSource, 0, len(cfg.Inputs.UnixSockets))
|
||||
for _, inputCfg := range cfg.Inputs.UnixSockets {
|
||||
source := unixsocket.NewUnixSocketSource(unixsocket.Config{
|
||||
Name: inputCfg.Name,
|
||||
Path: inputCfg.Path,
|
||||
SourceType: inputCfg.SourceType,
|
||||
SocketPermissions: inputCfg.GetSocketPermissions(),
|
||||
})
|
||||
// Set logger for debug logging
|
||||
source.SetLogger(logger)
|
||||
sources = append(sources, source)
|
||||
logger.Info(fmt.Sprintf("Configured input source: name=%s, path=%s, permissions=%o", inputCfg.Name, inputCfg.Path, inputCfg.GetSocketPermissions()))
|
||||
}
|
||||
|
||||
// Create sinks
|
||||
sinks := make([]ports.CorrelatedLogSink, 0)
|
||||
|
||||
if cfg.Outputs.File.Enabled && cfg.Outputs.File.Path != "" {
|
||||
fileSink, err := file.NewFileSink(file.Config{
|
||||
Path: cfg.Outputs.File.Path,
|
||||
})
|
||||
if err != nil {
|
||||
logger.Error("Failed to create file sink", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
sinks = append(sinks, fileSink)
|
||||
logger.Info(fmt.Sprintf("Configured file sink: path=%s", cfg.Outputs.File.Path))
|
||||
}
|
||||
|
||||
if cfg.Outputs.ClickHouse.Enabled {
|
||||
clickHouseSink, err := clickhouse.NewClickHouseSink(clickhouse.Config{
|
||||
DSN: cfg.Outputs.ClickHouse.DSN,
|
||||
Table: cfg.Outputs.ClickHouse.Table,
|
||||
BatchSize: cfg.Outputs.ClickHouse.BatchSize,
|
||||
FlushIntervalMs: cfg.Outputs.ClickHouse.FlushIntervalMs,
|
||||
MaxBufferSize: cfg.Outputs.ClickHouse.MaxBufferSize,
|
||||
DropOnOverflow: cfg.Outputs.ClickHouse.DropOnOverflow,
|
||||
AsyncInsert: cfg.Outputs.ClickHouse.AsyncInsert,
|
||||
TimeoutMs: cfg.Outputs.ClickHouse.TimeoutMs,
|
||||
})
|
||||
if err != nil {
|
||||
logger.Error("Failed to create ClickHouse sink", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
clickHouseSink.SetLogger(logger)
|
||||
sinks = append(sinks, clickHouseSink)
|
||||
logger.Info(fmt.Sprintf("Configured ClickHouse sink: table=%s", cfg.Outputs.ClickHouse.Table))
|
||||
}
|
||||
|
||||
if cfg.Outputs.Stdout.Enabled {
|
||||
stdoutSink := stdout.NewStdoutSink(stdout.Config{Enabled: true})
|
||||
sinks = append(sinks, stdoutSink)
|
||||
logger.Info("Configured stdout sink (operational logs on stderr)")
|
||||
}
|
||||
|
||||
// Create multi-sink wrapper
|
||||
multiSink := multi.NewMultiSink(sinks...)
|
||||
|
||||
// Create correlation service
|
||||
correlationSvc := domain.NewCorrelationService(domain.CorrelationConfig{
|
||||
TimeWindow: cfg.Correlation.GetTimeWindow(),
|
||||
ApacheAlwaysEmit: cfg.Correlation.GetApacheAlwaysEmit(),
|
||||
ApacheEmitDelayMs: cfg.Correlation.GetApacheEmitDelayMs(),
|
||||
NetworkEmit: false,
|
||||
MaxHTTPBufferSize: cfg.Correlation.GetMaxHTTPBufferSize(),
|
||||
MaxNetworkBufferSize: cfg.Correlation.GetMaxNetworkBufferSize(),
|
||||
NetworkTTLS: cfg.Correlation.GetNetworkTTLS(),
|
||||
MatchingMode: cfg.Correlation.GetMatchingMode(),
|
||||
ExcludeSourceIPs: cfg.Correlation.GetExcludeSourceIPs(),
|
||||
IncludeDestPorts: cfg.Correlation.GetIncludeDestPorts(),
|
||||
}, &domain.RealTimeProvider{})
|
||||
|
||||
// Set logger for correlation service
|
||||
correlationSvc.SetLogger(logger.WithFields(map[string]any{"component": "correlation"}))
|
||||
|
||||
logger.Info(fmt.Sprintf("Correlation service initialized: time_window=%s, emit_orphans=%v, emit_delay_ms=%d",
|
||||
cfg.Correlation.GetTimeWindow().String(),
|
||||
cfg.Correlation.GetApacheAlwaysEmit(),
|
||||
cfg.Correlation.GetApacheEmitDelayMs()))
|
||||
|
||||
// Start metrics server if enabled
|
||||
var metricsServer *observability.MetricsServer
|
||||
if cfg.Metrics.Enabled {
|
||||
addr := cfg.Metrics.Addr
|
||||
if addr == "" {
|
||||
addr = ":8080" // Default address
|
||||
}
|
||||
var err error
|
||||
metricsServer, err = observability.NewMetricsServer(addr, correlationSvc.GetMetricsSnapshot)
|
||||
if err != nil {
|
||||
logger.Error("Failed to create metrics server", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
if err := metricsServer.Start(); err != nil {
|
||||
logger.Error("Failed to start metrics server", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
logger.Info(fmt.Sprintf("Metrics server started: addr=%s", metricsServer.Addr()))
|
||||
logger.Info("Metrics endpoints: /metrics (JSON), /health")
|
||||
}
|
||||
|
||||
// Create orchestrator
|
||||
orchestrator := app.NewOrchestrator(app.OrchestratorConfig{
|
||||
Sources: sources,
|
||||
Sink: multiSink,
|
||||
}, correlationSvc)
|
||||
|
||||
// Start the application
|
||||
if err := orchestrator.Start(); err != nil {
|
||||
logger.Error("Failed to start orchestrator", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
logger.Info("logcorrelator started successfully")
|
||||
|
||||
// Wait for shutdown signal
|
||||
sigChan := make(chan os.Signal, 1)
|
||||
signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM, syscall.SIGHUP)
|
||||
|
||||
for {
|
||||
sig := <-sigChan
|
||||
|
||||
if sig == syscall.SIGHUP {
|
||||
// Reopen file sinks for log rotation
|
||||
logger.Info("SIGHUP received, reopening file sinks...")
|
||||
if err := multiSink.Reopen(); err != nil {
|
||||
logger.Error("Error reopening file sinks", err)
|
||||
} else {
|
||||
logger.Info("File sinks reopened successfully")
|
||||
}
|
||||
continue
|
||||
}
|
||||
|
||||
// Shutdown signal received
|
||||
logger.Info(fmt.Sprintf("Shutdown signal received: %v", sig))
|
||||
break
|
||||
}
|
||||
|
||||
// Graceful shutdown
|
||||
if err := orchestrator.Stop(); err != nil {
|
||||
logger.Error("Error during shutdown", err)
|
||||
}
|
||||
|
||||
// Stop metrics server
|
||||
if metricsServer != nil {
|
||||
shutdownCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
defer cancel()
|
||||
if err := metricsServer.Stop(shutdownCtx); err != nil {
|
||||
logger.Error("Error stopping metrics server", err)
|
||||
}
|
||||
}
|
||||
|
||||
logger.Info("logcorrelator stopped")
|
||||
}
|
||||
92
services/correlator/config.example.yml
Normal file
92
services/correlator/config.example.yml
Normal file
@ -0,0 +1,92 @@
|
||||
# logcorrelator configuration file
|
||||
# Format: YAML
|
||||
|
||||
# Logging configuration
|
||||
log:
|
||||
level: INFO # DEBUG, INFO, WARN, ERROR
|
||||
|
||||
inputs:
|
||||
unix_sockets:
|
||||
- name: http
|
||||
source_type: A
|
||||
path: /var/run/logcorrelator/http.socket
|
||||
format: json
|
||||
socket_permissions: "0666" # world read/write
|
||||
- name: network
|
||||
source_type: B
|
||||
path: /var/run/logcorrelator/network.socket
|
||||
format: json
|
||||
socket_permissions: "0666"
|
||||
|
||||
outputs:
|
||||
file:
|
||||
enabled: true
|
||||
path: /var/log/logcorrelator/correlated.log
|
||||
|
||||
clickhouse:
|
||||
enabled: false
|
||||
dsn: clickhouse://user:pass@localhost:9000/db
|
||||
table: correlated_logs_http_network
|
||||
batch_size: 500
|
||||
flush_interval_ms: 200
|
||||
max_buffer_size: 5000
|
||||
drop_on_overflow: true
|
||||
async_insert: true
|
||||
timeout_ms: 1000
|
||||
|
||||
stdout:
|
||||
enabled: false
|
||||
|
||||
correlation:
|
||||
# Time window for correlation (A and B must be within this window)
|
||||
# Increased to 10s to support HTTP Keep-Alive scenarios
|
||||
time_window:
|
||||
value: 10
|
||||
unit: s
|
||||
|
||||
# Orphan policy: what to do when no match is found
|
||||
orphan_policy:
|
||||
apache_always_emit: true # Always emit A events, even without B match
|
||||
apache_emit_delay_ms: 500 # Wait 500ms before emitting as orphan (allows B to arrive)
|
||||
network_emit: false # Never emit B events alone
|
||||
|
||||
# Matching mode: one_to_one or one_to_many (Keep-Alive)
|
||||
matching:
|
||||
mode: one_to_many
|
||||
|
||||
# Buffer limits (max events in memory)
|
||||
buffers:
|
||||
max_http_items: 10000
|
||||
max_network_items: 20000
|
||||
|
||||
# TTL for network events (source B)
|
||||
# Increased to 120s to support long-lived HTTP Keep-Alive sessions
|
||||
ttl:
|
||||
network_ttl_s: 120
|
||||
|
||||
# Exclude specific source IPs or CIDR ranges from correlation
|
||||
# Events from these IPs will be silently dropped (not correlated, not emitted)
|
||||
# Useful for excluding health checks, internal traffic, or known bad actors
|
||||
exclude_source_ips:
|
||||
- 10.0.0.1 # Single IP
|
||||
- 192.168.1.100 # Another single IP
|
||||
- 172.16.0.0/12 # CIDR range (private network)
|
||||
- 10.10.10.0/24 # Another CIDR range
|
||||
|
||||
# Restrict correlation to specific destination ports (optional)
|
||||
# If non-empty, only events whose dst_port matches one of these values will be correlated
|
||||
# Events on other ports are silently ignored (not correlated, not emitted as orphans)
|
||||
# Useful to focus on HTTP/HTTPS traffic only and ignore unrelated connections
|
||||
# include_dest_ports:
|
||||
# - 80 # HTTP
|
||||
# - 443 # HTTPS
|
||||
# - 8080 # HTTP alt
|
||||
# - 8443 # HTTPS alt
|
||||
|
||||
# Metrics server configuration (optional, for debugging/monitoring)
|
||||
metrics:
|
||||
enabled: false
|
||||
addr: ":8080" # Address to listen on (e.g., ":8080", "localhost:8080")
|
||||
# Endpoints:
|
||||
# GET /metrics - Returns correlation metrics as JSON
|
||||
# GET /health - Health check endpoint
|
||||
224
services/correlator/docs/detection.md
Normal file
224
services/correlator/docs/detection.md
Normal file
@ -0,0 +1,224 @@
|
||||
# Architecture de détection — logcorrelator
|
||||
|
||||
## Vue d'ensemble
|
||||
|
||||
Le système de détection est composé de **trois couches** qui s'enchaînent en pipeline :
|
||||
|
||||
```
|
||||
Trafic HTTP/TLS capturé
|
||||
│
|
||||
▼
|
||||
┌───────────────────┐
|
||||
│ ClickHouse │ Stockage, agrégation, vues heuristiques
|
||||
│ (SQL pipeline) │
|
||||
└────────┬──────────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────────┐
|
||||
│ bot_detector.py │ Modèle IA (Isolation Forest, cycle 5 min)
|
||||
│ (Python / ML) │
|
||||
└────────┬──────────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────────┐
|
||||
│ ml_detected_ │ Table de résultats (ReplacingMergeTree)
|
||||
│ anomalies │
|
||||
└───────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 1. Ingestion des logs (`http_logs_raw` → `http_logs`)
|
||||
|
||||
Les logs bruts arrivent en JSON dans la table `http_logs_raw`. Une **vue matérialisée** (`mv_http_logs`) les parse en temps réel et alimente la table `http_logs`, qui contient les champs structurés suivants :
|
||||
|
||||
| Catégorie | Champs clés |
|
||||
|---|---|
|
||||
| Réseau | `src_ip`, `src_port`, `dst_ip`, `dst_port` |
|
||||
| Enrichissement | `src_asn`, `src_country_code`, `src_as_name` (via dictionnaire IPLocate) |
|
||||
| HTTP | `method`, `host`, `path`, `query`, `http_version` |
|
||||
| Corrélation | `correlated`, `orphan_side`, `conn_id`, `keepalives` |
|
||||
| Métadonnées IP | `ip_meta_ttl`, `ip_meta_id`, `ip_meta_df`, `ip_meta_total_length` |
|
||||
| Métadonnées TCP | `tcp_meta_window_size`, `tcp_meta_mss`, `tcp_meta_window_scale`, `tcp_meta_options` |
|
||||
| TLS / Fingerprint | `tls_version`, `tls_sni`, `tls_alpn`, `ja3`, `ja3_hash`, `ja4` |
|
||||
| En-têtes HTTP | `header_user_agent`, `header_sec_ch_ua*`, `header_sec_fetch_*`, … |
|
||||
|
||||
L'enrichissement IP est réalisé via le dictionnaire `dict_iplocate_asn` (fichier CSV chargé en mémoire, rechargé toutes les 1-2 heures).
|
||||
|
||||
---
|
||||
|
||||
## 2. Agrégation comportementale (fenêtre horaire)
|
||||
|
||||
Deux tables d'agrégation `AggregatingMergeTree` sont alimentées en continu par des vues matérialisées.
|
||||
|
||||
### 2.1 `agg_host_ip_ja4_1h` — Comportement réseau & applicatif
|
||||
|
||||
Agrège par triplet **(window_start, src_ip, ja4, host)** toutes les heures :
|
||||
|
||||
| Métrique agrégée | Signification |
|
||||
|---|---|
|
||||
| `hits` | Nombre total de requêtes |
|
||||
| `count_post` | Requêtes POST |
|
||||
| `uniq_paths` | Chemins distincts visités |
|
||||
| `uniq_query_params` | Paramètres de query distincts |
|
||||
| `unique_src_ports` | Ports sources distincts |
|
||||
| `unique_conn_id` | Connexions TCP distinctes |
|
||||
| `max_keepalives` | Réutilisation maximale d'une connexion |
|
||||
| `orphan_count` | Requêtes sans corrélation TCP complète |
|
||||
| `ip_id_zero_count` | Paquets avec IP ID = 0 (spoofing potentiel) |
|
||||
| `tcp_fp_raw` | Hash de l'empreinte TCP (window, MSS, scale, options) |
|
||||
| `tcp_jitter_variance` | Variance du délai SYN→ClientHello (jitter TLS) |
|
||||
| `total_ip_length_var` | Variance de la taille des paquets IP |
|
||||
| `mss_1460_count` | Requêtes avec MSS = 1460 (signature Ethernet/desktop) |
|
||||
|
||||
### 2.2 `agg_header_fingerprint_1h` — Empreinte des en-têtes HTTP
|
||||
|
||||
Agrège par **(window_start, src_ip)** :
|
||||
|
||||
| Métrique | Signification |
|
||||
|---|---|
|
||||
| `header_order_hash` | Hash de l'ordre des en-têtes (fingerprint JA4H) |
|
||||
| `header_count` | Nombre d'en-têtes distincts |
|
||||
| `has_accept_language` | Présence de `Accept-Language` |
|
||||
| `has_cookie` | Présence de `Cookie` |
|
||||
| `has_referer` | Présence de `Referer` |
|
||||
| `modern_browser_score` | Score 0/50/100 selon présence UA et `Sec-CH-UA` |
|
||||
| `ua_ch_mismatch` | Incohérence entre `User-Agent` et `Sec-CH-UA-Platform` |
|
||||
| `sec_fetch_mode/dest` | Contexte de navigation déclaré |
|
||||
|
||||
---
|
||||
|
||||
## 3. Exclusions (listes blanches)
|
||||
|
||||
Avant toute analyse, deux tables permettent d'**exclure les robots légitimes** connus :
|
||||
|
||||
- `bot_ip` (fichier `bot_ip.csv`) — IPs à ignorer (crawlers, monitoring…)
|
||||
- `bot_ja4` (fichier `bot_ja4.csv`) — Fingerprints JA4 à ignorer
|
||||
- `ref_bot_networks` — Réseaux CIDR IPv4/IPv6 catégorisés (légitimes ou malveillants)
|
||||
|
||||
Ces exclusions sont appliquées dans la vue `view_ai_features_1h`.
|
||||
|
||||
---
|
||||
|
||||
## 4. Vue IA : `view_ai_features_1h`
|
||||
|
||||
Cette vue consolidée **sur 24 heures glissantes** calcule les **28 features** passées au modèle ML. Elle joint les deux tables d'agrégation et dérive les métriques suivantes :
|
||||
|
||||
| Feature | Calcul | Signal détecté |
|
||||
|---|---|---|
|
||||
| `hit_velocity` | `hits / durée_en_secondes` | Volume de requêtes anormalement élevé |
|
||||
| `fuzzing_index` | `uniq_query_params / uniq_paths` | Exploration paramétrique (fuzzing) |
|
||||
| `post_ratio` | `count_post / hits` | Soumission de formulaires en masse |
|
||||
| `port_exhaustion_ratio` | `unique_src_ports / hits` | Rotation de ports (scan) |
|
||||
| `orphan_ratio` | `orphan_count / hits` | Requêtes sans handshake complet |
|
||||
| `ip_id_zero_ratio` | `ip_id_zero_count / hits` | Spoofing d'adresse IP |
|
||||
| `multiplexing_efficiency` | `hits / unique_conn_id` | Réutilisation des connexions (H2/H3) |
|
||||
| `true_window_size` | `tcp_win * 2^tcp_scale` | Taille réelle de la fenêtre TCP |
|
||||
| `window_mss_ratio` | `tcp_win / tcp_mss` | Cohérence TCP stack |
|
||||
| `tcp_jitter_variance` | Variance SYN→ClientHello | Irrégularité du timing TLS |
|
||||
| `alpn_http_mismatch` | ALPN=h2 mais HTTP/1.1 | Négociation TLS mensongère |
|
||||
| `is_alpn_missing` | ALPN absent ou `00` | Client non-standard |
|
||||
| `sni_host_mismatch` | SNI ≠ Host header | Proxy transparent / bot |
|
||||
| `mss_mobile_mismatch` | MSS=1460 + score navigateur élevé | Client mobile simulé depuis desktop |
|
||||
| `is_fake_navigation` | `sec_fetch_mode=navigate` mais `sec_fetch_dest≠document` | Navigation simulée |
|
||||
| `tcp_shared_count` | Nb d'IPs partageant la même empreinte TCP | Infrastructure partagée / botnet |
|
||||
| `header_order_shared_count` | Nb d'IPs partageant le même ordre d'en-têtes | Outil automatisé commun |
|
||||
|
||||
---
|
||||
|
||||
## 5. Modèle IA : Isolation Forest (`bot_detector.py`)
|
||||
|
||||
### Cycle d'exécution
|
||||
|
||||
Le service tourne en boucle avec un **cycle de 5 minutes** :
|
||||
|
||||
```
|
||||
fetch_and_analyze()
|
||||
│
|
||||
├─ Requête SELECT * FROM view_ai_features_1h
|
||||
│
|
||||
├─ Nettoyage des données (fillna)
|
||||
│
|
||||
├─ Dual-Model routing :
|
||||
│ ├─ [Complet] correlated=1 → 23 features (réseau + TLS + headers)
|
||||
│ └─ [Applicatif] correlated=0 → 19 features (headers + comportement)
|
||||
│
|
||||
└─ INSERT INTO ml_detected_anomalies
|
||||
```
|
||||
|
||||
### Paramétrage du modèle
|
||||
|
||||
| Paramètre | Valeur | Signification |
|
||||
|---|---|---|
|
||||
| `n_estimators` | 200 | Nombre d'arbres d'isolation |
|
||||
| `contamination` | 0.2% | Proportion de bots attendue dans le trafic |
|
||||
| `seuil de score` | < -0.05 | Score en dessous duquel une session est marquée anomalie |
|
||||
| `volume minimum` | 500 sessions | En dessous, le modèle est ignoré (trop peu de données) |
|
||||
|
||||
### Dual-Model routing
|
||||
|
||||
Le trafic est **séparé en deux populations** selon le champ `correlated` :
|
||||
|
||||
- **Modèle Complet** (`correlated=1`) : la corrélation TCP↔HTTP est disponible → les features réseau (TTL, jitter TLS, ALPN, SNI) sont fiables et ajoutées à l'analyse.
|
||||
- **Modèle Applicatif** (`correlated=0`) : seule la couche HTTP est disponible → l'analyse se concentre sur le comportement applicatif (headers, paths, POST ratio…).
|
||||
|
||||
---
|
||||
|
||||
## 6. Vues heuristiques statiques
|
||||
|
||||
En parallèle du modèle IA, cinq vues SQL fournissent des **détections déterministes** sans ML, sur fenêtre 24h :
|
||||
|
||||
| Vue | Règle de détection |
|
||||
|---|---|
|
||||
| `view_host_ip_ja4_rotation` | IP avec ≥ 5 fingerprints JA4 distincts et > 100 requêtes → rotation d'identité |
|
||||
| `view_host_ja4_anomalies` | Fingerprint JA4 vu depuis ≥ 20 IPs sur ≥ 3 hôtes → outil de scan distribué |
|
||||
| `view_form_bruteforce_detected` | ≥ 10 query params distincts et ≥ 20 hits → brute-force de formulaire |
|
||||
| `view_alpn_mismatch_detected` | HTTP/1.1 avec ALPN h2 ou h3 et ≥ 10 hits → négociation TLS frauduleuse |
|
||||
| `view_tcp_spoofing_detected` | TTL ≤ 64 avec User-Agent Windows ou iPhone → empreinte OS incohérente |
|
||||
|
||||
---
|
||||
|
||||
## 7. Résultats : `ml_detected_anomalies`
|
||||
|
||||
Les anomalies détectées sont stockées dans une table `ReplacingMergeTree(detected_at)` avec **TTL 30 jours**. La clé d'ordre `(src_ip, ja4, host)` garantit que chaque triplet ne conserve que la **détection la plus récente** (dédoublonnage automatique).
|
||||
|
||||
Chaque enregistrement contient :
|
||||
- Les scores et features ayant conduit à la détection
|
||||
- Le champ `reason` : texte lisible avec score, vélocité, et indice de fuzzing
|
||||
- Le champ `is_headless` : déduit de l'incohérence `sec_fetch_mode`
|
||||
|
||||
---
|
||||
|
||||
## 8. Schéma de flux complet
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────┐
|
||||
│ http_logs_raw (JSON) │
|
||||
└──────────────┬──────────────────────┘
|
||||
│ mv_http_logs (MV)
|
||||
▼
|
||||
┌─────────────────────────────────────┐
|
||||
│ http_logs (parsée) │
|
||||
└────────┬──────────────┬─────────────┘
|
||||
│ │
|
||||
mv_agg_host_ip_ja4 │ │ mv_agg_header_fingerprint
|
||||
▼ ▼
|
||||
┌──────────────────┐ ┌──────────────────────────┐
|
||||
│ agg_host_ip_ja4 │ │ agg_header_fingerprint │
|
||||
│ _1h │ │ _1h │
|
||||
└────────┬─────────┘ └──────────┬──────────────┘
|
||||
│ │
|
||||
└──────────┬─────────────┘
|
||||
│ view_ai_features_1h (JOIN + calculs)
|
||||
▼
|
||||
┌─────────────────────────────────────┐
|
||||
│ bot_detector.py (Isolation Forest) │
|
||||
│ Cycle : 5 min | Fenêtre : 24h │
|
||||
└──────────────┬──────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────┐
|
||||
│ ml_detected_anomalies │
|
||||
│ (ReplacingMergeTree, TTL 30j) │
|
||||
└─────────────────────────────────────┘
|
||||
```
|
||||
29
services/correlator/go.mod
Normal file
29
services/correlator/go.mod
Normal file
@ -0,0 +1,29 @@
|
||||
module github.com/antitbone/ja4/correlator
|
||||
|
||||
go 1.21
|
||||
|
||||
require (
|
||||
github.com/ClickHouse/clickhouse-go/v2 v2.23.0
|
||||
gopkg.in/yaml.v3 v3.0.1
|
||||
)
|
||||
|
||||
require (
|
||||
github.com/ClickHouse/ch-go v0.61.5 // indirect
|
||||
github.com/andybalholm/brotli v1.1.0 // indirect
|
||||
github.com/go-faster/city v1.0.1 // indirect
|
||||
github.com/go-faster/errors v0.7.1 // indirect
|
||||
github.com/google/uuid v1.6.0 // indirect
|
||||
github.com/klauspost/compress v1.17.7 // indirect
|
||||
github.com/paulmach/orb v0.11.1 // indirect
|
||||
github.com/pierrec/lz4/v4 v4.1.21 // indirect
|
||||
github.com/pkg/errors v0.9.1 // indirect
|
||||
github.com/segmentio/asm v1.2.0 // indirect
|
||||
github.com/shopspring/decimal v1.3.1 // indirect
|
||||
go.opentelemetry.io/otel v1.24.0 // indirect
|
||||
go.opentelemetry.io/otel/trace v1.24.0 // indirect
|
||||
golang.org/x/sys v0.18.0 // indirect
|
||||
)
|
||||
|
||||
require github.com/antitbone/ja4/ja4common v0.1.0
|
||||
|
||||
replace github.com/antitbone/ja4/ja4common => ../../shared/go/ja4common
|
||||
110
services/correlator/go.sum
Normal file
110
services/correlator/go.sum
Normal file
@ -0,0 +1,110 @@
|
||||
github.com/ClickHouse/ch-go v0.61.5 h1:zwR8QbYI0tsMiEcze/uIMK+Tz1D3XZXLdNrlaOpeEI4=
|
||||
github.com/ClickHouse/ch-go v0.61.5/go.mod h1:s1LJW/F/LcFs5HJnuogFMta50kKDO0lf9zzfrbl0RQg=
|
||||
github.com/ClickHouse/clickhouse-go/v2 v2.23.0 h1:srmRrkS0BR8gEut87u8jpcZ7geOob6nGj9ifrb+aKmg=
|
||||
github.com/ClickHouse/clickhouse-go/v2 v2.23.0/go.mod h1:tBhdF3f3RdP7sS59+oBAtTyhWpy0024ZxDMhgxra0QE=
|
||||
github.com/andybalholm/brotli v1.1.0 h1:eLKJA0d02Lf0mVpIDgYnqXcUn0GqVmEFny3VuID1U3M=
|
||||
github.com/andybalholm/brotli v1.1.0/go.mod h1:sms7XGricyQI9K10gOSf56VKKWS4oLer58Q+mhRPtnY=
|
||||
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
|
||||
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
|
||||
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
|
||||
github.com/go-faster/city v1.0.1 h1:4WAxSZ3V2Ws4QRDrscLEDcibJY8uf41H6AhXDrNDcGw=
|
||||
github.com/go-faster/city v1.0.1/go.mod h1:jKcUJId49qdW3L1qKHH/3wPeUstCVpVSXTM6vO3VcTw=
|
||||
github.com/go-faster/errors v0.7.1 h1:MkJTnDoEdi9pDabt1dpWf7AA8/BaSYZqibYyhZ20AYg=
|
||||
github.com/go-faster/errors v0.7.1/go.mod h1:5ySTjWFiphBs07IKuiL69nxdfd5+fzh1u7FPGZP2quo=
|
||||
github.com/gogo/protobuf v1.3.2/go.mod h1:P1XiOD3dCwIKUDQYPy72D8LYyHL2YPYrpS2s69NZV8Q=
|
||||
github.com/golang/protobuf v1.5.0/go.mod h1:FsONVRAS9T7sI+LIUmWTfcYkHO4aIWwzhcaSAoJOfIk=
|
||||
github.com/golang/snappy v0.0.1/go.mod h1:/XxbfmMg8lxefKM7IXC3fBNl/7bRcc72aCRzEWrmP2Q=
|
||||
github.com/google/go-cmp v0.5.2/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
|
||||
github.com/google/go-cmp v0.5.5/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
|
||||
github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI=
|
||||
github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
|
||||
github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
|
||||
github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
|
||||
github.com/kisielk/errcheck v1.5.0/go.mod h1:pFxgyoBC7bSaBwPgfKdkLd5X25qrDl4LWUI2bnpBCr8=
|
||||
github.com/kisielk/gotool v1.0.0/go.mod h1:XhKaO+MFFWcvkIS/tQcRk01m1F5IRFswLeQ+oQHNcck=
|
||||
github.com/klauspost/compress v1.13.6/go.mod h1:/3/Vjq9QcHkK5uEr5lBEmyoZ1iFhe47etQ6QUkpK6sk=
|
||||
github.com/klauspost/compress v1.17.7 h1:ehO88t2UGzQK66LMdE8tibEd1ErmzZjNEqWkjLAKQQg=
|
||||
github.com/klauspost/compress v1.17.7/go.mod h1:Di0epgTjJY877eYKx5yC51cX2A2Vl2ibi7bDH9ttBbw=
|
||||
github.com/kr/pretty v0.1.0/go.mod h1:dAy3ld7l9f0ibDNOQOHHMYYIIbhfbHSm3C4ZsoJORNo=
|
||||
github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE=
|
||||
github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk=
|
||||
github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ=
|
||||
github.com/kr/text v0.1.0 h1:45sCR5RtlFHMR4UwH9sdQ5TC8v0qDQCHnXt+kaKSTVE=
|
||||
github.com/kr/text v0.1.0/go.mod h1:4Jbv+DJW3UT/LiOwJeYQe1efqtUx/iVham/4vfdArNI=
|
||||
github.com/montanaflynn/stats v0.0.0-20171201202039-1bf9dbcd8cbe/go.mod h1:wL8QJuTMNUDYhXwkmfOly8iTdp5TEcJFWZD2D7SIkUc=
|
||||
github.com/paulmach/orb v0.11.1 h1:3koVegMC4X/WeiXYz9iswopaTwMem53NzTJuTF20JzU=
|
||||
github.com/paulmach/orb v0.11.1/go.mod h1:5mULz1xQfs3bmQm63QEJA6lNGujuRafwA5S/EnuLaLU=
|
||||
github.com/paulmach/protoscan v0.2.1/go.mod h1:SpcSwydNLrxUGSDvXvO0P7g7AuhJ7lcKfDlhJCDw2gY=
|
||||
github.com/pierrec/lz4/v4 v4.1.21 h1:yOVMLb6qSIDP67pl/5F7RepeKYu/VmTyEXvuMI5d9mQ=
|
||||
github.com/pierrec/lz4/v4 v4.1.21/go.mod h1:gZWDp/Ze/IJXGXf23ltt2EXimqmTUXEy0GFuRQyBid4=
|
||||
github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4=
|
||||
github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
|
||||
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
|
||||
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
|
||||
github.com/rogpeppe/go-internal v1.10.0 h1:TMyTOH3F/DB16zRVcYyreMH6GnZZrwQVAoYjRBZyWFQ=
|
||||
github.com/rogpeppe/go-internal v1.10.0/go.mod h1:UQnix2H7Ngw/k4C5ijL5+65zddjncjaFoBhdsK/akog=
|
||||
github.com/segmentio/asm v1.2.0 h1:9BQrFxC+YOHJlTlHGkTrFWf59nbL3XnCoFLTwDCI7ys=
|
||||
github.com/segmentio/asm v1.2.0/go.mod h1:BqMnlJP91P8d+4ibuonYZw9mfnzI9HfxselHZr5aAcs=
|
||||
github.com/shopspring/decimal v1.3.1 h1:2Usl1nmF/WZucqkFZhnfFYxxxu8LG21F6nPQBE5gKV8=
|
||||
github.com/shopspring/decimal v1.3.1/go.mod h1:DKyhrW/HYNuLGql+MJL6WCR6knT2jwCFRcu2hWCYk4o=
|
||||
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
|
||||
github.com/stretchr/testify v1.6.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
|
||||
github.com/stretchr/testify v1.9.0 h1:HtqpIVDClZ4nwg75+f6Lvsy/wHu+3BoSGCbBAcpTsTg=
|
||||
github.com/stretchr/testify v1.9.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY=
|
||||
github.com/tidwall/pretty v1.0.0/go.mod h1:XNkn88O1ChpSDQmQeStsy+sBenx6DDtFZJxhVysOjyk=
|
||||
github.com/xdg-go/pbkdf2 v1.0.0/go.mod h1:jrpuAogTd400dnrH08LKmI/xc1MbPOebTwRqcT5RDeI=
|
||||
github.com/xdg-go/scram v1.1.1/go.mod h1:RaEWvsqvNKKvBPvcKeFjrG2cJqOkHTiyTpzz23ni57g=
|
||||
github.com/xdg-go/stringprep v1.0.3/go.mod h1:W3f5j4i+9rC0kuIEJL0ky1VpHXQU3ocBgklLGvcBnW8=
|
||||
github.com/youmark/pkcs8 v0.0.0-20181117223130-1be2e3e5546d/go.mod h1:rHwXgn7JulP+udvsHwJoVG1YGAP6VLg4y9I5dyZdqmA=
|
||||
github.com/yuin/goldmark v1.1.27/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
|
||||
github.com/yuin/goldmark v1.2.1/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
|
||||
go.mongodb.org/mongo-driver v1.11.4/go.mod h1:PTSz5yu21bkT/wXpkS7WR5f0ddqw5quethTUn9WM+2g=
|
||||
go.opentelemetry.io/otel v1.24.0 h1:0LAOdjNmQeSTzGBzduGe/rU4tZhMwL5rWgtp9Ku5Jfo=
|
||||
go.opentelemetry.io/otel v1.24.0/go.mod h1:W7b9Ozg4nkF5tWI5zsXkaKKDjdVjpD4oAt9Qi/MArHo=
|
||||
go.opentelemetry.io/otel/trace v1.24.0 h1:CsKnnL4dUAr/0llH9FKuc698G04IrpWV0MQA/Y1YELI=
|
||||
go.opentelemetry.io/otel/trace v1.24.0/go.mod h1:HPc3Xr/cOApsBI154IU0OI0HJexz+aw5uPdbs3UCjNU=
|
||||
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
|
||||
golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI=
|
||||
golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9/go.mod h1:LzIPMQfyMNhhGPhUkYOs5KpL4U8rLKemX1yGLhDgUto=
|
||||
golang.org/x/crypto v0.0.0-20220622213112-05595931fe9d/go.mod h1:IxCIyHEi3zRg3s0A5j5BB6A9Jmi73HwBIUl50j+osU4=
|
||||
golang.org/x/mod v0.2.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA=
|
||||
golang.org/x/mod v0.3.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA=
|
||||
golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg=
|
||||
golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
|
||||
golang.org/x/net v0.0.0-20200226121028-0de0cce0169b/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
|
||||
golang.org/x/net v0.0.0-20201021035429-f5854403a974/go.mod h1:sp8m0HH+o8qH0wwXwYZr8TS3Oi6o0r6Gce1SSxlDquU=
|
||||
golang.org/x/net v0.0.0-20211112202133-69e39bad7dc2/go.mod h1:9nx3DQGgdP8bBQD5qxJ1jj9UTztislL4KSBs9R2vV5Y=
|
||||
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
||||
golang.org/x/sync v0.0.0-20190911185100-cd5d95a43a6e/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
||||
golang.org/x/sync v0.0.0-20201020160332-67f06af15bc9/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
||||
golang.org/x/sync v0.0.0-20210220032951-036812b2e83c/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
||||
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
|
||||
golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
||||
golang.org/x/sys v0.0.0-20200930185726-fdedc70b468f/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
||||
golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
||||
golang.org/x/sys v0.0.0-20210423082822-04245dca01da/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
||||
golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.18.0 h1:DBdB3niSjOA/O0blCZBqDefyWNYveAYMNF1Wum0DYQ4=
|
||||
golang.org/x/sys v0.18.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
|
||||
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
|
||||
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
|
||||
golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
|
||||
golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
|
||||
golang.org/x/text v0.3.7/go.mod h1:u+2+/6zg+i71rQMx5EYifcz6MCKuco9NR6JIITiCfzQ=
|
||||
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
|
||||
golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
|
||||
golang.org/x/tools v0.0.0-20200619180055-7c47624df98f/go.mod h1:EkVYQZoAsY45+roYkvgYkIh4xh/qjgUK9TdY2XT94GE=
|
||||
golang.org/x/tools v0.0.0-20210106214847-113979e3529a/go.mod h1:emZCQorbCU4vsT4fOWvOPXz4eW1wZW4PmDk9uLelYpA=
|
||||
golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
|
||||
golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
|
||||
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
|
||||
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
|
||||
google.golang.org/protobuf v1.26.0-rc.1/go.mod h1:jlhhOSvTdKEhbULTjvd4ARK9grFBp09yW+WbY/TyQbw=
|
||||
google.golang.org/protobuf v1.27.1/go.mod h1:9q0QmTI4eRPtz6boOQmLYwt+qCgq0jsYwAQnmE0givc=
|
||||
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
|
||||
gopkg.in/check.v1 v1.0.0-20180628173108-788fd7840127/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
|
||||
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk=
|
||||
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q=
|
||||
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
|
||||
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
|
||||
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
|
||||
111
services/correlator/idees/champs.md
Normal file
111
services/correlator/idees/champs.md
Normal file
@ -0,0 +1,111 @@
|
||||
time
|
||||
log_date
|
||||
|
||||
src_ip
|
||||
- ip source de la connexion
|
||||
src_port
|
||||
- port source de la connexion
|
||||
dst_ip
|
||||
- ip de destination de la connexion
|
||||
dst_port
|
||||
- port de destination de la connexion
|
||||
|
||||
src_asn
|
||||
- Numero d'AS de l'ip source
|
||||
src_country_code
|
||||
- Code Pays de l'ip source
|
||||
src_as_name
|
||||
- Nom de l'AS de l ip source
|
||||
src_org
|
||||
- Organisation de l AS source
|
||||
src_domain
|
||||
- domaine de l'AS de l ip source
|
||||
|
||||
method
|
||||
- Methode HTTP [GET, POST, ... ]
|
||||
scheme
|
||||
- Type de connexion http [http, https]
|
||||
host
|
||||
- Hostname demandé dans l'url
|
||||
path
|
||||
- Path demandé dans l'url
|
||||
query
|
||||
- Query demandé dans l'url
|
||||
http_version
|
||||
- Version du protocol http utilisé
|
||||
|
||||
orphan_side
|
||||
- Indique si le log HTTP a pu etre enrichi avec les informations ip_, tcp, ja3_ et ja4_
|
||||
- "A" indique que seul le log HTTP est present, sans enrichissement
|
||||
correlated
|
||||
- l'algorithm de correlation log http + parametres tcp a il réussi (tcp + ja4/3)
|
||||
keepalives
|
||||
- Numero de desquance dans une connexion http avec keepalive.
|
||||
a_timestamp
|
||||
b_timestamp
|
||||
conn_id
|
||||
|
||||
ip_meta_df
|
||||
- Flag dont fragement
|
||||
ip_meta_id
|
||||
- id du packet ip
|
||||
ip_meta_total_length
|
||||
- Taille des metadata dans pe packet ip
|
||||
ip_meta_ttl
|
||||
- TTL du packet ip vu par le serveur destinataire du packet
|
||||
|
||||
tcp_meta_options
|
||||
- options du packet TCP vu par le serveur destinataire du packet
|
||||
tcp_meta_window_size
|
||||
- TCP window size vu par le serveur destinataire du packet
|
||||
tcp_meta_mss
|
||||
- TCP mss vu par le serveur destinataire du packet
|
||||
tcp_meta_window_scale
|
||||
- TCP windows scale vu par le serveur destinataire du packet
|
||||
syn_to_clienthello_ms
|
||||
- durée en ms entre le 1er packet SYN et le ClienHello du TLS
|
||||
|
||||
tls_version
|
||||
- Version de TLS negocié avec le serveur destinataire du packet
|
||||
tls_sni
|
||||
- SNI, nom de domaine demandé pour le cerificat TLS
|
||||
tls_alpn
|
||||
- ALPN annoncé lors du TLS
|
||||
ja3
|
||||
- liste des agos utiliés pour la signature ja3
|
||||
ja3_hash
|
||||
- hash ja3
|
||||
ja4
|
||||
- hash ja4
|
||||
|
||||
client_headers
|
||||
- liste des headers envoyés par le client http sous forme de liste Header,Header2,Header3,...
|
||||
|
||||
header_user_agent
|
||||
- Header HTTP User-Agent
|
||||
header_accept
|
||||
- Header HTTP Accept
|
||||
header_accept_encoding
|
||||
- Header HTTP Accept-Encoding
|
||||
header_accept_language
|
||||
- Header HTTP Accept-Language
|
||||
header_content_type
|
||||
- Header Content-Type
|
||||
header_x_request_id
|
||||
- Header X-Request-ID
|
||||
header_x_trace_id
|
||||
- Header X-Trace-ID
|
||||
header_x_forwarded_for
|
||||
- Header X-Forwarded-For
|
||||
header_sec_ch_ua
|
||||
- Header Sec-Ch-UA
|
||||
header_sec_ch_ua_mobile
|
||||
- Header -Sec-Ch-UA-Mobile
|
||||
header_sec_ch_ua_platform
|
||||
- Header Sec-Ch-UA-Plateform
|
||||
header_sec_fetch_dest
|
||||
- Header -Sec-Fetch-Dest
|
||||
header_sec_fetch_mode
|
||||
- Header Sec-Fetch-Mode
|
||||
header_sec_fetch_site
|
||||
- Header Sec-Fetch-Site
|
||||
30
services/correlator/idees/idees.txt
Normal file
30
services/correlator/idees/idees.txt
Normal file
@ -0,0 +1,30 @@
|
||||
1. Incohérences de Signatures (Spoofing)
|
||||
|
||||
User-Agent vs TLS : Le header_user_agent prétend être un navigateur (Chrome/Safari) mais le ja3/ja4 correspond à un outil de script.
|
||||
User-Agent vs Headers modernes : Le header_user_agent indique un navigateur récent, mais les headers header_sec_ch_ua_* sont vides ou absents de client_headers.
|
||||
User-Agent vs ALPN : Le navigateur déclaré ne correspond pas au protocole négocié dans tls_alpn (ex: Chrome sans h2).
|
||||
OS vs TTL TCP : L'OS déclaré dans le header_user_agent (ex: Windows) contredit la valeur de ip_meta_ttl (ex: 64, typique de Linux).
|
||||
Host vs SNI : Le nom de domaine dans le header host ne correspond pas au tls_sni demandé lors du handshake TLS.
|
||||
|
||||
2. Anomalies de Headers (HTTP Fingerprinting)
|
||||
|
||||
Empreinte d'ordre (Fingerprint) : Apparition soudaine d'une disposition de client_headers (ordre exact) très rare, générant beaucoup de trafic.
|
||||
Pauvreté des headers : Le nombre total de headers dans client_headers est anormalement bas (ex: < 5), typique des scripts basiques.
|
||||
Absence de headers vitaux : Le trafic prétend être humain mais n'envoie pas header_accept_language ou header_accept_encoding.
|
||||
Combinaison fatale : Le croisement d'un ja4 spécifique avec un ordre de client_headers inédit (détection de bots modifiant leur TLS mais trahis par l'applicatif).
|
||||
|
||||
3. Anomalies Réseau et TCP (Couche 3 & 4)
|
||||
|
||||
Mécanique TCP de masse : Une même combinaison (tcp_meta_window_size, tcp_meta_window_scale, tcp_meta_mss) vue sur des milliers d'IP différentes.
|
||||
Handshake robotique : Un délai syn_to_clienthello_ms anormalement constant (variance quasi nulle) sur un grand nombre de connexions, typique d'un bot en datacenter.
|
||||
Options TCP atypiques : Des paramètres tcp_meta_options inhabituels pour le trafic web classique de tes vrais utilisateurs.
|
||||
|
||||
4. Anomalies Comportementales et Volumétriques (Côté Requête)
|
||||
|
||||
Rafale de requêtes (Spike) : Volume d'appels (count) par src_ip ou par ja4 dépassant drastiquement le 99ème percentile historique sur 5 minutes.
|
||||
Scraping furtif distribué : Un même ja4 (non standard) utilisé par des centaines de src_ip différentes, chacune faisant très peu de requêtes.
|
||||
Balayage aveugle (Scanner) : Un volume anormal de path uniques (ou path + query) visités par une même IP ou un même ja4 en quelques minutes (remplace la détection des erreurs 404).
|
||||
Acharnement sur cible (Brute force aveugle) : Une concentration extrême de requêtes ciblant uniquement les path sensibles (login, API, password-reset) sans navigation normale sur le reste du site (remplace la détection des 401/403).
|
||||
Méthodes suspectes : Utilisation massive ou inhabituelle de method non standards (PUT, DELETE, OPTIONS, TRACE) par rapport à la baseline.
|
||||
Payloads suspects : Présence de patterns d'injection ou de caractères très inhabituels dans query ou path (longueur extrême, encodages multiples).
|
||||
Bot "Low and Slow" : IP ou ja4 qui passe sous les radars sur 5 minutes, mais dont le volume cumulé sur 24h ou 7 jours est mathématiquement improbable pour un humain.
|
||||
521
services/correlator/idees/views.md
Normal file
521
services/correlator/idees/views.md
Normal file
@ -0,0 +1,521 @@
|
||||
# 🛡️ Manuel de Référence Technique : Moteur de Détection Antispam & Bot
|
||||
|
||||
Ce document détaille les algorithmes de détection implémentés dans les vues ClickHouse pour la plateforme.
|
||||
|
||||
---
|
||||
|
||||
## 1. Analyse de la Couche Transport (L4) : La "Trace Physique"
|
||||
Avant même d'analyser l'URL, le moteur inspecte la manière dont la connexion a été établie. C'est la couche la plus difficile à falsifier pour un attaquant.
|
||||
|
||||
### A. Fingerprint de la Pile TCP (`tcp_fingerprint`)
|
||||
* **Fonctionnement :** Nous utilisons `cityHash64` pour créer un identifiant unique basé sur trois paramètres immuables du handshake : le **MSS** (Maximum Segment Size), la **Window Size** et le **Window Scale**.
|
||||
* **Ce que ça détecte :** L'unicité logicielle. Un bot tournant sur une image Alpine Linux aura une signature TCP différente d'un utilisateur sur iOS 17 ou Windows 11.
|
||||
* **Détection de botnet :** Si 500 IPs différentes partagent exactement le même `tcp_fingerprint` ET le même `ja4`, il y a une probabilité de 99% qu'il s'agisse d'un cluster de bots clonés.
|
||||
|
||||
|
||||
|
||||
### B. Analyse de la gigue (Jitter) et Handshake
|
||||
* **Fonctionnement :** On calcule la variance (`varPop`) du délai entre le `SYN` et le `ClientHello` TLS.
|
||||
* **Ce que ça détecte :** La stabilité robotique.
|
||||
* **Humain :** Latence variable (4G, Wi-Fi, mouvements). La variance est élevée.
|
||||
* **Bot Datacenter :** Latence ultra-stable (fibre optique dédiée). Une variance proche de 0 indique une connexion automatisée depuis une infrastructure cloud.
|
||||
|
||||
---
|
||||
|
||||
## 2. Analyse de la Session (L5) : Le "Passeport TLS"
|
||||
Le handshake TLS est une mine d'or pour identifier la bibliothèque logicielle (OpenSSL, Go-TLS, etc.).
|
||||
|
||||
### A. Incohérence UA vs JA4
|
||||
* **Fonctionnement :** Le moteur croise le `header_user_agent` (déclaratif) avec le `ja4` (structurel).
|
||||
* **Ce que ça détecte :** Le **Spoofing de Browser**. Un script Python peut facilement écrire `User-Agent: Mozilla/5.0...Chrome/120`, mais il ne peut pas simuler l'ordre exact des extensions TLS et des algorithmes de chiffrement d'un vrai Chrome sans une ingénierie complexe (comme `utls`).
|
||||
* **Logique de score :** Si UA = Chrome mais JA4 != Signature_Chrome -> **+50 points de risque**.
|
||||
|
||||
### B. Discordance Host vs SNI
|
||||
* **Fonctionnement :** Comparaison entre le champ `tls_sni` (négocié en clair lors du handshake) et le header `Host` (envoyé plus tard dans la requête chiffrée).
|
||||
* **Ce que ça détecte :** Le **Domain Fronting** ou les attaques par tunnel. Un bot peut demander un certificat pour `domaine-innocent.com` (SNI) mais tenter d'attaquer `api-critique.com` (Host).
|
||||
|
||||
|
||||
|
||||
---
|
||||
|
||||
## 3. Analyse Applicative (L7) : Le "Comportement HTTP"
|
||||
Une fois le tunnel établi, on analyse la structure de la requête HTTP.
|
||||
|
||||
### A. Empreinte d'ordre des Headers (`http_fp`)
|
||||
* **Fonctionnement :** Nous hashons la liste ordonnée des clés de headers (`Accept`, `User-Agent`, `Connection`, etc.).
|
||||
* **Ce que ça détecte :** La signature du moteur de rendu. Chaque navigateur (Firefox, Safari, Chromium) a un ordre immuable pour envoyer ses headers.
|
||||
* **Détection :** Si un client envoie les headers dans un ordre inhabituel ou minimaliste (pauvreté des headers < 6), il est marqué comme suspect.
|
||||
|
||||
### B. Analyse des Payloads et Entropie
|
||||
* **Fonctionnement :** Recherche de patterns via regex dans `query` et `path` (détection SQLi, XSS, Path Traversal).
|
||||
* **Complexité :** Nous détectons les encodages multiples (ex: `%2520`) qui tentent de tromper les pare-feux simples.
|
||||
|
||||
---
|
||||
|
||||
## 4. Corrélation Temporelle & Baseline : Le "Voisinage Statistique"
|
||||
Le score final dépend du passé de la signature TLS.
|
||||
|
||||
### A. Le Malus de Nouveauté (`agg_novelty`)
|
||||
* **Logique :** Une signature (JA4 + FP) vue pour la première fois aujourd'hui est "froide".
|
||||
* **Traitement :** On applique un malus si `first_seen` date de moins de 2 heures. Un botnet qui vient de lancer une campagne de rotation de signatures sera immédiatement pénalisé par son manque d'historique.
|
||||
|
||||
### B. Le Dépassement de Baseline (`tbl_baseline_ja4_7d`)
|
||||
* **Fonctionnement :** On compare les `hits` actuels au 99ème percentile (`p99`) historique de cette signature précise.
|
||||
* **Exemple :** Si le JA4 de "Chrome 122" fait habituellement 10 requêtes/min/IP sur votre site, et qu'une IP en fait soudainement 300, le score explose même si la requête est techniquement parfaite.
|
||||
|
||||
---
|
||||
|
||||
## 5. Synthèse du Scoring (Le Verdict)
|
||||
|
||||
| Algorithme | Signal | Impact Score |
|
||||
| :--- | :--- | :--- |
|
||||
| **Fingerprint Mismatch** | UA vs TLS (Spoofing) | **Haut (50)** |
|
||||
| **L4 Anomaly** | Variance latence < 0.5ms | **Moyen (30)** |
|
||||
| **Path Sensitivity** | Hit sur `/admin` ou `/config` | **Haut (40)** |
|
||||
| **Payload Security** | Caractères d'injection (SQL/XSS) | **Critique (60)** |
|
||||
| **Mass Distribution** | 1 JA4 sur > 50 IPs différentes | **Moyen (30)** |
|
||||
|
||||
|
||||
|
||||
---
|
||||
|
||||
## 6. Identification des Hosts par IP et JA4 (sql/hosts.sql)
|
||||
|
||||
Cette section détaille les vues d'agrégation et de détection pour identifier quels hosts sont associés à quelles signatures (IP + JA4).
|
||||
|
||||
### A. Agrégats de Base
|
||||
|
||||
| Table | Granularité | Description |
|
||||
|-------|-------------|-------------|
|
||||
| `agg_host_ip_ja4_1h` | heure | Hits, paths uniques, query params, méthodes par (IP, JA4, host) |
|
||||
| `agg_host_ip_ja4_24h` | jour | Rollup quotidien pour historique long terme |
|
||||
|
||||
### B. Vues d'Identification
|
||||
|
||||
**`view_host_identification`** - Top hosts par signature
|
||||
```sql
|
||||
-- Quel host est associé à cette IP/JA4 ?
|
||||
SELECT src_ip, ja4, host, total_hits, unique_paths, user_agent
|
||||
FROM mabase_prod.view_host_identification
|
||||
WHERE src_ip = '1.2.3.4'
|
||||
ORDER BY total_hits DESC;
|
||||
```
|
||||
|
||||
**`view_host_ja4_anomalies`** - JA4 partagé par plusieurs hosts (botnet)
|
||||
```sql
|
||||
-- Ce JA4 est-il utilisé par plusieurs hosts différents ?
|
||||
SELECT ja4, hosts, unique_hosts, unique_ips
|
||||
FROM mabase_prod.view_host_ja4_anomalies
|
||||
HAVING unique_hosts >= 3;
|
||||
-- Interprétation : 1 JA4 sur 3+ hosts = botnet cloné probable
|
||||
```
|
||||
|
||||
**`view_host_ip_ja4_rotation`** - IP avec rotation de fingerprints
|
||||
```sql
|
||||
-- Cette IP change-t-elle de JA4 fréquemment ?
|
||||
SELECT src_ip, ja4s, unique_ja4s
|
||||
FROM mabase_prod.view_host_ip_ja4_rotation
|
||||
HAVING unique_ja4s >= 5;
|
||||
-- Interprétation : 1 IP avec 5+ JA4 différents = fingerprint spoofing
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Détection de Brute Force (sql/hosts.sql)
|
||||
|
||||
### A. Brute Force sur POST (endpoints sensibles)
|
||||
|
||||
**Table :** `agg_bruteforce_post_5m` - Fenêtres de 5 minutes
|
||||
|
||||
**Vue :** `view_bruteforce_post_detected`
|
||||
```sql
|
||||
-- Détecter les tentatives de brute force sur les login
|
||||
SELECT window, src_ip, ja4, host, path, attempts, attempts_per_minute
|
||||
FROM mabase_prod.view_bruteforce_post_detected
|
||||
WHERE host = 'api.example.com'
|
||||
ORDER BY attempts DESC;
|
||||
|
||||
-- Threshold : ≥10 POST en 5 minutes sur endpoints sensibles
|
||||
-- Endpoints ciblés : login, auth, signin, password, admin, wp-login, etc.
|
||||
```
|
||||
|
||||
### B. Brute Force sur Formulaire (Query params variables)
|
||||
|
||||
**Table :** `agg_form_bruteforce_5m`
|
||||
|
||||
**Vue :** `view_form_bruteforce_detected`
|
||||
```sql
|
||||
-- Détecter les requêtes avec query params hautement variables
|
||||
SELECT window, src_ip, ja4, host, path, requests, unique_query_patterns
|
||||
FROM mabase_prod.view_form_bruteforce_detected
|
||||
HAVING requests >= 20 AND unique_query_patterns >= 10;
|
||||
|
||||
-- Interprétation : 20+ requêtes avec 10+ patterns query différents
|
||||
-- = tentative de fuzzing ou brute force sur paramètres
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Header Fingerprinting (sql/hosts.sql)
|
||||
|
||||
Le champ `client_headers` contient la liste comma-separated des headers présents.
|
||||
Exemple : `"Accept,Accept-Encoding,Sec-CH-UA,Sec-Fetch-Dest,User-Agent"`
|
||||
|
||||
### A. Signature par Ordre de Headers
|
||||
|
||||
**Table :** `agg_header_fingerprint_1h`
|
||||
|
||||
| Champ | Description |
|
||||
|-------|-------------|
|
||||
| `header_count` | Nombre total de headers (virgules + 1) |
|
||||
| `has_*` | Flags pour chaque header moderne (Sec-CH-UA, Sec-Fetch-*, etc.) |
|
||||
| `header_order_hash` | MD5(client_headers) = signature unique de l'ordre |
|
||||
| `modern_browser_score` | Score 0-100 basé sur les headers modernes présents |
|
||||
|
||||
### B. Vues de Détection
|
||||
|
||||
**`view_header_missing_modern_headers`** - Headers modernes manquants
|
||||
```sql
|
||||
-- Navigateurs "modernes" avec headers manquants
|
||||
SELECT src_ip, ja4, header_user_agent, modern_browser_score, header_count
|
||||
FROM mabase_prod.view_header_missing_modern_headers
|
||||
WHERE header_user_agent ILIKE '%Chrome%';
|
||||
|
||||
-- Threshold : score < 70 pour Chrome/Firefox = suspect
|
||||
-- Un vrai Chrome envoie automatiquement Sec-CH-UA, Sec-Fetch-*, etc.
|
||||
```
|
||||
|
||||
**`view_header_ua_order_mismatch`** - Spoofing détecté
|
||||
```sql
|
||||
-- Même User-Agent avec ordre de headers différent
|
||||
SELECT header_user_agent, ja4, unique_hashes, unique_ips
|
||||
FROM mabase_prod.view_header_ua_order_mismatch
|
||||
HAVING unique_hashes > 1;
|
||||
|
||||
-- Interprétation : 1 UA avec 2+ ordres de headers = spoofing ou outil custom
|
||||
```
|
||||
|
||||
**`view_header_minimalist_count`** - Bot minimaliste
|
||||
```sql
|
||||
-- Clients avec trop peu de headers
|
||||
SELECT src_ip, ja4, header_count, header_user_agent
|
||||
FROM mabase_prod.view_header_minimalist_count
|
||||
WHERE header_count < 6;
|
||||
|
||||
-- Threshold : < 6 headers = bot scripté (curl, Python requests, etc.)
|
||||
```
|
||||
|
||||
**`view_header_sec_ch_missing`** - Incohérence Chrome
|
||||
```sql
|
||||
-- Chrome sans Sec-CH-UA (impossible pour un vrai Chrome)
|
||||
SELECT src_ip, ja4, header_user_agent
|
||||
FROM mabase_prod.view_header_sec_ch_missing
|
||||
WHERE header_user_agent ILIKE '%Chrome/%';
|
||||
```
|
||||
|
||||
**`view_header_known_bot_signature`** - Signature botnet
|
||||
```sql
|
||||
-- Même ordre de headers sur 10+ IPs différentes
|
||||
SELECT header_order_hash, header_user_agent, unique_ips, total_hits
|
||||
FROM mabase_prod.view_header_known_bot_signature
|
||||
HAVING unique_ips >= 10;
|
||||
|
||||
-- Interprétation : 1 signature sur 10+ IPs = cluster de bots clonés
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. ALPN Mismatch Detection (sql/hosts.sql)
|
||||
|
||||
### Principe
|
||||
|
||||
ALPN (Application-Layer Protocol Negotiation) est une extension TLS qui négocie le protocole HTTP **avant** la requête.
|
||||
|
||||
| ALPN déclaré | HTTP réel | Interprétation |
|
||||
|--------------|-----------|----------------|
|
||||
| `h2` | `HTTP/2` | ✅ Normal |
|
||||
| `h2` | `HTTP/1.1` | ❌ Bot mal configuré |
|
||||
| `http/1.1` | `HTTP/1.1` | ✅ Normal |
|
||||
|
||||
### Vue de Détection
|
||||
|
||||
**`view_alpn_mismatch_detected`**
|
||||
```sql
|
||||
-- Clients déclarant h2 mais parlant HTTP/1.1
|
||||
SELECT src_ip, ja4, declared_alpn, actual_http_version, mismatches, mismatch_pct
|
||||
FROM mabase_prod.view_alpn_mismatch_detected
|
||||
HAVING mismatch_pct >= 80;
|
||||
|
||||
-- Threshold : ≥5 requêtes avec ≥80% d'incohérence
|
||||
-- Cause : curl mal configuré, Python requests, bots spoofant ALPN
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10. Rate Limiting & Burst Detection (sql/hosts.sql)
|
||||
|
||||
### A. Rate Limiting (1 minute)
|
||||
|
||||
**Table :** `agg_rate_limit_1m`
|
||||
|
||||
**Vue :** `view_rate_limit_exceeded`
|
||||
```sql
|
||||
-- IPs dépassant 50 requêtes/minute
|
||||
SELECT minute, src_ip, ja4, requests_per_min, unique_paths
|
||||
FROM mabase_prod.view_rate_limit_exceeded
|
||||
ORDER BY requests_per_min DESC;
|
||||
|
||||
-- Threshold : > 50 req/min = trafic automatisé
|
||||
-- Un humain ne peut pas soutenir 50+ req/min de manière cohérente
|
||||
```
|
||||
|
||||
### B. Burst Detection (10 secondes)
|
||||
|
||||
**Table :** `agg_burst_10s`
|
||||
|
||||
**Vue :** `view_burst_detected`
|
||||
```sql
|
||||
-- Pics soudains de trafic
|
||||
SELECT window, src_ip, ja4, burst_count
|
||||
FROM mabase_prod.view_burst_detected
|
||||
HAVING burst_count > 20;
|
||||
|
||||
-- Threshold : > 20 requêtes en 10 secondes = burst suspect
|
||||
-- Utile pour détecter les attaques par vagues
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11. Path Enumeration / Scanning (sql/hosts.sql)
|
||||
|
||||
### Vue de Détection
|
||||
|
||||
**`view_path_scan_detected`**
|
||||
```sql
|
||||
-- Détection de scanning de paths sensibles
|
||||
SELECT window, src_ip, ja4, host, sensitive_hits, sensitive_ratio
|
||||
FROM mabase_prod.view_path_scan_detected
|
||||
HAVING sensitive_hits >= 5;
|
||||
|
||||
-- Paths surveillés : admin, backup, config, .env, .git, wp-admin,
|
||||
-- phpinfo, test, debug, log, sql, dump, passwd, shadow, htaccess, etc.
|
||||
|
||||
-- Threshold : ≥5 paths sensibles en 5 minutes = scanning
|
||||
```
|
||||
|
||||
### Exemple de Résultat
|
||||
|
||||
| src_ip | ja4 | host | sensitive_hits | sensitive_ratio |
|
||||
|--------|-----|------|----------------|-----------------|
|
||||
| 1.2.3.4 | t13d... | api.example.com | 47 | 94.00 |
|
||||
| 5.6.7.8 | t13d... | www.example.com | 12 | 80.00 |
|
||||
|
||||
**Interprétation :** Ces IPs testent systématiquement les paths sensibles = outils comme Nikto, Dirb, Gobuster.
|
||||
|
||||
---
|
||||
|
||||
## 12. Payload Attack Detection (sql/hosts.sql)
|
||||
|
||||
### A. Types d'Attaques Détectées
|
||||
|
||||
| Type | Patterns Détectés |
|
||||
|------|-------------------|
|
||||
| **SQL Injection** | `UNION SELECT`, `OR 1=1`, `DROP TABLE`, `; --`, `/* */`, `WAITFOR DELAY`, `SLEEP()` |
|
||||
| **XSS** | `<script>`, `javascript:`, `onerror=`, `onload=`, `<img src=data:`, `<svg onload>` |
|
||||
| **Path Traversal** | `../`, `..\\`, `%2e%2e%2f`, `%252e%252e`, `%%32%65%%32%65` |
|
||||
|
||||
### Vue de Détection
|
||||
|
||||
**`view_payload_attacks_detected`**
|
||||
```sql
|
||||
-- Toutes les tentatives d'injection
|
||||
SELECT window, src_ip, ja4, host, path,
|
||||
sqli_attempts, xss_attempts, traversal_attempts
|
||||
FROM mabase_prod.view_payload_attacks_detected
|
||||
ORDER BY sqli_attempts DESC, xss_attempts DESC, traversal_attempts DESC;
|
||||
|
||||
-- Threshold : ≥1 tentative = alerte (zero tolerance)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 13. JA4 Botnet Detection (sql/hosts.sql)
|
||||
|
||||
### Principe
|
||||
|
||||
Un vrai navigateur a un fingerprint TLS unique. Un bot déployé sur 100 machines aura le **même JA4**.
|
||||
|
||||
### Vue de Détection
|
||||
|
||||
**`view_ja4_botnet_suspected`**
|
||||
```sql
|
||||
-- JA4 partagé par 20+ IPs différentes
|
||||
SELECT ja4, ja3_hash, unique_ips, unique_asns, unique_countries, total_hits
|
||||
FROM mabase_prod.view_ja4_botnet_suspected
|
||||
HAVING unique_ips >= 20;
|
||||
|
||||
-- Threshold : ≥20 IPs avec le même JA4 = botnet cloné
|
||||
```
|
||||
|
||||
### Exemple de Résultat
|
||||
|
||||
| ja4 | ja3_hash | unique_ips | unique_asns | unique_countries |
|
||||
|-----|----------|------------|-------------|------------------|
|
||||
| t13d1512... | a3b5c7... | 147 | 12 | 8 |
|
||||
| t13d0918... | f1e2d3... | 52 | 3 | 2 |
|
||||
|
||||
**Interprétation :** 147 IPs différentes avec le même fingerprint = cluster de bots clonés.
|
||||
|
||||
---
|
||||
|
||||
## 14. Correlation Quality (sql/hosts.sql)
|
||||
|
||||
### Principe
|
||||
|
||||
Mesure le ratio d'événements non-corrélés (orphelins). Un trafic légitime a une bonne corrélation HTTP/TCP.
|
||||
|
||||
### Vue de Détection
|
||||
|
||||
**`view_high_orphan_ratio`**
|
||||
```sql
|
||||
-- Trafic avec >80% d'événements non-corrélés
|
||||
SELECT hour, src_ip, ja4, host, correlated, orphans, orphan_pct
|
||||
FROM mabase_prod.view_high_orphan_ratio
|
||||
ORDER BY orphan_pct DESC;
|
||||
|
||||
-- Threshold : orphan_pct > 80% = trafic suspect
|
||||
-- Peut indiquer du trafic généré artificiellement
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 15. Maintenance et Faux Positifs
|
||||
|
||||
### Exceptions Connues
|
||||
|
||||
| Source | Faux Positif | Solution |
|
||||
|--------|--------------|----------|
|
||||
| **Googlebot/Bingbot** | Scan agressif mais légitime | Filtrer par ASN + Reverse DNS |
|
||||
| **Monitoring interne** | Rate limit élevé | Whitelist par IP/ASN |
|
||||
| **CDN/Proxy** | JA4 partagé (clients derrière proxy) | Vérifier ASN (Cloudflare, Akamai) |
|
||||
| **Navigateurs anciens** | Headers modernes manquants | Vérifier UA version |
|
||||
|
||||
### Reset des Scores
|
||||
|
||||
Les agrégats sont automatiquement purgés par TTL :
|
||||
- `agg_*_1h` : TTL 7 jours
|
||||
- `agg_*_5m` : TTL 1 jour
|
||||
- `agg_*_1m` : TTL 1 jour
|
||||
|
||||
Un IP bloquée par erreur retrouvera un score normal après expiration du TTL.
|
||||
|
||||
---
|
||||
|
||||
## 16. Synthèse des Vues de Détection
|
||||
|
||||
| Vue | Détection | Threshold | Impact |
|
||||
|-----|-----------|-----------|--------|
|
||||
| `view_bruteforce_post_detected` | POST endpoints sensibles | ≥10 en 5min | 🔴 Haut |
|
||||
| `view_form_bruteforce_detected` | Query params variables | ≥20 req, ≥10 patterns | 🔴 Haut |
|
||||
| `view_header_missing_modern_headers` | Headers modernes manquants | score < 70 | 🔴 Haut |
|
||||
| `view_header_ua_order_mismatch` | UA spoofing (ordre) | >1 hash | 🔴 Haut |
|
||||
| `view_header_minimalist_count` | Bot minimaliste | < 6 headers | 🔴 Haut |
|
||||
| `view_header_sec_ch_missing` | Chrome sans Sec-CH | absent | 🟡 Moyen |
|
||||
| `view_header_known_bot_signature` | Signature connue (botnet) | 10+ IPs | 🔴 Haut |
|
||||
| `view_alpn_mismatch_detected` | h2 déclaré, HTTP/1.1 parlé | ≥80% mismatch | 🔴 Haut |
|
||||
| `view_rate_limit_exceeded` | Rate limit dépassé | >50 req/min | 🔴 Haut |
|
||||
| `view_burst_detected` | Burst soudain | >20 req/10s | 🟡 Moyen |
|
||||
| `view_path_scan_detected` | Scanning de paths | ≥5 sensibles | 🔴 Haut |
|
||||
| `view_payload_attacks_detected` | Injections SQLi/XSS | ≥1 tentative | 🔴 Critique |
|
||||
| `view_ja4_botnet_suspected` | JA4 partagé (botnet) | ≥20 IPs | 🔴 Haut |
|
||||
| `view_high_orphan_ratio` | Trafic non-corrélé | >80% orphans | 🟡 Moyen |
|
||||
| `view_host_ja4_anomalies` | JA4 sur plusieurs hosts | ≥3 hosts | 🟡 Moyen |
|
||||
| `view_host_ip_ja4_rotation` | IP rotate JA4 | ≥5 JA4 | 🟡 Moyen |
|
||||
|
||||
---
|
||||
|
||||
## 17. Exemples de Requêtes d'Investigation
|
||||
|
||||
### Top 10 des IPs les plus suspectes (score cumulé)
|
||||
```sql
|
||||
WITH threats AS (
|
||||
SELECT src_ip, ja4, 'bruteforce' AS type, sum(attempts) AS score
|
||||
FROM mabase_prod.view_bruteforce_post_detected GROUP BY src_ip, ja4
|
||||
UNION ALL
|
||||
SELECT src_ip, ja4, 'path_scan', sum(sensitive_hits)
|
||||
FROM mabase_prod.view_path_scan_detected GROUP BY src_ip, ja4
|
||||
UNION ALL
|
||||
SELECT src_ip, ja4, 'payload', sum(sqli_attempts + xss_attempts)
|
||||
FROM mabase_prod.view_payload_attacks_detected GROUP BY src_ip, ja4
|
||||
)
|
||||
SELECT src_ip, ja4, sum(score) AS total_score, groupArray(type) AS threat_types
|
||||
FROM threats
|
||||
GROUP BY src_ip, ja4
|
||||
ORDER BY total_score DESC
|
||||
LIMIT 10;
|
||||
```
|
||||
|
||||
### Historique d'une IP suspecte
|
||||
```sql
|
||||
SELECT
|
||||
hour,
|
||||
host,
|
||||
countMerge(hits) AS requests,
|
||||
uniqMerge(uniq_paths) AS unique_paths
|
||||
FROM mabase_prod.agg_host_ip_ja4_1h
|
||||
WHERE src_ip = '1.2.3.4'
|
||||
AND hour >= now() - INTERVAL 24 HOUR
|
||||
GROUP BY hour, host
|
||||
ORDER BY hour DESC;
|
||||
```
|
||||
|
||||
### Corrélation JA4 → User-Agent → Hosts
|
||||
```sql
|
||||
SELECT
|
||||
ja4,
|
||||
any(first_ua) AS user_agent,
|
||||
groupArray(DISTINCT host) AS hosts,
|
||||
sum(countMerge(hits)) AS total_requests
|
||||
FROM mabase_prod.agg_host_ip_ja4_1h
|
||||
WHERE hour >= now() - INTERVAL 1 HOUR
|
||||
GROUP BY ja4
|
||||
ORDER BY total_requests DESC
|
||||
LIMIT 20;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 18. Installation et Maintenance
|
||||
|
||||
### Installation
|
||||
```bash
|
||||
# Exécuter après init.sql
|
||||
clickhouse-client --multiquery < sql/hosts.sql
|
||||
```
|
||||
|
||||
### Vérification
|
||||
```sql
|
||||
-- Compter les enregistrements
|
||||
SELECT count(*) FROM mabase_prod.agg_host_ip_ja4_1h;
|
||||
SELECT count(*) FROM mabase_prod.agg_header_fingerprint_1h;
|
||||
|
||||
-- Tester les vues
|
||||
SELECT * FROM mabase_prod.view_host_identification LIMIT 10;
|
||||
SELECT * FROM mabase_prod.view_bruteforce_post_detected LIMIT 10;
|
||||
SELECT * FROM mabase_prod.view_payload_attacks_detected LIMIT 10;
|
||||
```
|
||||
|
||||
### Monitoring
|
||||
```sql
|
||||
-- Vues les plus actives (dernière heure)
|
||||
SELECT
|
||||
'bruteforce_post' AS view_name, count() AS alerts
|
||||
FROM mabase_prod.view_bruteforce_post_detected
|
||||
UNION ALL
|
||||
SELECT 'path_scan', count() FROM mabase_prod.view_path_scan_detected
|
||||
UNION ALL
|
||||
SELECT 'payload_attacks', count() FROM mabase_prod.view_payload_attacks_detected
|
||||
UNION ALL
|
||||
SELECT 'ja4_botnet', count() FROM mabase_prod.view_ja4_botnet_suspected
|
||||
ORDER BY alerts DESC;
|
||||
```
|
||||
@ -0,0 +1,376 @@
|
||||
package unixsocket
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"math"
|
||||
"net"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strconv"
|
||||
"strings"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
"github.com/antitbone/ja4/correlator/internal/domain"
|
||||
"github.com/antitbone/ja4/correlator/internal/observability"
|
||||
)
|
||||
|
||||
const (
|
||||
// Maximum datagram size for JSON logs (64KB - Unix datagram limit)
|
||||
MaxDatagramSize = 65535
|
||||
// Rate limit: max events per second
|
||||
MaxEventsPerSecond = 10000
|
||||
)
|
||||
|
||||
// Config holds the Unix socket source configuration.
|
||||
type Config struct {
|
||||
Name string
|
||||
Path string
|
||||
SourceType string // "A" for Apache/HTTP, "B" for Network, "" for auto-detect
|
||||
SocketPermissions os.FileMode
|
||||
}
|
||||
|
||||
// UnixSocketSource reads JSON events from a Unix datagram socket.
|
||||
type UnixSocketSource struct {
|
||||
config Config
|
||||
mu sync.Mutex
|
||||
conn *net.UnixConn
|
||||
done chan struct{}
|
||||
wg sync.WaitGroup
|
||||
stopOnce sync.Once
|
||||
logger *observability.Logger
|
||||
}
|
||||
|
||||
// NewUnixSocketSource creates a new Unix socket source.
|
||||
func NewUnixSocketSource(config Config) *UnixSocketSource {
|
||||
return &UnixSocketSource{
|
||||
config: config,
|
||||
done: make(chan struct{}),
|
||||
logger: observability.NewLogger("unixsocket:" + config.Name),
|
||||
}
|
||||
}
|
||||
|
||||
// SetLogger sets the logger for the source (for debug mode).
|
||||
func (s *UnixSocketSource) SetLogger(logger *observability.Logger) {
|
||||
s.logger = logger.WithFields(map[string]any{"source": s.config.Name})
|
||||
}
|
||||
|
||||
// Name returns the source name.
|
||||
func (s *UnixSocketSource) Name() string {
|
||||
return s.config.Name
|
||||
}
|
||||
|
||||
// Start begins listening on the Unix datagram socket.
|
||||
func (s *UnixSocketSource) Start(ctx context.Context, eventChan chan<- *domain.NormalizedEvent) error {
|
||||
if strings.TrimSpace(s.config.Path) == "" {
|
||||
return fmt.Errorf("socket path cannot be empty")
|
||||
}
|
||||
|
||||
// Create parent directory if it doesn't exist
|
||||
socketDir := filepath.Dir(s.config.Path)
|
||||
if err := os.MkdirAll(socketDir, 0755); err != nil {
|
||||
return fmt.Errorf("failed to create socket directory %s: %w", socketDir, err)
|
||||
}
|
||||
|
||||
// Remove existing socket file if present
|
||||
if info, err := os.Stat(s.config.Path); err == nil {
|
||||
if info.Mode()&os.ModeSocket != 0 {
|
||||
if err := os.Remove(s.config.Path); err != nil {
|
||||
return fmt.Errorf("failed to remove existing socket: %w", err)
|
||||
}
|
||||
} else {
|
||||
return fmt.Errorf("path exists but is not a socket: %s", s.config.Path)
|
||||
}
|
||||
}
|
||||
|
||||
// Create Unix datagram socket
|
||||
addr, err := net.ResolveUnixAddr("unixgram", s.config.Path)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to resolve unix socket address: %w", err)
|
||||
}
|
||||
|
||||
conn, err := net.ListenUnixgram("unixgram", addr)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to create unix datagram socket: %w", err)
|
||||
}
|
||||
s.conn = conn
|
||||
|
||||
// Set permissions - fail if we can't
|
||||
permissions := s.config.SocketPermissions
|
||||
if permissions == 0 {
|
||||
permissions = 0666 // default
|
||||
}
|
||||
if err := os.Chmod(s.config.Path, permissions); err != nil {
|
||||
_ = conn.Close()
|
||||
_ = os.Remove(s.config.Path)
|
||||
return fmt.Errorf("failed to set socket permissions: %w", err)
|
||||
}
|
||||
|
||||
s.wg.Add(1)
|
||||
go func() {
|
||||
defer s.wg.Done()
|
||||
s.readDatagrams(ctx, eventChan)
|
||||
}()
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
func (s *UnixSocketSource) readDatagrams(ctx context.Context, eventChan chan<- *domain.NormalizedEvent) {
|
||||
buf := make([]byte, MaxDatagramSize)
|
||||
|
||||
for {
|
||||
select {
|
||||
case <-s.done:
|
||||
return
|
||||
case <-ctx.Done():
|
||||
return
|
||||
default:
|
||||
}
|
||||
|
||||
// Set read deadline to allow periodic context checks
|
||||
_ = s.conn.SetReadDeadline(time.Now().Add(100 * time.Millisecond))
|
||||
|
||||
n, _, err := s.conn.ReadFromUnix(buf)
|
||||
if err != nil {
|
||||
if netErr, ok := err.(net.Error); ok && netErr.Timeout() {
|
||||
// Read timeout, continue to check context
|
||||
continue
|
||||
}
|
||||
// Other errors (e.g., closed socket)
|
||||
select {
|
||||
case <-s.done:
|
||||
return
|
||||
case <-ctx.Done():
|
||||
return
|
||||
default:
|
||||
s.logger.Warnf("read error: %v", err)
|
||||
continue
|
||||
}
|
||||
}
|
||||
|
||||
if n == 0 {
|
||||
continue
|
||||
}
|
||||
|
||||
data := make([]byte, n)
|
||||
copy(data, buf[:n])
|
||||
|
||||
event, err := parseJSONEvent(data, s.config.SourceType)
|
||||
if err != nil {
|
||||
// Log parse errors with the raw data for debugging
|
||||
s.logger.Warnf("parse error: %v | raw: %s", err, string(data))
|
||||
continue
|
||||
}
|
||||
|
||||
// Debug: log raw events with all key details
|
||||
s.logger.Debugf("event received: source=%s src_ip=%s src_port=%d timestamp=%v raw_timestamp=%v",
|
||||
event.Source, event.SrcIP, event.SrcPort, event.Timestamp, event.Raw["timestamp"])
|
||||
|
||||
select {
|
||||
case eventChan <- event:
|
||||
case <-ctx.Done():
|
||||
return
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func resolveSource(sourceType string, headers map[string]string) domain.EventSource {
|
||||
switch strings.ToLower(strings.TrimSpace(sourceType)) {
|
||||
case "a", "apache", "http":
|
||||
return domain.SourceA
|
||||
case "b", "network", "net":
|
||||
return domain.SourceB
|
||||
default:
|
||||
// fallback compat
|
||||
if len(headers) > 0 {
|
||||
return domain.SourceA
|
||||
}
|
||||
return domain.SourceB
|
||||
}
|
||||
}
|
||||
|
||||
func parseJSONEvent(data []byte, sourceType string) (*domain.NormalizedEvent, error) {
|
||||
var raw map[string]any
|
||||
if err := json.Unmarshal(data, &raw); err != nil {
|
||||
return nil, fmt.Errorf("invalid JSON: %w", err)
|
||||
}
|
||||
|
||||
event := &domain.NormalizedEvent{
|
||||
Raw: raw,
|
||||
Extra: make(map[string]any),
|
||||
Headers: make(map[string]string),
|
||||
}
|
||||
|
||||
// Extract headers (header_* fields) first
|
||||
for k, v := range raw {
|
||||
if strings.HasPrefix(k, "header_") {
|
||||
if sv, ok := v.(string); ok {
|
||||
event.Headers[k[7:]] = sv
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Resolve source first (strict timestamp logic depends on source)
|
||||
event.Source = resolveSource(sourceType, event.Headers)
|
||||
|
||||
// Extract and validate src_ip
|
||||
if v, ok := getString(raw, "src_ip"); ok {
|
||||
v = strings.TrimSpace(v)
|
||||
if v == "" {
|
||||
return nil, fmt.Errorf("src_ip cannot be empty")
|
||||
}
|
||||
event.SrcIP = v
|
||||
} else {
|
||||
return nil, fmt.Errorf("missing required field: src_ip")
|
||||
}
|
||||
|
||||
// Extract and validate src_port
|
||||
if v, ok := getInt(raw, "src_port"); ok {
|
||||
if v < 1 || v > 65535 {
|
||||
return nil, fmt.Errorf("src_port must be between 1 and 65535, got %d", v)
|
||||
}
|
||||
event.SrcPort = v
|
||||
} else {
|
||||
return nil, fmt.Errorf("missing required field: src_port")
|
||||
}
|
||||
|
||||
// Extract dst_ip (optional)
|
||||
if v, ok := getString(raw, "dst_ip"); ok {
|
||||
event.DstIP = strings.TrimSpace(v)
|
||||
}
|
||||
|
||||
// Extract dst_port (optional)
|
||||
if v, ok := getInt(raw, "dst_port"); ok {
|
||||
if v < 0 || v > 65535 {
|
||||
return nil, fmt.Errorf("dst_port must be between 0 and 65535, got %d", v)
|
||||
}
|
||||
event.DstPort = v
|
||||
}
|
||||
|
||||
// Extract timestamp based on source contract
|
||||
switch event.Source {
|
||||
case domain.SourceA:
|
||||
ts, ok := getInt64(raw, "timestamp")
|
||||
if !ok {
|
||||
return nil, fmt.Errorf("missing required numeric field: timestamp for source A")
|
||||
}
|
||||
// Assume nanoseconds
|
||||
event.Timestamp = time.Unix(0, ts)
|
||||
case domain.SourceB:
|
||||
// For network source, try to use event timestamp if available,
|
||||
// fallback to reception time. This improves correlation accuracy
|
||||
// when network logs include their own timestamp (e.g., from packet capture).
|
||||
if ts, ok := getInt64(raw, "timestamp"); ok {
|
||||
event.Timestamp = time.Unix(0, ts)
|
||||
} else if timeStr, ok := getString(raw, "time"); ok {
|
||||
// Try RFC3339 format
|
||||
if t, err := time.Parse(time.RFC3339, timeStr); err == nil {
|
||||
event.Timestamp = t
|
||||
} else if t, err := time.Parse(time.RFC3339Nano, timeStr); err == nil {
|
||||
event.Timestamp = t
|
||||
} else {
|
||||
event.Timestamp = time.Now()
|
||||
}
|
||||
} else {
|
||||
event.Timestamp = time.Now()
|
||||
}
|
||||
default:
|
||||
return nil, fmt.Errorf("unsupported source type: %s", event.Source)
|
||||
}
|
||||
|
||||
// Extra fields
|
||||
knownFields := map[string]bool{
|
||||
"src_ip": true, "src_port": true, "dst_ip": true, "dst_port": true,
|
||||
"timestamp": true, "time": true,
|
||||
}
|
||||
for k, v := range raw {
|
||||
if knownFields[k] {
|
||||
continue
|
||||
}
|
||||
if strings.HasPrefix(k, "header_") {
|
||||
continue
|
||||
}
|
||||
event.Extra[k] = v
|
||||
}
|
||||
|
||||
return event, nil
|
||||
}
|
||||
|
||||
func getString(m map[string]any, key string) (string, bool) {
|
||||
if v, ok := m[key]; ok {
|
||||
if s, ok := v.(string); ok {
|
||||
return s, true
|
||||
}
|
||||
}
|
||||
return "", false
|
||||
}
|
||||
|
||||
func getInt(m map[string]any, key string) (int, bool) {
|
||||
if v, ok := m[key]; ok {
|
||||
switch val := v.(type) {
|
||||
case float64:
|
||||
if math.Trunc(val) != val {
|
||||
return 0, false
|
||||
}
|
||||
return int(val), true
|
||||
case int:
|
||||
return val, true
|
||||
case int64:
|
||||
return int(val), true
|
||||
case string:
|
||||
if i, err := strconv.Atoi(val); err == nil {
|
||||
return i, true
|
||||
}
|
||||
}
|
||||
}
|
||||
return 0, false
|
||||
}
|
||||
|
||||
func getInt64(m map[string]any, key string) (int64, bool) {
|
||||
if v, ok := m[key]; ok {
|
||||
switch val := v.(type) {
|
||||
case float64:
|
||||
if math.Trunc(val) != val {
|
||||
return 0, false
|
||||
}
|
||||
return int64(val), true
|
||||
case int:
|
||||
return int64(val), true
|
||||
case int64:
|
||||
return val, true
|
||||
case string:
|
||||
if i, err := strconv.ParseInt(val, 10, 64); err == nil {
|
||||
return i, true
|
||||
}
|
||||
}
|
||||
}
|
||||
return 0, false
|
||||
}
|
||||
|
||||
// Stop gracefully stops the source.
|
||||
func (s *UnixSocketSource) Stop() error {
|
||||
var stopErr error
|
||||
|
||||
s.stopOnce.Do(func() {
|
||||
s.mu.Lock()
|
||||
defer s.mu.Unlock()
|
||||
|
||||
close(s.done)
|
||||
|
||||
if s.conn != nil {
|
||||
_ = s.conn.Close()
|
||||
}
|
||||
|
||||
s.wg.Wait()
|
||||
|
||||
// Clean up socket file
|
||||
if err := os.Remove(s.config.Path); err != nil && !os.IsNotExist(err) {
|
||||
stopErr = fmt.Errorf("failed to remove socket file: %w", err)
|
||||
return
|
||||
}
|
||||
})
|
||||
|
||||
return stopErr
|
||||
}
|
||||
@ -0,0 +1,596 @@
|
||||
package unixsocket
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"net"
|
||||
"os"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"github.com/antitbone/ja4/correlator/internal/domain"
|
||||
)
|
||||
|
||||
func TestParseJSONEvent_Apache(t *testing.T) {
|
||||
data := []byte(`{
|
||||
"src_ip": "192.168.1.1",
|
||||
"src_port": 8080,
|
||||
"dst_ip": "10.0.0.1",
|
||||
"dst_port": 80,
|
||||
"timestamp": 1704110400000000000,
|
||||
"method": "GET",
|
||||
"path": "/api/test",
|
||||
"header_host": "example.com",
|
||||
"header_user_agent": "Mozilla/5.0"
|
||||
}`)
|
||||
|
||||
event, err := parseJSONEvent(data, "A")
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected error: %v", err)
|
||||
}
|
||||
|
||||
if event.SrcIP != "192.168.1.1" {
|
||||
t.Errorf("expected src_ip 192.168.1.1, got %s", event.SrcIP)
|
||||
}
|
||||
if event.SrcPort != 8080 {
|
||||
t.Errorf("expected src_port 8080, got %d", event.SrcPort)
|
||||
}
|
||||
if event.Headers["host"] != "example.com" {
|
||||
t.Errorf("expected header host example.com, got %s", event.Headers["host"])
|
||||
}
|
||||
if event.Headers["user_agent"] != "Mozilla/5.0" {
|
||||
t.Errorf("expected header_user_agent Mozilla/5.0, got %s", event.Headers["user_agent"])
|
||||
}
|
||||
if event.Source != domain.SourceA {
|
||||
t.Errorf("expected source A, got %s", event.Source)
|
||||
}
|
||||
expectedTs := time.Unix(0, 1704110400000000000)
|
||||
if !event.Timestamp.Equal(expectedTs) {
|
||||
t.Errorf("expected timestamp %v, got %v", expectedTs, event.Timestamp)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseJSONEvent_Network(t *testing.T) {
|
||||
data := []byte(`{
|
||||
"src_ip": "192.168.1.1",
|
||||
"src_port": 8080,
|
||||
"dst_ip": "10.0.0.1",
|
||||
"dst_port": 443,
|
||||
"timestamp": 1704110400000000000,
|
||||
"ja3": "abc123def456",
|
||||
"ja4": "xyz789",
|
||||
"tcp_meta_flags": "SYN"
|
||||
}`)
|
||||
|
||||
event, err := parseJSONEvent(data, "B")
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected error: %v", err)
|
||||
}
|
||||
|
||||
if event.SrcIP != "192.168.1.1" {
|
||||
t.Errorf("expected src_ip 192.168.1.1, got %s", event.SrcIP)
|
||||
}
|
||||
if event.Extra["ja3"] != "abc123def456" {
|
||||
t.Errorf("expected ja3 abc123def456, got %v", event.Extra["ja3"])
|
||||
}
|
||||
if event.Source != domain.SourceB {
|
||||
t.Errorf("expected source B, got %s", event.Source)
|
||||
}
|
||||
// Network source now uses payload timestamp if available
|
||||
expectedTs := time.Unix(0, 1704110400000000000)
|
||||
if !event.Timestamp.Equal(expectedTs) {
|
||||
t.Errorf("expected network timestamp %v, got %v", expectedTs, event.Timestamp)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseJSONEvent_InvalidJSON(t *testing.T) {
|
||||
data := []byte(`{invalid json}`)
|
||||
|
||||
_, err := parseJSONEvent(data, "")
|
||||
if err == nil {
|
||||
t.Error("expected error for invalid JSON")
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseJSONEvent_MissingFields(t *testing.T) {
|
||||
data := []byte(`{"other_field": "value"}`)
|
||||
|
||||
_, err := parseJSONEvent(data, "")
|
||||
if err == nil {
|
||||
t.Error("expected error for missing src_ip/src_port")
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseJSONEvent_SourceARequiresNumericTimestamp(t *testing.T) {
|
||||
data := []byte(`{
|
||||
"src_ip": "192.168.1.1",
|
||||
"src_port": 8080,
|
||||
"time": "2024-01-01T12:00:00Z"
|
||||
}`)
|
||||
|
||||
_, err := parseJSONEvent(data, "A")
|
||||
if err == nil {
|
||||
t.Fatal("expected error for source A without numeric timestamp")
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseJSONEvent_SourceBUsesPayloadTimestamp(t *testing.T) {
|
||||
expectedTs := int64(1704110400000000000)
|
||||
data := []byte(`{
|
||||
"src_ip": "192.168.1.1",
|
||||
"src_port": 8080,
|
||||
"timestamp": 1704110400000000000
|
||||
}`)
|
||||
|
||||
event, err := parseJSONEvent(data, "B")
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected error: %v", err)
|
||||
}
|
||||
|
||||
expectedTime := time.Unix(0, expectedTs)
|
||||
if !event.Timestamp.Equal(expectedTime) {
|
||||
t.Errorf("expected source B to use payload timestamp %v, got %v", expectedTime, event.Timestamp)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseJSONEvent_SourceBUsesTimeField(t *testing.T) {
|
||||
data := []byte(`{
|
||||
"src_ip": "192.168.1.1",
|
||||
"src_port": 8080,
|
||||
"time": "2024-01-01T12:00:00Z"
|
||||
}`)
|
||||
|
||||
event, err := parseJSONEvent(data, "B")
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected error: %v", err)
|
||||
}
|
||||
|
||||
expectedTime := time.Unix(0, 1704110400000000000)
|
||||
if !event.Timestamp.Equal(expectedTime) {
|
||||
t.Errorf("expected source B to use time field %v, got %v", expectedTime, event.Timestamp)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseJSONEvent_SourceBFallbackToNow(t *testing.T) {
|
||||
data := []byte(`{
|
||||
"src_ip": "192.168.1.1",
|
||||
"src_port": 8080
|
||||
}`)
|
||||
|
||||
before := time.Now()
|
||||
event, err := parseJSONEvent(data, "B")
|
||||
after := time.Now()
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected error: %v", err)
|
||||
}
|
||||
|
||||
if event.Timestamp.Before(before.Add(-2*time.Second)) || event.Timestamp.After(after.Add(2*time.Second)) {
|
||||
t.Errorf("expected source B timestamp near now, got %v", event.Timestamp)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseJSONEvent_ExplicitSourceType(t *testing.T) {
|
||||
tests := []struct {
|
||||
name string
|
||||
data string
|
||||
sourceType string
|
||||
expected domain.EventSource
|
||||
}{
|
||||
{
|
||||
name: "explicit A",
|
||||
data: `{"src_ip": "192.168.1.1", "src_port": 8080, "timestamp": 1704110400000000000}`,
|
||||
sourceType: "A",
|
||||
expected: domain.SourceA,
|
||||
},
|
||||
{
|
||||
name: "explicit B",
|
||||
data: `{"src_ip": "192.168.1.1", "src_port": 8080}`,
|
||||
sourceType: "B",
|
||||
expected: domain.SourceB,
|
||||
},
|
||||
{
|
||||
name: "explicit apache",
|
||||
data: `{"src_ip": "192.168.1.1", "src_port": 8080, "timestamp": 1704110400000000000}`,
|
||||
sourceType: "apache",
|
||||
expected: domain.SourceA,
|
||||
},
|
||||
{
|
||||
name: "explicit network",
|
||||
data: `{"src_ip": "192.168.1.1", "src_port": 8080}`,
|
||||
sourceType: "network",
|
||||
expected: domain.SourceB,
|
||||
},
|
||||
{
|
||||
name: "auto-detect A with headers",
|
||||
data: `{"src_ip": "192.168.1.1", "src_port": 8080, "timestamp": 1704110400000000000, "header_host": "example.com"}`,
|
||||
sourceType: "",
|
||||
expected: domain.SourceA,
|
||||
},
|
||||
{
|
||||
name: "auto-detect B without headers",
|
||||
data: `{"src_ip": "192.168.1.1", "src_port": 8080, "ja3": "abc"}`,
|
||||
sourceType: "",
|
||||
expected: domain.SourceB,
|
||||
},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
event, err := parseJSONEvent([]byte(tt.data), tt.sourceType)
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected error: %v", err)
|
||||
}
|
||||
if event.Source != tt.expected {
|
||||
t.Errorf("expected source %s, got %s", tt.expected, event.Source)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestUnixSocketSource_Name(t *testing.T) {
|
||||
source := NewUnixSocketSource(Config{
|
||||
Name: "test_source",
|
||||
Path: "/tmp/test.sock",
|
||||
})
|
||||
|
||||
if source.Name() != "test_source" {
|
||||
t.Errorf("expected name 'test_source', got %s", source.Name())
|
||||
}
|
||||
}
|
||||
|
||||
func TestUnixSocketSource_StopWithoutStart(t *testing.T) {
|
||||
source := NewUnixSocketSource(Config{
|
||||
Name: "test_source",
|
||||
Path: "/tmp/test.sock",
|
||||
})
|
||||
|
||||
// Should not panic
|
||||
err := source.Stop()
|
||||
if err != nil {
|
||||
t.Errorf("expected no error on stop without start, got %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestUnixSocketSource_EmptyPath(t *testing.T) {
|
||||
source := NewUnixSocketSource(Config{
|
||||
Name: "test_source",
|
||||
Path: "",
|
||||
})
|
||||
|
||||
ctx := context.Background()
|
||||
eventChan := make(chan *domain.NormalizedEvent, 10)
|
||||
|
||||
err := source.Start(ctx, eventChan)
|
||||
if err == nil {
|
||||
t.Error("expected error for empty path")
|
||||
}
|
||||
}
|
||||
|
||||
func TestGetString(t *testing.T) {
|
||||
m := map[string]any{
|
||||
"string": "hello",
|
||||
"int": 42,
|
||||
"nil": nil,
|
||||
}
|
||||
|
||||
v, ok := getString(m, "string")
|
||||
if !ok || v != "hello" {
|
||||
t.Errorf("expected 'hello', got %v, %v", v, ok)
|
||||
}
|
||||
|
||||
_, ok = getString(m, "int")
|
||||
if ok {
|
||||
t.Error("expected false for int")
|
||||
}
|
||||
|
||||
_, ok = getString(m, "missing")
|
||||
if ok {
|
||||
t.Error("expected false for missing key")
|
||||
}
|
||||
}
|
||||
|
||||
func TestGetInt(t *testing.T) {
|
||||
m := map[string]any{
|
||||
"float": 42.5,
|
||||
"int": 42,
|
||||
"int64": int64(42),
|
||||
"string": "42",
|
||||
"bad": "not a number",
|
||||
"nil": nil,
|
||||
}
|
||||
|
||||
tests := []struct {
|
||||
key string
|
||||
expected int
|
||||
ok bool
|
||||
}{
|
||||
{"float", 0, false},
|
||||
{"int", 42, true},
|
||||
{"int64", 42, true},
|
||||
{"string", 42, true},
|
||||
{"bad", 0, false},
|
||||
{"nil", 0, false},
|
||||
{"missing", 0, false},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.key, func(t *testing.T) {
|
||||
v, ok := getInt(m, tt.key)
|
||||
if ok != tt.ok {
|
||||
t.Errorf("getInt(%q) ok = %v, want %v", tt.key, ok, tt.ok)
|
||||
}
|
||||
if v != tt.expected {
|
||||
t.Errorf("getInt(%q) = %v, want %v", tt.key, v, tt.expected)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestGetInt64(t *testing.T) {
|
||||
m := map[string]any{
|
||||
"float": 42.5,
|
||||
"int": 42,
|
||||
"int64": int64(42),
|
||||
"string": "42",
|
||||
"bad": "not a number",
|
||||
"nil": nil,
|
||||
}
|
||||
|
||||
tests := []struct {
|
||||
key string
|
||||
expected int64
|
||||
ok bool
|
||||
}{
|
||||
{"float", 0, false},
|
||||
{"int", 42, true},
|
||||
{"int64", 42, true},
|
||||
{"string", 42, true},
|
||||
{"bad", 0, false},
|
||||
{"nil", 0, false},
|
||||
{"missing", 0, false},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.key, func(t *testing.T) {
|
||||
v, ok := getInt64(m, tt.key)
|
||||
if ok != tt.ok {
|
||||
t.Errorf("getInt64(%q) ok = %v, want %v", tt.key, ok, tt.ok)
|
||||
}
|
||||
if v != tt.expected {
|
||||
t.Errorf("getInt64(%q) = %v, want %v", tt.key, v, tt.expected)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseJSONEvent_PortValidation(t *testing.T) {
|
||||
tests := []struct {
|
||||
name string
|
||||
data string
|
||||
sourceType string
|
||||
wantErr bool
|
||||
}{
|
||||
{
|
||||
name: "valid src_port",
|
||||
data: `{"src_ip": "192.168.1.1", "src_port": 8080}`,
|
||||
sourceType: "B",
|
||||
wantErr: false,
|
||||
},
|
||||
{
|
||||
name: "src_port zero",
|
||||
data: `{"src_ip": "192.168.1.1", "src_port": 0}`,
|
||||
sourceType: "B",
|
||||
wantErr: true,
|
||||
},
|
||||
{
|
||||
name: "src_port negative",
|
||||
data: `{"src_ip": "192.168.1.1", "src_port": -1}`,
|
||||
sourceType: "B",
|
||||
wantErr: true,
|
||||
},
|
||||
{
|
||||
name: "src_port too high",
|
||||
data: `{"src_ip": "192.168.1.1", "src_port": 70000}`,
|
||||
sourceType: "B",
|
||||
wantErr: true,
|
||||
},
|
||||
{
|
||||
name: "valid dst_port zero",
|
||||
data: `{"src_ip": "192.168.1.1", "src_port": 8080, "dst_port": 0}`,
|
||||
sourceType: "B",
|
||||
wantErr: false,
|
||||
},
|
||||
{
|
||||
name: "dst_port too high",
|
||||
data: `{"src_ip": "192.168.1.1", "src_port": 8080, "dst_port": 70000}`,
|
||||
sourceType: "B",
|
||||
wantErr: true,
|
||||
},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
_, err := parseJSONEvent([]byte(tt.data), tt.sourceType)
|
||||
if (err != nil) != tt.wantErr {
|
||||
t.Errorf("parseJSONEvent() error = %v, wantErr %v", err, tt.wantErr)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseJSONEvent_TimestampFallback(t *testing.T) {
|
||||
data := []byte(`{"src_ip": "192.168.1.1", "src_port": 8080}`)
|
||||
event, err := parseJSONEvent(data, "B")
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected error: %v", err)
|
||||
}
|
||||
|
||||
// For source B, timestamp is reception time
|
||||
if event.Timestamp.IsZero() {
|
||||
t.Error("expected non-zero timestamp")
|
||||
}
|
||||
}
|
||||
|
||||
func TestUnixSocketSource_StartStopDatagram(t *testing.T) {
|
||||
tmpPath := "/tmp/test_logcorrelator_datagram.sock"
|
||||
// Clean up any existing socket
|
||||
os.Remove(tmpPath)
|
||||
|
||||
source := NewUnixSocketSource(Config{
|
||||
Name: "test_datagram",
|
||||
Path: tmpPath,
|
||||
SourceType: "B",
|
||||
SocketPermissions: 0666,
|
||||
})
|
||||
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
|
||||
defer cancel()
|
||||
|
||||
eventChan := make(chan *domain.NormalizedEvent, 10)
|
||||
|
||||
err := source.Start(ctx, eventChan)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to start source: %v", err)
|
||||
}
|
||||
|
||||
// Give socket time to start
|
||||
time.Sleep(100 * time.Millisecond)
|
||||
|
||||
// Verify socket file exists
|
||||
if _, err := os.Stat(tmpPath); os.IsNotExist(err) {
|
||||
t.Error("socket file should exist")
|
||||
}
|
||||
|
||||
// Stop the source
|
||||
err = source.Stop()
|
||||
if err != nil {
|
||||
t.Errorf("failed to stop source: %v", err)
|
||||
}
|
||||
|
||||
// Socket file should be cleaned up
|
||||
time.Sleep(100 * time.Millisecond)
|
||||
if _, err := os.Stat(tmpPath); !os.IsNotExist(err) {
|
||||
t.Error("socket file should be removed after stop")
|
||||
}
|
||||
}
|
||||
|
||||
func TestUnixSocketSource_SendDatagram(t *testing.T) {
|
||||
tmpPath := "/tmp/test_logcorrelator_send.sock"
|
||||
os.Remove(tmpPath)
|
||||
|
||||
source := NewUnixSocketSource(Config{
|
||||
Name: "test_send",
|
||||
Path: tmpPath,
|
||||
SourceType: "B",
|
||||
SocketPermissions: 0666,
|
||||
})
|
||||
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
defer cancel()
|
||||
|
||||
eventChan := make(chan *domain.NormalizedEvent, 10)
|
||||
|
||||
err := source.Start(ctx, eventChan)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to start source: %v", err)
|
||||
}
|
||||
|
||||
// Give socket time to start
|
||||
time.Sleep(100 * time.Millisecond)
|
||||
|
||||
// Connect and send a datagram
|
||||
conn, err := net.Dial("unixgram", tmpPath)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to dial socket: %v", err)
|
||||
}
|
||||
defer conn.Close()
|
||||
|
||||
data := []byte(`{"src_ip": "192.168.1.1", "src_port": 8080, "ja3": "test"}`)
|
||||
_, err = conn.Write(data)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to write: %v", err)
|
||||
}
|
||||
|
||||
// Wait for event
|
||||
select {
|
||||
case event := <-eventChan:
|
||||
if event.SrcIP != "192.168.1.1" {
|
||||
t.Errorf("expected src_ip 192.168.1.1, got %s", event.SrcIP)
|
||||
}
|
||||
if event.SrcPort != 8080 {
|
||||
t.Errorf("expected src_port 8080, got %d", event.SrcPort)
|
||||
}
|
||||
case <-time.After(2 * time.Second):
|
||||
t.Error("timeout waiting for event")
|
||||
case <-ctx.Done():
|
||||
t.Error("context cancelled")
|
||||
}
|
||||
|
||||
err = source.Stop()
|
||||
if err != nil {
|
||||
t.Errorf("failed to stop source: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestUnixSocketSource_MultipleDatagrams(t *testing.T) {
|
||||
tmpPath := "/tmp/test_logcorrelator_multi.sock"
|
||||
os.Remove(tmpPath)
|
||||
|
||||
source := NewUnixSocketSource(Config{
|
||||
Name: "test_multi",
|
||||
Path: tmpPath,
|
||||
SourceType: "B",
|
||||
SocketPermissions: 0666,
|
||||
})
|
||||
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
defer cancel()
|
||||
|
||||
eventChan := make(chan *domain.NormalizedEvent, 100)
|
||||
|
||||
err := source.Start(ctx, eventChan)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to start source: %v", err)
|
||||
}
|
||||
|
||||
// Give socket time to start
|
||||
time.Sleep(100 * time.Millisecond)
|
||||
|
||||
// Connect and send multiple datagrams
|
||||
conn, err := net.Dial("unixgram", tmpPath)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to dial socket: %v", err)
|
||||
}
|
||||
defer conn.Close()
|
||||
|
||||
for i := 0; i < 5; i++ {
|
||||
data := []byte(fmt.Sprintf(`{"src_ip": "192.168.1.%d", "src_port": %d, "ja3": "test%d"}`, i+1, 8080+i, i))
|
||||
_, err = conn.Write(data)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to write datagram %d: %v", i, err)
|
||||
}
|
||||
}
|
||||
|
||||
// Wait for all events
|
||||
received := 0
|
||||
timeout := time.After(3 * time.Second)
|
||||
for received < 5 {
|
||||
select {
|
||||
case event := <-eventChan:
|
||||
received++
|
||||
t.Logf("received event %d: src_ip=%s", received, event.SrcIP)
|
||||
case <-timeout:
|
||||
t.Errorf("timeout waiting for events, received %d/5", received)
|
||||
goto done
|
||||
case <-ctx.Done():
|
||||
t.Error("context cancelled")
|
||||
goto done
|
||||
}
|
||||
}
|
||||
|
||||
done:
|
||||
err = source.Stop()
|
||||
if err != nil {
|
||||
t.Errorf("failed to stop source: %v", err)
|
||||
}
|
||||
}
|
||||
@ -0,0 +1,391 @@
|
||||
package clickhouse
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"fmt"
|
||||
"net"
|
||||
"strings"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
"github.com/ClickHouse/clickhouse-go/v2"
|
||||
"github.com/antitbone/ja4/correlator/internal/domain"
|
||||
"github.com/antitbone/ja4/correlator/internal/observability"
|
||||
)
|
||||
|
||||
const (
|
||||
// DefaultBatchSize is the default number of records per batch
|
||||
DefaultBatchSize = 500
|
||||
// DefaultFlushIntervalMs is the default flush interval in milliseconds
|
||||
DefaultFlushIntervalMs = 200
|
||||
// DefaultMaxBufferSize is the default maximum buffer size
|
||||
DefaultMaxBufferSize = 5000
|
||||
// DefaultTimeoutMs is the default timeout for operations in milliseconds
|
||||
DefaultTimeoutMs = 1000
|
||||
// DefaultPingTimeoutMs is the timeout for initial connection ping
|
||||
DefaultPingTimeoutMs = 5000
|
||||
// MaxRetries is the maximum number of retry attempts for failed inserts
|
||||
MaxRetries = 3
|
||||
// RetryBaseDelay is the base delay between retries
|
||||
RetryBaseDelay = 100 * time.Millisecond
|
||||
)
|
||||
|
||||
// Config holds the ClickHouse sink configuration.
|
||||
type Config struct {
|
||||
DSN string
|
||||
Table string
|
||||
BatchSize int
|
||||
FlushIntervalMs int
|
||||
MaxBufferSize int
|
||||
DropOnOverflow bool
|
||||
AsyncInsert bool
|
||||
TimeoutMs int
|
||||
}
|
||||
|
||||
// ClickHouseSink writes correlated logs to ClickHouse.
|
||||
type ClickHouseSink struct {
|
||||
config Config
|
||||
conn clickhouse.Conn
|
||||
mu sync.Mutex
|
||||
buffer []domain.CorrelatedLog
|
||||
flushChan chan struct{}
|
||||
done chan struct{}
|
||||
wg sync.WaitGroup
|
||||
closeOnce sync.Once
|
||||
logger *observability.Logger
|
||||
}
|
||||
|
||||
// SetLogger sets the logger used by the sink.
|
||||
func (s *ClickHouseSink) SetLogger(logger *observability.Logger) {
|
||||
s.logger = logger.WithFields(map[string]any{"sink": "clickhouse"})
|
||||
}
|
||||
|
||||
// NewClickHouseSink creates a new ClickHouse sink.
|
||||
func NewClickHouseSink(config Config) (*ClickHouseSink, error) {
|
||||
if strings.TrimSpace(config.DSN) == "" {
|
||||
return nil, fmt.Errorf("clickhouse DSN is required")
|
||||
}
|
||||
if strings.TrimSpace(config.Table) == "" {
|
||||
return nil, fmt.Errorf("clickhouse table is required")
|
||||
}
|
||||
|
||||
// Apply defaults
|
||||
if config.BatchSize <= 0 {
|
||||
config.BatchSize = DefaultBatchSize
|
||||
}
|
||||
if config.FlushIntervalMs <= 0 {
|
||||
config.FlushIntervalMs = DefaultFlushIntervalMs
|
||||
}
|
||||
if config.MaxBufferSize <= 0 {
|
||||
config.MaxBufferSize = DefaultMaxBufferSize
|
||||
}
|
||||
if config.TimeoutMs <= 0 {
|
||||
config.TimeoutMs = DefaultTimeoutMs
|
||||
}
|
||||
|
||||
s := &ClickHouseSink{
|
||||
config: config,
|
||||
buffer: make([]domain.CorrelatedLog, 0, config.BatchSize),
|
||||
flushChan: make(chan struct{}, 1),
|
||||
done: make(chan struct{}),
|
||||
logger: observability.NewLogger("clickhouse"),
|
||||
}
|
||||
|
||||
// Parse DSN and create options
|
||||
options, err := clickhouse.ParseDSN(config.DSN)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to parse ClickHouse DSN: %w", err)
|
||||
}
|
||||
|
||||
// Connect to ClickHouse using native API
|
||||
conn, err := clickhouse.Open(options)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to connect to ClickHouse: %w", err)
|
||||
}
|
||||
|
||||
// Ping with timeout to verify connection
|
||||
pingCtx, pingCancel := context.WithTimeout(context.Background(), time.Duration(DefaultPingTimeoutMs)*time.Millisecond)
|
||||
defer pingCancel()
|
||||
|
||||
if err := conn.Ping(pingCtx); err != nil {
|
||||
_ = conn.Close()
|
||||
return nil, fmt.Errorf("failed to ping ClickHouse: %w", err)
|
||||
}
|
||||
|
||||
s.conn = conn
|
||||
s.log().Infof("connected to ClickHouse: table=%s batch_size=%d flush_interval_ms=%d",
|
||||
config.Table, config.BatchSize, config.FlushIntervalMs)
|
||||
|
||||
// Start flush goroutine
|
||||
s.wg.Add(1)
|
||||
go s.flushLoop()
|
||||
|
||||
return s, nil
|
||||
}
|
||||
|
||||
// Name returns the sink name.
|
||||
func (s *ClickHouseSink) Name() string {
|
||||
return "clickhouse"
|
||||
}
|
||||
|
||||
// log returns the logger, initializing a default one if not set (e.g. in tests).
|
||||
func (s *ClickHouseSink) log() *observability.Logger {
|
||||
if s.logger == nil {
|
||||
s.logger = observability.NewLogger("clickhouse")
|
||||
}
|
||||
return s.logger
|
||||
}
|
||||
|
||||
// Reopen is a no-op for ClickHouse (connection is managed internally).
|
||||
func (s *ClickHouseSink) Reopen() error {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Write adds a log to the buffer.
|
||||
func (s *ClickHouseSink) Write(ctx context.Context, log domain.CorrelatedLog) error {
|
||||
deadline := time.Now().Add(time.Duration(s.config.TimeoutMs) * time.Millisecond)
|
||||
|
||||
for {
|
||||
s.mu.Lock()
|
||||
if len(s.buffer) < s.config.MaxBufferSize {
|
||||
s.buffer = append(s.buffer, log)
|
||||
if len(s.buffer) >= s.config.BatchSize {
|
||||
select {
|
||||
case s.flushChan <- struct{}{}:
|
||||
default:
|
||||
}
|
||||
}
|
||||
s.mu.Unlock()
|
||||
return nil
|
||||
}
|
||||
drop := s.config.DropOnOverflow
|
||||
s.mu.Unlock()
|
||||
|
||||
if drop {
|
||||
s.log().Warnf("buffer full, dropping log: table=%s buffer_size=%d", s.config.Table, s.config.MaxBufferSize)
|
||||
return nil
|
||||
}
|
||||
if time.Now().After(deadline) {
|
||||
return fmt.Errorf("buffer full, timeout exceeded")
|
||||
}
|
||||
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
return ctx.Err()
|
||||
case <-time.After(10 * time.Millisecond):
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Flush flushes the buffer to ClickHouse.
|
||||
func (s *ClickHouseSink) Flush(ctx context.Context) error {
|
||||
return s.doFlush(ctx)
|
||||
}
|
||||
|
||||
// Close closes the sink.
|
||||
func (s *ClickHouseSink) Close() error {
|
||||
var closeErr error
|
||||
|
||||
s.closeOnce.Do(func() {
|
||||
if s.done != nil {
|
||||
close(s.done)
|
||||
}
|
||||
s.wg.Wait()
|
||||
|
||||
flushCtx, cancel := context.WithTimeout(context.Background(), time.Duration(s.config.TimeoutMs)*time.Millisecond)
|
||||
defer cancel()
|
||||
if err := s.doFlush(flushCtx); err != nil {
|
||||
closeErr = err
|
||||
}
|
||||
|
||||
if s.conn != nil {
|
||||
if err := s.conn.Close(); err != nil && closeErr == nil {
|
||||
closeErr = err
|
||||
}
|
||||
}
|
||||
})
|
||||
|
||||
return closeErr
|
||||
}
|
||||
|
||||
func (s *ClickHouseSink) flushLoop() {
|
||||
defer s.wg.Done()
|
||||
|
||||
ticker := time.NewTicker(time.Duration(s.config.FlushIntervalMs) * time.Millisecond)
|
||||
defer ticker.Stop()
|
||||
|
||||
for {
|
||||
select {
|
||||
case <-s.done:
|
||||
ctx, cancel := context.WithTimeout(context.Background(), time.Duration(s.config.TimeoutMs)*time.Millisecond)
|
||||
if err := s.doFlush(ctx); err != nil {
|
||||
s.log().Error("final flush on close failed", err)
|
||||
}
|
||||
cancel()
|
||||
return
|
||||
|
||||
case <-ticker.C:
|
||||
s.mu.Lock()
|
||||
needsFlush := len(s.buffer) > 0
|
||||
s.mu.Unlock()
|
||||
|
||||
if needsFlush {
|
||||
ctx, cancel := context.WithTimeout(context.Background(), time.Duration(s.config.TimeoutMs)*time.Millisecond)
|
||||
if err := s.doFlush(ctx); err != nil {
|
||||
s.log().Error("periodic flush failed", err)
|
||||
}
|
||||
cancel()
|
||||
}
|
||||
|
||||
case <-s.flushChan:
|
||||
s.mu.Lock()
|
||||
needsFlush := len(s.buffer) >= s.config.BatchSize
|
||||
s.mu.Unlock()
|
||||
|
||||
if needsFlush {
|
||||
ctx, cancel := context.WithTimeout(context.Background(), time.Duration(s.config.TimeoutMs)*time.Millisecond)
|
||||
if err := s.doFlush(ctx); err != nil {
|
||||
s.log().Error("batch flush failed", err)
|
||||
}
|
||||
cancel()
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func (s *ClickHouseSink) doFlush(ctx context.Context) error {
|
||||
s.mu.Lock()
|
||||
if len(s.buffer) == 0 {
|
||||
s.mu.Unlock()
|
||||
return nil
|
||||
}
|
||||
|
||||
// Copy buffer to flush
|
||||
buffer := make([]domain.CorrelatedLog, len(s.buffer))
|
||||
copy(buffer, s.buffer)
|
||||
s.buffer = make([]domain.CorrelatedLog, 0, s.config.BatchSize)
|
||||
s.mu.Unlock()
|
||||
|
||||
if s.conn == nil {
|
||||
return fmt.Errorf("clickhouse connection is not initialized")
|
||||
}
|
||||
|
||||
batchSize := len(buffer)
|
||||
|
||||
// Retry logic with exponential backoff
|
||||
var lastErr error
|
||||
for attempt := 0; attempt < MaxRetries; attempt++ {
|
||||
if attempt > 0 {
|
||||
delay := RetryBaseDelay * time.Duration(1<<uint(attempt-1))
|
||||
s.log().Warnf("retrying batch insert: attempt=%d/%d delay=%s rows=%d err=%v",
|
||||
attempt+1, MaxRetries, delay, batchSize, lastErr)
|
||||
select {
|
||||
case <-time.After(delay):
|
||||
case <-ctx.Done():
|
||||
return ctx.Err()
|
||||
}
|
||||
}
|
||||
|
||||
lastErr = s.executeBatch(ctx, buffer)
|
||||
if lastErr == nil {
|
||||
s.log().Debugf("batch sent: rows=%d table=%s", batchSize, s.config.Table)
|
||||
return nil
|
||||
}
|
||||
|
||||
if !isRetryableError(lastErr) {
|
||||
return fmt.Errorf("non-retryable error: %w", lastErr)
|
||||
}
|
||||
}
|
||||
|
||||
return fmt.Errorf("failed after %d retries (batch size: %d): %w", MaxRetries, batchSize, lastErr)
|
||||
}
|
||||
|
||||
func (s *ClickHouseSink) executeBatch(ctx context.Context, buffer []domain.CorrelatedLog) error {
|
||||
if s.conn == nil {
|
||||
return fmt.Errorf("clickhouse connection is not initialized")
|
||||
}
|
||||
|
||||
// Table schema: http_logs_raw (raw_json String)
|
||||
// Single column insert - the entire log is serialized as JSON string
|
||||
query := fmt.Sprintf(`INSERT INTO %s (raw_json)`, s.config.Table)
|
||||
|
||||
// Prepare batch using native clickhouse-go/v2 API
|
||||
batch, err := s.conn.PrepareBatch(ctx, query)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to prepare batch: %w", err)
|
||||
}
|
||||
|
||||
for i, log := range buffer {
|
||||
// Marshal the entire CorrelatedLog to JSON
|
||||
logJSON, marshalErr := json.Marshal(log)
|
||||
if marshalErr != nil {
|
||||
return fmt.Errorf("failed to marshal log %d to JSON: %w", i, marshalErr)
|
||||
}
|
||||
|
||||
// Append the JSON string as the raw_json column value
|
||||
appendErr := batch.Append(string(logJSON))
|
||||
if appendErr != nil {
|
||||
return fmt.Errorf("failed to append log %d to batch: %w", i, appendErr)
|
||||
}
|
||||
}
|
||||
|
||||
// Send the batch - DO NOT FORGET this step
|
||||
sendErr := batch.Send()
|
||||
if sendErr != nil {
|
||||
return fmt.Errorf("failed to send batch (%d rows): %w", len(buffer), sendErr)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// isRetryableError checks if an error is retryable.
|
||||
func isRetryableError(err error) bool {
|
||||
if err == nil {
|
||||
return false
|
||||
}
|
||||
|
||||
if errors.Is(err, context.DeadlineExceeded) {
|
||||
return true
|
||||
}
|
||||
|
||||
if errors.Is(err, context.Canceled) {
|
||||
return false
|
||||
}
|
||||
|
||||
var netErr net.Error
|
||||
if errors.As(err, &netErr) {
|
||||
if netErr.Timeout() {
|
||||
return true
|
||||
}
|
||||
}
|
||||
|
||||
errStr := strings.ToLower(err.Error())
|
||||
|
||||
// Explicit non-retryable SQL/schema errors
|
||||
if strings.Contains(errStr, "syntax error") ||
|
||||
strings.Contains(errStr, "unknown table") ||
|
||||
strings.Contains(errStr, "unknown column") ||
|
||||
(strings.Contains(errStr, "table") && strings.Contains(errStr, "not found")) {
|
||||
return false
|
||||
}
|
||||
|
||||
// Fallback network/transient errors
|
||||
retryableErrors := []string{
|
||||
"connection refused",
|
||||
"connection reset",
|
||||
"timeout",
|
||||
"temporary failure",
|
||||
"network is unreachable",
|
||||
"broken pipe",
|
||||
"no route to host",
|
||||
}
|
||||
for _, re := range retryableErrors {
|
||||
if strings.Contains(errStr, re) {
|
||||
return true
|
||||
}
|
||||
}
|
||||
|
||||
return false
|
||||
}
|
||||
@ -0,0 +1,538 @@
|
||||
package clickhouse
|
||||
|
||||
import (
|
||||
"context"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"github.com/antitbone/ja4/correlator/internal/domain"
|
||||
"github.com/antitbone/ja4/correlator/internal/observability"
|
||||
)
|
||||
|
||||
func TestClickHouseSink_Name(t *testing.T) {
|
||||
sink := &ClickHouseSink{
|
||||
config: Config{
|
||||
DSN: "clickhouse://test:test@localhost:9000/test",
|
||||
Table: "test_table",
|
||||
},
|
||||
}
|
||||
|
||||
if sink.Name() != "clickhouse" {
|
||||
t.Errorf("expected name 'clickhouse', got %s", sink.Name())
|
||||
}
|
||||
}
|
||||
|
||||
func TestClickHouseSink_ConfigDefaults(t *testing.T) {
|
||||
// Test that defaults are applied correctly
|
||||
config := Config{
|
||||
DSN: "clickhouse://test:test@localhost:9000/test",
|
||||
Table: "test_table",
|
||||
// Other fields are zero, should get defaults
|
||||
}
|
||||
|
||||
// Verify defaults would be applied (we can't actually connect in tests)
|
||||
if config.BatchSize <= 0 {
|
||||
config.BatchSize = DefaultBatchSize
|
||||
}
|
||||
if config.FlushIntervalMs <= 0 {
|
||||
config.FlushIntervalMs = DefaultFlushIntervalMs
|
||||
}
|
||||
if config.MaxBufferSize <= 0 {
|
||||
config.MaxBufferSize = DefaultMaxBufferSize
|
||||
}
|
||||
if config.TimeoutMs <= 0 {
|
||||
config.TimeoutMs = DefaultTimeoutMs
|
||||
}
|
||||
|
||||
if config.BatchSize != DefaultBatchSize {
|
||||
t.Errorf("expected BatchSize %d, got %d", DefaultBatchSize, config.BatchSize)
|
||||
}
|
||||
if config.FlushIntervalMs != DefaultFlushIntervalMs {
|
||||
t.Errorf("expected FlushIntervalMs %d, got %d", DefaultFlushIntervalMs, config.FlushIntervalMs)
|
||||
}
|
||||
if config.MaxBufferSize != DefaultMaxBufferSize {
|
||||
t.Errorf("expected MaxBufferSize %d, got %d", DefaultMaxBufferSize, config.MaxBufferSize)
|
||||
}
|
||||
if config.TimeoutMs != DefaultTimeoutMs {
|
||||
t.Errorf("expected TimeoutMs %d, got %d", DefaultTimeoutMs, config.TimeoutMs)
|
||||
}
|
||||
}
|
||||
|
||||
func TestClickHouseSink_Write_BufferOverflow(t *testing.T) {
|
||||
// This test verifies the buffer overflow logic without actually connecting
|
||||
config := Config{
|
||||
DSN: "clickhouse://test:test@localhost:9000/test",
|
||||
Table: "test_table",
|
||||
BatchSize: 10,
|
||||
MaxBufferSize: 10,
|
||||
DropOnOverflow: true,
|
||||
TimeoutMs: 100,
|
||||
FlushIntervalMs: 1000,
|
||||
}
|
||||
|
||||
// We can't test actual writes without a ClickHouse instance,
|
||||
// but we can verify the config is valid
|
||||
if config.BatchSize > config.MaxBufferSize {
|
||||
t.Error("BatchSize should not exceed MaxBufferSize")
|
||||
}
|
||||
}
|
||||
|
||||
func TestClickHouseSink_IsRetryableError(t *testing.T) {
|
||||
tests := []struct {
|
||||
name string
|
||||
err error
|
||||
expected bool
|
||||
}{
|
||||
{"nil error", nil, false},
|
||||
{"connection refused", &mockError{"connection refused"}, true},
|
||||
{"connection reset", &mockError{"connection reset by peer"}, true},
|
||||
{"timeout", &mockError{"timeout waiting for response"}, true},
|
||||
{"network unreachable", &mockError{"network is unreachable"}, true},
|
||||
{"broken pipe", &mockError{"broken pipe"}, true},
|
||||
{"syntax error", &mockError{"syntax error in SQL"}, false},
|
||||
{"table not found", &mockError{"table test not found"}, false},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
result := isRetryableError(tt.err)
|
||||
if result != tt.expected {
|
||||
t.Errorf("expected %v, got %v", tt.expected, result)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestClickHouseSink_FlushEmpty(t *testing.T) {
|
||||
// Test that flushing an empty buffer doesn't cause issues
|
||||
// (We can't test actual ClickHouse operations without a real instance)
|
||||
|
||||
s := &ClickHouseSink{
|
||||
config: Config{
|
||||
DSN: "clickhouse://test:test@localhost:9000/test",
|
||||
Table: "test_table",
|
||||
},
|
||||
buffer: make([]domain.CorrelatedLog, 0),
|
||||
}
|
||||
|
||||
// Should not panic or error on empty flush
|
||||
ctx := context.Background()
|
||||
err := s.Flush(ctx)
|
||||
if err != nil {
|
||||
t.Errorf("expected no error on empty flush, got %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestClickHouseSink_CloseWithoutConnect(t *testing.T) {
|
||||
// Test that closing without connecting doesn't panic
|
||||
s := &ClickHouseSink{
|
||||
config: Config{
|
||||
DSN: "clickhouse://test:test@localhost:9000/test",
|
||||
Table: "test_table",
|
||||
},
|
||||
buffer: make([]domain.CorrelatedLog, 0),
|
||||
done: make(chan struct{}),
|
||||
}
|
||||
|
||||
err := s.Close()
|
||||
if err != nil {
|
||||
t.Errorf("expected no error on close without connect, got %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestClickHouseSink_Constants(t *testing.T) {
|
||||
// Verify constants have reasonable values
|
||||
if DefaultBatchSize <= 0 {
|
||||
t.Error("DefaultBatchSize should be positive")
|
||||
}
|
||||
if DefaultFlushIntervalMs <= 0 {
|
||||
t.Error("DefaultFlushIntervalMs should be positive")
|
||||
}
|
||||
if DefaultMaxBufferSize <= 0 {
|
||||
t.Error("DefaultMaxBufferSize should be positive")
|
||||
}
|
||||
if DefaultTimeoutMs <= 0 {
|
||||
t.Error("DefaultTimeoutMs should be positive")
|
||||
}
|
||||
if DefaultPingTimeoutMs <= 0 {
|
||||
t.Error("DefaultPingTimeoutMs should be positive")
|
||||
}
|
||||
if MaxRetries <= 0 {
|
||||
t.Error("MaxRetries should be positive")
|
||||
}
|
||||
if RetryBaseDelay <= 0 {
|
||||
t.Error("RetryBaseDelay should be positive")
|
||||
}
|
||||
}
|
||||
|
||||
// mockError implements error for testing
|
||||
type mockError struct {
|
||||
msg string
|
||||
}
|
||||
|
||||
func (e *mockError) Error() string {
|
||||
return e.msg
|
||||
}
|
||||
|
||||
// Test the doFlush function with empty buffer (no actual DB connection)
|
||||
func TestClickHouseSink_DoFlushEmpty(t *testing.T) {
|
||||
s := &ClickHouseSink{
|
||||
config: Config{
|
||||
DSN: "clickhouse://test:test@localhost:9000/test",
|
||||
Table: "test_table",
|
||||
},
|
||||
buffer: make([]domain.CorrelatedLog, 0),
|
||||
}
|
||||
|
||||
ctx := context.Background()
|
||||
err := s.doFlush(ctx)
|
||||
if err != nil {
|
||||
t.Errorf("expected no error when flushing empty buffer, got %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
// Test that buffer is properly managed (without actual DB operations)
|
||||
func TestClickHouseSink_BufferManagement(t *testing.T) {
|
||||
log := domain.CorrelatedLog{
|
||||
SrcIP: "192.168.1.1",
|
||||
SrcPort: 8080,
|
||||
Correlated: true,
|
||||
}
|
||||
|
||||
s := &ClickHouseSink{
|
||||
config: Config{
|
||||
DSN: "clickhouse://test:test@localhost:9000/test",
|
||||
Table: "test_table",
|
||||
MaxBufferSize: 100, // Allow more than 1 element
|
||||
DropOnOverflow: false,
|
||||
TimeoutMs: 1000,
|
||||
},
|
||||
buffer: []domain.CorrelatedLog{log},
|
||||
}
|
||||
|
||||
// Verify buffer has data
|
||||
if len(s.buffer) != 1 {
|
||||
t.Fatalf("expected buffer length 1, got %d", len(s.buffer))
|
||||
}
|
||||
|
||||
// Test that Write properly adds to buffer
|
||||
ctx := context.Background()
|
||||
err := s.Write(ctx, log)
|
||||
if err != nil {
|
||||
t.Errorf("unexpected error on Write: %v", err)
|
||||
}
|
||||
|
||||
if len(s.buffer) != 2 {
|
||||
t.Errorf("expected buffer length 2 after Write, got %d", len(s.buffer))
|
||||
}
|
||||
}
|
||||
|
||||
// Test Write with context cancellation
|
||||
func TestClickHouseSink_Write_ContextCancel(t *testing.T) {
|
||||
s := &ClickHouseSink{
|
||||
config: Config{
|
||||
DSN: "clickhouse://test:test@localhost:9000/test",
|
||||
Table: "test_table",
|
||||
MaxBufferSize: 1,
|
||||
DropOnOverflow: false,
|
||||
TimeoutMs: 10,
|
||||
},
|
||||
buffer: make([]domain.CorrelatedLog, 0, 1),
|
||||
}
|
||||
|
||||
// Fill the buffer
|
||||
log := domain.CorrelatedLog{SrcIP: "192.168.1.1", SrcPort: 8080}
|
||||
s.buffer = append(s.buffer, log)
|
||||
|
||||
// Try to write with cancelled context
|
||||
ctx, cancel := context.WithCancel(context.Background())
|
||||
cancel() // Cancel immediately
|
||||
|
||||
err := s.Write(ctx, log)
|
||||
if err == nil {
|
||||
t.Error("expected error when writing with cancelled context")
|
||||
}
|
||||
}
|
||||
|
||||
// Test DropOnOverflow behavior
|
||||
func TestClickHouseSink_Write_DropOnOverflow(t *testing.T) {
|
||||
s := &ClickHouseSink{
|
||||
config: Config{
|
||||
DSN: "clickhouse://test:test@localhost:9000/test",
|
||||
Table: "test_table",
|
||||
MaxBufferSize: 1,
|
||||
DropOnOverflow: true,
|
||||
TimeoutMs: 10,
|
||||
},
|
||||
buffer: make([]domain.CorrelatedLog, 0, 1),
|
||||
}
|
||||
|
||||
// Fill the buffer
|
||||
log := domain.CorrelatedLog{SrcIP: "192.168.1.1", SrcPort: 8080}
|
||||
s.buffer = append(s.buffer, log)
|
||||
|
||||
// Try to write when buffer is full - should drop silently
|
||||
ctx := context.Background()
|
||||
err := s.Write(ctx, log)
|
||||
if err != nil {
|
||||
t.Errorf("expected no error when DropOnOverflow is true, got %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
// TestIsRetryableError_ContextDeadlineExceeded tests context.DeadlineExceeded is retryable.
|
||||
func TestIsRetryableError_ContextDeadlineExceeded(t *testing.T) {
|
||||
if !isRetryableError(context.DeadlineExceeded) {
|
||||
t.Error("context.DeadlineExceeded should be retryable")
|
||||
}
|
||||
}
|
||||
|
||||
// TestIsRetryableError_ContextCanceled tests context.Canceled is NOT retryable.
|
||||
func TestIsRetryableError_ContextCanceled(t *testing.T) {
|
||||
if isRetryableError(context.Canceled) {
|
||||
t.Error("context.Canceled should not be retryable")
|
||||
}
|
||||
}
|
||||
|
||||
// TestIsRetryableError_NetTimeout tests net.Error with Timeout() = true is retryable.
|
||||
func TestIsRetryableError_NetTimeout(t *testing.T) {
|
||||
err := &mockNetError{timeout: true, temporary: false}
|
||||
if !isRetryableError(err) {
|
||||
t.Error("net.Error with Timeout()=true should be retryable")
|
||||
}
|
||||
}
|
||||
|
||||
// TestIsRetryableError_NetNoTimeout tests net.Error with Timeout() = false is NOT retryable.
|
||||
func TestIsRetryableError_NetNoTimeout(t *testing.T) {
|
||||
err := &mockNetError{timeout: false, temporary: false}
|
||||
if isRetryableError(err) {
|
||||
t.Error("net.Error with Timeout()=false should not be retryable (unless msg matches)")
|
||||
}
|
||||
}
|
||||
|
||||
// TestIsRetryableError_UnknownTable tests "unknown table" is NOT retryable.
|
||||
func TestIsRetryableError_UnknownTable(t *testing.T) {
|
||||
if isRetryableError(&mockError{"unknown table users"}) {
|
||||
t.Error("unknown table error should not be retryable")
|
||||
}
|
||||
}
|
||||
|
||||
// TestIsRetryableError_UnknownColumn tests "unknown column" is NOT retryable.
|
||||
func TestIsRetryableError_UnknownColumn(t *testing.T) {
|
||||
if isRetryableError(&mockError{"unknown column foo"}) {
|
||||
t.Error("unknown column error should not be retryable")
|
||||
}
|
||||
}
|
||||
|
||||
// TestIsRetryableError_RandomError tests a random error is NOT retryable.
|
||||
func TestIsRetryableError_RandomError(t *testing.T) {
|
||||
if isRetryableError(&mockError{"some random unrecognized error"}) {
|
||||
t.Error("random error should not be retryable")
|
||||
}
|
||||
}
|
||||
|
||||
// TestIsRetryableError_NoRouteToHost tests "no route to host" is retryable.
|
||||
func TestIsRetryableError_NoRouteToHost(t *testing.T) {
|
||||
if !isRetryableError(&mockError{"no route to host"}) {
|
||||
t.Error("'no route to host' should be retryable")
|
||||
}
|
||||
}
|
||||
|
||||
// TestIsRetryableError_TemporaryFailure tests "temporary failure" is retryable.
|
||||
func TestIsRetryableError_TemporaryFailure(t *testing.T) {
|
||||
if !isRetryableError(&mockError{"temporary failure in name resolution"}) {
|
||||
t.Error("'temporary failure' should be retryable")
|
||||
}
|
||||
}
|
||||
|
||||
// mockNetError implements net.Error for testing.
|
||||
type mockNetError struct {
|
||||
timeout bool
|
||||
temporary bool
|
||||
msg string
|
||||
}
|
||||
|
||||
func (e *mockNetError) Error() string { return e.msg }
|
||||
func (e *mockNetError) Timeout() bool { return e.timeout }
|
||||
func (e *mockNetError) Temporary() bool { return e.temporary }
|
||||
|
||||
// TestNewClickHouseSink_EmptyDSN tests that empty DSN returns error.
|
||||
func TestNewClickHouseSink_EmptyDSN(t *testing.T) {
|
||||
_, err := NewClickHouseSink(Config{
|
||||
DSN: "",
|
||||
Table: "test_table",
|
||||
})
|
||||
if err == nil {
|
||||
t.Error("expected error for empty DSN")
|
||||
}
|
||||
}
|
||||
|
||||
// TestNewClickHouseSink_WhitespaceDSN tests that whitespace DSN returns error.
|
||||
func TestNewClickHouseSink_WhitespaceDSN(t *testing.T) {
|
||||
_, err := NewClickHouseSink(Config{
|
||||
DSN: " ",
|
||||
Table: "test_table",
|
||||
})
|
||||
if err == nil {
|
||||
t.Error("expected error for whitespace-only DSN")
|
||||
}
|
||||
}
|
||||
|
||||
// TestNewClickHouseSink_EmptyTable tests that empty Table returns error.
|
||||
func TestNewClickHouseSink_EmptyTable(t *testing.T) {
|
||||
_, err := NewClickHouseSink(Config{
|
||||
DSN: "clickhouse://localhost:9000/test",
|
||||
Table: "",
|
||||
})
|
||||
if err == nil {
|
||||
t.Error("expected error for empty Table")
|
||||
}
|
||||
}
|
||||
|
||||
// TestNewClickHouseSink_WhitespaceTable tests that whitespace Table returns error.
|
||||
func TestNewClickHouseSink_WhitespaceTable(t *testing.T) {
|
||||
_, err := NewClickHouseSink(Config{
|
||||
DSN: "clickhouse://localhost:9000/test",
|
||||
Table: " ",
|
||||
})
|
||||
if err == nil {
|
||||
t.Error("expected error for whitespace-only Table")
|
||||
}
|
||||
}
|
||||
|
||||
// TestNewClickHouseSink_InvalidDSN tests that an invalid DSN (no real connection) returns error.
|
||||
func TestNewClickHouseSink_InvalidDSN(t *testing.T) {
|
||||
_, err := NewClickHouseSink(Config{
|
||||
DSN: "not-a-valid-dsn",
|
||||
Table: "test_table",
|
||||
})
|
||||
if err == nil {
|
||||
t.Error("expected error for invalid DSN")
|
||||
}
|
||||
}
|
||||
|
||||
// TestClickHouseSink_SetLogger tests that SetLogger sets a logger.
|
||||
func TestClickHouseSink_SetLogger(t *testing.T) {
|
||||
s := &ClickHouseSink{
|
||||
config: Config{Table: "test_table"},
|
||||
buffer: make([]domain.CorrelatedLog, 0),
|
||||
}
|
||||
|
||||
testLogger := observability.NewLogger("test")
|
||||
s.SetLogger(testLogger)
|
||||
|
||||
if s.logger == nil {
|
||||
t.Error("expected logger to be set")
|
||||
}
|
||||
}
|
||||
|
||||
// TestClickHouseSink_LogNilLogger tests that log() returns a logger even when s.logger is nil.
|
||||
func TestClickHouseSink_LogNilLogger(t *testing.T) {
|
||||
s := &ClickHouseSink{
|
||||
config: Config{Table: "test_table"},
|
||||
buffer: make([]domain.CorrelatedLog, 0),
|
||||
}
|
||||
s.logger = nil
|
||||
|
||||
// log() should auto-initialize
|
||||
logger := s.log()
|
||||
if logger == nil {
|
||||
t.Error("expected non-nil logger from log()")
|
||||
}
|
||||
}
|
||||
|
||||
// TestClickHouseSink_Reopen tests that Reopen is a no-op and returns nil.
|
||||
func TestClickHouseSink_Reopen(t *testing.T) {
|
||||
s := &ClickHouseSink{
|
||||
config: Config{Table: "test_table"},
|
||||
buffer: make([]domain.CorrelatedLog, 0),
|
||||
}
|
||||
if err := s.Reopen(); err != nil {
|
||||
t.Errorf("Reopen() should return nil, got: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
// TestClickHouseSink_DoFlushNilConn tests doFlush returns error when conn is nil and buffer non-empty.
|
||||
func TestClickHouseSink_DoFlushNilConn(t *testing.T) {
|
||||
log := domain.CorrelatedLog{SrcIP: "1.2.3.4", SrcPort: 1234}
|
||||
s := &ClickHouseSink{
|
||||
config: Config{
|
||||
Table: "test_table",
|
||||
BatchSize: DefaultBatchSize,
|
||||
},
|
||||
buffer: []domain.CorrelatedLog{log},
|
||||
conn: nil,
|
||||
}
|
||||
|
||||
err := s.doFlush(context.Background())
|
||||
if err == nil {
|
||||
t.Error("expected error from doFlush when conn is nil")
|
||||
}
|
||||
}
|
||||
|
||||
// TestClickHouseSink_CloseTwice tests that calling Close() twice does not panic or error.
|
||||
func TestClickHouseSink_CloseTwice(t *testing.T) {
|
||||
s := &ClickHouseSink{
|
||||
config: Config{
|
||||
Table: "test_table",
|
||||
TimeoutMs: DefaultTimeoutMs,
|
||||
},
|
||||
buffer: make([]domain.CorrelatedLog, 0),
|
||||
done: make(chan struct{}),
|
||||
}
|
||||
|
||||
if err := s.Close(); err != nil {
|
||||
t.Errorf("first Close() should not error, got: %v", err)
|
||||
}
|
||||
if err := s.Close(); err != nil {
|
||||
t.Errorf("second Close() should not error (closeOnce), got: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
// TestClickHouseSink_WriteTimeout tests that Write returns error when buffer is full and timeout exceeded.
|
||||
func TestClickHouseSink_Write_Timeout(t *testing.T) {
|
||||
s := &ClickHouseSink{
|
||||
config: Config{
|
||||
Table: "test_table",
|
||||
MaxBufferSize: 1,
|
||||
DropOnOverflow: false,
|
||||
TimeoutMs: 1, // 1ms timeout
|
||||
},
|
||||
buffer: make([]domain.CorrelatedLog, 0, 1),
|
||||
}
|
||||
|
||||
log := domain.CorrelatedLog{SrcIP: "1.2.3.4", SrcPort: 1234}
|
||||
// Fill the buffer
|
||||
s.buffer = append(s.buffer, log)
|
||||
|
||||
ctx := context.Background()
|
||||
err := s.Write(ctx, log)
|
||||
if err == nil {
|
||||
t.Error("expected error when buffer full and timeout exceeded")
|
||||
}
|
||||
}
|
||||
|
||||
// Benchmark Write operation (without actual DB)
|
||||
func BenchmarkClickHouseSink_Write(b *testing.B) {
|
||||
s := &ClickHouseSink{
|
||||
config: Config{
|
||||
DSN: "clickhouse://test:test@localhost:9000/test",
|
||||
Table: "test_table",
|
||||
MaxBufferSize: 10000,
|
||||
DropOnOverflow: true,
|
||||
},
|
||||
buffer: make([]domain.CorrelatedLog, 0, 10000),
|
||||
}
|
||||
|
||||
log := domain.CorrelatedLog{
|
||||
Timestamp: time.Now(),
|
||||
SrcIP: "192.168.1.1",
|
||||
SrcPort: 8080,
|
||||
Correlated: true,
|
||||
}
|
||||
|
||||
ctx := context.Background()
|
||||
b.ResetTimer()
|
||||
for i := 0; i < b.N; i++ {
|
||||
s.Write(ctx, log)
|
||||
}
|
||||
}
|
||||
191
services/correlator/internal/adapters/outbound/file/sink.go
Normal file
191
services/correlator/internal/adapters/outbound/file/sink.go
Normal file
@ -0,0 +1,191 @@
|
||||
package file
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"sync"
|
||||
|
||||
"github.com/antitbone/ja4/correlator/internal/domain"
|
||||
)
|
||||
|
||||
const (
|
||||
// DefaultFilePermissions for output files
|
||||
DefaultFilePermissions os.FileMode = 0644
|
||||
// DefaultDirPermissions for output directories
|
||||
DefaultDirPermissions os.FileMode = 0750
|
||||
)
|
||||
|
||||
// Config holds the file sink configuration.
|
||||
type Config struct {
|
||||
Path string
|
||||
}
|
||||
|
||||
// FileSink writes correlated logs to a file as JSON lines.
|
||||
type FileSink struct {
|
||||
config Config
|
||||
mu sync.Mutex
|
||||
file *os.File
|
||||
}
|
||||
|
||||
// NewFileSink creates a new file sink.
|
||||
func NewFileSink(config Config) (*FileSink, error) {
|
||||
// Validate path
|
||||
if err := validateFilePath(config.Path); err != nil {
|
||||
return nil, fmt.Errorf("invalid file path: %w", err)
|
||||
}
|
||||
|
||||
s := &FileSink{
|
||||
config: config,
|
||||
}
|
||||
|
||||
// Open file on creation
|
||||
if err := s.openFile(); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
return s, nil
|
||||
}
|
||||
|
||||
// Name returns the sink name.
|
||||
func (s *FileSink) Name() string {
|
||||
return "file"
|
||||
}
|
||||
|
||||
// Reopen closes and reopens the file (for log rotation on SIGHUP).
|
||||
func (s *FileSink) Reopen() error {
|
||||
s.mu.Lock()
|
||||
defer s.mu.Unlock()
|
||||
|
||||
if s.file != nil {
|
||||
if err := s.file.Close(); err != nil {
|
||||
return fmt.Errorf("failed to close file: %w", err)
|
||||
}
|
||||
}
|
||||
|
||||
return s.openFile()
|
||||
}
|
||||
|
||||
// Write writes a correlated log to the file.
|
||||
func (s *FileSink) Write(ctx context.Context, log domain.CorrelatedLog) error {
|
||||
s.mu.Lock()
|
||||
defer s.mu.Unlock()
|
||||
|
||||
if s.file == nil {
|
||||
if err := s.openFile(); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
|
||||
data, err := json.Marshal(log)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to marshal log: %w", err)
|
||||
}
|
||||
|
||||
line := append(data, '\n')
|
||||
if _, err := s.file.Write(line); err != nil {
|
||||
return fmt.Errorf("failed to write log line: %w", err)
|
||||
}
|
||||
if err := s.file.Sync(); err != nil {
|
||||
return fmt.Errorf("failed to sync log line: %w", err)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// Flush flushes any buffered data.
|
||||
func (s *FileSink) Flush(ctx context.Context) error {
|
||||
s.mu.Lock()
|
||||
defer s.mu.Unlock()
|
||||
|
||||
if s.file != nil {
|
||||
return s.file.Sync()
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// Close closes the sink.
|
||||
func (s *FileSink) Close() error {
|
||||
s.mu.Lock()
|
||||
defer s.mu.Unlock()
|
||||
|
||||
if s.file != nil {
|
||||
err := s.file.Close()
|
||||
s.file = nil
|
||||
return err
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func (s *FileSink) openFile() error {
|
||||
// Validate path again before opening
|
||||
if err := validateFilePath(s.config.Path); err != nil {
|
||||
return fmt.Errorf("invalid file path: %w", err)
|
||||
}
|
||||
|
||||
// Ensure directory exists
|
||||
dir := filepath.Dir(s.config.Path)
|
||||
if err := os.MkdirAll(dir, DefaultDirPermissions); err != nil {
|
||||
return fmt.Errorf("failed to create directory: %w", err)
|
||||
}
|
||||
|
||||
file, err := os.OpenFile(s.config.Path, os.O_APPEND|os.O_CREATE|os.O_WRONLY, DefaultFilePermissions)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to open file: %w", err)
|
||||
}
|
||||
|
||||
s.file = file
|
||||
return nil
|
||||
}
|
||||
|
||||
// validateFilePath validates that the file path is safe and allowed.
|
||||
func validateFilePath(path string) error {
|
||||
if strings.TrimSpace(path) == "" {
|
||||
return fmt.Errorf("path cannot be empty")
|
||||
}
|
||||
|
||||
cleanPath := filepath.Clean(path)
|
||||
|
||||
// Allow relative paths for testing/dev
|
||||
if !filepath.IsAbs(cleanPath) {
|
||||
return nil
|
||||
}
|
||||
|
||||
absPath, err := filepath.Abs(cleanPath)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to resolve absolute path: %w", err)
|
||||
}
|
||||
|
||||
allowedRoots := []string{
|
||||
"/var/log/logcorrelator",
|
||||
"/var/log",
|
||||
"/tmp",
|
||||
}
|
||||
|
||||
for _, root := range allowedRoots {
|
||||
absRoot, err := filepath.Abs(filepath.Clean(root))
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
|
||||
rel, err := filepath.Rel(absRoot, absPath)
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
|
||||
if rel == "." {
|
||||
return nil
|
||||
}
|
||||
if rel == ".." {
|
||||
continue
|
||||
}
|
||||
if !strings.HasPrefix(rel, ".."+string(os.PathSeparator)) {
|
||||
return nil
|
||||
}
|
||||
}
|
||||
|
||||
return fmt.Errorf("path must be under allowed directories: %v", allowedRoots)
|
||||
}
|
||||
524
services/correlator/internal/adapters/outbound/file/sink_test.go
Normal file
524
services/correlator/internal/adapters/outbound/file/sink_test.go
Normal file
@ -0,0 +1,524 @@
|
||||
package file
|
||||
|
||||
import (
|
||||
"context"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"testing"
|
||||
|
||||
"github.com/antitbone/ja4/correlator/internal/domain"
|
||||
)
|
||||
|
||||
func TestFileSink_Write(t *testing.T) {
|
||||
tmpDir := t.TempDir()
|
||||
testPath := filepath.Join(tmpDir, "test.log")
|
||||
|
||||
sink, err := NewFileSink(Config{Path: testPath})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create sink: %v", err)
|
||||
}
|
||||
defer sink.Close()
|
||||
|
||||
log := domain.CorrelatedLog{
|
||||
SrcIP: "192.168.1.1",
|
||||
SrcPort: 8080,
|
||||
Correlated: true,
|
||||
}
|
||||
|
||||
if err := sink.Write(context.Background(), log); err != nil {
|
||||
t.Fatalf("failed to write: %v", err)
|
||||
}
|
||||
|
||||
if err := sink.Flush(context.Background()); err != nil {
|
||||
t.Fatalf("failed to flush: %v", err)
|
||||
}
|
||||
|
||||
// Verify file exists and contains data
|
||||
data, err := os.ReadFile(testPath)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to read file: %v", err)
|
||||
}
|
||||
|
||||
if len(data) == 0 {
|
||||
t.Error("expected non-empty file")
|
||||
}
|
||||
}
|
||||
|
||||
func TestFileSink_WriteImmediatePersist_NoFlushNeeded(t *testing.T) {
|
||||
tmpDir := t.TempDir()
|
||||
testPath := filepath.Join(tmpDir, "test.log")
|
||||
|
||||
sink, err := NewFileSink(Config{Path: testPath})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create sink: %v", err)
|
||||
}
|
||||
defer sink.Close()
|
||||
|
||||
log := domain.CorrelatedLog{
|
||||
SrcIP: "192.168.1.1",
|
||||
SrcPort: 8080,
|
||||
Correlated: true,
|
||||
}
|
||||
|
||||
if err := sink.Write(context.Background(), log); err != nil {
|
||||
t.Fatalf("failed to write: %v", err)
|
||||
}
|
||||
|
||||
// Must be visible immediately without Flush()
|
||||
data, err := os.ReadFile(testPath)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to read file: %v", err)
|
||||
}
|
||||
if len(data) == 0 {
|
||||
t.Error("expected data to be present immediately after Write without Flush")
|
||||
}
|
||||
}
|
||||
|
||||
func TestFileSink_MultipleWrites(t *testing.T) {
|
||||
tmpDir := t.TempDir()
|
||||
testPath := filepath.Join(tmpDir, "test.log")
|
||||
|
||||
sink, err := NewFileSink(Config{Path: testPath})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create sink: %v", err)
|
||||
}
|
||||
defer sink.Close()
|
||||
|
||||
for i := 0; i < 5; i++ {
|
||||
log := domain.CorrelatedLog{
|
||||
SrcIP: "192.168.1.1",
|
||||
SrcPort: 8080 + i,
|
||||
}
|
||||
if err := sink.Write(context.Background(), log); err != nil {
|
||||
t.Fatalf("failed to write: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
sink.Close()
|
||||
|
||||
// Verify file has 5 lines
|
||||
data, err := os.ReadFile(testPath)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to read file: %v", err)
|
||||
}
|
||||
|
||||
lines := 0
|
||||
for _, b := range data {
|
||||
if b == '\n' {
|
||||
lines++
|
||||
}
|
||||
}
|
||||
|
||||
if lines != 5 {
|
||||
t.Errorf("expected 5 lines, got %d", lines)
|
||||
}
|
||||
}
|
||||
|
||||
func TestFileSink_Name(t *testing.T) {
|
||||
sink, err := NewFileSink(Config{Path: "/tmp/test.log"})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create sink: %v", err)
|
||||
}
|
||||
|
||||
if sink.Name() != "file" {
|
||||
t.Errorf("expected name 'file', got %s", sink.Name())
|
||||
}
|
||||
}
|
||||
|
||||
func TestFileSink_ValidateFilePath(t *testing.T) {
|
||||
tests := []struct {
|
||||
name string
|
||||
path string
|
||||
wantErr bool
|
||||
}{
|
||||
{"empty path", "", true},
|
||||
{"valid /var/log/logcorrelator", "/var/log/logcorrelator/test.log", false},
|
||||
{"valid /var/log", "/var/log/test.log", false},
|
||||
{"valid /tmp", "/tmp/test.log", false},
|
||||
{"reject lookalike /var/logevil", "/var/logevil/test.log", true},
|
||||
{"invalid directory", "/etc/logcorrelator/test.log", true},
|
||||
{"relative path", "test.log", false}, // Allowed for testing
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
err := validateFilePath(tt.path)
|
||||
if (err != nil) != tt.wantErr {
|
||||
t.Errorf("validateFilePath(%q) error = %v, wantErr %v", tt.path, err, tt.wantErr)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestFileSink_OpenFile(t *testing.T) {
|
||||
tmpDir := t.TempDir()
|
||||
testPath := filepath.Join(tmpDir, "subdir", "test.log")
|
||||
|
||||
sink := &FileSink{
|
||||
config: Config{Path: testPath},
|
||||
}
|
||||
|
||||
err := sink.openFile()
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected error: %v", err)
|
||||
}
|
||||
defer sink.Close()
|
||||
|
||||
if sink.file == nil {
|
||||
t.Error("expected file to be opened")
|
||||
}
|
||||
}
|
||||
|
||||
func TestFileSink_WriteBeforeOpen(t *testing.T) {
|
||||
tmpDir := t.TempDir()
|
||||
testPath := filepath.Join(tmpDir, "test.log")
|
||||
|
||||
sink, err := NewFileSink(Config{Path: testPath})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create sink: %v", err)
|
||||
}
|
||||
defer sink.Close()
|
||||
|
||||
// Write should open file automatically
|
||||
log := domain.CorrelatedLog{SrcIP: "192.168.1.1", SrcPort: 8080}
|
||||
err = sink.Write(context.Background(), log)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to write: %v", err)
|
||||
}
|
||||
|
||||
// Verify file was created
|
||||
if _, err := os.Stat(testPath); os.IsNotExist(err) {
|
||||
t.Error("expected file to be created")
|
||||
}
|
||||
}
|
||||
|
||||
func TestFileSink_FlushBeforeOpen(t *testing.T) {
|
||||
tmpDir := t.TempDir()
|
||||
testPath := filepath.Join(tmpDir, "test.log")
|
||||
|
||||
sink, err := NewFileSink(Config{Path: testPath})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create sink: %v", err)
|
||||
}
|
||||
defer sink.Close()
|
||||
|
||||
// Flush before any write should not error
|
||||
err = sink.Flush(context.Background())
|
||||
if err != nil {
|
||||
t.Errorf("expected no error on flush before open, got %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestFileSink_InvalidPath(t *testing.T) {
|
||||
// Test with invalid path (outside allowed directories)
|
||||
_, err := NewFileSink(Config{Path: "/etc/../passwd"})
|
||||
if err == nil {
|
||||
t.Error("expected error for invalid path")
|
||||
}
|
||||
}
|
||||
|
||||
func TestFileSink_Reopen(t *testing.T) {
|
||||
tmpDir := t.TempDir()
|
||||
testPath := filepath.Join(tmpDir, "test.log")
|
||||
|
||||
sink, err := NewFileSink(Config{Path: testPath})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create sink: %v", err)
|
||||
}
|
||||
|
||||
// Write initial data
|
||||
log := domain.CorrelatedLog{SrcIP: "192.168.1.1", SrcPort: 8080}
|
||||
if err := sink.Write(context.Background(), log); err != nil {
|
||||
t.Fatalf("failed to write: %v", err)
|
||||
}
|
||||
|
||||
// Reopen should close and reopen the file
|
||||
err = sink.Reopen()
|
||||
if err != nil {
|
||||
t.Errorf("expected no error on Reopen, got %v", err)
|
||||
}
|
||||
|
||||
// Write after reopen
|
||||
log2 := domain.CorrelatedLog{SrcIP: "192.168.1.2", SrcPort: 8081}
|
||||
if err := sink.Write(context.Background(), log2); err != nil {
|
||||
t.Fatalf("failed to write after reopen: %v", err)
|
||||
}
|
||||
|
||||
sink.Close()
|
||||
|
||||
// Verify both writes are present
|
||||
data, err := os.ReadFile(testPath)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to read file: %v", err)
|
||||
}
|
||||
|
||||
lines := 0
|
||||
for _, b := range data {
|
||||
if b == '\n' {
|
||||
lines++
|
||||
}
|
||||
}
|
||||
|
||||
if lines != 2 {
|
||||
t.Errorf("expected 2 lines after reopen, got %d", lines)
|
||||
}
|
||||
}
|
||||
|
||||
func TestFileSink_Close(t *testing.T) {
|
||||
tmpDir := t.TempDir()
|
||||
testPath := filepath.Join(tmpDir, "test.log")
|
||||
|
||||
sink, err := NewFileSink(Config{Path: testPath})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create sink: %v", err)
|
||||
}
|
||||
|
||||
// Close should succeed
|
||||
err = sink.Close()
|
||||
if err != nil {
|
||||
t.Errorf("expected no error on Close, got %v", err)
|
||||
}
|
||||
|
||||
// Write after close should fail or reopen
|
||||
log := domain.CorrelatedLog{SrcIP: "192.168.1.1", SrcPort: 8080}
|
||||
err = sink.Write(context.Background(), log)
|
||||
if err != nil {
|
||||
// Expected - file was closed
|
||||
t.Logf("write after close returned error (expected): %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestFileSink_EmptyPath(t *testing.T) {
|
||||
_, err := NewFileSink(Config{Path: ""})
|
||||
if err == nil {
|
||||
t.Error("expected error for empty path")
|
||||
}
|
||||
}
|
||||
|
||||
func TestFileSink_WhitespacePath(t *testing.T) {
|
||||
_, err := NewFileSink(Config{Path: " "})
|
||||
if err == nil {
|
||||
t.Error("expected error for whitespace-only path")
|
||||
}
|
||||
}
|
||||
|
||||
func TestFileSink_ValidateFilePath_AllowedRoots(t *testing.T) {
|
||||
// Test paths under allowed roots
|
||||
allowedPaths := []string{
|
||||
"/var/log/logcorrelator/correlated.log",
|
||||
"/var/log/test.log",
|
||||
"/tmp/test.log",
|
||||
"/tmp/subdir/test.log",
|
||||
"relative/path/test.log",
|
||||
"./test.log",
|
||||
}
|
||||
|
||||
for _, path := range allowedPaths {
|
||||
err := validateFilePath(path)
|
||||
if err != nil {
|
||||
t.Errorf("validateFilePath(%q) unexpected error: %v", path, err)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestFileSink_ValidateFilePath_RejectedPaths(t *testing.T) {
|
||||
// Test paths that should be rejected
|
||||
rejectedPaths := []string{
|
||||
"",
|
||||
" ",
|
||||
"/etc/passwd",
|
||||
"/etc/logcorrelator/test.log",
|
||||
"/root/test.log",
|
||||
"/home/user/test.log",
|
||||
"/var/logevil/test.log",
|
||||
}
|
||||
|
||||
for _, path := range rejectedPaths {
|
||||
err := validateFilePath(path)
|
||||
if err == nil {
|
||||
t.Errorf("validateFilePath(%q) should have been rejected", path)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestFileSink_ConcurrentWrites(t *testing.T) {
|
||||
tmpDir := t.TempDir()
|
||||
testPath := filepath.Join(tmpDir, "test.log")
|
||||
|
||||
sink, err := NewFileSink(Config{Path: testPath})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create sink: %v", err)
|
||||
}
|
||||
defer sink.Close()
|
||||
|
||||
done := make(chan bool)
|
||||
for i := 0; i < 10; i++ {
|
||||
go func(n int) {
|
||||
log := domain.CorrelatedLog{SrcIP: "192.168.1.1", SrcPort: 8080 + n}
|
||||
sink.Write(context.Background(), log)
|
||||
done <- true
|
||||
}(i)
|
||||
}
|
||||
|
||||
for i := 0; i < 10; i++ {
|
||||
<-done
|
||||
}
|
||||
|
||||
// Verify all writes completed
|
||||
data, err := os.ReadFile(testPath)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to read file: %v", err)
|
||||
}
|
||||
|
||||
lines := 0
|
||||
for _, b := range data {
|
||||
if b == '\n' {
|
||||
lines++
|
||||
}
|
||||
}
|
||||
|
||||
if lines != 10 {
|
||||
t.Errorf("expected 10 lines from concurrent writes, got %d", lines)
|
||||
}
|
||||
}
|
||||
|
||||
func TestFileSink_Flush(t *testing.T) {
|
||||
tmpDir := t.TempDir()
|
||||
testPath := filepath.Join(tmpDir, "test.log")
|
||||
|
||||
sink, err := NewFileSink(Config{Path: testPath})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create sink: %v", err)
|
||||
}
|
||||
defer sink.Close()
|
||||
|
||||
log := domain.CorrelatedLog{SrcIP: "192.168.1.1", SrcPort: 8080}
|
||||
if err := sink.Write(context.Background(), log); err != nil {
|
||||
t.Fatalf("failed to write: %v", err)
|
||||
}
|
||||
|
||||
// Flush should succeed
|
||||
err = sink.Flush(context.Background())
|
||||
if err != nil {
|
||||
t.Errorf("expected no error on Flush, got %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestFileSink_MarshalError(t *testing.T) {
|
||||
tmpDir := t.TempDir()
|
||||
testPath := filepath.Join(tmpDir, "test.log")
|
||||
|
||||
sink, err := NewFileSink(Config{Path: testPath})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create sink: %v", err)
|
||||
}
|
||||
defer sink.Close()
|
||||
|
||||
// Create a log with unmarshalable data (channel)
|
||||
log := domain.CorrelatedLog{
|
||||
SrcIP: "192.168.1.1",
|
||||
SrcPort: 8080,
|
||||
Fields: map[string]any{"chan": make(chan int)},
|
||||
}
|
||||
|
||||
err = sink.Write(context.Background(), log)
|
||||
if err == nil {
|
||||
t.Error("expected error when marshaling unmarshalable data")
|
||||
}
|
||||
}
|
||||
|
||||
// TestFileSink_CloseTwice tests that closing an already-closed sink does not error.
|
||||
func TestFileSink_CloseTwice(t *testing.T) {
|
||||
tmpDir := t.TempDir()
|
||||
testPath := filepath.Join(tmpDir, "test.log")
|
||||
|
||||
sink, err := NewFileSink(Config{Path: testPath})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create sink: %v", err)
|
||||
}
|
||||
|
||||
if err := sink.Close(); err != nil {
|
||||
t.Errorf("first Close() should not error, got: %v", err)
|
||||
}
|
||||
|
||||
// After close, file is nil, so second close should return nil
|
||||
if err := sink.Close(); err != nil {
|
||||
t.Errorf("second Close() on already-closed sink should not error, got: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
// TestFileSink_WriteAfterClose tests that Write after Close re-opens the file.
|
||||
func TestFileSink_WriteAfterCloseReopens(t *testing.T) {
|
||||
tmpDir := t.TempDir()
|
||||
testPath := filepath.Join(tmpDir, "test.log")
|
||||
|
||||
sink, err := NewFileSink(Config{Path: testPath})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create sink: %v", err)
|
||||
}
|
||||
|
||||
if err := sink.Close(); err != nil {
|
||||
t.Fatalf("Close() failed: %v", err)
|
||||
}
|
||||
|
||||
// Write after close: FileSink.Write reopens the file when file == nil
|
||||
log := domain.CorrelatedLog{SrcIP: "1.2.3.4", SrcPort: 80}
|
||||
if err := sink.Write(context.Background(), log); err != nil {
|
||||
t.Errorf("Write after close should succeed (auto-reopen), got: %v", err)
|
||||
}
|
||||
|
||||
// Verify data was written
|
||||
data, err := os.ReadFile(testPath)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to read file: %v", err)
|
||||
}
|
||||
if len(data) == 0 {
|
||||
t.Error("expected data to be present after write on re-opened file")
|
||||
}
|
||||
}
|
||||
|
||||
// TestFileSink_ReopenAfterWrite tests Reopen then write produces correct output.
|
||||
func TestFileSink_ReopenThenWrite(t *testing.T) {
|
||||
tmpDir := t.TempDir()
|
||||
testPath := filepath.Join(tmpDir, "test.log")
|
||||
|
||||
sink, err := NewFileSink(Config{Path: testPath})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create sink: %v", err)
|
||||
}
|
||||
defer sink.Close()
|
||||
|
||||
// Write before reopen
|
||||
log1 := domain.CorrelatedLog{SrcIP: "1.1.1.1", SrcPort: 80}
|
||||
if err := sink.Write(context.Background(), log1); err != nil {
|
||||
t.Fatalf("first Write failed: %v", err)
|
||||
}
|
||||
|
||||
// Simulate log rotation
|
||||
if err := sink.Reopen(); err != nil {
|
||||
t.Fatalf("Reopen failed: %v", err)
|
||||
}
|
||||
|
||||
// Write after reopen
|
||||
log2 := domain.CorrelatedLog{SrcIP: "2.2.2.2", SrcPort: 443}
|
||||
if err := sink.Write(context.Background(), log2); err != nil {
|
||||
t.Fatalf("second Write failed: %v", err)
|
||||
}
|
||||
|
||||
sink.Close()
|
||||
|
||||
data, err := os.ReadFile(testPath)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to read file: %v", err)
|
||||
}
|
||||
|
||||
lines := 0
|
||||
for _, b := range data {
|
||||
if b == '\n' {
|
||||
lines++
|
||||
}
|
||||
}
|
||||
if lines != 2 {
|
||||
t.Errorf("expected 2 lines after reopen+write, got %d", lines)
|
||||
}
|
||||
}
|
||||
137
services/correlator/internal/adapters/outbound/multi/sink.go
Normal file
137
services/correlator/internal/adapters/outbound/multi/sink.go
Normal file
@ -0,0 +1,137 @@
|
||||
package multi
|
||||
|
||||
import (
|
||||
"context"
|
||||
"sync"
|
||||
|
||||
"github.com/antitbone/ja4/correlator/internal/domain"
|
||||
"github.com/antitbone/ja4/correlator/internal/ports"
|
||||
)
|
||||
|
||||
// MultiSink fans out correlated logs to multiple sinks.
|
||||
type MultiSink struct {
|
||||
mu sync.RWMutex
|
||||
sinks []ports.CorrelatedLogSink
|
||||
}
|
||||
|
||||
// NewMultiSink creates a new multi-sink.
|
||||
func NewMultiSink(sinks ...ports.CorrelatedLogSink) *MultiSink {
|
||||
return &MultiSink{
|
||||
sinks: sinks,
|
||||
}
|
||||
}
|
||||
|
||||
// Name returns the sink name.
|
||||
func (s *MultiSink) Name() string {
|
||||
return "multi"
|
||||
}
|
||||
|
||||
// AddSink adds a sink to the fan-out.
|
||||
func (s *MultiSink) AddSink(sink ports.CorrelatedLogSink) {
|
||||
s.mu.Lock()
|
||||
defer s.mu.Unlock()
|
||||
s.sinks = append(s.sinks, sink)
|
||||
}
|
||||
|
||||
// Write writes a correlated log to all sinks concurrently.
|
||||
// Returns the first error encountered (but all sinks are attempted).
|
||||
func (s *MultiSink) Write(ctx context.Context, log domain.CorrelatedLog) error {
|
||||
s.mu.RLock()
|
||||
sinks := make([]ports.CorrelatedLogSink, len(s.sinks))
|
||||
copy(sinks, s.sinks)
|
||||
s.mu.RUnlock()
|
||||
|
||||
if len(sinks) == 0 {
|
||||
return nil
|
||||
}
|
||||
|
||||
var wg sync.WaitGroup
|
||||
var firstErr error
|
||||
var firstErrMu sync.Mutex
|
||||
errChan := make(chan error, len(sinks))
|
||||
|
||||
for _, sink := range sinks {
|
||||
wg.Add(1)
|
||||
go func(sk ports.CorrelatedLogSink) {
|
||||
defer wg.Done()
|
||||
if err := sk.Write(ctx, log); err != nil {
|
||||
// Non-blocking send to errChan
|
||||
select {
|
||||
case errChan <- err:
|
||||
default:
|
||||
// Channel full, error will be handled via firstErr
|
||||
}
|
||||
}
|
||||
}(sink)
|
||||
}
|
||||
|
||||
// Wait for all writes to complete in a separate goroutine
|
||||
done := make(chan struct{})
|
||||
go func() {
|
||||
wg.Wait()
|
||||
close(done)
|
||||
}()
|
||||
|
||||
// Collect errors with timeout
|
||||
select {
|
||||
case <-done:
|
||||
close(errChan)
|
||||
// Collect first error
|
||||
for err := range errChan {
|
||||
if err != nil {
|
||||
firstErrMu.Lock()
|
||||
if firstErr == nil {
|
||||
firstErr = err
|
||||
}
|
||||
firstErrMu.Unlock()
|
||||
}
|
||||
}
|
||||
case <-ctx.Done():
|
||||
return ctx.Err()
|
||||
}
|
||||
|
||||
firstErrMu.Lock()
|
||||
defer firstErrMu.Unlock()
|
||||
return firstErr
|
||||
}
|
||||
|
||||
// Flush flushes all sinks.
|
||||
func (s *MultiSink) Flush(ctx context.Context) error {
|
||||
s.mu.RLock()
|
||||
defer s.mu.RUnlock()
|
||||
|
||||
for _, sink := range s.sinks {
|
||||
if err := sink.Flush(ctx); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// Close closes all sinks.
|
||||
func (s *MultiSink) Close() error {
|
||||
s.mu.RLock()
|
||||
defer s.mu.RUnlock()
|
||||
|
||||
var firstErr error
|
||||
for _, sink := range s.sinks {
|
||||
if err := sink.Close(); err != nil && firstErr == nil {
|
||||
firstErr = err
|
||||
}
|
||||
}
|
||||
return firstErr
|
||||
}
|
||||
|
||||
// Reopen reopens all sinks (for log rotation on SIGHUP).
|
||||
func (s *MultiSink) Reopen() error {
|
||||
s.mu.RLock()
|
||||
defer s.mu.RUnlock()
|
||||
|
||||
var firstErr error
|
||||
for _, sink := range s.sinks {
|
||||
if err := sink.Reopen(); err != nil && firstErr == nil {
|
||||
firstErr = err
|
||||
}
|
||||
}
|
||||
return firstErr
|
||||
}
|
||||
@ -0,0 +1,233 @@
|
||||
package multi
|
||||
|
||||
import (
|
||||
"context"
|
||||
"sync"
|
||||
"testing"
|
||||
|
||||
"github.com/antitbone/ja4/correlator/internal/domain"
|
||||
)
|
||||
|
||||
type mockSink struct {
|
||||
name string
|
||||
mu sync.Mutex
|
||||
writeFunc func(domain.CorrelatedLog) error
|
||||
flushFunc func() error
|
||||
closeFunc func() error
|
||||
reopenFunc func() error
|
||||
}
|
||||
|
||||
func (m *mockSink) Name() string { return m.name }
|
||||
func (m *mockSink) Write(ctx context.Context, log domain.CorrelatedLog) error {
|
||||
m.mu.Lock()
|
||||
defer m.mu.Unlock()
|
||||
return m.writeFunc(log)
|
||||
}
|
||||
func (m *mockSink) Flush(ctx context.Context) error { return m.flushFunc() }
|
||||
func (m *mockSink) Close() error { return m.closeFunc() }
|
||||
func (m *mockSink) Reopen() error {
|
||||
if m.reopenFunc != nil {
|
||||
return m.reopenFunc()
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func TestMultiSink_Write(t *testing.T) {
|
||||
var mu sync.Mutex
|
||||
writeCount := 0
|
||||
|
||||
sink1 := &mockSink{
|
||||
name: "sink1",
|
||||
writeFunc: func(log domain.CorrelatedLog) error {
|
||||
mu.Lock()
|
||||
writeCount++
|
||||
mu.Unlock()
|
||||
return nil
|
||||
},
|
||||
flushFunc: func() error { return nil },
|
||||
closeFunc: func() error { return nil },
|
||||
}
|
||||
|
||||
sink2 := &mockSink{
|
||||
name: "sink2",
|
||||
writeFunc: func(log domain.CorrelatedLog) error {
|
||||
mu.Lock()
|
||||
writeCount++
|
||||
mu.Unlock()
|
||||
return nil
|
||||
},
|
||||
flushFunc: func() error { return nil },
|
||||
closeFunc: func() error { return nil },
|
||||
}
|
||||
|
||||
ms := NewMultiSink(sink1, sink2)
|
||||
|
||||
log := domain.CorrelatedLog{SrcIP: "192.168.1.1"}
|
||||
err := ms.Write(context.Background(), log)
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected error: %v", err)
|
||||
}
|
||||
|
||||
if writeCount != 2 {
|
||||
t.Errorf("expected 2 writes, got %d", writeCount)
|
||||
}
|
||||
}
|
||||
|
||||
func TestMultiSink_Write_OneFails(t *testing.T) {
|
||||
sink1 := &mockSink{
|
||||
name: "sink1",
|
||||
writeFunc: func(log domain.CorrelatedLog) error {
|
||||
return nil
|
||||
},
|
||||
flushFunc: func() error { return nil },
|
||||
closeFunc: func() error { return nil },
|
||||
}
|
||||
|
||||
sink2 := &mockSink{
|
||||
name: "sink2",
|
||||
writeFunc: func(log domain.CorrelatedLog) error {
|
||||
return context.Canceled
|
||||
},
|
||||
flushFunc: func() error { return nil },
|
||||
closeFunc: func() error { return nil },
|
||||
}
|
||||
|
||||
ms := NewMultiSink(sink1, sink2)
|
||||
|
||||
log := domain.CorrelatedLog{SrcIP: "192.168.1.1"}
|
||||
err := ms.Write(context.Background(), log)
|
||||
if err == nil {
|
||||
t.Error("expected error when one sink fails")
|
||||
}
|
||||
}
|
||||
|
||||
func TestMultiSink_AddSink(t *testing.T) {
|
||||
ms := NewMultiSink()
|
||||
|
||||
sink := &mockSink{
|
||||
name: "dynamic",
|
||||
writeFunc: func(log domain.CorrelatedLog) error { return nil },
|
||||
flushFunc: func() error { return nil },
|
||||
closeFunc: func() error { return nil },
|
||||
}
|
||||
|
||||
ms.AddSink(sink)
|
||||
|
||||
log := domain.CorrelatedLog{SrcIP: "192.168.1.1"}
|
||||
err := ms.Write(context.Background(), log)
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected error: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestMultiSink_Name(t *testing.T) {
|
||||
ms := NewMultiSink()
|
||||
if ms.Name() != "multi" {
|
||||
t.Errorf("expected name 'multi', got %s", ms.Name())
|
||||
}
|
||||
}
|
||||
|
||||
func TestMultiSink_Flush(t *testing.T) {
|
||||
flushed := false
|
||||
sink := &mockSink{
|
||||
name: "test",
|
||||
writeFunc: func(log domain.CorrelatedLog) error { return nil },
|
||||
flushFunc: func() error {
|
||||
flushed = true
|
||||
return nil
|
||||
},
|
||||
closeFunc: func() error { return nil },
|
||||
}
|
||||
|
||||
ms := NewMultiSink(sink)
|
||||
err := ms.Flush(context.Background())
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected error: %v", err)
|
||||
}
|
||||
if !flushed {
|
||||
t.Error("expected sink to be flushed")
|
||||
}
|
||||
}
|
||||
|
||||
func TestMultiSink_Flush_Error(t *testing.T) {
|
||||
sink := &mockSink{
|
||||
name: "test",
|
||||
writeFunc: func(log domain.CorrelatedLog) error { return nil },
|
||||
flushFunc: func() error { return context.Canceled },
|
||||
closeFunc: func() error { return nil },
|
||||
}
|
||||
|
||||
ms := NewMultiSink(sink)
|
||||
err := ms.Flush(context.Background())
|
||||
if err != context.Canceled {
|
||||
t.Errorf("expected context.Canceled error, got %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestMultiSink_Close(t *testing.T) {
|
||||
closed := false
|
||||
sink := &mockSink{
|
||||
name: "test",
|
||||
writeFunc: func(log domain.CorrelatedLog) error { return nil },
|
||||
flushFunc: func() error { return nil },
|
||||
closeFunc: func() error {
|
||||
closed = true
|
||||
return nil
|
||||
},
|
||||
}
|
||||
|
||||
ms := NewMultiSink(sink)
|
||||
err := ms.Close()
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected error: %v", err)
|
||||
}
|
||||
if !closed {
|
||||
t.Error("expected sink to be closed")
|
||||
}
|
||||
}
|
||||
|
||||
func TestMultiSink_Close_Error(t *testing.T) {
|
||||
sink := &mockSink{
|
||||
name: "test",
|
||||
writeFunc: func(log domain.CorrelatedLog) error { return nil },
|
||||
flushFunc: func() error { return nil },
|
||||
closeFunc: func() error { return context.Canceled },
|
||||
}
|
||||
|
||||
ms := NewMultiSink(sink)
|
||||
err := ms.Close()
|
||||
if err != context.Canceled {
|
||||
t.Errorf("expected context.Canceled error, got %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestMultiSink_Write_EmptySinks(t *testing.T) {
|
||||
ms := NewMultiSink()
|
||||
log := domain.CorrelatedLog{SrcIP: "192.168.1.1"}
|
||||
err := ms.Write(context.Background(), log)
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected error with empty sinks: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestMultiSink_Write_ContextCancelled(t *testing.T) {
|
||||
sink := &mockSink{
|
||||
name: "test",
|
||||
writeFunc: func(log domain.CorrelatedLog) error {
|
||||
<-context.Background().Done()
|
||||
return nil
|
||||
},
|
||||
flushFunc: func() error { return nil },
|
||||
closeFunc: func() error { return nil },
|
||||
}
|
||||
|
||||
ms := NewMultiSink(sink)
|
||||
ctx, cancel := context.WithCancel(context.Background())
|
||||
cancel()
|
||||
|
||||
log := domain.CorrelatedLog{SrcIP: "192.168.1.1"}
|
||||
err := ms.Write(ctx, log)
|
||||
if err != context.Canceled {
|
||||
t.Errorf("expected context.Canceled error, got %v", err)
|
||||
}
|
||||
}
|
||||
@ -0,0 +1,46 @@
|
||||
package stdout
|
||||
|
||||
import (
|
||||
"context"
|
||||
|
||||
"github.com/antitbone/ja4/correlator/internal/domain"
|
||||
)
|
||||
|
||||
// Config holds the stdout sink configuration.
|
||||
type Config struct {
|
||||
Enabled bool
|
||||
}
|
||||
|
||||
// StdoutSink is a no-op data sink. Operational logs are written to stderr
|
||||
// by the observability.Logger; correlated data must never appear on stdout.
|
||||
type StdoutSink struct{}
|
||||
|
||||
// NewStdoutSink creates a new stdout sink.
|
||||
func NewStdoutSink(config Config) *StdoutSink {
|
||||
return &StdoutSink{}
|
||||
}
|
||||
|
||||
// Name returns the sink name.
|
||||
func (s *StdoutSink) Name() string {
|
||||
return "stdout"
|
||||
}
|
||||
|
||||
// Reopen is a no-op for stdout.
|
||||
func (s *StdoutSink) Reopen() error {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Write is a no-op: correlated data must never be written to stdout.
|
||||
func (s *StdoutSink) Write(_ context.Context, _ domain.CorrelatedLog) error {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Flush is a no-op for stdout.
|
||||
func (s *StdoutSink) Flush(_ context.Context) error {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Close is a no-op for stdout.
|
||||
func (s *StdoutSink) Close() error {
|
||||
return nil
|
||||
}
|
||||
@ -0,0 +1,81 @@
|
||||
package stdout
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"context"
|
||||
"os"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"github.com/antitbone/ja4/correlator/internal/domain"
|
||||
)
|
||||
|
||||
func makeLog(correlated bool) domain.CorrelatedLog {
|
||||
return domain.CorrelatedLog{
|
||||
Timestamp: time.Unix(1700000000, 0),
|
||||
SrcIP: "1.2.3.4",
|
||||
SrcPort: 12345,
|
||||
Correlated: correlated,
|
||||
}
|
||||
}
|
||||
|
||||
// captureStdout replaces os.Stdout temporarily and returns what was written.
|
||||
func captureStdout(t *testing.T, fn func()) string {
|
||||
t.Helper()
|
||||
r, w, err := os.Pipe()
|
||||
if err != nil {
|
||||
t.Fatalf("os.Pipe: %v", err)
|
||||
}
|
||||
old := os.Stdout
|
||||
os.Stdout = w
|
||||
|
||||
fn()
|
||||
|
||||
w.Close()
|
||||
os.Stdout = old
|
||||
|
||||
var buf bytes.Buffer
|
||||
buf.ReadFrom(r)
|
||||
r.Close()
|
||||
return buf.String()
|
||||
}
|
||||
|
||||
func TestStdoutSink_Name(t *testing.T) {
|
||||
s := NewStdoutSink(Config{Enabled: true})
|
||||
if s.Name() != "stdout" {
|
||||
t.Errorf("expected name 'stdout', got %q", s.Name())
|
||||
}
|
||||
}
|
||||
|
||||
// TestStdoutSink_WriteDoesNotProduceOutput verifies that no JSON data
|
||||
// (correlated or not) is ever written to stdout.
|
||||
func TestStdoutSink_WriteDoesNotProduceOutput(t *testing.T) {
|
||||
s := NewStdoutSink(Config{Enabled: true})
|
||||
|
||||
got := captureStdout(t, func() {
|
||||
if err := s.Write(context.Background(), makeLog(true)); err != nil {
|
||||
t.Fatalf("Write(correlated) returned error: %v", err)
|
||||
}
|
||||
if err := s.Write(context.Background(), makeLog(false)); err != nil {
|
||||
t.Fatalf("Write(orphan) returned error: %v", err)
|
||||
}
|
||||
})
|
||||
|
||||
if got != "" {
|
||||
t.Errorf("stdout must be empty but got: %q", got)
|
||||
}
|
||||
}
|
||||
|
||||
func TestStdoutSink_NoopMethods(t *testing.T) {
|
||||
s := NewStdoutSink(Config{Enabled: true})
|
||||
|
||||
if err := s.Flush(context.Background()); err != nil {
|
||||
t.Errorf("Flush returned error: %v", err)
|
||||
}
|
||||
if err := s.Close(); err != nil {
|
||||
t.Errorf("Close returned error: %v", err)
|
||||
}
|
||||
if err := s.Reopen(); err != nil {
|
||||
t.Errorf("Reopen returned error: %v", err)
|
||||
}
|
||||
}
|
||||
160
services/correlator/internal/app/orchestrator.go
Normal file
160
services/correlator/internal/app/orchestrator.go
Normal file
@ -0,0 +1,160 @@
|
||||
package app
|
||||
|
||||
import (
|
||||
"context"
|
||||
"sync"
|
||||
"sync/atomic"
|
||||
"time"
|
||||
|
||||
"github.com/antitbone/ja4/correlator/internal/domain"
|
||||
"github.com/antitbone/ja4/correlator/internal/ports"
|
||||
)
|
||||
|
||||
const (
|
||||
// DefaultEventChannelBufferSize is the default size for event channels
|
||||
DefaultEventChannelBufferSize = 1000
|
||||
// OrphanTickInterval is how often the orchestrator drains pending orphans.
|
||||
// Set to half the default emit delay (500ms/2) so orphans are emitted promptly
|
||||
// even when no new events arrive.
|
||||
OrphanTickInterval = 250 * time.Millisecond
|
||||
)
|
||||
|
||||
// OrchestratorConfig holds the orchestrator configuration.
|
||||
type OrchestratorConfig struct {
|
||||
Sources []ports.EventSource
|
||||
Sink ports.CorrelatedLogSink
|
||||
}
|
||||
|
||||
// Orchestrator connects sources to the correlation service and sinks.
|
||||
type Orchestrator struct {
|
||||
config OrchestratorConfig
|
||||
correlationSvc ports.CorrelationProcessor
|
||||
ctx context.Context
|
||||
cancel context.CancelFunc
|
||||
wg sync.WaitGroup
|
||||
running atomic.Bool
|
||||
}
|
||||
|
||||
// NewOrchestrator creates a new orchestrator.
|
||||
func NewOrchestrator(config OrchestratorConfig, correlationSvc ports.CorrelationProcessor) *Orchestrator {
|
||||
ctx, cancel := context.WithCancel(context.Background())
|
||||
return &Orchestrator{
|
||||
config: config,
|
||||
correlationSvc: correlationSvc,
|
||||
ctx: ctx,
|
||||
cancel: cancel,
|
||||
}
|
||||
}
|
||||
|
||||
// Start begins the orchestration.
|
||||
func (o *Orchestrator) Start() error {
|
||||
if !o.running.CompareAndSwap(false, true) {
|
||||
return nil // Already running
|
||||
}
|
||||
|
||||
// Start each source
|
||||
for _, source := range o.config.Sources {
|
||||
eventChan := make(chan *domain.NormalizedEvent, DefaultEventChannelBufferSize)
|
||||
|
||||
o.wg.Add(1)
|
||||
go func(src ports.EventSource, evChan chan *domain.NormalizedEvent) {
|
||||
defer o.wg.Done()
|
||||
|
||||
// Start the source in a separate goroutine
|
||||
sourceErr := make(chan error, 1)
|
||||
go func() {
|
||||
if err := src.Start(o.ctx, evChan); err != nil {
|
||||
sourceErr <- err
|
||||
}
|
||||
}()
|
||||
|
||||
// Process events in the current goroutine
|
||||
o.processEvents(evChan)
|
||||
|
||||
// Check for source start errors
|
||||
if err := <-sourceErr; err != nil {
|
||||
// Source failed to start, log error and exit
|
||||
return
|
||||
}
|
||||
}(source, eventChan)
|
||||
}
|
||||
|
||||
// Start a periodic ticker to drain pending orphan A events independently of the
|
||||
// event flow. Without this, orphans are only emitted when a new event arrives,
|
||||
// causing them to accumulate silently when the source goes quiet.
|
||||
o.wg.Add(1)
|
||||
go func() {
|
||||
defer o.wg.Done()
|
||||
ticker := time.NewTicker(OrphanTickInterval)
|
||||
defer ticker.Stop()
|
||||
for {
|
||||
select {
|
||||
case <-o.ctx.Done():
|
||||
return
|
||||
case <-ticker.C:
|
||||
logs := o.correlationSvc.EmitPendingOrphans()
|
||||
for _, log := range logs {
|
||||
o.config.Sink.Write(o.ctx, log) //nolint:errcheck
|
||||
}
|
||||
}
|
||||
}
|
||||
}()
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
func (o *Orchestrator) processEvents(eventChan <-chan *domain.NormalizedEvent) {
|
||||
for {
|
||||
select {
|
||||
case <-o.ctx.Done():
|
||||
// Drain remaining events before exiting
|
||||
for {
|
||||
select {
|
||||
case event, ok := <-eventChan:
|
||||
if !ok {
|
||||
return
|
||||
}
|
||||
logs := o.correlationSvc.ProcessEvent(event)
|
||||
for _, log := range logs {
|
||||
o.config.Sink.Write(o.ctx, log)
|
||||
}
|
||||
default:
|
||||
return
|
||||
}
|
||||
}
|
||||
case event, ok := <-eventChan:
|
||||
if !ok {
|
||||
return
|
||||
}
|
||||
|
||||
// Process through correlation service
|
||||
logs := o.correlationSvc.ProcessEvent(event)
|
||||
|
||||
// Write correlated logs to sink
|
||||
for _, log := range logs {
|
||||
if err := o.config.Sink.Write(o.ctx, log); err != nil {
|
||||
// Log error but continue processing
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Stop gracefully stops the orchestrator.
|
||||
// It stops all sources and closes sinks immediately without waiting for queue drainage.
|
||||
// systemd TimeoutStopSec handles forced termination if needed.
|
||||
func (o *Orchestrator) Stop() error {
|
||||
if !o.running.CompareAndSwap(true, false) {
|
||||
return nil // Not running
|
||||
}
|
||||
|
||||
// Cancel context to stop accepting new events immediately
|
||||
o.cancel()
|
||||
|
||||
// Close sink (flush skipped - in-flight events are dropped)
|
||||
if err := o.config.Sink.Close(); err != nil {
|
||||
// Log error
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
300
services/correlator/internal/app/orchestrator_test.go
Normal file
300
services/correlator/internal/app/orchestrator_test.go
Normal file
@ -0,0 +1,300 @@
|
||||
package app
|
||||
|
||||
import (
|
||||
"context"
|
||||
"sync"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"github.com/antitbone/ja4/correlator/internal/domain"
|
||||
"github.com/antitbone/ja4/correlator/internal/ports"
|
||||
)
|
||||
|
||||
type mockEventSource struct {
|
||||
name string
|
||||
mu sync.RWMutex
|
||||
eventChan chan<- *domain.NormalizedEvent
|
||||
started bool
|
||||
stopped bool
|
||||
}
|
||||
|
||||
func (m *mockEventSource) Name() string { return m.name }
|
||||
func (m *mockEventSource) Start(ctx context.Context, eventChan chan<- *domain.NormalizedEvent) error {
|
||||
m.mu.Lock()
|
||||
m.started = true
|
||||
m.eventChan = eventChan
|
||||
m.mu.Unlock()
|
||||
<-ctx.Done()
|
||||
m.mu.Lock()
|
||||
m.stopped = true
|
||||
m.mu.Unlock()
|
||||
return nil
|
||||
}
|
||||
func (m *mockEventSource) Stop() error { return nil }
|
||||
|
||||
func (m *mockEventSource) getEventChan() chan<- *domain.NormalizedEvent {
|
||||
m.mu.RLock()
|
||||
defer m.mu.RUnlock()
|
||||
return m.eventChan
|
||||
}
|
||||
|
||||
func (m *mockEventSource) isStarted() bool {
|
||||
m.mu.RLock()
|
||||
defer m.mu.RUnlock()
|
||||
return m.started
|
||||
}
|
||||
|
||||
type mockSink struct {
|
||||
mu sync.Mutex
|
||||
written []domain.CorrelatedLog
|
||||
}
|
||||
|
||||
func (m *mockSink) Name() string { return "mock" }
|
||||
func (m *mockSink) Write(ctx context.Context, log domain.CorrelatedLog) error {
|
||||
m.mu.Lock()
|
||||
defer m.mu.Unlock()
|
||||
m.written = append(m.written, log)
|
||||
return nil
|
||||
}
|
||||
func (m *mockSink) Flush(ctx context.Context) error { return nil }
|
||||
func (m *mockSink) Close() error { return nil }
|
||||
func (m *mockSink) Reopen() error { return nil }
|
||||
|
||||
func (m *mockSink) getWritten() []domain.CorrelatedLog {
|
||||
m.mu.Lock()
|
||||
defer m.mu.Unlock()
|
||||
result := make([]domain.CorrelatedLog, len(m.written))
|
||||
copy(result, m.written)
|
||||
return result
|
||||
}
|
||||
|
||||
func TestOrchestrator_StartStop(t *testing.T) {
|
||||
source := &mockEventSource{name: "test"}
|
||||
sink := &mockSink{}
|
||||
|
||||
corrConfig := domain.CorrelationConfig{
|
||||
TimeWindow: time.Second,
|
||||
ApacheAlwaysEmit: true,
|
||||
NetworkEmit: false,
|
||||
}
|
||||
correlationSvc := domain.NewCorrelationService(corrConfig, &domain.RealTimeProvider{})
|
||||
|
||||
orchestrator := NewOrchestrator(OrchestratorConfig{
|
||||
Sources: []ports.EventSource{source},
|
||||
Sink: sink,
|
||||
}, correlationSvc)
|
||||
|
||||
if err := orchestrator.Start(); err != nil {
|
||||
t.Fatalf("failed to start: %v", err)
|
||||
}
|
||||
|
||||
// Let it run briefly
|
||||
time.Sleep(100 * time.Millisecond)
|
||||
|
||||
if err := orchestrator.Stop(); err != nil {
|
||||
t.Fatalf("failed to stop: %v", err)
|
||||
}
|
||||
|
||||
if !source.isStarted() {
|
||||
t.Error("expected source to be started")
|
||||
}
|
||||
}
|
||||
|
||||
func TestOrchestrator_ProcessEvent(t *testing.T) {
|
||||
source := &mockEventSource{name: "test"}
|
||||
sink := &mockSink{}
|
||||
|
||||
corrConfig := domain.CorrelationConfig{
|
||||
TimeWindow: time.Second,
|
||||
ApacheAlwaysEmit: true,
|
||||
NetworkEmit: false,
|
||||
}
|
||||
correlationSvc := domain.NewCorrelationService(corrConfig, &domain.RealTimeProvider{})
|
||||
|
||||
orchestrator := NewOrchestrator(OrchestratorConfig{
|
||||
Sources: []ports.EventSource{source},
|
||||
Sink: sink,
|
||||
}, correlationSvc)
|
||||
|
||||
if err := orchestrator.Start(); err != nil {
|
||||
t.Fatalf("failed to start: %v", err)
|
||||
}
|
||||
|
||||
// Wait for source to start and get the channel
|
||||
var eventChan chan<- *domain.NormalizedEvent
|
||||
for i := 0; i < 50; i++ {
|
||||
eventChan = source.getEventChan()
|
||||
if eventChan != nil {
|
||||
break
|
||||
}
|
||||
time.Sleep(10 * time.Millisecond)
|
||||
}
|
||||
|
||||
if eventChan == nil {
|
||||
t.Fatal("source did not start properly")
|
||||
}
|
||||
|
||||
// Send an event through the source
|
||||
event := &domain.NormalizedEvent{
|
||||
Source: domain.SourceA,
|
||||
Timestamp: time.Now(),
|
||||
SrcIP: "192.168.1.1",
|
||||
SrcPort: 8080,
|
||||
Raw: map[string]any{"method": "GET"},
|
||||
}
|
||||
|
||||
// Send event
|
||||
eventChan <- event
|
||||
|
||||
// Give it time to process
|
||||
time.Sleep(100 * time.Millisecond)
|
||||
|
||||
if err := orchestrator.Stop(); err != nil {
|
||||
t.Fatalf("failed to stop: %v", err)
|
||||
}
|
||||
|
||||
// Should have written at least one log (the orphan A)
|
||||
written := sink.getWritten()
|
||||
if len(written) == 0 {
|
||||
t.Error("expected at least one log to be written")
|
||||
}
|
||||
}
|
||||
|
||||
// TestOrchestrator_StartTwice tests that calling Start() twice is a no-op (already running).
|
||||
func TestOrchestrator_StartTwice(t *testing.T) {
|
||||
source := &mockEventSource{name: "test"}
|
||||
sink := &mockSink{}
|
||||
|
||||
corrConfig := domain.CorrelationConfig{
|
||||
TimeWindow: time.Second,
|
||||
ApacheAlwaysEmit: true,
|
||||
}
|
||||
correlationSvc := domain.NewCorrelationService(corrConfig, &domain.RealTimeProvider{})
|
||||
|
||||
o := NewOrchestrator(OrchestratorConfig{
|
||||
Sources: []ports.EventSource{source},
|
||||
Sink: sink,
|
||||
}, correlationSvc)
|
||||
|
||||
if err := o.Start(); err != nil {
|
||||
t.Fatalf("first Start() failed: %v", err)
|
||||
}
|
||||
if err := o.Start(); err != nil {
|
||||
t.Errorf("second Start() should be no-op, got: %v", err)
|
||||
}
|
||||
|
||||
o.Stop()
|
||||
}
|
||||
|
||||
// TestOrchestrator_StopTwice tests that calling Stop() twice is a no-op.
|
||||
func TestOrchestrator_StopTwice(t *testing.T) {
|
||||
source := &mockEventSource{name: "test"}
|
||||
sink := &mockSink{}
|
||||
|
||||
corrConfig := domain.CorrelationConfig{
|
||||
TimeWindow: time.Second,
|
||||
ApacheAlwaysEmit: true,
|
||||
}
|
||||
correlationSvc := domain.NewCorrelationService(corrConfig, &domain.RealTimeProvider{})
|
||||
|
||||
o := NewOrchestrator(OrchestratorConfig{
|
||||
Sources: []ports.EventSource{source},
|
||||
Sink: sink,
|
||||
}, correlationSvc)
|
||||
|
||||
o.Start()
|
||||
|
||||
if err := o.Stop(); err != nil {
|
||||
t.Errorf("first Stop() failed: %v", err)
|
||||
}
|
||||
if err := o.Stop(); err != nil {
|
||||
t.Errorf("second Stop() should be no-op, got: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
// TestOrchestrator_NoSources tests that Orchestrator works with no sources.
|
||||
func TestOrchestrator_NoSources(t *testing.T) {
|
||||
sink := &mockSink{}
|
||||
|
||||
corrConfig := domain.CorrelationConfig{TimeWindow: time.Second}
|
||||
correlationSvc := domain.NewCorrelationService(corrConfig, &domain.RealTimeProvider{})
|
||||
|
||||
o := NewOrchestrator(OrchestratorConfig{
|
||||
Sources: []ports.EventSource{},
|
||||
Sink: sink,
|
||||
}, correlationSvc)
|
||||
|
||||
if err := o.Start(); err != nil {
|
||||
t.Fatalf("Start() with no sources failed: %v", err)
|
||||
}
|
||||
|
||||
time.Sleep(50 * time.Millisecond)
|
||||
|
||||
if err := o.Stop(); err != nil {
|
||||
t.Errorf("Stop() failed: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
// TestOrchestrator_OrphanEmission tests that orphan A events are emitted via tick.
|
||||
func TestOrchestrator_OrphanEmission(t *testing.T) {
|
||||
source := &mockEventSource{name: "test"}
|
||||
sink := &mockSink{}
|
||||
|
||||
corrConfig := domain.CorrelationConfig{
|
||||
TimeWindow: 50 * time.Millisecond,
|
||||
ApacheAlwaysEmit: true,
|
||||
ApacheEmitDelayMs: 10, // Very short delay so orphans emit quickly
|
||||
}
|
||||
correlationSvc := domain.NewCorrelationService(corrConfig, &domain.RealTimeProvider{})
|
||||
|
||||
o := NewOrchestrator(OrchestratorConfig{
|
||||
Sources: []ports.EventSource{source},
|
||||
Sink: sink,
|
||||
}, correlationSvc)
|
||||
|
||||
if err := o.Start(); err != nil {
|
||||
t.Fatalf("Start() failed: %v", err)
|
||||
}
|
||||
|
||||
// Wait for source to be ready
|
||||
var eventChan chan<- *domain.NormalizedEvent
|
||||
for i := 0; i < 50; i++ {
|
||||
eventChan = source.getEventChan()
|
||||
if eventChan != nil {
|
||||
break
|
||||
}
|
||||
time.Sleep(5 * time.Millisecond)
|
||||
}
|
||||
if eventChan == nil {
|
||||
t.Fatal("source did not start")
|
||||
}
|
||||
|
||||
// Send a source A event (Apache/HTTP)
|
||||
eventChan <- &domain.NormalizedEvent{
|
||||
Source: domain.SourceA,
|
||||
Timestamp: time.Now(),
|
||||
SrcIP: "10.0.0.1",
|
||||
SrcPort: 12345,
|
||||
Raw: map[string]any{"method": "GET"},
|
||||
}
|
||||
|
||||
// Allow time for orphan ticker to fire (OrphanTickInterval = 250ms, but emit delay is 10ms)
|
||||
time.Sleep(600 * time.Millisecond)
|
||||
|
||||
o.Stop()
|
||||
|
||||
written := sink.getWritten()
|
||||
if len(written) == 0 {
|
||||
t.Error("expected at least one orphan log to be emitted")
|
||||
}
|
||||
}
|
||||
|
||||
// TestOrchestrator_Constants tests that constants have reasonable values.
|
||||
func TestOrchestrator_Constants(t *testing.T) {
|
||||
if DefaultEventChannelBufferSize <= 0 {
|
||||
t.Error("DefaultEventChannelBufferSize should be positive")
|
||||
}
|
||||
if OrphanTickInterval <= 0 {
|
||||
t.Error("OrphanTickInterval should be positive")
|
||||
}
|
||||
}
|
||||
406
services/correlator/internal/config/config.go
Normal file
406
services/correlator/internal/config/config.go
Normal file
@ -0,0 +1,406 @@
|
||||
package config
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"net"
|
||||
"os"
|
||||
"strconv"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/antitbone/ja4/correlator/internal/domain"
|
||||
"gopkg.in/yaml.v3"
|
||||
)
|
||||
|
||||
// Config holds the complete application configuration.
|
||||
type Config struct {
|
||||
Log LogConfig `yaml:"log"`
|
||||
Inputs InputsConfig `yaml:"inputs"`
|
||||
Outputs OutputsConfig `yaml:"outputs"`
|
||||
Correlation CorrelationConfig `yaml:"correlation"`
|
||||
Metrics MetricsConfig `yaml:"metrics"`
|
||||
}
|
||||
|
||||
// MetricsConfig holds metrics server configuration.
|
||||
type MetricsConfig struct {
|
||||
Enabled bool `yaml:"enabled"`
|
||||
Addr string `yaml:"addr"` // e.g., ":8080", "localhost:8080"
|
||||
}
|
||||
|
||||
// LogConfig holds logging configuration.
|
||||
type LogConfig struct {
|
||||
Level string `yaml:"level"` // DEBUG, INFO, WARN, ERROR
|
||||
}
|
||||
|
||||
// GetLogLevel returns the log level, defaulting to INFO if not set.
|
||||
func (c *LogConfig) GetLevel() string {
|
||||
if c.Level == "" {
|
||||
return "INFO"
|
||||
}
|
||||
return strings.ToUpper(c.Level)
|
||||
}
|
||||
|
||||
// ServiceConfig holds service-level configuration.
|
||||
type ServiceConfig struct {
|
||||
Name string `yaml:"name"`
|
||||
Language string `yaml:"language"`
|
||||
}
|
||||
|
||||
// InputsConfig holds input sources configuration.
|
||||
type InputsConfig struct {
|
||||
UnixSockets []UnixSocketConfig `yaml:"unix_sockets"`
|
||||
}
|
||||
|
||||
// UnixSocketConfig holds a Unix socket source configuration.
|
||||
type UnixSocketConfig struct {
|
||||
Name string `yaml:"name"`
|
||||
Path string `yaml:"path"`
|
||||
Format string `yaml:"format"`
|
||||
SourceType string `yaml:"source_type"` // "A" for Apache/HTTP, "B" for Network
|
||||
SocketPermissions string `yaml:"socket_permissions"` // octal string, e.g., "0660", "0666"
|
||||
}
|
||||
|
||||
// OutputsConfig holds output sinks configuration.
|
||||
type OutputsConfig struct {
|
||||
File FileOutputConfig `yaml:"file"`
|
||||
ClickHouse ClickHouseOutputConfig `yaml:"clickhouse"`
|
||||
Stdout StdoutOutputConfig `yaml:"stdout"`
|
||||
}
|
||||
|
||||
// FileOutputConfig holds file sink configuration.
|
||||
type FileOutputConfig struct {
|
||||
Enabled bool `yaml:"enabled"`
|
||||
Path string `yaml:"path"`
|
||||
}
|
||||
|
||||
// ClickHouseOutputConfig holds ClickHouse sink configuration.
|
||||
type ClickHouseOutputConfig struct {
|
||||
Enabled bool `yaml:"enabled"`
|
||||
DSN string `yaml:"dsn"`
|
||||
Table string `yaml:"table"`
|
||||
BatchSize int `yaml:"batch_size"`
|
||||
FlushIntervalMs int `yaml:"flush_interval_ms"`
|
||||
MaxBufferSize int `yaml:"max_buffer_size"`
|
||||
DropOnOverflow bool `yaml:"drop_on_overflow"`
|
||||
AsyncInsert bool `yaml:"async_insert"`
|
||||
TimeoutMs int `yaml:"timeout_ms"`
|
||||
}
|
||||
|
||||
// StdoutOutputConfig holds stdout sink configuration.
|
||||
type StdoutOutputConfig struct {
|
||||
Enabled bool `yaml:"enabled"`
|
||||
Level string `yaml:"level"` // DEBUG, INFO, WARN, ERROR - filters output verbosity
|
||||
}
|
||||
|
||||
// CorrelationConfig holds correlation configuration.
|
||||
type CorrelationConfig struct {
|
||||
TimeWindow TimeWindowConfig `yaml:"time_window"`
|
||||
OrphanPolicy OrphanPolicyConfig `yaml:"orphan_policy"`
|
||||
Matching MatchingConfig `yaml:"matching"`
|
||||
Buffers BuffersConfig `yaml:"buffers"`
|
||||
TTL TTLConfig `yaml:"ttl"`
|
||||
ExcludeSourceIPs []string `yaml:"exclude_source_ips"` // List of source IPs or CIDR ranges to exclude
|
||||
IncludeDestPorts []int `yaml:"include_dest_ports"` // If non-empty, only correlate events matching these destination ports
|
||||
// Deprecated: Use TimeWindow.Value instead
|
||||
TimeWindowS int `yaml:"time_window_s"`
|
||||
// Deprecated: Use OrphanPolicy.ApacheAlwaysEmit instead
|
||||
EmitOrphans bool `yaml:"emit_orphans"`
|
||||
}
|
||||
|
||||
// TimeWindowConfig holds time window configuration.
|
||||
type TimeWindowConfig struct {
|
||||
Value int `yaml:"value"`
|
||||
Unit string `yaml:"unit"` // s, ms, etc.
|
||||
}
|
||||
|
||||
// GetDuration returns the time window as a duration.
|
||||
func (c *TimeWindowConfig) GetDuration() time.Duration {
|
||||
value := c.Value
|
||||
if value <= 0 {
|
||||
value = 1
|
||||
}
|
||||
switch c.Unit {
|
||||
case "ms", "millisecond", "milliseconds":
|
||||
return time.Duration(value) * time.Millisecond
|
||||
case "s", "sec", "second", "seconds":
|
||||
fallthrough
|
||||
default:
|
||||
return time.Duration(value) * time.Second
|
||||
}
|
||||
}
|
||||
|
||||
// OrphanPolicyConfig holds orphan event policy configuration.
|
||||
type OrphanPolicyConfig struct {
|
||||
ApacheAlwaysEmit bool `yaml:"apache_always_emit"`
|
||||
ApacheEmitDelayMs int `yaml:"apache_emit_delay_ms"` // Delay in ms before emitting orphan A
|
||||
NetworkEmit bool `yaml:"network_emit"`
|
||||
}
|
||||
|
||||
// MatchingConfig holds matching mode configuration.
|
||||
type MatchingConfig struct {
|
||||
Mode string `yaml:"mode"` // one_to_one or one_to_many
|
||||
}
|
||||
|
||||
// BuffersConfig holds buffer size configuration.
|
||||
type BuffersConfig struct {
|
||||
MaxHTTPItems int `yaml:"max_http_items"`
|
||||
MaxNetworkItems int `yaml:"max_network_items"`
|
||||
}
|
||||
|
||||
// TTLConfig holds TTL configuration.
|
||||
type TTLConfig struct {
|
||||
NetworkTTLS int `yaml:"network_ttl_s"`
|
||||
}
|
||||
|
||||
// Load loads configuration from a YAML file.
|
||||
func Load(path string) (*Config, error) {
|
||||
data, err := os.ReadFile(path)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to read config file: %w", err)
|
||||
}
|
||||
|
||||
cfg := defaultConfig()
|
||||
|
||||
if err := yaml.Unmarshal(data, cfg); err != nil {
|
||||
return nil, fmt.Errorf("failed to parse config file: %w", err)
|
||||
}
|
||||
|
||||
if err := cfg.Validate(); err != nil {
|
||||
return nil, fmt.Errorf("invalid config: %w", err)
|
||||
}
|
||||
|
||||
return cfg, nil
|
||||
}
|
||||
|
||||
// defaultConfig returns a Config with default values.
|
||||
func defaultConfig() *Config {
|
||||
return &Config{
|
||||
Log: LogConfig{
|
||||
Level: "INFO",
|
||||
},
|
||||
Inputs: InputsConfig{
|
||||
UnixSockets: make([]UnixSocketConfig, 0),
|
||||
},
|
||||
Outputs: OutputsConfig{
|
||||
File: FileOutputConfig{
|
||||
Enabled: true,
|
||||
Path: "/var/log/logcorrelator/correlated.log",
|
||||
},
|
||||
ClickHouse: ClickHouseOutputConfig{
|
||||
Enabled: false,
|
||||
BatchSize: 500,
|
||||
FlushIntervalMs: 200,
|
||||
MaxBufferSize: 5000,
|
||||
DropOnOverflow: true,
|
||||
AsyncInsert: true,
|
||||
TimeoutMs: 1000,
|
||||
},
|
||||
Stdout: StdoutOutputConfig{Enabled: false},
|
||||
},
|
||||
Correlation: CorrelationConfig{
|
||||
TimeWindowS: 1,
|
||||
EmitOrphans: true,
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
// Validate validates the configuration.
|
||||
func (c *Config) Validate() error {
|
||||
if len(c.Inputs.UnixSockets) < 2 {
|
||||
return fmt.Errorf("at least two unix socket inputs are required")
|
||||
}
|
||||
|
||||
seenNames := make(map[string]struct{}, len(c.Inputs.UnixSockets))
|
||||
seenPaths := make(map[string]struct{}, len(c.Inputs.UnixSockets))
|
||||
|
||||
for i, input := range c.Inputs.UnixSockets {
|
||||
if strings.TrimSpace(input.Name) == "" {
|
||||
return fmt.Errorf("inputs.unix_sockets[%d].name is required", i)
|
||||
}
|
||||
if strings.TrimSpace(input.Path) == "" {
|
||||
return fmt.Errorf("inputs.unix_sockets[%d].path is required", i)
|
||||
}
|
||||
|
||||
if _, exists := seenNames[input.Name]; exists {
|
||||
return fmt.Errorf("duplicate unix socket input name: %s", input.Name)
|
||||
}
|
||||
seenNames[input.Name] = struct{}{}
|
||||
|
||||
if _, exists := seenPaths[input.Path]; exists {
|
||||
return fmt.Errorf("duplicate unix socket input path: %s", input.Path)
|
||||
}
|
||||
seenPaths[input.Path] = struct{}{}
|
||||
}
|
||||
|
||||
// At least one output must be enabled
|
||||
hasOutput := false
|
||||
if c.Outputs.File.Enabled && c.Outputs.File.Path != "" {
|
||||
hasOutput = true
|
||||
}
|
||||
if c.Outputs.ClickHouse.Enabled {
|
||||
hasOutput = true
|
||||
}
|
||||
if c.Outputs.Stdout.Enabled {
|
||||
hasOutput = true
|
||||
}
|
||||
|
||||
if !hasOutput {
|
||||
return fmt.Errorf("at least one output must be enabled (file, clickhouse, or stdout)")
|
||||
}
|
||||
|
||||
if c.Outputs.ClickHouse.Enabled {
|
||||
if strings.TrimSpace(c.Outputs.ClickHouse.DSN) == "" {
|
||||
return fmt.Errorf("clickhouse DSN is required when enabled")
|
||||
}
|
||||
if strings.TrimSpace(c.Outputs.ClickHouse.Table) == "" {
|
||||
return fmt.Errorf("clickhouse table is required when enabled")
|
||||
}
|
||||
if c.Outputs.ClickHouse.BatchSize <= 0 {
|
||||
return fmt.Errorf("clickhouse batch_size must be > 0")
|
||||
}
|
||||
if c.Outputs.ClickHouse.MaxBufferSize <= 0 {
|
||||
return fmt.Errorf("clickhouse max_buffer_size must be > 0")
|
||||
}
|
||||
if c.Outputs.ClickHouse.TimeoutMs <= 0 {
|
||||
return fmt.Errorf("clickhouse timeout_ms must be > 0")
|
||||
}
|
||||
}
|
||||
|
||||
if c.Correlation.TimeWindowS <= 0 {
|
||||
return fmt.Errorf("correlation.time_window_s must be > 0")
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// GetTimeWindow returns the time window as a duration.
|
||||
// Deprecated: Use TimeWindow.GetDuration() instead.
|
||||
func (c *CorrelationConfig) GetTimeWindow() time.Duration {
|
||||
// New config takes precedence
|
||||
if c.TimeWindow.Value > 0 {
|
||||
return c.TimeWindow.GetDuration()
|
||||
}
|
||||
// Fallback to deprecated field
|
||||
value := c.TimeWindowS
|
||||
if value <= 0 {
|
||||
value = 1
|
||||
}
|
||||
return time.Duration(value) * time.Second
|
||||
}
|
||||
|
||||
// GetApacheAlwaysEmit returns whether to always emit Apache events.
|
||||
func (c *CorrelationConfig) GetApacheAlwaysEmit() bool {
|
||||
if c.OrphanPolicy.ApacheAlwaysEmit {
|
||||
return true
|
||||
}
|
||||
// Fallback to deprecated field
|
||||
return c.EmitOrphans
|
||||
}
|
||||
|
||||
// GetApacheEmitDelayMs returns the delay in milliseconds before emitting orphan A events.
|
||||
func (c *CorrelationConfig) GetApacheEmitDelayMs() int {
|
||||
if c.OrphanPolicy.ApacheEmitDelayMs > 0 {
|
||||
return c.OrphanPolicy.ApacheEmitDelayMs
|
||||
}
|
||||
return domain.DefaultApacheEmitDelayMs // Default: 500ms
|
||||
}
|
||||
|
||||
// GetMatchingMode returns the matching mode.
|
||||
func (c *CorrelationConfig) GetMatchingMode() string {
|
||||
if c.Matching.Mode != "" {
|
||||
return c.Matching.Mode
|
||||
}
|
||||
return "one_to_many" // Default to Keep-Alive
|
||||
}
|
||||
|
||||
// GetMaxHTTPBufferSize returns the max HTTP buffer size.
|
||||
func (c *CorrelationConfig) GetMaxHTTPBufferSize() int {
|
||||
if c.Buffers.MaxHTTPItems > 0 {
|
||||
return c.Buffers.MaxHTTPItems
|
||||
}
|
||||
return domain.DefaultMaxHTTPBufferSize
|
||||
}
|
||||
|
||||
// GetMaxNetworkBufferSize returns the max network buffer size.
|
||||
func (c *CorrelationConfig) GetMaxNetworkBufferSize() int {
|
||||
if c.Buffers.MaxNetworkItems > 0 {
|
||||
return c.Buffers.MaxNetworkItems
|
||||
}
|
||||
return domain.DefaultMaxNetworkBufferSize
|
||||
}
|
||||
|
||||
// GetNetworkTTLS returns the network TTL in seconds.
|
||||
func (c *CorrelationConfig) GetNetworkTTLS() int {
|
||||
if c.TTL.NetworkTTLS > 0 {
|
||||
return c.TTL.NetworkTTLS
|
||||
}
|
||||
return domain.DefaultNetworkTTLS
|
||||
}
|
||||
|
||||
// GetSocketPermissions returns the socket permissions as os.FileMode.
|
||||
// Default is 0666 (world read/write).
|
||||
func (c *UnixSocketConfig) GetSocketPermissions() os.FileMode {
|
||||
trimmed := strings.TrimSpace(c.SocketPermissions)
|
||||
if trimmed == "" {
|
||||
return 0666
|
||||
}
|
||||
|
||||
// Parse octal string (e.g., "0660", "660", "0666")
|
||||
perms, err := strconv.ParseUint(trimmed, 8, 32)
|
||||
if err != nil {
|
||||
return 0666
|
||||
}
|
||||
|
||||
return os.FileMode(perms)
|
||||
}
|
||||
|
||||
// GetIncludeDestPorts returns the list of destination ports allowed for correlation.
|
||||
// An empty list means all ports are allowed.
|
||||
func (c *CorrelationConfig) GetIncludeDestPorts() []int {
|
||||
return c.IncludeDestPorts
|
||||
}
|
||||
|
||||
// GetExcludeSourceIPs returns the list of excluded source IPs or CIDR ranges.
|
||||
func (c *CorrelationConfig) GetExcludeSourceIPs() []string {
|
||||
return c.ExcludeSourceIPs
|
||||
}
|
||||
|
||||
// IsSourceIPExcluded checks if a source IP should be excluded.
|
||||
// Supports both exact IP matches and CIDR ranges.
|
||||
func (c *CorrelationConfig) IsSourceIPExcluded(ip string) bool {
|
||||
if len(c.ExcludeSourceIPs) == 0 {
|
||||
return false
|
||||
}
|
||||
|
||||
// Parse the IP once
|
||||
parsedIP := net.ParseIP(ip)
|
||||
if parsedIP == nil {
|
||||
return false // Invalid IP
|
||||
}
|
||||
|
||||
for _, exclude := range c.ExcludeSourceIPs {
|
||||
// Try CIDR first
|
||||
if strings.Contains(exclude, "/") {
|
||||
_, cidr, err := net.ParseCIDR(exclude)
|
||||
if err != nil {
|
||||
continue // Invalid CIDR, skip
|
||||
}
|
||||
if cidr.Contains(parsedIP) {
|
||||
return true
|
||||
}
|
||||
} else {
|
||||
// Exact IP match
|
||||
if exclude == ip {
|
||||
return true
|
||||
}
|
||||
// Also try parsing as IP (handles different formats like 192.168.1.1 vs 192.168.001.001)
|
||||
if excludeIP := net.ParseIP(exclude); excludeIP != nil {
|
||||
if excludeIP.Equal(parsedIP) {
|
||||
return true
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return false
|
||||
}
|
||||
1253
services/correlator/internal/config/config_test.go
Normal file
1253
services/correlator/internal/config/config_test.go
Normal file
File diff suppressed because it is too large
Load Diff
151
services/correlator/internal/domain/correlated_log.go
Normal file
151
services/correlator/internal/domain/correlated_log.go
Normal file
@ -0,0 +1,151 @@
|
||||
package domain
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"reflect"
|
||||
"time"
|
||||
)
|
||||
|
||||
// CorrelatedLog represents the output correlated log entry.
|
||||
// All fields are flattened into a single-level structure.
|
||||
type CorrelatedLog struct {
|
||||
Timestamp time.Time `json:"timestamp"`
|
||||
SrcIP string `json:"src_ip"`
|
||||
SrcPort int `json:"src_port"`
|
||||
DstIP string `json:"dst_ip,omitempty"`
|
||||
DstPort int `json:"dst_port,omitempty"`
|
||||
Correlated bool `json:"correlated"`
|
||||
OrphanSide string `json:"orphan_side,omitempty"`
|
||||
Fields map[string]any `json:"-"` // Additional fields, merged at marshal time
|
||||
}
|
||||
|
||||
// MarshalJSON implements custom JSON marshaling to flatten the structure.
|
||||
func (c CorrelatedLog) MarshalJSON() ([]byte, error) {
|
||||
// Create a flat map with all fields
|
||||
flat := make(map[string]any)
|
||||
|
||||
// Add core fields
|
||||
flat["timestamp"] = c.Timestamp
|
||||
flat["src_ip"] = c.SrcIP
|
||||
flat["src_port"] = c.SrcPort
|
||||
if c.DstIP != "" {
|
||||
flat["dst_ip"] = c.DstIP
|
||||
}
|
||||
if c.DstPort != 0 {
|
||||
flat["dst_port"] = c.DstPort
|
||||
}
|
||||
flat["correlated"] = c.Correlated
|
||||
if c.OrphanSide != "" {
|
||||
flat["orphan_side"] = c.OrphanSide
|
||||
}
|
||||
|
||||
// Merge additional fields while preserving reserved keys
|
||||
reservedKeys := map[string]struct{}{
|
||||
"timestamp": {},
|
||||
"src_ip": {},
|
||||
"src_port": {},
|
||||
"dst_ip": {},
|
||||
"dst_port": {},
|
||||
"correlated": {},
|
||||
"orphan_side": {},
|
||||
}
|
||||
for k, v := range c.Fields {
|
||||
if _, reserved := reservedKeys[k]; reserved {
|
||||
continue
|
||||
}
|
||||
flat[k] = v
|
||||
}
|
||||
|
||||
return json.Marshal(flat)
|
||||
}
|
||||
|
||||
// NewCorrelatedLogFromEvent creates a correlated log from a single event (orphan).
|
||||
func NewCorrelatedLogFromEvent(event *NormalizedEvent, orphanSide string) CorrelatedLog {
|
||||
fields := extractFields(event)
|
||||
if event.KeepAliveSeq > 0 {
|
||||
fields["keepalives"] = event.KeepAliveSeq
|
||||
}
|
||||
return CorrelatedLog{
|
||||
Timestamp: event.Timestamp,
|
||||
SrcIP: event.SrcIP,
|
||||
SrcPort: event.SrcPort,
|
||||
DstIP: event.DstIP,
|
||||
DstPort: event.DstPort,
|
||||
Correlated: false,
|
||||
OrphanSide: orphanSide,
|
||||
Fields: fields,
|
||||
}
|
||||
}
|
||||
|
||||
// NewCorrelatedLog creates a correlated log from two matched events.
|
||||
func NewCorrelatedLog(apacheEvent, networkEvent *NormalizedEvent) CorrelatedLog {
|
||||
ts := apacheEvent.Timestamp
|
||||
if networkEvent.Timestamp.After(ts) {
|
||||
ts = networkEvent.Timestamp
|
||||
}
|
||||
|
||||
fields := mergeFields(apacheEvent, networkEvent)
|
||||
if apacheEvent.KeepAliveSeq > 0 {
|
||||
fields["keepalives"] = apacheEvent.KeepAliveSeq
|
||||
}
|
||||
|
||||
return CorrelatedLog{
|
||||
Timestamp: ts,
|
||||
SrcIP: apacheEvent.SrcIP,
|
||||
SrcPort: apacheEvent.SrcPort,
|
||||
DstIP: coalesceString(apacheEvent.DstIP, networkEvent.DstIP),
|
||||
DstPort: coalesceInt(apacheEvent.DstPort, networkEvent.DstPort),
|
||||
Correlated: true,
|
||||
OrphanSide: "",
|
||||
Fields: fields,
|
||||
}
|
||||
}
|
||||
|
||||
func extractFields(e *NormalizedEvent) map[string]any {
|
||||
result := make(map[string]any)
|
||||
for k, v := range e.Raw {
|
||||
result[k] = v
|
||||
}
|
||||
return result
|
||||
}
|
||||
|
||||
func mergeFields(a, b *NormalizedEvent) map[string]any {
|
||||
result := make(map[string]any)
|
||||
|
||||
// Start with A fields
|
||||
for k, v := range a.Raw {
|
||||
result[k] = v
|
||||
}
|
||||
|
||||
// Merge B fields with collision handling
|
||||
for k, v := range b.Raw {
|
||||
if existing, exists := result[k]; exists {
|
||||
if reflect.DeepEqual(existing, v) {
|
||||
continue
|
||||
}
|
||||
|
||||
// Collision with different values: keep both with prefixes
|
||||
delete(result, k)
|
||||
result["a_"+k] = existing
|
||||
result["b_"+k] = v
|
||||
continue
|
||||
}
|
||||
result[k] = v
|
||||
}
|
||||
|
||||
return result
|
||||
}
|
||||
|
||||
func coalesceString(a, b string) string {
|
||||
if a != "" {
|
||||
return a
|
||||
}
|
||||
return b
|
||||
}
|
||||
|
||||
func coalesceInt(a, b int) int {
|
||||
if a != 0 {
|
||||
return a
|
||||
}
|
||||
return b
|
||||
}
|
||||
365
services/correlator/internal/domain/correlated_log_test.go
Normal file
365
services/correlator/internal/domain/correlated_log_test.go
Normal file
@ -0,0 +1,365 @@
|
||||
package domain
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"testing"
|
||||
"time"
|
||||
)
|
||||
|
||||
func TestNormalizedEvent_CorrelationKey(t *testing.T) {
|
||||
tests := []struct {
|
||||
name string
|
||||
event *NormalizedEvent
|
||||
expected string
|
||||
}{
|
||||
{
|
||||
name: "basic key",
|
||||
event: &NormalizedEvent{
|
||||
SrcIP: "192.168.1.1",
|
||||
SrcPort: 8080,
|
||||
},
|
||||
expected: "192.168.1.1:8080",
|
||||
},
|
||||
{
|
||||
name: "different port",
|
||||
event: &NormalizedEvent{
|
||||
SrcIP: "10.0.0.1",
|
||||
SrcPort: 443,
|
||||
},
|
||||
expected: "10.0.0.1:443",
|
||||
},
|
||||
{
|
||||
name: "port zero",
|
||||
event: &NormalizedEvent{
|
||||
SrcIP: "127.0.0.1",
|
||||
SrcPort: 0,
|
||||
},
|
||||
expected: "127.0.0.1:0",
|
||||
},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
key := tt.event.CorrelationKey()
|
||||
if key != tt.expected {
|
||||
t.Errorf("expected %s, got %s", tt.expected, key)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestNewCorrelatedLogFromEvent(t *testing.T) {
|
||||
event := &NormalizedEvent{
|
||||
Source: SourceA,
|
||||
Timestamp: time.Date(2024, 1, 1, 12, 0, 0, 0, time.UTC),
|
||||
SrcIP: "192.168.1.1",
|
||||
SrcPort: 8080,
|
||||
DstIP: "10.0.0.1",
|
||||
DstPort: 80,
|
||||
Raw: map[string]any{
|
||||
"method": "GET",
|
||||
"path": "/api/test",
|
||||
},
|
||||
}
|
||||
|
||||
log := NewCorrelatedLogFromEvent(event, "A")
|
||||
|
||||
if log.Correlated {
|
||||
t.Error("expected correlated to be false")
|
||||
}
|
||||
if log.OrphanSide != "A" {
|
||||
t.Errorf("expected orphan_side A, got %s", log.OrphanSide)
|
||||
}
|
||||
if log.SrcIP != "192.168.1.1" {
|
||||
t.Errorf("expected src_ip 192.168.1.1, got %s", log.SrcIP)
|
||||
}
|
||||
if log.Fields == nil {
|
||||
t.Error("expected fields to be non-nil")
|
||||
}
|
||||
}
|
||||
|
||||
func TestNewCorrelatedLog(t *testing.T) {
|
||||
apacheEvent := &NormalizedEvent{
|
||||
Source: SourceA,
|
||||
Timestamp: time.Date(2024, 1, 1, 12, 0, 0, 0, time.UTC),
|
||||
SrcIP: "192.168.1.1",
|
||||
SrcPort: 8080,
|
||||
DstIP: "10.0.0.1",
|
||||
DstPort: 80,
|
||||
Raw: map[string]any{"method": "GET"},
|
||||
}
|
||||
|
||||
networkEvent := &NormalizedEvent{
|
||||
Source: SourceB,
|
||||
Timestamp: time.Date(2024, 1, 1, 12, 0, 0, 500000000, time.UTC),
|
||||
SrcIP: "192.168.1.1",
|
||||
SrcPort: 8080,
|
||||
DstIP: "10.0.0.1",
|
||||
DstPort: 80,
|
||||
Raw: map[string]any{"ja3": "abc123"},
|
||||
}
|
||||
|
||||
log := NewCorrelatedLog(apacheEvent, networkEvent)
|
||||
|
||||
if !log.Correlated {
|
||||
t.Error("expected correlated to be true")
|
||||
}
|
||||
if log.OrphanSide != "" {
|
||||
t.Errorf("expected orphan_side to be empty, got %s", log.OrphanSide)
|
||||
}
|
||||
if log.Fields == nil {
|
||||
t.Error("expected fields to be non-nil")
|
||||
}
|
||||
}
|
||||
|
||||
// TestNewCorrelatedLog_TimestampSelectionAEarlier verifies that when A is earlier the later (B) timestamp is used.
|
||||
func TestNewCorrelatedLog_TimestampSelectionAEarlier(t *testing.T) {
|
||||
tsA := time.Date(2024, 1, 1, 12, 0, 0, 0, time.UTC)
|
||||
tsB := time.Date(2024, 1, 1, 12, 0, 1, 0, time.UTC) // B is later
|
||||
|
||||
a := &NormalizedEvent{Source: SourceA, Timestamp: tsA, SrcIP: "1.1.1.1", SrcPort: 100, Raw: map[string]any{}}
|
||||
b := &NormalizedEvent{Source: SourceB, Timestamp: tsB, SrcIP: "1.1.1.1", SrcPort: 100, Raw: map[string]any{}}
|
||||
|
||||
log := NewCorrelatedLog(a, b)
|
||||
if !log.Timestamp.Equal(tsB) {
|
||||
t.Errorf("expected timestamp to be B's (later), got %v", log.Timestamp)
|
||||
}
|
||||
}
|
||||
|
||||
// TestNewCorrelatedLog_TimestampSelectionBEarlier verifies that when B is earlier, A's timestamp is used.
|
||||
func TestNewCorrelatedLog_TimestampSelectionBEarlier(t *testing.T) {
|
||||
tsA := time.Date(2024, 1, 1, 12, 0, 1, 0, time.UTC) // A is later
|
||||
tsB := time.Date(2024, 1, 1, 12, 0, 0, 0, time.UTC)
|
||||
|
||||
a := &NormalizedEvent{Source: SourceA, Timestamp: tsA, SrcIP: "1.1.1.1", SrcPort: 100, Raw: map[string]any{}}
|
||||
b := &NormalizedEvent{Source: SourceB, Timestamp: tsB, SrcIP: "1.1.1.1", SrcPort: 100, Raw: map[string]any{}}
|
||||
|
||||
log := NewCorrelatedLog(a, b)
|
||||
// The later timestamp wins. Since B is not After A, ts stays as A's timestamp.
|
||||
if !log.Timestamp.Equal(tsA) {
|
||||
t.Errorf("expected timestamp to be A's (later), got %v", log.Timestamp)
|
||||
}
|
||||
}
|
||||
|
||||
// TestNewCorrelatedLog_TimestampEqual verifies equal timestamps yield A's timestamp.
|
||||
func TestNewCorrelatedLog_TimestampEqual(t *testing.T) {
|
||||
ts := time.Date(2024, 1, 1, 12, 0, 0, 0, time.UTC)
|
||||
a := &NormalizedEvent{Source: SourceA, Timestamp: ts, SrcIP: "1.1.1.1", SrcPort: 100, Raw: map[string]any{}}
|
||||
b := &NormalizedEvent{Source: SourceB, Timestamp: ts, SrcIP: "1.1.1.1", SrcPort: 100, Raw: map[string]any{}}
|
||||
|
||||
log := NewCorrelatedLog(a, b)
|
||||
if !log.Timestamp.Equal(ts) {
|
||||
t.Errorf("expected timestamp to be equal to both events' timestamp, got %v", log.Timestamp)
|
||||
}
|
||||
}
|
||||
|
||||
// TestNewCorrelatedLogFromEvent_WithKeepAlive verifies keepalives field is added when KeepAliveSeq > 0.
|
||||
func TestNewCorrelatedLogFromEvent_WithKeepAlive(t *testing.T) {
|
||||
event := &NormalizedEvent{
|
||||
Source: SourceA,
|
||||
Timestamp: time.Now(),
|
||||
SrcIP: "1.1.1.1",
|
||||
SrcPort: 9999,
|
||||
KeepAliveSeq: 3,
|
||||
Raw: map[string]any{"method": "GET"},
|
||||
}
|
||||
|
||||
log := NewCorrelatedLogFromEvent(event, "A")
|
||||
if log.Fields["keepalives"] != 3 {
|
||||
t.Errorf("expected keepalives=3, got %v", log.Fields["keepalives"])
|
||||
}
|
||||
}
|
||||
|
||||
// TestNewCorrelatedLogFromEvent_NoKeepAlive verifies keepalives field is absent when KeepAliveSeq == 0.
|
||||
func TestNewCorrelatedLogFromEvent_NoKeepAlive(t *testing.T) {
|
||||
event := &NormalizedEvent{
|
||||
Source: SourceA,
|
||||
Timestamp: time.Now(),
|
||||
SrcIP: "1.1.1.1",
|
||||
SrcPort: 9999,
|
||||
KeepAliveSeq: 0,
|
||||
Raw: map[string]any{"method": "GET"},
|
||||
}
|
||||
|
||||
log := NewCorrelatedLogFromEvent(event, "A")
|
||||
if _, ok := log.Fields["keepalives"]; ok {
|
||||
t.Error("keepalives field should not be present when KeepAliveSeq == 0")
|
||||
}
|
||||
}
|
||||
|
||||
// TestMergeFields_NoCollision verifies fields from A and B are merged without conflict.
|
||||
func TestMergeFields_NoCollision(t *testing.T) {
|
||||
a := &NormalizedEvent{Raw: map[string]any{"method": "GET", "path": "/foo"}}
|
||||
b := &NormalizedEvent{Raw: map[string]any{"ja4": "abc123", "proto": "TLS"}}
|
||||
|
||||
fields := mergeFields(a, b)
|
||||
if fields["method"] != "GET" {
|
||||
t.Errorf("expected method=GET, got %v", fields["method"])
|
||||
}
|
||||
if fields["ja4"] != "abc123" {
|
||||
t.Errorf("expected ja4=abc123, got %v", fields["ja4"])
|
||||
}
|
||||
}
|
||||
|
||||
// TestMergeFields_SameValueNoPrefix verifies same-value fields are not prefixed.
|
||||
func TestMergeFields_SameValueNoPrefix(t *testing.T) {
|
||||
a := &NormalizedEvent{Raw: map[string]any{"proto": "TCP"}}
|
||||
b := &NormalizedEvent{Raw: map[string]any{"proto": "TCP"}}
|
||||
|
||||
fields := mergeFields(a, b)
|
||||
if fields["proto"] != "TCP" {
|
||||
t.Errorf("expected proto=TCP (no prefix), got %v", fields["proto"])
|
||||
}
|
||||
if _, ok := fields["a_proto"]; ok {
|
||||
t.Error("a_proto should not exist for same-value collision")
|
||||
}
|
||||
if _, ok := fields["b_proto"]; ok {
|
||||
t.Error("b_proto should not exist for same-value collision")
|
||||
}
|
||||
}
|
||||
|
||||
// TestMergeFields_DifferentValuePrefix verifies different-value fields get a_/b_ prefix.
|
||||
func TestMergeFields_DifferentValuePrefix(t *testing.T) {
|
||||
a := &NormalizedEvent{Raw: map[string]any{"port": 80}}
|
||||
b := &NormalizedEvent{Raw: map[string]any{"port": 443}}
|
||||
|
||||
fields := mergeFields(a, b)
|
||||
if fields["a_port"] != 80 {
|
||||
t.Errorf("expected a_port=80, got %v", fields["a_port"])
|
||||
}
|
||||
if fields["b_port"] != 443 {
|
||||
t.Errorf("expected b_port=443, got %v", fields["b_port"])
|
||||
}
|
||||
if _, ok := fields["port"]; ok {
|
||||
t.Error("original 'port' key should be removed on collision")
|
||||
}
|
||||
}
|
||||
|
||||
// TestCoalesceString_EmptyA tests that when a is empty, b is returned.
|
||||
func TestCoalesceString_EmptyA(t *testing.T) {
|
||||
result := coalesceString("", "fallback")
|
||||
if result != "fallback" {
|
||||
t.Errorf("expected 'fallback', got %q", result)
|
||||
}
|
||||
}
|
||||
|
||||
// TestCoalesceString_NonEmptyA tests that when a is non-empty, a is returned.
|
||||
func TestCoalesceString_NonEmptyA(t *testing.T) {
|
||||
result := coalesceString("primary", "fallback")
|
||||
if result != "primary" {
|
||||
t.Errorf("expected 'primary', got %q", result)
|
||||
}
|
||||
}
|
||||
|
||||
// TestCoalesceInt_ZeroA tests that when a is zero, b is returned.
|
||||
func TestCoalesceInt_ZeroA(t *testing.T) {
|
||||
result := coalesceInt(0, 443)
|
||||
if result != 443 {
|
||||
t.Errorf("expected 443, got %d", result)
|
||||
}
|
||||
}
|
||||
|
||||
// TestCoalesceInt_NonZeroA tests that when a is non-zero, a is returned.
|
||||
func TestCoalesceInt_NonZeroA(t *testing.T) {
|
||||
result := coalesceInt(80, 443)
|
||||
if result != 80 {
|
||||
t.Errorf("expected 80, got %d", result)
|
||||
}
|
||||
}
|
||||
|
||||
// TestMarshalJSON_ReservedKeyProtection verifies reserved keys in Fields are not overwritten.
|
||||
func TestMarshalJSON_ReservedKeyProtection(t *testing.T) {
|
||||
log := CorrelatedLog{
|
||||
Timestamp: time.Date(2024, 1, 1, 12, 0, 0, 0, time.UTC),
|
||||
SrcIP: "1.2.3.4",
|
||||
SrcPort: 1234,
|
||||
Correlated: true,
|
||||
Fields: map[string]any{
|
||||
"src_ip": "EVIL_OVERRIDE", // should be ignored
|
||||
"correlated": false, // should be ignored
|
||||
"extra": "value",
|
||||
},
|
||||
}
|
||||
|
||||
data, err := json.Marshal(log)
|
||||
if err != nil {
|
||||
t.Fatalf("MarshalJSON failed: %v", err)
|
||||
}
|
||||
|
||||
var flat map[string]any
|
||||
if err := json.Unmarshal(data, &flat); err != nil {
|
||||
t.Fatalf("Unmarshal failed: %v", err)
|
||||
}
|
||||
|
||||
if flat["src_ip"] != "1.2.3.4" {
|
||||
t.Errorf("reserved key src_ip should not be overwritten, got %v", flat["src_ip"])
|
||||
}
|
||||
if flat["correlated"] != true {
|
||||
t.Errorf("reserved key correlated should not be overwritten, got %v", flat["correlated"])
|
||||
}
|
||||
if flat["extra"] != "value" {
|
||||
t.Errorf("non-reserved key extra should be present, got %v", flat["extra"])
|
||||
}
|
||||
}
|
||||
|
||||
// TestMarshalJSON_OptionalFieldsOmittedWhenZero verifies DstIP/DstPort are omitted when zero.
|
||||
func TestMarshalJSON_OptionalFieldsOmittedWhenZero(t *testing.T) {
|
||||
log := CorrelatedLog{
|
||||
Timestamp: time.Now(),
|
||||
SrcIP: "1.2.3.4",
|
||||
SrcPort: 1234,
|
||||
Correlated: false,
|
||||
}
|
||||
|
||||
data, err := json.Marshal(log)
|
||||
if err != nil {
|
||||
t.Fatalf("MarshalJSON failed: %v", err)
|
||||
}
|
||||
|
||||
var flat map[string]any
|
||||
if err := json.Unmarshal(data, &flat); err != nil {
|
||||
t.Fatalf("Unmarshal failed: %v", err)
|
||||
}
|
||||
|
||||
if _, ok := flat["dst_ip"]; ok {
|
||||
t.Error("dst_ip should be omitted when empty")
|
||||
}
|
||||
if _, ok := flat["dst_port"]; ok {
|
||||
t.Error("dst_port should be omitted when zero")
|
||||
}
|
||||
if _, ok := flat["orphan_side"]; ok {
|
||||
t.Error("orphan_side should be omitted when empty")
|
||||
}
|
||||
}
|
||||
|
||||
// TestExtractFields_Basic verifies extractFields copies Raw fields.
|
||||
func TestExtractFields_Basic(t *testing.T) {
|
||||
e := &NormalizedEvent{
|
||||
Raw: map[string]any{"key1": "val1", "key2": 42},
|
||||
}
|
||||
fields := extractFields(e)
|
||||
if fields["key1"] != "val1" {
|
||||
t.Errorf("expected key1=val1, got %v", fields["key1"])
|
||||
}
|
||||
if fields["key2"] != 42 {
|
||||
t.Errorf("expected key2=42, got %v", fields["key2"])
|
||||
}
|
||||
}
|
||||
|
||||
// TestNewCorrelatedLog_KeepAliveSeq verifies keepalives is set from apache event.
|
||||
func TestNewCorrelatedLog_KeepAliveSeq(t *testing.T) {
|
||||
a := &NormalizedEvent{
|
||||
Source: SourceA, Timestamp: time.Now(), SrcIP: "1.1.1.1", SrcPort: 100,
|
||||
KeepAliveSeq: 5,
|
||||
Raw: map[string]any{},
|
||||
}
|
||||
b := &NormalizedEvent{
|
||||
Source: SourceB, Timestamp: time.Now(), SrcIP: "1.1.1.1", SrcPort: 100,
|
||||
Raw: map[string]any{},
|
||||
}
|
||||
|
||||
log := NewCorrelatedLog(a, b)
|
||||
if log.Fields["keepalives"] != 5 {
|
||||
t.Errorf("expected keepalives=5, got %v", log.Fields["keepalives"])
|
||||
}
|
||||
}
|
||||
1017
services/correlator/internal/domain/correlation_service.go
Normal file
1017
services/correlator/internal/domain/correlation_service.go
Normal file
File diff suppressed because it is too large
Load Diff
1865
services/correlator/internal/domain/correlation_service_test.go
Normal file
1865
services/correlator/internal/domain/correlation_service_test.go
Normal file
File diff suppressed because it is too large
Load Diff
33
services/correlator/internal/domain/event.go
Normal file
33
services/correlator/internal/domain/event.go
Normal file
@ -0,0 +1,33 @@
|
||||
package domain
|
||||
|
||||
import (
|
||||
"strconv"
|
||||
"time"
|
||||
)
|
||||
|
||||
// EventSource identifies the source of an event.
|
||||
type EventSource string
|
||||
|
||||
const (
|
||||
SourceA EventSource = "A" // Apache/HTTP source
|
||||
SourceB EventSource = "B" // Network source
|
||||
)
|
||||
|
||||
// NormalizedEvent represents a unified internal event from either source.
|
||||
type NormalizedEvent struct {
|
||||
Source EventSource
|
||||
Timestamp time.Time
|
||||
SrcIP string
|
||||
SrcPort int
|
||||
DstIP string
|
||||
DstPort int
|
||||
Headers map[string]string
|
||||
Extra map[string]any
|
||||
Raw map[string]any // Original raw data
|
||||
KeepAliveSeq int // Request sequence number within the Keep-Alive connection (1-based)
|
||||
}
|
||||
|
||||
// CorrelationKey returns the key used for correlation (src_ip + src_port).
|
||||
func (e *NormalizedEvent) CorrelationKey() string {
|
||||
return e.SrcIP + ":" + strconv.Itoa(e.SrcPort)
|
||||
}
|
||||
25
services/correlator/internal/observability/logger.go
Normal file
25
services/correlator/internal/observability/logger.go
Normal file
@ -0,0 +1,25 @@
|
||||
// Package observability provides structured logging for the correlator service.
|
||||
// Implementation is delegated to shared/go/ja4common/logger to avoid duplication.
|
||||
package observability
|
||||
|
||||
import jalogger "github.com/antitbone/ja4/ja4common/logger"
|
||||
|
||||
// Type aliases — all existing correlator code compiles unchanged.
|
||||
type Logger = jalogger.Logger
|
||||
type LogLevel = jalogger.LogLevel
|
||||
|
||||
const (
|
||||
DEBUG LogLevel = jalogger.DEBUG
|
||||
INFO LogLevel = jalogger.INFO
|
||||
WARN LogLevel = jalogger.WARN
|
||||
ERROR LogLevel = jalogger.ERROR
|
||||
)
|
||||
|
||||
// NewLogger creates a new Logger with INFO level.
|
||||
func NewLogger(prefix string) *Logger { return jalogger.New(prefix) }
|
||||
|
||||
// NewLoggerWithLevel creates a new Logger with the specified minimum level.
|
||||
func NewLoggerWithLevel(prefix, level string) *Logger { return jalogger.NewWithLevel(prefix, level) }
|
||||
|
||||
// ParseLogLevel converts a string to LogLevel.
|
||||
func ParseLogLevel(level string) LogLevel { return jalogger.ParseLogLevel(level) }
|
||||
296
services/correlator/internal/observability/logger_test.go
Normal file
296
services/correlator/internal/observability/logger_test.go
Normal file
@ -0,0 +1,296 @@
|
||||
// Package observability tests — behavioral tests for the Logger type alias.
|
||||
// Since Logger = jalogger.Logger, we test the observable API only.
|
||||
package observability_test
|
||||
|
||||
import (
|
||||
"testing"
|
||||
|
||||
"github.com/antitbone/ja4/correlator/internal/observability"
|
||||
)
|
||||
|
||||
func TestNewLogger_NonNil(t *testing.T) {
|
||||
logger := observability.NewLogger("test")
|
||||
if logger == nil {
|
||||
t.Fatal("expected non-nil logger")
|
||||
}
|
||||
}
|
||||
|
||||
func TestLogger_DefaultLevel_IsInfo(t *testing.T) {
|
||||
logger := observability.NewLogger("test")
|
||||
if !logger.ShouldLog(observability.INFO) {
|
||||
t.Error("INFO should be enabled by default")
|
||||
}
|
||||
if logger.ShouldLog(observability.DEBUG) {
|
||||
t.Error("DEBUG should be disabled by default")
|
||||
}
|
||||
}
|
||||
|
||||
func TestLogger_Info_NoPanic(t *testing.T) {
|
||||
logger := observability.NewLoggerWithLevel("test", "INFO")
|
||||
if !logger.ShouldLog(observability.INFO) {
|
||||
t.Error("INFO should be enabled")
|
||||
}
|
||||
logger.Info("test message")
|
||||
}
|
||||
|
||||
func TestLogger_Error_NoPanic(t *testing.T) {
|
||||
logger := observability.NewLoggerWithLevel("test", "ERROR")
|
||||
if !logger.ShouldLog(observability.ERROR) {
|
||||
t.Error("ERROR should be enabled")
|
||||
}
|
||||
logger.Error("error message", nil)
|
||||
}
|
||||
|
||||
func TestLogger_Debug_NoPanic(t *testing.T) {
|
||||
logger := observability.NewLogger("test")
|
||||
logger.SetLevel("DEBUG")
|
||||
if !logger.ShouldLog(observability.DEBUG) {
|
||||
t.Error("DEBUG should be enabled after SetLevel(DEBUG)")
|
||||
}
|
||||
logger.Debug("test message")
|
||||
}
|
||||
|
||||
func TestLogger_SetLevel(t *testing.T) {
|
||||
logger := observability.NewLogger("test")
|
||||
|
||||
logger.SetLevel("DEBUG")
|
||||
if !logger.ShouldLog(observability.DEBUG) {
|
||||
t.Error("DEBUG should be enabled after SetLevel(DEBUG)")
|
||||
}
|
||||
|
||||
logger.SetLevel("INFO")
|
||||
if logger.ShouldLog(observability.DEBUG) {
|
||||
t.Error("DEBUG should be disabled after SetLevel(INFO)")
|
||||
}
|
||||
|
||||
logger.SetLevel("WARN")
|
||||
if logger.ShouldLog(observability.INFO) {
|
||||
t.Error("INFO should be disabled after SetLevel(WARN)")
|
||||
}
|
||||
if !logger.ShouldLog(observability.WARN) {
|
||||
t.Error("WARN should be enabled after SetLevel(WARN)")
|
||||
}
|
||||
|
||||
logger.SetLevel("ERROR")
|
||||
if logger.ShouldLog(observability.WARN) {
|
||||
t.Error("WARN should be disabled after SetLevel(ERROR)")
|
||||
}
|
||||
if !logger.ShouldLog(observability.ERROR) {
|
||||
t.Error("ERROR should be enabled after SetLevel(ERROR)")
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseLogLevel(t *testing.T) {
|
||||
cases := []struct {
|
||||
input string
|
||||
expected observability.LogLevel
|
||||
}{
|
||||
{"DEBUG", observability.DEBUG},
|
||||
{"debug", observability.DEBUG},
|
||||
{"INFO", observability.INFO},
|
||||
{"info", observability.INFO},
|
||||
{"WARN", observability.WARN},
|
||||
{"warn", observability.WARN},
|
||||
{"WARNING", observability.WARN},
|
||||
{"ERROR", observability.ERROR},
|
||||
{"error", observability.ERROR},
|
||||
{"", observability.INFO},
|
||||
{"invalid", observability.INFO},
|
||||
}
|
||||
for _, tt := range cases {
|
||||
t.Run(tt.input, func(t *testing.T) {
|
||||
result := observability.ParseLogLevel(tt.input)
|
||||
if result != tt.expected {
|
||||
t.Errorf("ParseLogLevel(%q) = %v, want %v", tt.input, result, tt.expected)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestLogger_WithFields_NoPanic(t *testing.T) {
|
||||
logger := observability.NewLogger("test")
|
||||
child := logger.WithFields(map[string]any{"key1": "value1", "key2": 42})
|
||||
if child == logger {
|
||||
t.Error("expected different logger instance")
|
||||
}
|
||||
child.Info("message with fields")
|
||||
}
|
||||
|
||||
func TestLogLevel_String(t *testing.T) {
|
||||
cases := []struct {
|
||||
level observability.LogLevel
|
||||
expected string
|
||||
}{
|
||||
{observability.DEBUG, "DEBUG"},
|
||||
{observability.INFO, "INFO"},
|
||||
{observability.WARN, "WARN"},
|
||||
{observability.ERROR, "ERROR"},
|
||||
}
|
||||
for _, tt := range cases {
|
||||
t.Run(tt.expected, func(t *testing.T) {
|
||||
if got := tt.level.String(); got != tt.expected {
|
||||
t.Errorf("LogLevel(%d).String() = %q, want %q", tt.level, got, tt.expected)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestLogger_Warn_NoPanic(t *testing.T) {
|
||||
logger := observability.NewLoggerWithLevel("test", "WARN")
|
||||
if !logger.ShouldLog(observability.WARN) {
|
||||
t.Error("WARN should be enabled")
|
||||
}
|
||||
logger.Warn("warning message")
|
||||
}
|
||||
|
||||
func TestLogger_Formatted_NoPanic(t *testing.T) {
|
||||
logger := observability.NewLoggerWithLevel("test", "DEBUG")
|
||||
logger.Warnf("formatted %s %d", "message", 42)
|
||||
logger.Infof("formatted %s %d", "message", 42)
|
||||
logger.Debugf("formatted %s %d", "message", 42)
|
||||
}
|
||||
|
||||
func TestLogger_Error_WithError(t *testing.T) {
|
||||
logger := observability.NewLoggerWithLevel("test", "ERROR")
|
||||
logger.Error("error occurred", &testErr{"test error"})
|
||||
}
|
||||
|
||||
func TestLogger_ShouldLog_Concurrent(t *testing.T) {
|
||||
logger := observability.NewLoggerWithLevel("test", "DEBUG")
|
||||
done := make(chan bool)
|
||||
for i := 0; i < 10; i++ {
|
||||
go func() {
|
||||
_ = logger.ShouldLog(observability.DEBUG)
|
||||
done <- true
|
||||
}()
|
||||
}
|
||||
for i := 0; i < 10; i++ {
|
||||
<-done
|
||||
}
|
||||
}
|
||||
|
||||
func TestLogger_Log_Concurrent(t *testing.T) {
|
||||
logger := observability.NewLoggerWithLevel("test", "DEBUG")
|
||||
done := make(chan bool)
|
||||
for i := 0; i < 10; i++ {
|
||||
go func(n int) {
|
||||
logger.Debugf("message %d", n)
|
||||
done <- true
|
||||
}(i)
|
||||
}
|
||||
for i := 0; i < 10; i++ {
|
||||
<-done
|
||||
}
|
||||
}
|
||||
|
||||
func TestLogger_WithFields_Concurrent(t *testing.T) {
|
||||
logger := observability.NewLogger("test")
|
||||
done := make(chan bool)
|
||||
for i := 0; i < 10; i++ {
|
||||
go func(n int) {
|
||||
_ = logger.WithFields(map[string]any{"key": n})
|
||||
done <- true
|
||||
}(i)
|
||||
}
|
||||
for i := 0; i < 10; i++ {
|
||||
<-done
|
||||
}
|
||||
}
|
||||
|
||||
func TestLogger_SetLevel_Concurrent(t *testing.T) {
|
||||
logger := observability.NewLogger("test")
|
||||
done := make(chan bool)
|
||||
for i := 0; i < 10; i++ {
|
||||
go func() {
|
||||
logger.SetLevel("DEBUG")
|
||||
logger.SetLevel("INFO")
|
||||
done <- true
|
||||
}()
|
||||
}
|
||||
for i := 0; i < 10; i++ {
|
||||
<-done
|
||||
}
|
||||
}
|
||||
|
||||
type testErr struct{ msg string }
|
||||
|
||||
func (e *testErr) Error() string { return e.msg }
|
||||
|
||||
func TestNewLoggerWithLevel_AllLevels(t *testing.T) {
|
||||
levels := []string{"DEBUG", "INFO", "WARN", "WARNING", "ERROR", "invalid", ""}
|
||||
for _, level := range levels {
|
||||
t.Run(level, func(t *testing.T) {
|
||||
logger := observability.NewLoggerWithLevel("test", level)
|
||||
if logger == nil {
|
||||
t.Errorf("NewLoggerWithLevel(%q) returned nil", level)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestLogLevel_Constants(t *testing.T) {
|
||||
if observability.DEBUG >= observability.INFO {
|
||||
t.Error("DEBUG should be less than INFO")
|
||||
}
|
||||
if observability.INFO >= observability.WARN {
|
||||
t.Error("INFO should be less than WARN")
|
||||
}
|
||||
if observability.WARN >= observability.ERROR {
|
||||
t.Error("WARN should be less than ERROR")
|
||||
}
|
||||
}
|
||||
|
||||
func TestLogger_ShouldLog_AllLevels(t *testing.T) {
|
||||
cases := []struct {
|
||||
minLevel string
|
||||
level observability.LogLevel
|
||||
want bool
|
||||
}{
|
||||
{"DEBUG", observability.DEBUG, true},
|
||||
{"DEBUG", observability.INFO, true},
|
||||
{"DEBUG", observability.WARN, true},
|
||||
{"DEBUG", observability.ERROR, true},
|
||||
{"INFO", observability.DEBUG, false},
|
||||
{"INFO", observability.INFO, true},
|
||||
{"INFO", observability.WARN, true},
|
||||
{"WARN", observability.INFO, false},
|
||||
{"WARN", observability.WARN, true},
|
||||
{"WARN", observability.ERROR, true},
|
||||
{"ERROR", observability.WARN, false},
|
||||
{"ERROR", observability.ERROR, true},
|
||||
}
|
||||
|
||||
for _, tc := range cases {
|
||||
t.Run(tc.minLevel+"_"+tc.level.String(), func(t *testing.T) {
|
||||
logger := observability.NewLoggerWithLevel("test", tc.minLevel)
|
||||
got := logger.ShouldLog(tc.level)
|
||||
if got != tc.want {
|
||||
t.Errorf("ShouldLog(%v) with min=%s: expected %v, got %v",
|
||||
tc.level, tc.minLevel, tc.want, got)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseLogLevel_WarningAlias(t *testing.T) {
|
||||
got := observability.ParseLogLevel("WARNING")
|
||||
if got != observability.WARN {
|
||||
t.Errorf("ParseLogLevel(WARNING) = %v, want WARN", got)
|
||||
}
|
||||
}
|
||||
|
||||
func TestLogger_Errorf_NoPanic(t *testing.T) {
|
||||
logger := observability.NewLoggerWithLevel("test", "DEBUG")
|
||||
// Errorf is not defined in the interface, but Warnf/Infof/Debugf are tested
|
||||
// Just ensure Error with a formatted message doesn't panic
|
||||
logger.Error("formatted error", &testErr{"err detail"})
|
||||
}
|
||||
|
||||
func TestNewLogger_PrefixIsUsed(t *testing.T) {
|
||||
logger := observability.NewLogger("my-prefix")
|
||||
if logger == nil {
|
||||
t.Fatal("expected non-nil logger")
|
||||
}
|
||||
// The logger should be usable
|
||||
logger.Infof("hello from %s", "my-prefix")
|
||||
}
|
||||
176
services/correlator/internal/observability/metrics.go
Normal file
176
services/correlator/internal/observability/metrics.go
Normal file
@ -0,0 +1,176 @@
|
||||
package observability
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"strings"
|
||||
"sync"
|
||||
"sync/atomic"
|
||||
)
|
||||
|
||||
// CorrelationMetrics tracks correlation statistics for debugging and monitoring.
|
||||
type CorrelationMetrics struct {
|
||||
mu sync.RWMutex
|
||||
|
||||
// Events received
|
||||
eventsReceivedA atomic.Int64
|
||||
eventsReceivedB atomic.Int64
|
||||
|
||||
// Correlation results
|
||||
correlationsSuccess atomic.Int64
|
||||
correlationsFailed atomic.Int64
|
||||
|
||||
// Failure reasons
|
||||
failedNoMatchKey atomic.Int64 // No event with same key in buffer
|
||||
failedTimeWindow atomic.Int64 // Key found but outside time window
|
||||
failedBufferEviction atomic.Int64 // Event evicted due to buffer full
|
||||
failedTTLExpired atomic.Int64 // B event TTL expired before match
|
||||
failedIPExcluded atomic.Int64 // Event excluded by IP filter
|
||||
|
||||
// Buffer stats
|
||||
bufferASize atomic.Int64
|
||||
bufferBSize atomic.Int64
|
||||
|
||||
// Orphan stats
|
||||
orphansEmittedA atomic.Int64
|
||||
orphansEmittedB atomic.Int64
|
||||
orphansPendingA atomic.Int64
|
||||
pendingOrphanMatch atomic.Int64 // B matched with pending orphan A
|
||||
|
||||
// Keep-Alive stats
|
||||
keepAliveResets atomic.Int64 // Number of TTL resets (one-to-many mode)
|
||||
}
|
||||
|
||||
// NewCorrelationMetrics creates a new metrics tracker.
|
||||
func NewCorrelationMetrics() *CorrelationMetrics {
|
||||
return &CorrelationMetrics{}
|
||||
}
|
||||
|
||||
// RecordEventReceived records an event received from a source.
|
||||
func (m *CorrelationMetrics) RecordEventReceived(source string) {
|
||||
if source == "A" {
|
||||
m.eventsReceivedA.Add(1)
|
||||
} else if source == "B" {
|
||||
m.eventsReceivedB.Add(1)
|
||||
}
|
||||
}
|
||||
|
||||
// RecordCorrelationSuccess records a successful correlation.
|
||||
func (m *CorrelationMetrics) RecordCorrelationSuccess() {
|
||||
m.correlationsSuccess.Add(1)
|
||||
}
|
||||
|
||||
// RecordCorrelationFailed records a failed correlation attempt with the reason.
|
||||
func (m *CorrelationMetrics) RecordCorrelationFailed(reason string) {
|
||||
m.correlationsFailed.Add(1)
|
||||
switch reason {
|
||||
case "no_match_key":
|
||||
m.failedNoMatchKey.Add(1)
|
||||
case "time_window":
|
||||
m.failedTimeWindow.Add(1)
|
||||
case "buffer_eviction":
|
||||
m.failedBufferEviction.Add(1)
|
||||
case "ttl_expired":
|
||||
m.failedTTLExpired.Add(1)
|
||||
case "ip_excluded":
|
||||
m.failedIPExcluded.Add(1)
|
||||
}
|
||||
}
|
||||
|
||||
// RecordBufferEviction records an event evicted from buffer.
|
||||
func (m *CorrelationMetrics) RecordBufferEviction(source string) {
|
||||
// Can be used for additional tracking if needed
|
||||
}
|
||||
|
||||
// RecordOrphanEmitted records an orphan event emitted.
|
||||
func (m *CorrelationMetrics) RecordOrphanEmitted(source string) {
|
||||
if source == "A" {
|
||||
m.orphansEmittedA.Add(1)
|
||||
} else if source == "B" {
|
||||
m.orphansEmittedB.Add(1)
|
||||
}
|
||||
}
|
||||
|
||||
// RecordPendingOrphan records an A event added to pending orphans.
|
||||
func (m *CorrelationMetrics) RecordPendingOrphan() {
|
||||
m.orphansPendingA.Add(1)
|
||||
}
|
||||
|
||||
// RecordPendingOrphanMatch records a B event matching a pending orphan A.
|
||||
func (m *CorrelationMetrics) RecordPendingOrphanMatch() {
|
||||
m.pendingOrphanMatch.Add(1)
|
||||
}
|
||||
|
||||
// RecordKeepAliveReset records a TTL reset for Keep-Alive.
|
||||
func (m *CorrelationMetrics) RecordKeepAliveReset() {
|
||||
m.keepAliveResets.Add(1)
|
||||
}
|
||||
|
||||
// UpdateBufferSizes updates the current buffer sizes.
|
||||
func (m *CorrelationMetrics) UpdateBufferSizes(sizeA, sizeB int64) {
|
||||
m.bufferASize.Store(sizeA)
|
||||
m.bufferBSize.Store(sizeB)
|
||||
}
|
||||
|
||||
// Snapshot returns a point-in-time snapshot of all metrics.
|
||||
func (m *CorrelationMetrics) Snapshot() MetricsSnapshot {
|
||||
return MetricsSnapshot{
|
||||
EventsReceivedA: m.eventsReceivedA.Load(),
|
||||
EventsReceivedB: m.eventsReceivedB.Load(),
|
||||
CorrelationsSuccess: m.correlationsSuccess.Load(),
|
||||
CorrelationsFailed: m.correlationsFailed.Load(),
|
||||
FailedNoMatchKey: m.failedNoMatchKey.Load(),
|
||||
FailedTimeWindow: m.failedTimeWindow.Load(),
|
||||
FailedBufferEviction: m.failedBufferEviction.Load(),
|
||||
FailedTTLExpired: m.failedTTLExpired.Load(),
|
||||
FailedIPExcluded: m.failedIPExcluded.Load(),
|
||||
BufferASize: m.bufferASize.Load(),
|
||||
BufferBSize: m.bufferBSize.Load(),
|
||||
OrphansEmittedA: m.orphansEmittedA.Load(),
|
||||
OrphansEmittedB: m.orphansEmittedB.Load(),
|
||||
OrphansPendingA: m.orphansPendingA.Load(),
|
||||
PendingOrphanMatch: m.pendingOrphanMatch.Load(),
|
||||
KeepAliveResets: m.keepAliveResets.Load(),
|
||||
}
|
||||
}
|
||||
|
||||
// MetricsSnapshot is a point-in-time snapshot of metrics.
|
||||
type MetricsSnapshot struct {
|
||||
EventsReceivedA int64 `json:"events_received_a"`
|
||||
EventsReceivedB int64 `json:"events_received_b"`
|
||||
CorrelationsSuccess int64 `json:"correlations_success"`
|
||||
CorrelationsFailed int64 `json:"correlations_failed"`
|
||||
FailedNoMatchKey int64 `json:"failed_no_match_key"`
|
||||
FailedTimeWindow int64 `json:"failed_time_window"`
|
||||
FailedBufferEviction int64 `json:"failed_buffer_eviction"`
|
||||
FailedTTLExpired int64 `json:"failed_ttl_expired"`
|
||||
FailedIPExcluded int64 `json:"failed_ip_excluded"`
|
||||
BufferASize int64 `json:"buffer_a_size"`
|
||||
BufferBSize int64 `json:"buffer_b_size"`
|
||||
OrphansEmittedA int64 `json:"orphans_emitted_a"`
|
||||
OrphansEmittedB int64 `json:"orphans_emitted_b"`
|
||||
OrphansPendingA int64 `json:"orphans_pending_a"`
|
||||
PendingOrphanMatch int64 `json:"pending_orphan_match"`
|
||||
KeepAliveResets int64 `json:"keepalive_resets"`
|
||||
}
|
||||
|
||||
// MarshalJSON implements json.Marshaler.
|
||||
func (m *CorrelationMetrics) MarshalJSON() ([]byte, error) {
|
||||
return json.Marshal(m.Snapshot())
|
||||
}
|
||||
|
||||
// String returns a human-readable string of metrics.
|
||||
func (m *CorrelationMetrics) String() string {
|
||||
s := m.Snapshot()
|
||||
var b strings.Builder
|
||||
b.WriteString("Correlation Metrics:\n")
|
||||
fmt.Fprintf(&b, " Events Received: A=%d B=%d Total=%d\n", s.EventsReceivedA, s.EventsReceivedB, s.EventsReceivedA+s.EventsReceivedB)
|
||||
fmt.Fprintf(&b, " Correlations: Success=%d Failed=%d\n", s.CorrelationsSuccess, s.CorrelationsFailed)
|
||||
fmt.Fprintf(&b, " Failure Reasons: no_match_key=%d time_window=%d buffer_eviction=%d ttl_expired=%d ip_excluded=%d\n",
|
||||
s.FailedNoMatchKey, s.FailedTimeWindow, s.FailedBufferEviction, s.FailedTTLExpired, s.FailedIPExcluded)
|
||||
fmt.Fprintf(&b, " Buffer Sizes: A=%d B=%d\n", s.BufferASize, s.BufferBSize)
|
||||
fmt.Fprintf(&b, " Orphans: Emitted A=%d B=%d Pending A=%d\n", s.OrphansEmittedA, s.OrphansEmittedB, s.OrphansPendingA)
|
||||
fmt.Fprintf(&b, " Pending Orphan Match: %d\n", s.PendingOrphanMatch)
|
||||
fmt.Fprintf(&b, " Keep-Alive Resets: %d\n", s.KeepAliveResets)
|
||||
return b.String()
|
||||
}
|
||||
128
services/correlator/internal/observability/metrics_server.go
Normal file
128
services/correlator/internal/observability/metrics_server.go
Normal file
@ -0,0 +1,128 @@
|
||||
package observability
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"net"
|
||||
"net/http"
|
||||
"sync"
|
||||
"time"
|
||||
)
|
||||
|
||||
// MetricsServer exposes correlation metrics via HTTP.
|
||||
type MetricsServer struct {
|
||||
mu sync.Mutex
|
||||
server *http.Server
|
||||
listener net.Listener
|
||||
metricsFunc func() MetricsSnapshot
|
||||
running bool
|
||||
}
|
||||
|
||||
// NewMetricsServer creates a new metrics HTTP server.
|
||||
func NewMetricsServer(addr string, metricsFunc func() MetricsSnapshot) (*MetricsServer, error) {
|
||||
if metricsFunc == nil {
|
||||
return nil, fmt.Errorf("metricsFunc cannot be nil")
|
||||
}
|
||||
|
||||
ms := &MetricsServer{
|
||||
metricsFunc: metricsFunc,
|
||||
}
|
||||
|
||||
mux := http.NewServeMux()
|
||||
mux.HandleFunc("/metrics", ms.handleMetrics)
|
||||
mux.HandleFunc("/health", ms.handleHealth)
|
||||
|
||||
ms.server = &http.Server{
|
||||
Addr: addr,
|
||||
Handler: mux,
|
||||
ReadTimeout: 5 * time.Second,
|
||||
WriteTimeout: 10 * time.Second,
|
||||
}
|
||||
|
||||
return ms, nil
|
||||
}
|
||||
|
||||
// Start begins listening on the configured address.
|
||||
func (ms *MetricsServer) Start() error {
|
||||
ms.mu.Lock()
|
||||
defer ms.mu.Unlock()
|
||||
|
||||
if ms.running {
|
||||
return nil
|
||||
}
|
||||
|
||||
listener, err := net.Listen("tcp", ms.server.Addr)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to start metrics server: %w", err)
|
||||
}
|
||||
|
||||
ms.listener = listener
|
||||
ms.running = true
|
||||
|
||||
go func() {
|
||||
if err := ms.server.Serve(listener); err != nil && err != http.ErrServerClosed {
|
||||
// Server error or closed
|
||||
}
|
||||
}()
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// Stop gracefully stops the metrics server.
|
||||
func (ms *MetricsServer) Stop(ctx context.Context) error {
|
||||
ms.mu.Lock()
|
||||
defer ms.mu.Unlock()
|
||||
|
||||
if !ms.running {
|
||||
return nil
|
||||
}
|
||||
|
||||
ms.running = false
|
||||
return ms.server.Shutdown(ctx)
|
||||
}
|
||||
|
||||
// handleMetrics returns the correlation metrics as JSON.
|
||||
func (ms *MetricsServer) handleMetrics(w http.ResponseWriter, r *http.Request) {
|
||||
if r.Method != http.MethodGet {
|
||||
http.Error(w, "Method not allowed", http.StatusMethodNotAllowed)
|
||||
return
|
||||
}
|
||||
|
||||
metrics := ms.metricsFunc()
|
||||
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
if err := json.NewEncoder(w).Encode(metrics); err != nil {
|
||||
http.Error(w, "Failed to encode metrics", http.StatusInternalServerError)
|
||||
return
|
||||
}
|
||||
}
|
||||
|
||||
// handleHealth returns a simple health check response.
|
||||
func (ms *MetricsServer) handleHealth(w http.ResponseWriter, r *http.Request) {
|
||||
if r.Method != http.MethodGet {
|
||||
http.Error(w, "Method not allowed", http.StatusMethodNotAllowed)
|
||||
return
|
||||
}
|
||||
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
w.WriteHeader(http.StatusOK)
|
||||
fmt.Fprintf(w, `{"status":"healthy"}`)
|
||||
}
|
||||
|
||||
// IsRunning returns true if the server is running.
|
||||
func (ms *MetricsServer) IsRunning() bool {
|
||||
ms.mu.Lock()
|
||||
defer ms.mu.Unlock()
|
||||
return ms.running
|
||||
}
|
||||
|
||||
// Addr returns the listening address.
|
||||
func (ms *MetricsServer) Addr() string {
|
||||
ms.mu.Lock()
|
||||
defer ms.mu.Unlock()
|
||||
if ms.listener == nil {
|
||||
return ""
|
||||
}
|
||||
return ms.listener.Addr().String()
|
||||
}
|
||||
57
services/correlator/internal/ports/source.go
Normal file
57
services/correlator/internal/ports/source.go
Normal file
@ -0,0 +1,57 @@
|
||||
package ports
|
||||
|
||||
import (
|
||||
"context"
|
||||
|
||||
"github.com/antitbone/ja4/correlator/internal/domain"
|
||||
)
|
||||
|
||||
// EventSource defines the interface for log sources.
|
||||
type EventSource interface {
|
||||
// Start begins reading events and sending them to the channel.
|
||||
// Returns an error if the source cannot be started.
|
||||
Start(ctx context.Context, eventChan chan<- *domain.NormalizedEvent) error
|
||||
|
||||
// Stop gracefully stops the source.
|
||||
Stop() error
|
||||
|
||||
// Name returns the source name.
|
||||
Name() string
|
||||
}
|
||||
|
||||
// CorrelatedLogSink defines the interface for correlated log destinations.
|
||||
type CorrelatedLogSink interface {
|
||||
// Write sends a correlated log to the sink.
|
||||
Write(ctx context.Context, log domain.CorrelatedLog) error
|
||||
|
||||
// Flush flushes any buffered logs.
|
||||
Flush(ctx context.Context) error
|
||||
|
||||
// Close closes the sink.
|
||||
Close() error
|
||||
|
||||
// Name returns the sink name.
|
||||
Name() string
|
||||
|
||||
// Reopen closes and reopens the sink (for log rotation on SIGHUP).
|
||||
// Optional: only FileSink implements this.
|
||||
Reopen() error
|
||||
}
|
||||
|
||||
// CorrelationProcessor defines the interface for the correlation service.
|
||||
// This allows for easier testing and alternative implementations.
|
||||
type CorrelationProcessor interface {
|
||||
// ProcessEvent processes an incoming event and returns correlated logs.
|
||||
ProcessEvent(event *domain.NormalizedEvent) []domain.CorrelatedLog
|
||||
|
||||
// Flush forces emission of remaining buffered events.
|
||||
Flush() []domain.CorrelatedLog
|
||||
|
||||
// EmitPendingOrphans emits orphan A events whose delay has expired.
|
||||
// Called periodically by the Orchestrator ticker so orphans are not blocked
|
||||
// waiting for the next incoming event.
|
||||
EmitPendingOrphans() []domain.CorrelatedLog
|
||||
|
||||
// GetBufferSizes returns the current buffer sizes for monitoring.
|
||||
GetBufferSizes() (int, int)
|
||||
}
|
||||
34
services/correlator/logcorrelator.service
Normal file
34
services/correlator/logcorrelator.service
Normal file
@ -0,0 +1,34 @@
|
||||
[Unit]
|
||||
Description=logcorrelator service
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=logcorrelator
|
||||
Group=logcorrelator
|
||||
ExecStart=/usr/bin/logcorrelator -config /etc/logcorrelator/logcorrelator.yml
|
||||
ExecReload=/bin/kill -HUP $MAINPID
|
||||
Restart=on-failure
|
||||
RestartSec=5
|
||||
|
||||
# Runtime directory: systemd crée /run/logcorrelator (= /var/run/logcorrelator)
|
||||
# avec le bon propriétaire (logcorrelator:logcorrelator) à chaque démarrage/restart,
|
||||
# ce qui évite que les sockets se retrouvent en root:root après un reboot (tmpfs vidé).
|
||||
RuntimeDirectory=logcorrelator
|
||||
RuntimeDirectoryMode=0755
|
||||
|
||||
# Security hardening
|
||||
NoNewPrivileges=true
|
||||
ProtectSystem=strict
|
||||
ProtectHome=true
|
||||
ReadWritePaths=/var/log/logcorrelator /etc/logcorrelator
|
||||
|
||||
# Resource limits
|
||||
LimitNOFILE=65536
|
||||
|
||||
# Systemd timeouts
|
||||
TimeoutStartSec=10
|
||||
TimeoutStopSec=30
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
383
services/correlator/packaging/rpm/logcorrelator.spec
Normal file
383
services/correlator/packaging/rpm/logcorrelator.spec
Normal file
@ -0,0 +1,383 @@
|
||||
# logcorrelator RPM spec file
|
||||
# Compatible with CentOS 7, Rocky Linux 8, 9, 10
|
||||
# Built with rpmbuild (not FPM)
|
||||
|
||||
Name: logcorrelator
|
||||
Version: %{version}
|
||||
Release: 1%{?dist}
|
||||
Summary: Log correlation service for HTTP and network events
|
||||
|
||||
License: MIT
|
||||
URL: https://github.com/logcorrelator/logcorrelator
|
||||
Vendor: logcorrelator <dev@example.com>
|
||||
Packager: logcorrelator <dev@example.com>
|
||||
|
||||
BuildArch: x86_64
|
||||
|
||||
# Dependencies
|
||||
Requires: systemd
|
||||
Requires(post): systemd
|
||||
Requires(preun): systemd
|
||||
Requires(postun): systemd
|
||||
|
||||
%description
|
||||
logcorrelator est un service système écrit en Go qui reçoit deux flux de logs JSON
|
||||
via des sockets Unix, corrèle les événements HTTP applicatifs avec des événements
|
||||
réseau, et produit des logs corrélés en temps réel vers ClickHouse et/ou fichier local.
|
||||
|
||||
Notes de sécurité :
|
||||
- Le service s'exécute sous l'utilisateur logcorrelator (non-root)
|
||||
- Les sockets Unix sont créés avec des permissions 0666 (world read/write)
|
||||
- Les répertoires critiques sont protégés : /var/log (750), /var/lib (750), /etc (750)
|
||||
- /var/run/logcorrelator est en 755 pour permettre la création de sockets
|
||||
|
||||
%prep
|
||||
# Files are already in BUILD directory (copied by build-rpm.sh)
|
||||
# No extraction needed
|
||||
echo "Files available in BUILD directory:"
|
||||
ls -la %{_builddir}/
|
||||
|
||||
%install
|
||||
# Create directory structure in buildroot
|
||||
mkdir -p %{buildroot}/usr/bin
|
||||
mkdir -p %{buildroot}/etc/logcorrelator
|
||||
mkdir -p %{buildroot}/var/log/logcorrelator
|
||||
mkdir -p %{buildroot}/var/run/logcorrelator
|
||||
mkdir -p %{buildroot}/var/lib/logcorrelator
|
||||
mkdir -p %{buildroot}/etc/systemd/system
|
||||
mkdir -p %{buildroot}/etc/logrotate.d
|
||||
mkdir -p %{buildroot}/usr/lib/tmpfiles.d
|
||||
|
||||
# Install binary (from BUILD directory)
|
||||
install -m 0755 %{_builddir}/usr/bin/logcorrelator %{buildroot}/usr/bin/logcorrelator
|
||||
|
||||
# Install config files
|
||||
install -m 0640 %{_builddir}/etc/logcorrelator/logcorrelator.yml %{buildroot}/etc/logcorrelator/logcorrelator.yml
|
||||
install -m 0640 %{_builddir}/etc/logcorrelator/logcorrelator.yml.example %{buildroot}/etc/logcorrelator/logcorrelator.yml.example
|
||||
|
||||
# Install systemd service
|
||||
install -m 0644 %{_builddir}/etc/systemd/system/logcorrelator.service %{buildroot}/etc/systemd/system/logcorrelator.service
|
||||
|
||||
# Install logrotate config
|
||||
install -m 0644 %{_builddir}/etc/logrotate.d/logcorrelator %{buildroot}/etc/logrotate.d/logcorrelator
|
||||
|
||||
%post
|
||||
# Create logcorrelator user and group
|
||||
if ! getent group logcorrelator >/dev/null 2>&1; then
|
||||
groupadd --system logcorrelator
|
||||
fi
|
||||
|
||||
if ! getent passwd logcorrelator >/dev/null 2>&1; then
|
||||
useradd --system \
|
||||
--gid logcorrelator \
|
||||
--home-dir /var/lib/logcorrelator \
|
||||
--no-create-home \
|
||||
--shell /usr/sbin/nologin \
|
||||
logcorrelator
|
||||
fi
|
||||
|
||||
# Create directories
|
||||
mkdir -p /var/lib/logcorrelator
|
||||
mkdir -p /var/log/logcorrelator
|
||||
# Note: /var/run/logcorrelator est géré par RuntimeDirectory= (systemd) et tmpfiles.d
|
||||
|
||||
# Set ownership
|
||||
chown -R logcorrelator:logcorrelator /var/lib/logcorrelator
|
||||
chown -R logcorrelator:logcorrelator /var/log/logcorrelator
|
||||
chown -R logcorrelator:logcorrelator /etc/logcorrelator
|
||||
|
||||
# Set permissions
|
||||
chmod 750 /var/lib/logcorrelator
|
||||
chmod 750 /var/log/logcorrelator
|
||||
chmod 750 /etc/logcorrelator
|
||||
|
||||
# Copy default config if not exists
|
||||
if [ ! -f /etc/logcorrelator/logcorrelator.yml ]; then
|
||||
cp /etc/logcorrelator/logcorrelator.yml.example /etc/logcorrelator/logcorrelator.yml
|
||||
chown logcorrelator:logcorrelator /etc/logcorrelator/logcorrelator.yml
|
||||
chmod 640 /etc/logcorrelator/logcorrelator.yml
|
||||
fi
|
||||
|
||||
# Reload systemd and start service
|
||||
if [ -x /bin/systemctl ]; then
|
||||
systemctl daemon-reload
|
||||
systemctl enable logcorrelator.service
|
||||
systemctl start logcorrelator.service
|
||||
fi
|
||||
|
||||
exit 0
|
||||
|
||||
%preun
|
||||
if [ $1 -eq 0 ]; then
|
||||
# Package removal, not upgrade
|
||||
if [ -x /bin/systemctl ]; then
|
||||
systemctl stop logcorrelator.service
|
||||
systemctl disable logcorrelator.service
|
||||
fi
|
||||
fi
|
||||
|
||||
exit 0
|
||||
|
||||
%postun
|
||||
if [ -x /bin/systemctl ]; then
|
||||
systemctl daemon-reload
|
||||
if [ $1 -ge 1 ]; then
|
||||
# Package upgrade, restart service
|
||||
systemctl try-restart logcorrelator.service
|
||||
fi
|
||||
fi
|
||||
|
||||
exit 0
|
||||
|
||||
%files
|
||||
/usr/bin/logcorrelator
|
||||
%config(noreplace) /etc/logcorrelator/logcorrelator.yml
|
||||
/etc/logcorrelator/logcorrelator.yml.example
|
||||
/var/log/logcorrelator
|
||||
/var/lib/logcorrelator
|
||||
/etc/systemd/system/logcorrelator.service
|
||||
%config(noreplace) /etc/logrotate.d/logcorrelator
|
||||
|
||||
%changelog
|
||||
* Wed Mar 11 2026 logcorrelator <dev@example.com> - 1.1.22-1
|
||||
- Feat(outputs): file output enabled/disabled toggle
|
||||
Ajout du champ enabled: true/false dans outputs.file de la configuration.
|
||||
Le sink fichier n'est cree que si enabled: true ET path: defini.
|
||||
Permet de desactiver completement la sortie fichier tout en gardant stdout/clickhouse.
|
||||
Tests: TestValidate_FileOutputDisabled, TestLoadConfig_FileOutputDisabled
|
||||
|
||||
- Fix(systemd): arret immediat sans vidage de queue
|
||||
orchestrator.Stop() ne vide plus les buffers (events en transit perdus).
|
||||
Suppression de ShutdownTimeout et de la logique de flush/attente.
|
||||
systemd TimeoutStopSec=30 gere l'arret force si besoin.
|
||||
Simplification: cancel() + Close() uniquement.
|
||||
|
||||
- Feat(sql): TTL et compression ZSTD sur tables ClickHouse
|
||||
http_logs_raw: TTL 1 jour, compression ZSTD sur raw_json
|
||||
http_logs: TTL 7 jours, compression ZSTD sur champs texte volumineux
|
||||
Parametre ttl_only_drop_parts = 1 pour optimiser les suppressions
|
||||
|
||||
* Mon Mar 09 2026 logcorrelator <dev@example.com> - 1.1.21-1
|
||||
- Update: vues ClickHouse et schema SQL
|
||||
Ajout de bots.sql pour l'identification des bots (User-Agent parsing)
|
||||
Ajout de tables.sql pour les tables de reference
|
||||
Mise a jour de mv1.sql (vue materialisee) avec nouvelle structure de correlation
|
||||
Documentation views.md enrichie avec exemples de requetes et schema complet
|
||||
|
||||
* Mon Mar 09 2026 logcorrelator <dev@example.com> - 1.1.20-1
|
||||
- Fix(rpm): suppression de systemd-tmpfiles.conf redondant
|
||||
RuntimeDirectory=logcorrelator dans le service systemd gere deja /run/logcorrelator
|
||||
automatiquement. La commande systemd-tmpfiles --create causait des erreurs sur
|
||||
les systemes avec /var/lib/mysql existant (fichier au lieu de repertoire).
|
||||
Suppression de /usr/lib/tmpfiles.d/logcorrelator.conf et de systemd-tmpfiles --create.
|
||||
|
||||
* Mon Mar 09 2026 logcorrelator <dev@example.com> - 1.1.19-1
|
||||
- Fix(systemd): stop/restart immediat sans attendre vidage queue
|
||||
L'arret du service ne vide plus les buffers (events en transit perdus).
|
||||
systemd TimeoutStopSec=30 gere deja l'arret force si besoin.
|
||||
Simplification de orchestrator.Stop() : cancel() + Close() uniquement.
|
||||
Suppression de ShutdownTimeout devenu inutile.
|
||||
|
||||
* Mon Mar 09 2026 logcorrelator <dev@example.com> - 1.1.18-1
|
||||
- Fix(outputs): file output enabled: false ne coupait pas l ecriture du fichier
|
||||
Le champ Enabled manquait dans FileOutputConfig. Le sink fichier etait cree
|
||||
meme avec enabled: false tant que path etait defini. Desormais, la condition
|
||||
verifie explicitement enabled && path != "" dans main.go et Validate().
|
||||
Test: TestValidate_FileOutputDisabled et TestLoadConfig_FileOutputDisabled ajoutes.
|
||||
|
||||
* Fri Mar 06 2026 logcorrelator <dev@example.com> - 1.1.17-1
|
||||
- Fix(correlation): champ keepalives non peuple dans ClickHouse
|
||||
Le champ KeepAliveSeq de NormalizedEvent n'etait pas transfere dans les Fields
|
||||
de CorrelatedLog. La vue materialisee ClickHouse extrayait keepalives du JSON
|
||||
mais trouvait toujours 0. Desormais, NewCorrelatedLog et NewCorrelatedLogFromEvent
|
||||
ajoutent explicitement keepalives = KeepAliveSeq dans les Fields.
|
||||
|
||||
* Fri Mar 06 2026 logcorrelator <dev@example.com> - 1.1.16-1
|
||||
- Feat(correlation): emettre les evenements A filtrés par include_dest_ports vers ClickHouse
|
||||
Quand un evenement A (HTTP) etait exclu par le filtre include_dest_ports, il etait
|
||||
silencieusement ignore. Desormais, si ApacheAlwaysEmit=true, l evenement est emis comme
|
||||
non-correle (orphan_side=A) afin d apparaitre dans ClickHouse. Les evenements B restent
|
||||
ignores. Test: TestCorrelationService_IncludeDestPorts_FilteredPort mis a jour +
|
||||
TestCorrelationService_IncludeDestPorts_FilteredPort_NoAlwaysEmit ajoute.
|
||||
|
||||
* Thu Mar 05 2026 logcorrelator <dev@example.com> - 1.1.15-1
|
||||
- Fix(correlation/bug3): perte de donnees quand B expire avec des orphelins en attente
|
||||
cleanNetworkBufferByTTL supprimait les pendingOrphans sans les emettre (perte silencieuse).
|
||||
Desormais, les orphelins A sont retournes immediatement a l'appelant quand B expire,
|
||||
et cleanExpired/ProcessEvent propagent ces resultats vers le sink.
|
||||
Test: TestBTTLExpiry_PurgesPendingOrphans etendu pour verifier l'emission effective.
|
||||
|
||||
* Thu Mar 05 2026 logcorrelator <dev@example.com> - 1.1.14-1
|
||||
- Fix(correlation/bug1): Keep-Alive sessions au-dela de TimeWindow ne correlent plus en orphelins
|
||||
Le matcher dans processSourceA utilisait eventsMatch (comparaison de timestamps) en mode
|
||||
one_to_many. Apres ~10s, B.Timestamp_original depasse la TimeWindow et toutes les requetes
|
||||
suivantes devenaient orphelines. Nouveau matcher bEventHasValidTTL : un B event est valide
|
||||
tant que son TTL n'a pas expire (le TTL est reset a chaque correlation Keep-Alive).
|
||||
- Fix(correlation/bug4): checkPendingOrphansForCorrelation utilisait eventsMatch (meme bug)
|
||||
En mode one_to_many, un B arrivant avec un vieux timestamp ne matchait plus les pending orphans
|
||||
pour la meme cle. Remplace par une verification de cle uniquement (meme cle = meme connexion).
|
||||
- Fix(correlation/bug3): pendingOrphans non purges quand le B expire (cleanNetworkBufferByTTL)
|
||||
Quand un B event expirait (TTL), les pending orphan A associes etaient bloques indefiniment.
|
||||
Ils sont desormais emis immediatement lors de l'expiration du B correspondant.
|
||||
- Fix(correlation/bug2): orphans emis uniquement sur reception d'evenement (pas de timer dedie)
|
||||
EmitPendingOrphans() est maintenant une methode publique thread-safe. L'Orchestrateur
|
||||
demarre un goroutine ticker (250ms) qui appelle EmitPendingOrphans() independamment du flux,
|
||||
garantissant l'emission meme en l'absence de nouveaux evenements.
|
||||
- Feat(ports): ajout de EmitPendingOrphans() dans l'interface CorrelationProcessor
|
||||
- Test: 4 nouveaux tests de non-regression (Bug #1, #2, #3, #4)
|
||||
|
||||
* Thu Mar 05 2026 logcorrelator <dev@example.com> - 1.1.13-1
|
||||
- Fix: Unix sockets ne passent plus en root:root lors des restarts du service
|
||||
- Fix: Ajout de RuntimeDirectory=logcorrelator dans le service systemd (systemd gère /run/logcorrelator avec le bon propriétaire à chaque démarrage/restart)
|
||||
- Fix: Ajout de /usr/lib/tmpfiles.d/logcorrelator.conf pour recréer /run/logcorrelator au boot
|
||||
- Chore: Retrait de /var/run/logcorrelator du RPM %files (géré par tmpfiles.d)
|
||||
- Fix(correlation): emitPendingOrphans - corruption de slice lors de l expiration simultanée de plusieurs orphelins pour la même clé (slice aliasing bug, émissions en double)
|
||||
- Fix(correlation): rotateOldestA - l événement rotaté était perdu silencieusement même avec ApacheAlwaysEmit=true (retourne désormais le CorrelatedLog)
|
||||
- Fix(correlation): Keep-Alive cassé dans le chemin pending-orphan-then-B - le B event n était pas bufferisé en mode one_to_many, bloquant la corrélation des requêtes A2+ du même Keep-Alive
|
||||
- Chore(correlation): suppression du champ mort timer *time.Timer dans pendingOrphan
|
||||
- Feat(correlation): ajout de keepalive_seq dans les logs orphelins pour faciliter le debug (numéro de requête dans la connexion Keep-Alive, 1-based)
|
||||
- Test: 4 nouveaux tests de non-régression pour les bugs de corrélation
|
||||
|
||||
* Thu Mar 05 2026 logcorrelator <dev@example.com> - 1.1.12-1
|
||||
- Feat: New config directive include_dest_ports - restrict correlation to specific destination ports
|
||||
- Feat: If include_dest_ports is non-empty, events on unlisted ports are silently ignored (not correlated, not emitted as orphan)
|
||||
- Feat: New metric failed_dest_port_filtered for monitoring filtered traffic
|
||||
- Feat: Debug log for filtered events: "event excluded by dest port filter: source=A dst_port=22"
|
||||
- Test: New unit tests for include_dest_ports (allowed port, filtered port, empty=all)
|
||||
- Docs: README.md updated with include_dest_ports section and current version references
|
||||
- Docs: architecture.yml updated with include_dest_ports
|
||||
- Fix: config.example.yml - removed obsolete stdout.level field
|
||||
|
||||
* Thu Mar 05 2026 logcorrelator <dev@example.com> - 1.1.11-1
|
||||
- Fix: StdoutSink no longer writes correlated/orphan JSON to stdout
|
||||
- Fix: stdout sink is now a no-op for data; operational logs go to stderr via logger
|
||||
- Fix: ClickHouse sink had no logger - all flush errors were silently discarded
|
||||
- Fix: Periodic, batch and final-close flush errors are now logged at ERROR level
|
||||
- Fix: Buffer overflow with DropOnOverflow=true is now logged at WARN level
|
||||
- Fix: Retry attempts are now logged at WARN level with attempt number, delay and error
|
||||
- Feat: ClickHouse connection success logged at INFO (table, batch_size, flush_interval_ms)
|
||||
- Feat: Successful batch sends logged at DEBUG (rows count, table)
|
||||
- Feat: SetLogger() method added to ClickHouseSink for external logger injection
|
||||
- Test: New unit tests for StdoutSink asserting stdout remains empty for all log types
|
||||
|
||||
* Wed Mar 04 2026 logcorrelator <dev@example.com> - 1.1.10-1
|
||||
- Feat: IP exclusion filter - exclude specific source IPs or CIDR ranges
|
||||
- Feat: Configuration exclude_source_ips supports single IPs and CIDR notation
|
||||
- Feat: Debug logging for excluded IPs
|
||||
- Feat: New metric failed_ip_excluded for monitoring filtered traffic
|
||||
- Feat: Architecture documentation updated with observability section
|
||||
- Use cases: exclude health checks, internal traffic, known bad actors
|
||||
- Docs: README.md updated with IP exclusion documentation
|
||||
- Docs: architecture.yml updated with metrics and troubleshooting guide
|
||||
|
||||
* Wed Mar 04 2026 logcorrelator <dev@example.com> - 1.1.9-1
|
||||
- Feat: Debug logging - detailed DEBUG logs for correlation troubleshooting
|
||||
- Feat: Correlation metrics server (HTTP endpoint /metrics and /health)
|
||||
- Feat: New metrics: events_received, correlations_success/failed, failure reasons
|
||||
- Feat: Failure reason tracking: no_match_key, time_window, buffer_eviction, ttl_expired
|
||||
- Feat: Buffer size monitoring (buffer_a_size, buffer_b_size)
|
||||
- Feat: Orphan tracking (orphans_emitted, orphans_pending, pending_orphan_match)
|
||||
- Feat: Keep-Alive reset counter for connection tracking
|
||||
- Feat: Test scripts added (test-correlation.sh, test-correlation-advanced.py)
|
||||
- Change: Config example updated with metrics section
|
||||
- Docs: README.md updated with debugging guide and troubleshooting table
|
||||
|
||||
* Tue Mar 03 2026 logcorrelator <dev@example.com> - 1.1.8-1
|
||||
- Migrated from FPM to rpmbuild (native RPM build)
|
||||
- Reduced build image size by 200MB (-40%)
|
||||
- Removed FPM gem dependency (use rpmbuild directly)
|
||||
- Scripts post/preun/postun now inline in spec file
|
||||
- Build image: rockylinux:8 instead of ruby:3.2-bookworm
|
||||
|
||||
* Tue Mar 03 2026 logcorrelator <dev@example.com> - 1.1.7-1
|
||||
- Fix: Critical Keep-Alive bug - network events evicted based on original timestamp instead of reset TTL
|
||||
- Fix: Correlation time window increased from 1s to 10s for HTTP Keep-Alive support
|
||||
- Fix: Network source now uses payload timestamp if available (fallback to reception time)
|
||||
- Change: Default network TTL increased from 30s to 120s for long Keep-Alive sessions
|
||||
- Test: Added comprehensive Keep-Alive tests (TTL reset, long session scenarios)
|
||||
|
||||
* Tue Mar 03 2026 logcorrelator <dev@example.com> - 1.1.6-1
|
||||
- Docs: Update ClickHouse schema documentation (http_logs_raw + http_logs tables)
|
||||
- Fix: ClickHouse insertion uses single raw_json column (FORMAT JSONEachRow)
|
||||
- Fix: ClickHouse native API (clickhouse-go/v2 PrepareBatch + Append + Send)
|
||||
|
||||
* Tue Mar 03 2026 logcorrelator <dev@example.com> - 1.1.5-1
|
||||
- Fix: ClickHouse insertion using native clickhouse-go/v2 API (PrepareBatch + Append + Send)
|
||||
- Fix: Replaced database/sql wrapper with clickhouse.Open() and clickhouse.Conn
|
||||
- Fix: Proper batch sending to avoid ATTEMPT_TO_READ_AFTER_EOF errors
|
||||
- Fix: Set correct permissions (755) on /var/run/logcorrelator in RPM post-install
|
||||
|
||||
* Mon Mar 02 2026 logcorrelator <dev@example.com> - 1.1.4-1
|
||||
- Fix: Log raw JSON data on parse errors for debugging
|
||||
|
||||
* Mon Mar 02 2026 logcorrelator <dev@example.com> - 1.1.3-1
|
||||
- Refactor: Switch Unix sockets from STREAM to DGRAM mode (SOCK_DGRAM)
|
||||
- Test: Comprehensive tests added - coverage improved to 74.4%
|
||||
- Fix: Example config file installed to /etc/logcorrelator/logcorrelator.yml.example
|
||||
- Change: Default socket permissions from 0660 to 0666 (world read/write)
|
||||
|
||||
* Mon Mar 02 2026 logcorrelator <dev@example.com> - 1.1.2-1
|
||||
- Fix: Example config file installed to /etc/logcorrelator/logcorrelator.yml.example
|
||||
- Change: Default socket permissions from 0660 to 0666 (world read/write)
|
||||
|
||||
* Mon Mar 02 2026 logcorrelator <dev@example.com> - 1.1.1-1
|
||||
- Fix: Move logcorrelator.yml.example from /usr/share/logcorrelator/ to /etc/logcorrelator/
|
||||
|
||||
* Mon Mar 02 2026 logcorrelator <dev@example.com> - 1.1.0-1
|
||||
- Feat: Keep-Alive support (one-to-many correlation mode)
|
||||
- Feat: Dynamic TTL for network events (source B)
|
||||
- Feat: Separate buffer sizes for HTTP and network events
|
||||
- Feat: SIGHUP signal handling for log rotation
|
||||
- Feat: File sink Reopen() method for log rotation
|
||||
- Feat: logrotate configuration included
|
||||
- Feat: ExecReload added to systemd service
|
||||
- Feat: New YAML config structure (time_window, orphan_policy, matching, buffers, ttl)
|
||||
- Docs: Updated architecture.yml and config.example.yml
|
||||
|
||||
* Sat Feb 28 2026 logcorrelator <dev@example.com> - 1.0.7-1
|
||||
- Added: Log levels DEBUG, INFO, WARN, ERROR configurable via log.level
|
||||
- Added: Warn and Warnf methods for warning messages
|
||||
- Added: Debug logs for events received from sockets and correlations
|
||||
- Added: Warning logs for orphan events and buffer overflow
|
||||
- Changed: Configuration log.enabled replaced by log.level
|
||||
- Changed: Orphan events and buffer overflow now logged as WARN instead of DEBUG
|
||||
|
||||
* Sat Feb 28 2026 logcorrelator <dev@example.com> - 1.0.6-1
|
||||
- Changed: Configuration YAML simplified, removed service.name, service.language
|
||||
- Changed: Correlation config simplified, time_window_s instead of nested object
|
||||
- Changed: Orphan policy simplified to emit_orphans boolean
|
||||
- Changed: Apache socket renamed to http.socket
|
||||
- Added: socket_permissions option on unix sockets
|
||||
|
||||
* Sat Feb 28 2026 logcorrelator <dev@example.com> - 1.0.5-1
|
||||
- Added: Systemd service auto-start after RPM installation
|
||||
- Added: Systemd service hardening (TimeoutStartSec, TimeoutStopSec, ReadWritePaths)
|
||||
- Fixed: Systemd service unit correct config path (.yml instead of .conf)
|
||||
- Fixed: CI workflow branch name main to master
|
||||
- Changed: RPM packaging generic el8/el9/el10 directory naming
|
||||
|
||||
* Sat Feb 28 2026 logcorrelator <dev@example.com> - 1.0.4-1
|
||||
- Breaking: Flattened JSON output structure - removed apache and network subdivisions
|
||||
- All log fields now merged into single-level JSON structure
|
||||
- ClickHouse schema: replaced apache JSON and network JSON columns with fields JSON column
|
||||
- Custom MarshalJSON() implementation for flat output
|
||||
|
||||
* Sat Feb 28 2026 logcorrelator <dev@example.com> - 1.0.3-1
|
||||
- Fix: Added missing ClickHouse driver dependency
|
||||
- Fix: Fixed race condition in orchestrator
|
||||
- Security: Added explicit source_type configuration for Unix socket sources
|
||||
- Added: Comprehensive test suite improvements
|
||||
- Added: Test coverage improved from 50.6% to 62.0%
|
||||
|
||||
* Sat Feb 28 2026 logcorrelator <dev@example.com> - 1.0.2-1
|
||||
- Added: Initial RPM packaging support for Rocky Linux 8/9 and AlmaLinux 10
|
||||
- Added: Docker multi-stage build pipeline
|
||||
- Added: Hexagonal architecture implementation
|
||||
- Added: Unix socket input sources (JSON line protocol)
|
||||
- Added: File output sink (JSON lines)
|
||||
- Added: ClickHouse output sink with batching and retry logic
|
||||
- Added: Time-window based correlation on src_ip + src_port
|
||||
- Added: Graceful shutdown with signal handling (SIGINT, SIGTERM)
|
||||
|
||||
* Sat Feb 28 2026 logcorrelator <dev@example.com> - 1.0.1-1
|
||||
- Initial package for CentOS 7, Rocky Linux 8, 9, 10
|
||||
13
services/correlator/packaging/rpm/logrotate
Normal file
13
services/correlator/packaging/rpm/logrotate
Normal file
@ -0,0 +1,13 @@
|
||||
/var/log/logcorrelator/correlated.log {
|
||||
daily
|
||||
rotate 7
|
||||
compress
|
||||
delaycompress
|
||||
missingok
|
||||
notifempty
|
||||
create 0640 logcorrelator logcorrelator
|
||||
sharedscripts
|
||||
postrotate
|
||||
/bin/systemctl reload logcorrelator > /dev/null 2>&1 || true
|
||||
endscript
|
||||
}
|
||||
258
services/correlator/packaging/test/test-rpm.sh
Executable file
258
services/correlator/packaging/test/test-rpm.sh
Executable file
@ -0,0 +1,258 @@
|
||||
#!/bin/bash
|
||||
# Test script for logcorrelator RPM package
|
||||
# Verifies installation, permissions, and service status
|
||||
#
|
||||
# Usage: ./packaging/test/test-rpm.sh [el8|el9|el10]
|
||||
#
|
||||
# This script tests the RPM package in a Docker container to ensure:
|
||||
# - Installation succeeds
|
||||
# - File permissions are correct
|
||||
# - Service starts properly
|
||||
# - Sockets are created with correct ownership
|
||||
|
||||
set -e
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
PROJECT_ROOT="$(dirname "$(dirname "$SCRIPT_DIR")")"
|
||||
RPM_DIR="${PROJECT_ROOT}/dist/rpm"
|
||||
|
||||
# Default to el8 if no argument provided
|
||||
DISTRO="${1:-el8}"
|
||||
|
||||
echo "========================================="
|
||||
echo "Testing logcorrelator RPM for ${DISTRO}"
|
||||
echo "========================================="
|
||||
|
||||
# Find the RPM file
|
||||
case "${DISTRO}" in
|
||||
el8|rocky8)
|
||||
RPM_PATH="${RPM_DIR}/el8"
|
||||
BASE_IMAGE="rockylinux:8"
|
||||
;;
|
||||
el9|rocky9)
|
||||
RPM_PATH="${RPM_DIR}/el9"
|
||||
BASE_IMAGE="rockylinux:9"
|
||||
;;
|
||||
el10|alma10)
|
||||
RPM_PATH="${RPM_DIR}/el10"
|
||||
BASE_IMAGE="almalinux:10"
|
||||
;;
|
||||
*)
|
||||
echo "Unknown distribution: ${DISTRO}"
|
||||
echo "Valid options: el8, el9, el10"
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
|
||||
# Find the latest RPM file
|
||||
RPM_FILE=$(ls -t "${RPM_PATH}"/logcorrelator-*.rpm 2>/dev/null | head -n 1)
|
||||
|
||||
if [ -z "${RPM_FILE}" ]; then
|
||||
echo "ERROR: No RPM file found in ${RPM_PATH}"
|
||||
echo "Please run 'make package-rpm' first"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "Testing RPM: ${RPM_FILE}"
|
||||
echo "Base image: ${BASE_IMAGE}"
|
||||
echo ""
|
||||
|
||||
# Create test script
|
||||
TEST_SCRIPT=$(cat <<'EOF'
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
echo "=== Installing logcorrelator RPM ==="
|
||||
rpm -ivh /tmp/logcorrelator.rpm
|
||||
|
||||
echo ""
|
||||
echo "=== Checking user and group ==="
|
||||
if ! getent group logcorrelator >/dev/null; then
|
||||
echo "FAIL: logcorrelator group not created"
|
||||
exit 1
|
||||
fi
|
||||
echo "OK: logcorrelator group exists"
|
||||
|
||||
if ! getent passwd logcorrelator >/dev/null; then
|
||||
echo "FAIL: logcorrelator user not created"
|
||||
exit 1
|
||||
fi
|
||||
echo "OK: logcorrelator user exists"
|
||||
|
||||
echo ""
|
||||
echo "=== Checking directory permissions ==="
|
||||
|
||||
# Check /var/run/logcorrelator
|
||||
DIR="/var/run/logcorrelator"
|
||||
if [ ! -d "$DIR" ]; then
|
||||
echo "FAIL: $DIR does not exist"
|
||||
exit 1
|
||||
fi
|
||||
OWNER=$(stat -c '%U:%G' "$DIR")
|
||||
PERMS=$(stat -c '%a' "$DIR")
|
||||
if [ "$OWNER" != "logcorrelator:logcorrelator" ]; then
|
||||
echo "FAIL: $DIR owner is $OWNER (expected logcorrelator:logcorrelator)"
|
||||
exit 1
|
||||
fi
|
||||
if [ "$PERMS" != "755" ]; then
|
||||
echo "FAIL: $DIR permissions are $PERMS (expected 755)"
|
||||
exit 1
|
||||
fi
|
||||
echo "OK: $DIR - owner=$OWNER, permissions=$PERMS"
|
||||
|
||||
# Check /var/log/logcorrelator
|
||||
DIR="/var/log/logcorrelator"
|
||||
if [ ! -d "$DIR" ]; then
|
||||
echo "FAIL: $DIR does not exist"
|
||||
exit 1
|
||||
fi
|
||||
OWNER=$(stat -c '%U:%G' "$DIR")
|
||||
PERMS=$(stat -c '%a' "$DIR")
|
||||
if [ "$OWNER" != "logcorrelator:logcorrelator" ]; then
|
||||
echo "FAIL: $DIR owner is $OWNER (expected logcorrelator:logcorrelator)"
|
||||
exit 1
|
||||
fi
|
||||
if [ "$PERMS" != "750" ]; then
|
||||
echo "FAIL: $DIR permissions are $PERMS (expected 750)"
|
||||
exit 1
|
||||
fi
|
||||
echo "OK: $DIR - owner=$OWNER, permissions=$PERMS"
|
||||
|
||||
# Check /var/lib/logcorrelator
|
||||
DIR="/var/lib/logcorrelator"
|
||||
if [ ! -d "$DIR" ]; then
|
||||
echo "FAIL: $DIR does not exist"
|
||||
exit 1
|
||||
fi
|
||||
OWNER=$(stat -c '%U:%G' "$DIR")
|
||||
PERMS=$(stat -c '%a' "$DIR")
|
||||
if [ "$OWNER" != "logcorrelator:logcorrelator" ]; then
|
||||
echo "FAIL: $DIR owner is $OWNER (expected logcorrelator:logcorrelator)"
|
||||
exit 1
|
||||
fi
|
||||
if [ "$PERMS" != "750" ]; then
|
||||
echo "FAIL: $DIR permissions are $PERMS (expected 750)"
|
||||
exit 1
|
||||
fi
|
||||
echo "OK: $DIR - owner=$OWNER, permissions=$PERMS"
|
||||
|
||||
echo ""
|
||||
echo "=== Checking config files ==="
|
||||
|
||||
# Check config file exists and has correct permissions
|
||||
CONFIG="/etc/logcorrelator/logcorrelator.yml"
|
||||
if [ ! -f "$CONFIG" ]; then
|
||||
echo "FAIL: $CONFIG does not exist"
|
||||
exit 1
|
||||
fi
|
||||
OWNER=$(stat -c '%U:%G' "$CONFIG")
|
||||
PERMS=$(stat -c '%a' "$CONFIG")
|
||||
if [ "$OWNER" != "logcorrelator:logcorrelator" ]; then
|
||||
echo "FAIL: $CONFIG owner is $OWNER (expected logcorrelator:logcorrelator)"
|
||||
exit 1
|
||||
fi
|
||||
if [ "$PERMS" != "640" ]; then
|
||||
echo "FAIL: $CONFIG permissions are $PERMS (expected 640)"
|
||||
exit 1
|
||||
fi
|
||||
echo "OK: $CONFIG - owner=$OWNER, permissions=$PERMS"
|
||||
|
||||
# Check example config file
|
||||
EXAMPLE_CONFIG="/etc/logcorrelator/logcorrelator.yml.example"
|
||||
if [ ! -f "$EXAMPLE_CONFIG" ]; then
|
||||
echo "FAIL: $EXAMPLE_CONFIG does not exist"
|
||||
exit 1
|
||||
fi
|
||||
OWNER=$(stat -c '%U:%G' "$EXAMPLE_CONFIG")
|
||||
PERMS=$(stat -c '%a' "$EXAMPLE_CONFIG")
|
||||
if [ "$OWNER" != "logcorrelator:logcorrelator" ]; then
|
||||
echo "FAIL: $EXAMPLE_CONFIG owner is $OWNER (expected logcorrelator:logcorrelator)"
|
||||
exit 1
|
||||
fi
|
||||
if [ "$PERMS" != "640" ]; then
|
||||
echo "FAIL: $EXAMPLE_CONFIG permissions are $PERMS (expected 640)"
|
||||
exit 1
|
||||
fi
|
||||
echo "OK: $EXAMPLE_CONFIG - owner=$OWNER, permissions=$PERMS"
|
||||
|
||||
echo ""
|
||||
echo "=== Checking systemd service ==="
|
||||
if [ ! -f /etc/systemd/system/logcorrelator.service ]; then
|
||||
echo "FAIL: systemd service file not found"
|
||||
exit 1
|
||||
fi
|
||||
echo "OK: systemd service file exists"
|
||||
|
||||
echo ""
|
||||
echo "=== Checking logrotate config ==="
|
||||
if [ ! -f /etc/logrotate.d/logcorrelator ]; then
|
||||
echo "FAIL: logrotate config not found"
|
||||
exit 1
|
||||
fi
|
||||
echo "OK: logrotate config exists"
|
||||
|
||||
echo ""
|
||||
echo "=== Testing service start ==="
|
||||
# Try to start the service (may fail in container without full systemd)
|
||||
if command -v systemctl >/dev/null 2>&1; then
|
||||
systemctl daemon-reload || true
|
||||
if systemctl start logcorrelator.service 2>/dev/null; then
|
||||
echo "OK: service started successfully"
|
||||
|
||||
# Wait for sockets to be created
|
||||
sleep 2
|
||||
|
||||
echo ""
|
||||
echo "=== Checking sockets ==="
|
||||
HTTP_SOCKET="/var/run/logcorrelator/http.socket"
|
||||
NETWORK_SOCKET="/var/run/logcorrelator/network.socket"
|
||||
|
||||
if [ -S "$HTTP_SOCKET" ]; then
|
||||
OWNER=$(stat -c '%U:%G' "$HTTP_SOCKET")
|
||||
PERMS=$(stat -c '%a' "$HTTP_SOCKET")
|
||||
echo "OK: $HTTP_SOCKET exists - owner=$OWNER, permissions=$PERMS"
|
||||
if [ "$PERMS" != "666" ]; then
|
||||
echo "WARN: socket permissions are $PERMS (expected 666)"
|
||||
fi
|
||||
else
|
||||
echo "WARN: $HTTP_SOCKET not found (service may not have started)"
|
||||
fi
|
||||
|
||||
if [ -S "$NETWORK_SOCKET" ]; then
|
||||
OWNER=$(stat -c '%U:%G' "$NETWORK_SOCKET")
|
||||
PERMS=$(stat -c '%a' "$NETWORK_SOCKET")
|
||||
echo "OK: $NETWORK_SOCKET exists - owner=$OWNER, permissions=$PERMS"
|
||||
if [ "$PERMS" != "666" ]; then
|
||||
echo "WARN: socket permissions are $PERMS (expected 666)"
|
||||
fi
|
||||
else
|
||||
echo "WARN: $NETWORK_SOCKET not found (service may not have started)"
|
||||
fi
|
||||
|
||||
systemctl stop logcorrelator.service || true
|
||||
else
|
||||
echo "WARN: service failed to start (expected in minimal container)"
|
||||
fi
|
||||
else
|
||||
echo "WARN: systemctl not available (minimal container)"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "========================================="
|
||||
echo "All tests passed!"
|
||||
echo "========================================="
|
||||
EOF
|
||||
)
|
||||
|
||||
# Run test in Docker container
|
||||
echo "Running tests in Docker container..."
|
||||
echo ""
|
||||
|
||||
docker run --rm \
|
||||
-v "${RPM_FILE}:/tmp/logcorrelator.rpm:ro" \
|
||||
-v "${TEST_SCRIPT}:/test.sh:ro" \
|
||||
"${BASE_IMAGE}" \
|
||||
bash /test.sh
|
||||
|
||||
echo ""
|
||||
echo "Test completed successfully for ${DISTRO}"
|
||||
101
services/correlator/scripts/audit-architecture.sh
Executable file
101
services/correlator/scripts/audit-architecture.sh
Executable file
@ -0,0 +1,101 @@
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
echo "=== AUDIT ARCHITECTURE COMPLIANCE ==="
|
||||
echo ""
|
||||
|
||||
# 1. Runtime - systemd service
|
||||
echo "1. RUNTIME - SYSTEMD SERVICE"
|
||||
if [ -f /src/logcorrelator.service ]; then
|
||||
echo "✅ logcorrelator.service exists"
|
||||
grep -q "ExecStart=/usr/bin/logcorrelator" /src/logcorrelator.service && echo " ✅ ExecStart correct" || echo " ❌ ExecStart incorrect"
|
||||
grep -q "ExecReload=" /src/logcorrelator.service && echo " ✅ ExecReload present" || echo " ❌ ExecReload missing"
|
||||
grep -q "Restart=on-failure" /src/logcorrelator.service && echo " ✅ Restart policy correct" || echo " ❌ Restart policy incorrect"
|
||||
else
|
||||
echo "❌ logcorrelator.service missing"
|
||||
fi
|
||||
|
||||
# Check signal handling in code
|
||||
echo ""
|
||||
grep -r "SIGINT\|SIGTERM\|SIGHUP" /src/cmd/logcorrelator/main.go > /dev/null && echo "✅ Signal handling (SIGINT/SIGTERM/SIGHUP) implemented" || echo "❌ Signal handling missing"
|
||||
|
||||
# 2. Packaging - RPM
|
||||
echo ""
|
||||
echo "2. PACKAGING - RPM"
|
||||
[ -f /src/packaging/rpm/logcorrelator.spec ] && echo "✅ RPM spec file exists" || echo "❌ RPM spec missing"
|
||||
grep -q "fpm" /src/Dockerfile.package && echo "✅ fpm tool used for packaging" || echo "❌ fpm not found"
|
||||
|
||||
# 3. Config - YAML
|
||||
echo ""
|
||||
echo "3. CONFIG - YAML"
|
||||
[ -f /src/config.example.yml ] && echo "✅ config.example.yml exists" || echo "❌ config.example.yml missing"
|
||||
grep -q "log:" /src/config.example.yml && echo " ✅ log section present" || echo " ❌ log section missing"
|
||||
grep -q "inputs:" /src/config.example.yml && echo " ✅ inputs section present" || echo " ❌ inputs section missing"
|
||||
grep -q "outputs:" /src/config.example.yml && echo " ✅ outputs section present" || echo " ❌ outputs section missing"
|
||||
grep -q "correlation:" /src/config.example.yml && echo " ✅ correlation section present" || echo " ❌ correlation section missing"
|
||||
|
||||
# 4. Inputs - Unix datagram sockets
|
||||
echo ""
|
||||
echo "4. INPUTS - UNIX DATAGRAM SOCKETS"
|
||||
grep -q "ListenUnixgram" /src/internal/adapters/inbound/unixsocket/source.go && echo "✅ Using ListenUnixgram (SOCK_DGRAM)" || echo "❌ Not using SOCK_DGRAM"
|
||||
grep -q "ReadFromUnix" /src/internal/adapters/inbound/unixsocket/source.go && echo "✅ Using ReadFromUnix for datagrams" || echo "❌ Not using ReadFromUnix"
|
||||
grep -q "MaxDatagramSize = 65535" /src/internal/adapters/inbound/unixsocket/source.go && echo "✅ max_datagram_bytes = 65535" || echo "❌ max_datagram_bytes incorrect"
|
||||
grep -q "0666" /src/internal/adapters/inbound/unixsocket/source.go && echo "✅ Default socket permissions 0666" || echo "❌ Socket permissions not 0666"
|
||||
|
||||
# Check socket paths in config
|
||||
grep -q "http.socket" /src/config.example.yml && echo " ✅ http.socket path configured" || echo " ❌ http.socket path missing"
|
||||
grep -q "network.socket" /src/config.example.yml && echo " ✅ network.socket path configured" || echo " ❌ network.socket path missing"
|
||||
|
||||
# 5. Outputs - Sinks
|
||||
echo ""
|
||||
echo "5. OUTPUTS - SINKS"
|
||||
[ -f /src/internal/adapters/outbound/file/sink.go ] && echo "✅ File sink exists" || echo "❌ File sink missing"
|
||||
[ -f /src/internal/adapters/outbound/clickhouse/sink.go ] && echo "✅ ClickHouse sink exists" || echo "❌ ClickHouse sink missing"
|
||||
[ -f /src/internal/adapters/outbound/multi/sink.go ] && echo "✅ MultiSink exists" || echo "❌ MultiSink missing"
|
||||
|
||||
# Check SIGHUP reopen in file sink
|
||||
grep -q "Reopen" /src/internal/adapters/outbound/file/sink.go && echo " ✅ FileSink.Reopen() for SIGHUP" || echo " ❌ FileSink.Reopen() missing"
|
||||
|
||||
# Check ClickHouse batching
|
||||
grep -q "batch" /src/internal/adapters/outbound/clickhouse/sink.go && echo " ✅ ClickHouse batching implemented" || echo " ❌ ClickHouse batching missing"
|
||||
grep -q "drop_on_overflow\|DropOnOverflow" /src/internal/adapters/outbound/clickhouse/sink.go && echo " ✅ drop_on_overflow implemented" || echo " ❌ drop_on_overflow missing"
|
||||
|
||||
# 6. Correlation
|
||||
echo ""
|
||||
echo "6. CORRELATION"
|
||||
grep -q "src_ip" /src/internal/domain/correlation_service.go && echo "✅ src_ip in correlation key" || echo "❌ src_ip missing"
|
||||
grep -q "src_port" /src/internal/domain/correlation_service.go && echo "✅ src_port in correlation key" || echo "❌ src_port missing"
|
||||
grep -q "MatchingMode" /src/internal/domain/correlation_service.go && echo "✅ MatchingMode (one_to_one/one_to_many) implemented" || echo "❌ MatchingMode missing"
|
||||
grep -q "ApacheAlwaysEmit" /src/internal/domain/correlation_service.go && echo "✅ apache_always_emit orphan policy" || echo "❌ apache_always_emit missing"
|
||||
grep -q "network_ttl\|NetworkTTLS" /src/internal/domain/correlation_service.go && echo "✅ TTL management for network events" || echo "❌ TTL management missing"
|
||||
grep -q "max_http_items\|maxHttpItems\|MaxHTTPItems" /src/internal/domain/correlation_service.go && echo "✅ Buffer limit max_http_items" || echo " ⚠️ Buffer limit naming may differ"
|
||||
grep -q "max_network_items\|maxNetworkItems\|MaxNetworkItems" /src/internal/domain/correlation_service.go && echo "✅ Buffer limit max_network_items" || echo " ⚠️ Buffer limit naming may differ"
|
||||
|
||||
# 7. Schema - Source A and B
|
||||
echo ""
|
||||
echo "7. SCHEMA - SOURCE A AND B"
|
||||
grep -q "timestamp" /src/internal/adapters/inbound/unixsocket/source.go && echo "✅ timestamp field for Source A" || echo "❌ timestamp missing for Source A"
|
||||
grep -q "SourceA\|SourceB" /src/internal/domain/event.go && echo "✅ EventSource enum (A/B)" || echo "❌ EventSource enum missing"
|
||||
grep -q "header_" /src/internal/adapters/inbound/unixsocket/source.go && echo "✅ header_* dynamic fields" || echo "❌ header_* fields missing"
|
||||
grep -q "Extra" /src/internal/domain/event.go && echo "✅ Extra fields map" || echo "❌ Extra fields missing"
|
||||
|
||||
# 8. Architecture modules
|
||||
echo ""
|
||||
echo "8. ARCHITECTURE MODULES"
|
||||
[ -d /src/internal/domain ] && echo "✅ internal/domain" || echo "❌ internal/domain missing"
|
||||
[ -d /src/internal/ports ] && echo "✅ internal/ports" || echo "❌ internal/ports missing"
|
||||
[ -d /src/internal/app ] && echo "✅ internal/app" || echo "❌ internal/app missing"
|
||||
[ -d /src/internal/adapters/inbound ] && echo "✅ internal/adapters/inbound" || echo "❌ internal/adapters/inbound missing"
|
||||
[ -d /src/internal/adapters/outbound ] && echo "✅ internal/adapters/outbound" || echo "❌ internal/adapters/outbound missing"
|
||||
[ -d /src/internal/config ] && echo "✅ internal/config" || echo "❌ internal/config missing"
|
||||
[ -d /src/internal/observability ] && echo "✅ internal/observability" || echo "❌ internal/observability missing"
|
||||
[ -d /src/cmd/logcorrelator ] && echo "✅ cmd/logcorrelator" || echo "❌ cmd/logcorrelator missing"
|
||||
|
||||
# 9. Testing
|
||||
echo ""
|
||||
echo "9. TESTING"
|
||||
echo "Running tests with coverage..."
|
||||
cd /src && go test ./... -cover 2>&1 | grep -E "^(ok|FAIL|\?)" || true
|
||||
|
||||
echo ""
|
||||
echo "=== AUDIT COMPLETE ==="
|
||||
582
services/correlator/scripts/test-correlation-advanced.py
Executable file
582
services/correlator/scripts/test-correlation-advanced.py
Executable file
@ -0,0 +1,582 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
test-correlation-advanced.py - Advanced correlation testing tool
|
||||
|
||||
This script provides comprehensive testing for the logcorrelator service,
|
||||
including various scenarios to debug correlation issues.
|
||||
|
||||
Usage:
|
||||
python3 test-correlation-advanced.py [options]
|
||||
|
||||
Requirements:
|
||||
- Python 3.6+
|
||||
- requests library (for metrics): pip install requests
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import socket
|
||||
import sys
|
||||
import time
|
||||
from datetime import datetime
|
||||
from typing import Dict, Any, Optional, Tuple
|
||||
|
||||
try:
|
||||
import requests
|
||||
HAS_REQUESTS = True
|
||||
except ImportError:
|
||||
HAS_REQUESTS = False
|
||||
|
||||
|
||||
class Colors:
|
||||
"""ANSI color codes for terminal output."""
|
||||
BLUE = '\033[0;34m'
|
||||
GREEN = '\033[0;32m'
|
||||
YELLOW = '\033[1;33m'
|
||||
RED = '\033[0;31m'
|
||||
NC = '\033[0m' # No Color
|
||||
BOLD = '\033[1m'
|
||||
|
||||
|
||||
def colorize(text: str, color: str) -> str:
|
||||
"""Wrap text with ANSI color codes."""
|
||||
return f"{color}{text}{Colors.NC}"
|
||||
|
||||
|
||||
def info(text: str):
|
||||
print(colorize(f"[INFO] ", Colors.BLUE) + text)
|
||||
|
||||
|
||||
def success(text: str):
|
||||
print(colorize(f"[OK] ", Colors.GREEN) + text)
|
||||
|
||||
|
||||
def warn(text: str):
|
||||
print(colorize(f"[WARN] ", Colors.YELLOW) + text)
|
||||
|
||||
|
||||
def error(text: str):
|
||||
print(colorize(f"[ERROR] ", Colors.RED) + text)
|
||||
|
||||
|
||||
def debug(text: str, verbose: bool = False):
|
||||
if verbose:
|
||||
print(colorize(f"[DEBUG] ", Colors.BLUE) + text)
|
||||
|
||||
|
||||
class CorrelationTester:
|
||||
"""Main test class for correlation testing."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
http_socket: str = "/var/run/logcorrelator/http.socket",
|
||||
network_socket: str = "/var/run/logcorrelator/network.socket",
|
||||
metrics_url: str = "http://localhost:8080/metrics",
|
||||
verbose: bool = False,
|
||||
skip_metrics: bool = False
|
||||
):
|
||||
self.http_socket = http_socket
|
||||
self.network_socket = network_socket
|
||||
self.metrics_url = metrics_url
|
||||
self.verbose = verbose
|
||||
self.skip_metrics = skip_metrics
|
||||
self.http_sock: Optional[socket.socket] = None
|
||||
self.network_sock: Optional[socket.socket] = None
|
||||
|
||||
def connect(self) -> bool:
|
||||
"""Connect to Unix sockets."""
|
||||
try:
|
||||
# HTTP socket
|
||||
self.http_sock = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM)
|
||||
self.http_sock.connect(self.http_socket)
|
||||
debug(f"Connected to HTTP socket: {self.http_socket}", self.verbose)
|
||||
|
||||
# Network socket
|
||||
self.network_sock = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM)
|
||||
self.network_sock.connect(self.network_socket)
|
||||
debug(f"Connected to Network socket: {self.network_socket}", self.verbose)
|
||||
|
||||
return True
|
||||
except FileNotFoundError as e:
|
||||
error(f"Socket not found: {e}")
|
||||
return False
|
||||
except Exception as e:
|
||||
error(f"Connection error: {e}")
|
||||
return False
|
||||
|
||||
def close(self):
|
||||
"""Close socket connections."""
|
||||
if self.http_sock:
|
||||
self.http_sock.close()
|
||||
if self.network_sock:
|
||||
self.network_sock.close()
|
||||
|
||||
def send_http_event(
|
||||
self,
|
||||
src_ip: str,
|
||||
src_port: int,
|
||||
timestamp: int,
|
||||
method: str = "GET",
|
||||
path: str = "/test",
|
||||
host: str = "example.com",
|
||||
extra_headers: Optional[Dict[str, str]] = None
|
||||
) -> Dict[str, Any]:
|
||||
"""Send an HTTP (source A) event."""
|
||||
event = {
|
||||
"src_ip": src_ip,
|
||||
"src_port": src_port,
|
||||
"dst_ip": "10.0.0.1",
|
||||
"dst_port": 443,
|
||||
"timestamp": timestamp,
|
||||
"method": method,
|
||||
"path": path,
|
||||
"host": host,
|
||||
"http_version": "HTTP/1.1",
|
||||
"header_user_agent": "TestAgent/1.0",
|
||||
"header_accept": "*/*"
|
||||
}
|
||||
|
||||
if extra_headers:
|
||||
for key, value in extra_headers.items():
|
||||
event[f"header_{key}"] = value
|
||||
|
||||
json_data = json.dumps(event)
|
||||
|
||||
if self.http_sock:
|
||||
self.http_sock.sendall(json_data.encode())
|
||||
debug(f"Sent HTTP event: {src_ip}:{src_port} ts={timestamp}", self.verbose)
|
||||
|
||||
return event
|
||||
|
||||
def send_network_event(
|
||||
self,
|
||||
src_ip: str,
|
||||
src_port: int,
|
||||
timestamp: int,
|
||||
ja3: str = "abc123",
|
||||
ja4: str = "def456",
|
||||
tls_version: str = "TLS1.3",
|
||||
tls_sni: str = "example.com"
|
||||
) -> Dict[str, Any]:
|
||||
"""Send a Network (source B) event."""
|
||||
event = {
|
||||
"src_ip": src_ip,
|
||||
"src_port": src_port,
|
||||
"dst_ip": "10.0.0.1",
|
||||
"dst_port": 443,
|
||||
"timestamp": timestamp,
|
||||
"ja3": ja3,
|
||||
"ja4": ja4,
|
||||
"tls_version": tls_version,
|
||||
"tls_sni": tls_sni
|
||||
}
|
||||
|
||||
json_data = json.dumps(event)
|
||||
|
||||
if self.network_sock:
|
||||
self.network_sock.sendall(json_data.encode())
|
||||
debug(f"Sent Network event: {src_ip}:{src_port} ts={timestamp}", self.verbose)
|
||||
|
||||
return event
|
||||
|
||||
def get_metrics(self) -> Dict[str, Any]:
|
||||
"""Fetch metrics from the metrics server."""
|
||||
if self.skip_metrics:
|
||||
return {}
|
||||
|
||||
if not HAS_REQUESTS:
|
||||
warn("requests library not installed, skipping metrics")
|
||||
return {}
|
||||
|
||||
try:
|
||||
response = requests.get(self.metrics_url, timeout=5)
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
except Exception as e:
|
||||
warn(f"Failed to fetch metrics: {e}")
|
||||
return {}
|
||||
|
||||
def print_metrics(self, metrics: Dict[str, Any], title: str = "Metrics"):
|
||||
"""Print metrics in a formatted way."""
|
||||
if not metrics:
|
||||
return
|
||||
|
||||
print(f"\n{colorize(f'=== {title} ===', Colors.BOLD)}")
|
||||
|
||||
keys_to_show = [
|
||||
("events_received_a", "Events A"),
|
||||
("events_received_b", "Events B"),
|
||||
("correlations_success", "Correlations"),
|
||||
("correlations_failed", "Failures"),
|
||||
("failed_no_match_key", " - No match key"),
|
||||
("failed_time_window", " - Time window"),
|
||||
("failed_buffer_eviction", " - Buffer eviction"),
|
||||
("failed_ttl_expired", " - TTL expired"),
|
||||
("buffer_a_size", "Buffer A size"),
|
||||
("buffer_b_size", "Buffer B size"),
|
||||
("orphans_emitted_a", "Orphans A"),
|
||||
("orphans_emitted_b", "Orphans B"),
|
||||
("pending_orphan_match", "Pending orphan matches"),
|
||||
("keepalive_resets", "Keep-Alive resets"),
|
||||
]
|
||||
|
||||
for key, label in keys_to_show:
|
||||
if key in metrics:
|
||||
print(f" {label}: {metrics[key]}")
|
||||
|
||||
def check_sockets(self) -> bool:
|
||||
"""Check if sockets exist."""
|
||||
import os
|
||||
|
||||
errors = 0
|
||||
for name, path in [("HTTP", self.http_socket), ("Network", self.network_socket)]:
|
||||
if not os.path.exists(path):
|
||||
error(f"{name} socket not found: {path}")
|
||||
errors += 1
|
||||
elif not os.path.exists(path) or not os.path.stat(path).st_mode & 0o170000 == 0o140000:
|
||||
# Check if it's a socket
|
||||
try:
|
||||
if not socket.getaddrinfo(path, None, socket.AF_UNIX):
|
||||
error(f"{name} path exists but is not a socket: {path}")
|
||||
errors += 1
|
||||
except:
|
||||
pass
|
||||
else:
|
||||
debug(f"{name} socket found: {path}", self.verbose)
|
||||
|
||||
return errors == 0
|
||||
|
||||
def run_basic_test(self, count: int = 10, delay_ms: int = 100) -> Tuple[bool, Dict[str, int]]:
|
||||
"""
|
||||
Run basic correlation test.
|
||||
|
||||
Sends N pairs of A+B events with matching src_ip:src_port and timestamps.
|
||||
All should correlate successfully.
|
||||
"""
|
||||
info(f"Running basic correlation test with {count} pairs...")
|
||||
|
||||
# Get initial metrics
|
||||
initial_metrics = self.get_metrics()
|
||||
self.print_metrics(initial_metrics, "Initial Metrics")
|
||||
|
||||
initial_success = initial_metrics.get("correlations_success", 0)
|
||||
initial_failed = initial_metrics.get("correlations_failed", 0)
|
||||
initial_a = initial_metrics.get("events_received_a", 0)
|
||||
initial_b = initial_metrics.get("events_received_b", 0)
|
||||
|
||||
# Send test events
|
||||
print(f"\nSending {count} event pairs...")
|
||||
|
||||
base_timestamp = time.time_ns()
|
||||
sent = 0
|
||||
|
||||
for i in range(1, count + 1):
|
||||
src_ip = f"192.168.1.{(i % 254) + 1}"
|
||||
src_port = 8000 + i
|
||||
|
||||
# Same timestamp for perfect correlation
|
||||
timestamp = base_timestamp + (i * 1_000_000)
|
||||
|
||||
self.send_http_event(src_ip, src_port, timestamp)
|
||||
self.send_network_event(src_ip, src_port, timestamp)
|
||||
|
||||
sent += 1
|
||||
|
||||
if delay_ms > 0:
|
||||
time.sleep(delay_ms / 1000.0)
|
||||
|
||||
success(f"Sent {sent} event pairs")
|
||||
|
||||
# Wait for processing
|
||||
info("Waiting for processing (2 seconds)...")
|
||||
time.sleep(2)
|
||||
|
||||
# Get final metrics
|
||||
final_metrics = self.get_metrics()
|
||||
self.print_metrics(final_metrics, "Final Metrics")
|
||||
|
||||
# Calculate deltas
|
||||
delta_success = final_metrics.get("correlations_success", 0) - initial_success
|
||||
delta_failed = final_metrics.get("correlations_failed", 0) - initial_failed
|
||||
delta_a = final_metrics.get("events_received_a", 0) - initial_a
|
||||
delta_b = final_metrics.get("events_received_b", 0) - initial_b
|
||||
|
||||
results = {
|
||||
"sent": sent,
|
||||
"received_a": delta_a,
|
||||
"received_b": delta_b,
|
||||
"correlations": delta_success,
|
||||
"failures": delta_failed
|
||||
}
|
||||
|
||||
# Print results
|
||||
print(f"\n{colorize('=== Results ===', Colors.BOLD)}")
|
||||
print(f" Events A sent: {delta_a} (expected: {sent})")
|
||||
print(f" Events B sent: {delta_b} (expected: {sent})")
|
||||
print(f" Correlations: {delta_success}")
|
||||
print(f" Failures: {delta_failed}")
|
||||
|
||||
# Validation
|
||||
test_passed = True
|
||||
|
||||
if delta_a != sent:
|
||||
error(f"Event A count mismatch: got {delta_a}, expected {sent}")
|
||||
test_passed = False
|
||||
|
||||
if delta_b != sent:
|
||||
error(f"Event B count mismatch: got {delta_b}, expected {sent}")
|
||||
test_passed = False
|
||||
|
||||
if delta_success != sent:
|
||||
error(f"Correlation count mismatch: got {delta_success}, expected {sent}")
|
||||
test_passed = False
|
||||
|
||||
if delta_failed > 0:
|
||||
warn(f"Unexpected correlation failures: {delta_failed}")
|
||||
|
||||
if test_passed:
|
||||
success("All tests passed! Correlation is working correctly.")
|
||||
else:
|
||||
error("Some tests failed. Check logs for details.")
|
||||
|
||||
return test_passed, results
|
||||
|
||||
def run_time_window_test(self) -> bool:
|
||||
"""Test time window expiration."""
|
||||
info("Running time window test...")
|
||||
|
||||
src_ip = "192.168.100.1"
|
||||
src_port = 9999
|
||||
|
||||
# Send A event
|
||||
ts_a = time.time_ns()
|
||||
self.send_http_event(src_ip, src_port, ts_a)
|
||||
info(f"Sent A event at {ts_a}")
|
||||
|
||||
# Wait for time window to expire (default 10s)
|
||||
info("Waiting 11 seconds (time window should expire)...")
|
||||
time.sleep(11)
|
||||
|
||||
# Send B event
|
||||
ts_b = time.time_ns()
|
||||
self.send_network_event(src_ip, src_port, ts_b)
|
||||
info(f"Sent B event at {ts_b}")
|
||||
|
||||
time_diff_sec = (ts_b - ts_a) / 1_000_000_000
|
||||
info(f"Time difference: {time_diff_sec:.1f} seconds")
|
||||
info("Expected: time_window failure (check metrics)")
|
||||
|
||||
return True
|
||||
|
||||
def run_different_ip_test(self) -> bool:
|
||||
"""Test different IP (should not correlate)."""
|
||||
info("Running different IP test...")
|
||||
|
||||
ts = time.time_ns()
|
||||
|
||||
# Send A with IP 192.168.200.1
|
||||
self.send_http_event("192.168.200.1", 7777, ts)
|
||||
info("Sent A event from 192.168.200.1:7777")
|
||||
|
||||
# Send B with different IP
|
||||
self.send_network_event("192.168.200.2", 7777, ts)
|
||||
info("Sent B event from 192.168.200.2:7777 (different IP)")
|
||||
|
||||
info("Expected: no_match_key failure (different src_ip)")
|
||||
|
||||
return True
|
||||
|
||||
def run_keepalive_test(self, count: int = 5) -> bool:
|
||||
"""Test Keep-Alive mode (one B correlates with multiple A)."""
|
||||
info(f"Running Keep-Alive test with {count} HTTP requests on same connection...")
|
||||
|
||||
src_ip = "192.168.50.1"
|
||||
src_port = 6000
|
||||
|
||||
# Send one B event first (network/TCP connection)
|
||||
ts_b = time.time_ns()
|
||||
self.send_network_event(src_ip, src_port, ts_b)
|
||||
info(f"Sent B event (connection): {src_ip}:{src_port}")
|
||||
|
||||
# Send multiple A events (HTTP requests) on same connection
|
||||
for i in range(count):
|
||||
ts_a = time.time_ns() + (i * 100_000_000) # 100ms apart
|
||||
self.send_http_event(src_ip, src_port, ts_a, path=f"/request{i}")
|
||||
info(f"Sent A event (request {i}): {src_ip}:{src_port}")
|
||||
time.sleep(0.05) # 50ms delay
|
||||
|
||||
time.sleep(2) # Wait for processing
|
||||
|
||||
# Check metrics
|
||||
metrics = self.get_metrics()
|
||||
keepalive_resets = metrics.get("keepalive_resets", 0)
|
||||
|
||||
info(f"Keep-Alive resets: {keepalive_resets} (expected: {count - 1})")
|
||||
|
||||
if keepalive_resets >= count - 1:
|
||||
success("Keep-Alive test passed!")
|
||||
return True
|
||||
else:
|
||||
warn(f"Keep-Alive resets lower than expected. This may be normal depending on timing.")
|
||||
return True
|
||||
|
||||
def run_all_tests(self) -> bool:
|
||||
"""Run all test scenarios."""
|
||||
results = []
|
||||
|
||||
# Basic test
|
||||
passed, _ = self.run_basic_test(count=10)
|
||||
results.append(("Basic correlation", passed))
|
||||
|
||||
print("\n" + "=" * 50 + "\n")
|
||||
|
||||
# Time window test
|
||||
self.run_time_window_test()
|
||||
results.append(("Time window", True)) # Informational
|
||||
|
||||
print("\n" + "=" * 50 + "\n")
|
||||
|
||||
# Different IP test
|
||||
self.run_different_ip_test()
|
||||
results.append(("Different IP", True)) # Informational
|
||||
|
||||
print("\n" + "=" * 50 + "\n")
|
||||
|
||||
# Keep-Alive test
|
||||
self.run_keepalive_test()
|
||||
results.append(("Keep-Alive", True))
|
||||
|
||||
# Summary
|
||||
print(f"\n{colorize('=== Test Summary ===', Colors.BOLD)}")
|
||||
for name, passed in results:
|
||||
status = colorize("PASS", Colors.GREEN) if passed else colorize("FAIL", Colors.RED)
|
||||
print(f" {name}: {status}")
|
||||
|
||||
return all(r[1] for r in results)
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Advanced correlation testing tool for logcorrelator",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
# Run basic test with 20 pairs
|
||||
python3 test-correlation-advanced.py -c 20
|
||||
|
||||
# Run all tests with verbose output
|
||||
python3 test-correlation-advanced.py --all -v
|
||||
|
||||
# Test with custom socket paths
|
||||
python3 test-correlation-advanced.py -H /tmp/http.sock -N /tmp/network.sock
|
||||
|
||||
# Skip metrics check
|
||||
python3 test-correlation-advanced.py --skip-metrics
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"-H", "--http-socket",
|
||||
default="/var/run/logcorrelator/http.socket",
|
||||
help="Path to HTTP Unix socket (default: /var/run/logcorrelator/http.socket)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"-N", "--network-socket",
|
||||
default="/var/run/logcorrelator/network.socket",
|
||||
help="Path to Network Unix socket (default: /var/run/logcorrelator/network.socket)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"-m", "--metrics-url",
|
||||
default="http://localhost:8080/metrics",
|
||||
help="Metrics server URL (default: http://localhost:8080/metrics)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"-c", "--count",
|
||||
type=int,
|
||||
default=10,
|
||||
help="Number of test pairs to send (default: 10)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"-d", "--delay",
|
||||
type=int,
|
||||
default=100,
|
||||
help="Delay between pairs in milliseconds (default: 100)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"-v", "--verbose",
|
||||
action="store_true",
|
||||
help="Enable verbose output"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--skip-metrics",
|
||||
action="store_true",
|
||||
help="Skip metrics check"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--all",
|
||||
action="store_true",
|
||||
help="Run all test scenarios"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--time-window",
|
||||
action="store_true",
|
||||
help="Run time window test only"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--different-ip",
|
||||
action="store_true",
|
||||
help="Run different IP test only"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--keepalive",
|
||||
action="store_true",
|
||||
help="Run Keep-Alive test only"
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Create tester
|
||||
tester = CorrelationTester(
|
||||
http_socket=args.http_socket,
|
||||
network_socket=args.network_socket,
|
||||
metrics_url=args.metrics_url,
|
||||
verbose=args.verbose,
|
||||
skip_metrics=args.skip_metrics
|
||||
)
|
||||
|
||||
# Check sockets
|
||||
if not tester.check_sockets():
|
||||
error("Socket check failed. Is logcorrelator running?")
|
||||
sys.exit(1)
|
||||
|
||||
success("Socket check passed")
|
||||
|
||||
# Connect
|
||||
if not tester.connect():
|
||||
error("Failed to connect to sockets")
|
||||
sys.exit(1)
|
||||
|
||||
try:
|
||||
if args.all:
|
||||
success = tester.run_all_tests()
|
||||
elif args.time_window:
|
||||
tester.run_time_window_test()
|
||||
success = True
|
||||
elif args.different_ip:
|
||||
tester.run_different_ip_test()
|
||||
success = True
|
||||
elif args.keepalive:
|
||||
tester.run_keepalive_test()
|
||||
success = True
|
||||
else:
|
||||
_, _ = tester.run_basic_test(count=args.count, delay_ms=args.delay)
|
||||
success = True
|
||||
|
||||
sys.exit(0 if success else 1)
|
||||
|
||||
finally:
|
||||
tester.close()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
404
services/correlator/scripts/test-correlation.sh
Executable file
404
services/correlator/scripts/test-correlation.sh
Executable file
@ -0,0 +1,404 @@
|
||||
#!/bin/bash
|
||||
#
|
||||
# test-correlation.sh - Test script for log correlation debugging
|
||||
#
|
||||
# This script sends test HTTP (A) and Network (B) events to the logcorrelator
|
||||
# Unix sockets and verifies that correlation is working correctly.
|
||||
#
|
||||
# Usage:
|
||||
# ./test-correlation.sh [options]
|
||||
#
|
||||
# Options:
|
||||
# -h, --http-socket PATH Path to HTTP socket (default: /var/run/logcorrelator/http.socket)
|
||||
# -n, --network-socket PATH Path to Network socket (default: /var/run/logcorrelator/network.socket)
|
||||
# -c, --count NUM Number of test pairs to send (default: 10)
|
||||
# -d, --delay MS Delay between pairs in milliseconds (default: 100)
|
||||
# -v, --verbose Enable verbose output
|
||||
# -m, --metrics-url URL Metrics server URL (default: http://localhost:8080/metrics)
|
||||
# --skip-metrics Skip metrics check
|
||||
# --help Show this help message
|
||||
#
|
||||
|
||||
set -e
|
||||
|
||||
# Default values
|
||||
HTTP_SOCKET="/var/run/logcorrelator/http.socket"
|
||||
NETWORK_SOCKET="/var/run/logcorrelator/network.socket"
|
||||
COUNT=10
|
||||
DELAY_MS=100
|
||||
VERBOSE=false
|
||||
METRICS_URL="http://localhost:8080/metrics"
|
||||
SKIP_METRICS=false
|
||||
|
||||
# Colors for output
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
BLUE='\033[0;34m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
# Print functions
|
||||
info() {
|
||||
echo -e "${BLUE}[INFO]${NC} $1"
|
||||
}
|
||||
|
||||
success() {
|
||||
echo -e "${GREEN}[OK]${NC} $1"
|
||||
}
|
||||
|
||||
warn() {
|
||||
echo -e "${YELLOW}[WARN]${NC} $1"
|
||||
}
|
||||
|
||||
error() {
|
||||
echo -e "${RED}[ERROR]${NC} $1"
|
||||
}
|
||||
|
||||
verbose() {
|
||||
if [ "$VERBOSE" = true ]; then
|
||||
echo -e "${BLUE}[DEBUG]${NC} $1"
|
||||
fi
|
||||
}
|
||||
|
||||
# Show help
|
||||
show_help() {
|
||||
head -20 "$0" | tail -17 | sed 's/^#//' | sed 's/^ //'
|
||||
exit 0
|
||||
}
|
||||
|
||||
# Parse arguments
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case $1 in
|
||||
-h|--http-socket)
|
||||
HTTP_SOCKET="$2"
|
||||
shift 2
|
||||
;;
|
||||
-n|--network-socket)
|
||||
NETWORK_SOCKET="$2"
|
||||
shift 2
|
||||
;;
|
||||
-c|--count)
|
||||
COUNT="$2"
|
||||
shift 2
|
||||
;;
|
||||
-d|--delay)
|
||||
DELAY_MS="$2"
|
||||
shift 2
|
||||
;;
|
||||
-v|--verbose)
|
||||
VERBOSE=true
|
||||
shift
|
||||
;;
|
||||
-m|--metrics-url)
|
||||
METRICS_URL="$2"
|
||||
shift 2
|
||||
;;
|
||||
--skip-metrics)
|
||||
SKIP_METRICS=true
|
||||
shift
|
||||
;;
|
||||
--help)
|
||||
show_help
|
||||
;;
|
||||
*)
|
||||
error "Unknown option: $1"
|
||||
echo "Use --help for usage information"
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
done
|
||||
|
||||
# Check if socat or netcat is available
|
||||
if command -v socat &> /dev/null; then
|
||||
SEND_CMD="socat"
|
||||
elif command -v nc &> /dev/null; then
|
||||
SEND_CMD="nc"
|
||||
else
|
||||
error "Neither socat nor nc (netcat) found. Please install one of them."
|
||||
echo " Ubuntu/Debian: apt-get install socat OR apt-get install netcat-openbsd"
|
||||
echo " RHEL/CentOS: yum install socat OR yum install nc"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Function to send data to Unix socket
|
||||
send_to_socket() {
|
||||
local socket="$1"
|
||||
local data="$2"
|
||||
|
||||
if [ "$SEND_CMD" = "socat" ]; then
|
||||
echo "$data" | socat - "UNIX-SENDTO:$socket" 2>/dev/null
|
||||
else
|
||||
echo "$data" | nc -U -u "$socket" 2>/dev/null
|
||||
fi
|
||||
}
|
||||
|
||||
# Function to generate timestamp in nanoseconds
|
||||
get_timestamp_ns() {
|
||||
date +%s%N
|
||||
}
|
||||
|
||||
# Function to send HTTP (A) event
|
||||
send_http_event() {
|
||||
local src_ip="$1"
|
||||
local src_port="$2"
|
||||
local timestamp="$3"
|
||||
local method="${4:-GET}"
|
||||
local path="${5:-/test}"
|
||||
local host="${6:-example.com}"
|
||||
|
||||
local json=$(cat <<EOF
|
||||
{"src_ip":"$src_ip","src_port":$src_port,"dst_ip":"10.0.0.1","dst_port":443,"timestamp":$timestamp,"method":"$method","path":"$path","host":"$host","http_version":"HTTP/1.1","header_user_agent":"TestAgent/1.0","header_accept":"*/*"}
|
||||
EOF
|
||||
)
|
||||
|
||||
verbose "Sending HTTP event: $json"
|
||||
send_to_socket "$HTTP_SOCKET" "$json"
|
||||
}
|
||||
|
||||
# Function to send Network (B) event
|
||||
send_network_event() {
|
||||
local src_ip="$1"
|
||||
local src_port="$2"
|
||||
local timestamp="$3"
|
||||
local ja3="${4:-abc123}"
|
||||
local ja4="${5:-def456}"
|
||||
|
||||
local json=$(cat <<EOF
|
||||
{"src_ip":"$src_ip","src_port":$src_port,"dst_ip":"10.0.0.1","dst_port":443,"timestamp":$timestamp,"ja3":"$ja3","ja4":"$ja4","tls_version":"TLS1.3","tls_sni":"example.com"}
|
||||
EOF
|
||||
)
|
||||
|
||||
verbose "Sending Network event: $json"
|
||||
send_to_socket "$NETWORK_SOCKET" "$json"
|
||||
}
|
||||
|
||||
# Check sockets exist
|
||||
check_sockets() {
|
||||
local errors=0
|
||||
|
||||
if [ ! -S "$HTTP_SOCKET" ]; then
|
||||
error "HTTP socket not found: $HTTP_SOCKET"
|
||||
errors=$((errors + 1))
|
||||
else
|
||||
verbose "HTTP socket found: $HTTP_SOCKET"
|
||||
fi
|
||||
|
||||
if [ ! -S "$NETWORK_SOCKET" ]; then
|
||||
error "Network socket not found: $NETWORK_SOCKET"
|
||||
errors=$((errors + 1))
|
||||
else
|
||||
verbose "Network socket found: $NETWORK_SOCKET"
|
||||
fi
|
||||
|
||||
if [ $errors -gt 0 ]; then
|
||||
error "$errors socket(s) not found. Is logcorrelator running?"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
success "Sockets check passed"
|
||||
}
|
||||
|
||||
# Get metrics from server
|
||||
get_metrics() {
|
||||
if [ "$SKIP_METRICS" = true ]; then
|
||||
return 0
|
||||
fi
|
||||
|
||||
if command -v curl &> /dev/null; then
|
||||
curl -s "$METRICS_URL" 2>/dev/null || echo "{}"
|
||||
elif command -v wget &> /dev/null; then
|
||||
wget -qO- "$METRICS_URL" 2>/dev/null || echo "{}"
|
||||
else
|
||||
warn "Neither curl nor wget found. Skipping metrics check."
|
||||
echo "{}"
|
||||
fi
|
||||
}
|
||||
|
||||
# Extract value from JSON (simple grep-based, requires jq for complex queries)
|
||||
get_json_value() {
|
||||
local json="$1"
|
||||
local key="$2"
|
||||
|
||||
if command -v jq &> /dev/null; then
|
||||
echo "$json" | jq -r ".$key // 0"
|
||||
else
|
||||
# Fallback: simple grep (works for flat JSON)
|
||||
echo "$json" | grep -o "\"$key\":[0-9]*" | cut -d: -f2 || echo "0"
|
||||
fi
|
||||
}
|
||||
|
||||
# Main test function
|
||||
run_test() {
|
||||
info "Starting correlation test..."
|
||||
info "Configuration:"
|
||||
echo " HTTP Socket: $HTTP_SOCKET"
|
||||
echo " Network Socket: $NETWORK_SOCKET"
|
||||
echo " Test pairs: $COUNT"
|
||||
echo " Delay between: ${DELAY_MS}ms"
|
||||
echo " Metrics URL: $METRICS_URL"
|
||||
echo " Send command: $SEND_CMD"
|
||||
echo ""
|
||||
|
||||
# Get initial metrics
|
||||
info "Fetching initial metrics..."
|
||||
local initial_metrics=$(get_metrics)
|
||||
local initial_success=$(get_json_value "$initial_metrics" "correlations_success")
|
||||
local initial_failed=$(get_json_value "$initial_metrics" "correlations_failed")
|
||||
local initial_a=$(get_json_value "$initial_metrics" "events_received_a")
|
||||
local initial_b=$(get_json_value "$initial_metrics" "events_received_b")
|
||||
|
||||
info "Initial metrics:"
|
||||
echo " Events A: $initial_a"
|
||||
echo " Events B: $initial_b"
|
||||
echo " Success: $initial_success"
|
||||
echo " Failed: $initial_failed"
|
||||
echo ""
|
||||
|
||||
# Send test events
|
||||
info "Sending $COUNT test event pairs..."
|
||||
|
||||
local base_timestamp=$(get_timestamp_ns)
|
||||
local sent=0
|
||||
local correlated=0
|
||||
|
||||
for i in $(seq 1 $COUNT); do
|
||||
local src_ip="192.168.1.$((i % 254 + 1))"
|
||||
local src_port=$((8000 + i))
|
||||
|
||||
# Send A and B with same timestamp (should correlate)
|
||||
local ts_a=$((base_timestamp + i * 1000000))
|
||||
local ts_b=$ts_a # Same timestamp for perfect correlation
|
||||
|
||||
send_http_event "$src_ip" "$src_port" "$ts_a"
|
||||
send_network_event "$src_ip" "$src_port" "$ts_b"
|
||||
|
||||
sent=$((sent + 1))
|
||||
verbose "Sent pair $i: $src_ip:$src_port"
|
||||
|
||||
if [ $DELAY_MS -gt 0 ]; then
|
||||
sleep $(echo "scale=3; $DELAY_MS / 1000" | bc)
|
||||
fi
|
||||
done
|
||||
|
||||
success "Sent $sent event pairs"
|
||||
echo ""
|
||||
|
||||
# Wait for processing
|
||||
info "Waiting for processing (2 seconds)..."
|
||||
sleep 2
|
||||
|
||||
# Get final metrics
|
||||
info "Fetching final metrics..."
|
||||
local final_metrics=$(get_metrics)
|
||||
local final_success=$(get_json_value "$final_metrics" "correlations_success")
|
||||
local final_failed=$(get_json_value "$final_metrics" "correlations_failed")
|
||||
local final_a=$(get_json_value "$final_metrics" "events_received_a")
|
||||
local final_b=$(get_json_value "$final_metrics" "events_received_b")
|
||||
|
||||
# Calculate deltas
|
||||
local delta_success=$((final_success - initial_success))
|
||||
local delta_failed=$((final_failed - initial_failed))
|
||||
local delta_a=$((final_a - initial_a))
|
||||
local delta_b=$((final_b - initial_b))
|
||||
|
||||
echo ""
|
||||
info "Results:"
|
||||
echo " Events A sent: $delta_a (expected: $sent)"
|
||||
echo " Events B sent: $delta_b (expected: $sent)"
|
||||
echo " Correlations: $delta_success"
|
||||
echo " Failures: $delta_failed"
|
||||
echo ""
|
||||
|
||||
# Validation
|
||||
local test_passed=true
|
||||
|
||||
if [ "$delta_a" -ne "$sent" ]; then
|
||||
error "Event A count mismatch: got $delta_a, expected $sent"
|
||||
test_passed=false
|
||||
fi
|
||||
|
||||
if [ "$delta_b" -ne "$sent" ]; then
|
||||
error "Event B count mismatch: got $delta_b, expected $sent"
|
||||
test_passed=false
|
||||
fi
|
||||
|
||||
if [ "$delta_success" -ne "$sent" ]; then
|
||||
error "Correlation count mismatch: got $delta_success, expected $sent"
|
||||
test_passed=false
|
||||
fi
|
||||
|
||||
if [ "$delta_failed" -ne 0 ]; then
|
||||
warn "Unexpected correlation failures: $delta_failed"
|
||||
fi
|
||||
|
||||
if [ "$test_passed" = true ]; then
|
||||
success "All tests passed! Correlation is working correctly."
|
||||
exit 0
|
||||
else
|
||||
error "Some tests failed. Check the logs for details."
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
# Test with time window exceeded
|
||||
run_time_window_test() {
|
||||
info "Running time window test (B arrives after time window)..."
|
||||
|
||||
local src_ip="192.168.100.1"
|
||||
local src_port="9999"
|
||||
|
||||
# Send A event
|
||||
local ts_a=$(get_timestamp_ns)
|
||||
send_http_event "$src_ip" "$src_port" "$ts_a"
|
||||
info "Sent A event at timestamp $ts_a"
|
||||
|
||||
# Wait for time window to expire (default is 10s, we wait 11s)
|
||||
info "Waiting 11 seconds (time window should expire)..."
|
||||
sleep 11
|
||||
|
||||
# Send B event
|
||||
local ts_b=$(get_timestamp_ns)
|
||||
send_network_event "$src_ip" "$src_port" "$ts_b"
|
||||
info "Sent B event at timestamp $ts_b"
|
||||
|
||||
info "This should result in a time_window failure (check metrics)"
|
||||
}
|
||||
|
||||
# Test with different src_ip
|
||||
run_different_ip_test() {
|
||||
info "Running different IP test (should NOT correlate)..."
|
||||
|
||||
# Send A with IP 192.168.200.1
|
||||
local ts=$(get_timestamp_ns)
|
||||
send_http_event "192.168.200.1" "7777" "$ts"
|
||||
info "Sent A event from 192.168.200.1:7777"
|
||||
|
||||
# Send B with different IP
|
||||
send_network_event "192.168.200.2" "7777" "$ts"
|
||||
info "Sent B event from 192.168.200.2:7777 (different IP)"
|
||||
|
||||
info "These should NOT correlate (different src_ip)"
|
||||
}
|
||||
|
||||
# Run tests
|
||||
check_sockets
|
||||
echo ""
|
||||
|
||||
# Run main test
|
||||
run_test
|
||||
|
||||
echo ""
|
||||
info "Additional tests available:"
|
||||
echo " --test-time-window Test time window expiration"
|
||||
echo " --test-different-ip Test different IP (no correlation)"
|
||||
|
||||
# Check for additional test flags
|
||||
if [[ "$@" == *"--test-time-window"* ]]; then
|
||||
echo ""
|
||||
run_time_window_test
|
||||
fi
|
||||
|
||||
if [[ "$@" == *"--test-different-ip"* ]]; then
|
||||
echo ""
|
||||
run_different_ip_test
|
||||
fi
|
||||
21
services/correlator/sql/bots.sql
Normal file
21
services/correlator/sql/bots.sql
Normal file
@ -0,0 +1,21 @@
|
||||
DROP TABLE IF EXISTS mabase_prod.ref_bot_networks;
|
||||
|
||||
CREATE TABLE mabase_prod.ref_bot_networks (
|
||||
-- On utilise IPv6CIDR car il accepte aussi les IPv4 au format ::ffff:1.2.3.4/120
|
||||
network IPv6CIDR,
|
||||
bot_name LowCardinality(String),
|
||||
is_legitimate UInt8,
|
||||
last_update DateTime
|
||||
) ENGINE = ReplacingMergeTree(last_update)
|
||||
ORDER BY (network, bot_name);
|
||||
|
||||
|
||||
-- Création de la table lisant le fichier des IPs
|
||||
CREATE TABLE mabase_prod.bot_ip (
|
||||
ip String
|
||||
) ENGINE = File(CSV, 'bot_ip.csv');
|
||||
|
||||
-- Création de la table lisant le fichier des signatures JA4
|
||||
CREATE TABLE mabase_prod.bot_ja4 (
|
||||
ja4 String
|
||||
) ENGINE = File(CSV, 'bot_ja4.csv');
|
||||
234
services/correlator/sql/init.sql
Normal file
234
services/correlator/sql/init.sql
Normal file
@ -0,0 +1,234 @@
|
||||
-- =============================================================================
|
||||
-- logcorrelator - Initialisation ClickHouse
|
||||
-- =============================================================================
|
||||
-- Ce fichier crée la base de données, les tables, la vue matérialisée
|
||||
-- et les utilisateurs nécessaires au fonctionnement de logcorrelator.
|
||||
--
|
||||
-- Usage :
|
||||
-- clickhouse-client --multiquery < sql/init.sql
|
||||
-- =============================================================================
|
||||
|
||||
-- -----------------------------------------------------------------------------
|
||||
-- Base de données
|
||||
-- -----------------------------------------------------------------------------
|
||||
CREATE DATABASE IF NOT EXISTS mabase_prod;
|
||||
|
||||
-- -----------------------------------------------------------------------------
|
||||
-- Table brute : cible directe des inserts du service
|
||||
-- Le service n'insère que dans cette table (colonne raw_json).
|
||||
-- -----------------------------------------------------------------------------
|
||||
CREATE TABLE IF NOT EXISTS mabase_prod.http_logs_raw
|
||||
(
|
||||
`raw_json` String CODEC(ZSTD(3)),
|
||||
`ingest_time` DateTime DEFAULT now()
|
||||
)
|
||||
ENGINE = MergeTree
|
||||
PARTITION BY toDate(ingest_time)
|
||||
ORDER BY ingest_time
|
||||
TTL ingest_time + INTERVAL 1 DAY
|
||||
SETTINGS
|
||||
index_granularity = 8192,
|
||||
ttl_only_drop_parts = 1;
|
||||
|
||||
-- -----------------------------------------------------------------------------
|
||||
-- Table parsée : alimentée automatiquement par la vue matérialisée
|
||||
-- -----------------------------------------------------------------------------
|
||||
|
||||
CREATE TABLE mabase_prod.http_logs
|
||||
(
|
||||
-- Temporel
|
||||
`time` DateTime,
|
||||
`log_date` Date DEFAULT toDate(time),
|
||||
|
||||
-- Réseau
|
||||
`src_ip` IPv4,
|
||||
`src_port` UInt16,
|
||||
`dst_ip` IPv4,
|
||||
`dst_port` UInt16,
|
||||
|
||||
-- Enrichissement IPLocate
|
||||
`src_asn` UInt32,
|
||||
`src_country_code` LowCardinality(String),
|
||||
`src_as_name` LowCardinality(String),
|
||||
`src_org` LowCardinality(String),
|
||||
`src_domain` LowCardinality(String),
|
||||
|
||||
-- HTTP
|
||||
`method` LowCardinality(String),
|
||||
`scheme` LowCardinality(String),
|
||||
`host` LowCardinality(String),
|
||||
`path` String CODEC(ZSTD(3)),
|
||||
`query` String CODEC(ZSTD(3)),
|
||||
`http_version` LowCardinality(String),
|
||||
|
||||
-- Corrélation
|
||||
`orphan_side` LowCardinality(String),
|
||||
`correlated` UInt8,
|
||||
`keepalives` UInt16,
|
||||
`a_timestamp` UInt64,
|
||||
`b_timestamp` UInt64,
|
||||
`conn_id` String CODEC(ZSTD(3)),
|
||||
|
||||
-- Métadonnées IP
|
||||
`ip_meta_df` UInt8,
|
||||
`ip_meta_id` UInt16,
|
||||
`ip_meta_total_length` UInt16,
|
||||
`ip_meta_ttl` UInt8,
|
||||
|
||||
-- Métadonnées TCP
|
||||
`tcp_meta_options` LowCardinality(String),
|
||||
`tcp_meta_window_size` UInt32,
|
||||
`tcp_meta_mss` UInt16,
|
||||
`tcp_meta_window_scale` UInt8,
|
||||
`syn_to_clienthello_ms` Int32,
|
||||
|
||||
-- TLS / fingerprint
|
||||
`tls_version` LowCardinality(String),
|
||||
`tls_sni` LowCardinality(String),
|
||||
`tls_alpn` LowCardinality(String),
|
||||
`ja3` String CODEC(ZSTD(3)),
|
||||
`ja3_hash` String CODEC(ZSTD(3)),
|
||||
`ja4` String CODEC(ZSTD(3)),
|
||||
|
||||
-- En-têtes HTTP
|
||||
`client_headers` String CODEC(ZSTD(3)),
|
||||
`header_user_agent` String CODEC(ZSTD(3)),
|
||||
`header_accept` String CODEC(ZSTD(3)),
|
||||
`header_accept_encoding` String CODEC(ZSTD(3)),
|
||||
`header_accept_language` String CODEC(ZSTD(3)),
|
||||
`header_content_type` String CODEC(ZSTD(3)),
|
||||
`header_x_request_id` String CODEC(ZSTD(3)),
|
||||
`header_x_trace_id` String CODEC(ZSTD(3)),
|
||||
`header_x_forwarded_for` String CODEC(ZSTD(3)),
|
||||
`header_sec_ch_ua` String CODEC(ZSTD(3)),
|
||||
`header_sec_ch_ua_mobile` String CODEC(ZSTD(3)),
|
||||
`header_sec_ch_ua_platform` String CODEC(ZSTD(3)),
|
||||
`header_sec_fetch_dest` String CODEC(ZSTD(3)),
|
||||
`header_sec_fetch_mode` String CODEC(ZSTD(3)),
|
||||
`header_sec_fetch_site` String CODEC(ZSTD(3))
|
||||
)
|
||||
ENGINE = MergeTree
|
||||
PARTITION BY log_date
|
||||
ORDER BY (time, src_ip, dst_ip, ja4)
|
||||
TTL log_date + INTERVAL 7 DAY
|
||||
SETTINGS
|
||||
index_granularity = 8192,
|
||||
ttl_only_drop_parts = 1;
|
||||
|
||||
-- -----------------------------------------------------------------------------
|
||||
-- Vue matérialisée : parse le JSON de http_logs_raw vers http_logs
|
||||
-- -----------------------------------------------------------------------------
|
||||
DROP VIEW IF EXISTS mabase_prod.mv_http_logs;
|
||||
|
||||
CREATE MATERIALIZED VIEW IF NOT EXISTS mabase_prod.mv_http_logs
|
||||
TO mabase_prod.http_logs
|
||||
AS
|
||||
SELECT
|
||||
parseDateTimeBestEffort(coalesce(JSONExtractString(raw_json, 'time'), '1970-01-01T00:00:00Z')) AS time,
|
||||
toDate(time) AS log_date,
|
||||
|
||||
toIPv4(coalesce(JSONExtractString(raw_json, 'src_ip'), '0.0.0.0')) AS src_ip,
|
||||
toUInt16(coalesce(JSONExtractUInt(raw_json, 'src_port'), 0)) AS src_port,
|
||||
toIPv4(coalesce(JSONExtractString(raw_json, 'dst_ip'), '0.0.0.0')) AS dst_ip,
|
||||
toUInt16(coalesce(JSONExtractUInt(raw_json, 'dst_port'), 0)) AS dst_port,
|
||||
|
||||
dictGetOrDefault(
|
||||
'mabase_prod.dict_iplocate_asn',
|
||||
'asn',
|
||||
IPv4ToIPv6(IPv4StringToNum(toString(src_ip))),
|
||||
toUInt32(0)
|
||||
) AS src_asn,
|
||||
dictGetOrDefault(
|
||||
'mabase_prod.dict_iplocate_asn',
|
||||
'country_code',
|
||||
IPv4ToIPv6(IPv4StringToNum(toString(src_ip))),
|
||||
''
|
||||
) AS src_country_code,
|
||||
dictGetOrDefault(
|
||||
'mabase_prod.dict_iplocate_asn',
|
||||
'name',
|
||||
IPv4ToIPv6(IPv4StringToNum(toString(src_ip))),
|
||||
''
|
||||
) AS src_as_name,
|
||||
dictGetOrDefault(
|
||||
'mabase_prod.dict_iplocate_asn',
|
||||
'org',
|
||||
IPv4ToIPv6(IPv4StringToNum(toString(src_ip))),
|
||||
''
|
||||
) AS src_org,
|
||||
dictGetOrDefault(
|
||||
'mabase_prod.dict_iplocate_asn',
|
||||
'domain',
|
||||
IPv4ToIPv6(IPv4StringToNum(toString(src_ip))),
|
||||
''
|
||||
) AS src_domain,
|
||||
|
||||
coalesce(JSONExtractString(raw_json, 'method'), '') AS method,
|
||||
coalesce(JSONExtractString(raw_json, 'scheme'), '') AS scheme,
|
||||
coalesce(JSONExtractString(raw_json, 'host'), '') AS host,
|
||||
coalesce(JSONExtractString(raw_json, 'path'), '') AS path,
|
||||
coalesce(JSONExtractString(raw_json, 'query'), '') AS query,
|
||||
coalesce(JSONExtractString(raw_json, 'http_version'), '') AS http_version,
|
||||
|
||||
coalesce(JSONExtractString(raw_json, 'orphan_side'), '') AS orphan_side,
|
||||
toUInt8(coalesce(JSONExtractBool(raw_json, 'correlated'), 0)) AS correlated,
|
||||
toUInt16(coalesce(JSONExtractUInt(raw_json, 'keepalives'), 0)) AS keepalives,
|
||||
coalesce(JSONExtractUInt(raw_json, 'a_timestamp'), 0) AS a_timestamp,
|
||||
coalesce(JSONExtractUInt(raw_json, 'b_timestamp'), 0) AS b_timestamp,
|
||||
coalesce(JSONExtractString(raw_json, 'conn_id'), '') AS conn_id,
|
||||
|
||||
toUInt8(coalesce(JSONExtractBool(raw_json, 'ip_meta_df'), 0)) AS ip_meta_df,
|
||||
toUInt16(coalesce(JSONExtractUInt(raw_json, 'ip_meta_id'), 0)) AS ip_meta_id,
|
||||
toUInt16(coalesce(JSONExtractUInt(raw_json, 'ip_meta_total_length'), 0)) AS ip_meta_total_length,
|
||||
toUInt8(coalesce(JSONExtractUInt(raw_json, 'ip_meta_ttl'), 0)) AS ip_meta_ttl,
|
||||
|
||||
coalesce(JSONExtractString(raw_json, 'tcp_meta_options'), '') AS tcp_meta_options,
|
||||
toUInt32(coalesce(JSONExtractUInt(raw_json, 'tcp_meta_window_size'), 0)) AS tcp_meta_window_size,
|
||||
toUInt16(coalesce(JSONExtractUInt(raw_json, 'tcp_meta_mss'), 0)) AS tcp_meta_mss,
|
||||
toUInt8(coalesce(JSONExtractUInt(raw_json, 'tcp_meta_window_scale'), 0)) AS tcp_meta_window_scale,
|
||||
toInt32(coalesce(JSONExtractInt(raw_json, 'syn_to_clienthello_ms'), 0)) AS syn_to_clienthello_ms,
|
||||
|
||||
coalesce(JSONExtractString(raw_json, 'tls_version'), '') AS tls_version,
|
||||
coalesce(JSONExtractString(raw_json, 'tls_sni'), '') AS tls_sni,
|
||||
coalesce(JSONExtractString(raw_json, 'tls_alpn'), '') AS tls_alpn,
|
||||
coalesce(JSONExtractString(raw_json, 'ja3'), '') AS ja3,
|
||||
coalesce(JSONExtractString(raw_json, 'ja3_hash'), '') AS ja3_hash,
|
||||
coalesce(JSONExtractString(raw_json, 'ja4'), '') AS ja4,
|
||||
|
||||
coalesce(JSONExtractString(raw_json, 'client_headers'), '') AS client_headers,
|
||||
coalesce(JSONExtractString(raw_json, 'header_User-Agent'), '') AS header_user_agent,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Accept'), '') AS header_accept,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Accept-Encoding'), '') AS header_accept_encoding,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Accept-Language'), '') AS header_accept_language,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Content-Type'), '') AS header_content_type,
|
||||
coalesce(JSONExtractString(raw_json, 'header_X-Request-Id'), '') AS header_x_request_id,
|
||||
coalesce(JSONExtractString(raw_json, 'header_X-Trace-Id'), '') AS header_x_trace_id,
|
||||
coalesce(JSONExtractString(raw_json, 'header_X-Forwarded-For'), '') AS header_x_forwarded_for,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Sec-CH-UA'), '') AS header_sec_ch_ua,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Sec-CH-UA-Mobile'), '') AS header_sec_ch_ua_mobile,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Sec-CH-UA-Platform'), '') AS header_sec_ch_ua_platform,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Sec-Fetch-Dest'), '') AS header_sec_fetch_dest,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Sec-Fetch-Mode'), '') AS header_sec_fetch_mode,
|
||||
coalesce(JSONExtractString(raw_json, 'header_Sec-Fetch-Site'), '') AS header_sec_fetch_site
|
||||
|
||||
FROM mabase_prod.http_logs_raw;
|
||||
|
||||
-- -----------------------------------------------------------------------------
|
||||
-- Utilisateurs et permissions
|
||||
-- -----------------------------------------------------------------------------
|
||||
CREATE USER IF NOT EXISTS data_writer IDENTIFIED WITH plaintext_password BY 'ChangeMe';
|
||||
CREATE USER IF NOT EXISTS analyst IDENTIFIED WITH plaintext_password BY 'ChangeMe';
|
||||
|
||||
-- data_writer : INSERT uniquement sur la table brute
|
||||
GRANT INSERT ON mabase_prod.http_logs_raw TO data_writer;
|
||||
GRANT SELECT ON mabase_prod.http_logs_raw TO data_writer;
|
||||
|
||||
-- analyst : lecture sur la table parsée
|
||||
GRANT SELECT ON mabase_prod.http_logs TO analyst;
|
||||
|
||||
-- -----------------------------------------------------------------------------
|
||||
-- Vérifications post-installation
|
||||
-- -----------------------------------------------------------------------------
|
||||
-- SELECT count(*), min(ingest_time), max(ingest_time) FROM mabase_prod.http_logs_raw;
|
||||
-- SELECT count(*), min(time), max(time) FROM mabase_prod.http_logs;
|
||||
-- SELECT time, src_ip, dst_ip, method, host, path, ja4 FROM mabase_prod.http_logs ORDER BY time DESC LIMIT 10;
|
||||
29
services/correlator/sql/tables.sql
Normal file
29
services/correlator/sql/tables.sql
Normal file
@ -0,0 +1,29 @@
|
||||
DROP DICTIONARY IF EXISTS mabase_prod.dict_iplocate_asn;
|
||||
|
||||
CREATE DICTIONARY IF NOT EXISTS mabase_prod.dict_iplocate_asn
|
||||
(
|
||||
network String,
|
||||
asn UInt32,
|
||||
country_code String,
|
||||
name String,
|
||||
org String,
|
||||
domain String
|
||||
)
|
||||
PRIMARY KEY network
|
||||
SOURCE(FILE(path '/var/lib/clickhouse/user_files/iplocate-ip-to-asn.csv' format 'CSVWithNames'))
|
||||
LAYOUT(IP_TRIE())
|
||||
LIFETIME(MIN 3600 MAX 7200);
|
||||
|
||||
|
||||
|
||||
-- Suppression si existe pour reconfiguration
|
||||
DROP TABLE IF EXISTS mabase_prod.ref_bot_networks;
|
||||
|
||||
-- Table optimisée pour le filtrage binaire de CIDR
|
||||
CREATE TABLE mabase_prod.ref_bot_networks (
|
||||
network IPv6CIDR, -- Gère nativement '1.2.3.0/24' et '2001:db8::/32'
|
||||
bot_name LowCardinality(String),
|
||||
is_legitimate UInt8, -- 1 = Whitelist, 0 = Blacklist
|
||||
last_update DateTime
|
||||
) ENGINE = ReplacingMergeTree(last_update)
|
||||
ORDER BY (network, bot_name)
|
||||
8
services/dashboard/.env.example
Normal file
8
services/dashboard/.env.example
Normal file
@ -0,0 +1,8 @@
|
||||
# dashboard configuration — DO NOT COMMIT real values
|
||||
CLICKHOUSE_HOST=clickhouse
|
||||
CLICKHOUSE_PORT=8123
|
||||
CLICKHOUSE_DB=mabase_prod
|
||||
CLICKHOUSE_USER=analyst
|
||||
CLICKHOUSE_PASSWORD=
|
||||
API_HOST=0.0.0.0
|
||||
CORS_ORIGINS=["http://localhost:3000"]
|
||||
114
services/dashboard/.github/copilot-instructions.md
vendored
Normal file
114
services/dashboard/.github/copilot-instructions.md
vendored
Normal file
@ -0,0 +1,114 @@
|
||||
# Copilot Instructions — Bot Detector Dashboard
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
This is a **SOC (Security Operations Center) dashboard** for visualizing bot detections from an upstream `bot_detector_ai` service. It is a **single-service, full-stack app**: the FastAPI backend serves the built React frontend as static files *and* exposes a REST API, all on port 8000. There is no separate frontend server in production and **no authentication**.
|
||||
|
||||
**Data source:** ClickHouse database (`mabase_prod`), primarily the `ml_detected_anomalies` table and the `view_dashboard_entities` view.
|
||||
|
||||
```
|
||||
dashboard/
|
||||
├── backend/ # Python 3.11 + FastAPI — REST API + static file serving
|
||||
│ ├── main.py # App entry point: CORS, router registration, SPA catch-all
|
||||
│ ├── config.py # pydantic-settings Settings, reads .env
|
||||
│ ├── database.py # ClickHouseClient singleton (db)
|
||||
│ ├── models.py # All Pydantic v2 response models
|
||||
│ ├── routes/ # One module per domain: metrics, detections, variability,
|
||||
│ │ # attributes, analysis, entities, incidents, audit, reputation
|
||||
│ └── services/
|
||||
│ └── reputation_ip.py # Async httpx → ip-api.com + ipinfo.io (no API keys)
|
||||
└── frontend/ # React 18 + TypeScript 5 + Vite 5 + Tailwind CSS 3
|
||||
└── src/
|
||||
├── App.tsx # BrowserRouter + Sidebar + TopHeader + all Routes
|
||||
├── ThemeContext.tsx # dark/light/auto, persisted to localStorage (key: soc_theme)
|
||||
├── api/client.ts # Axios instance (baseURL: /api) + all TS interfaces
|
||||
├── components/ # One component per route view + shared panels + ui/
|
||||
├── hooks/ # useMetrics, useDetections, useVariability (polling wrappers)
|
||||
└── utils/STIXExporter.ts
|
||||
```
|
||||
|
||||
## Dev Commands
|
||||
|
||||
```bash
|
||||
# Backend (run from repo root)
|
||||
pip install -r requirements.txt
|
||||
python -m uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000
|
||||
|
||||
# Frontend (separate terminal)
|
||||
cd frontend && npm install
|
||||
npm run dev # :3000 with HMR, proxies /api → localhost:8000
|
||||
npm run build # tsc type-check + vite build → frontend/dist/
|
||||
npm run preview # preview the production build
|
||||
|
||||
# Docker (production)
|
||||
docker compose up -d dashboard_web
|
||||
docker compose build dashboard_web && docker compose up -d dashboard_web
|
||||
docker compose logs -f dashboard_web
|
||||
```
|
||||
|
||||
There is no test suite or linter configured (no pytest, vitest, ESLint, Black, etc.).
|
||||
|
||||
```bash
|
||||
# Manual smoke tests
|
||||
curl http://localhost:8000/health
|
||||
curl http://localhost:8000/api/metrics | jq '.summary'
|
||||
curl "http://localhost:8000/api/detections?page=1&page_size=5" | jq '.items | length'
|
||||
```
|
||||
|
||||
## Key Conventions
|
||||
|
||||
### Backend
|
||||
|
||||
- **All routes are raw SQL** — no ORM. Results are accessed by positional index: `result.result_rows[0][n]`. Column order is determined by the `SELECT` statement.
|
||||
- **Query parameters** use `%(name)s` dict syntax: `db.query(sql, {"param": value})`.
|
||||
- **Every router module** defines `router = APIRouter(prefix="/api/<domain>", tags=["..."])` and is registered in `main.py` via `app.include_router(...)`.
|
||||
- **SPA catch-all** (`/{full_path:path}`) **must remain the last registered route** in `main.py`. New routers must be added with `app.include_router()` before it.
|
||||
- **IPv4 IPs** are stored as IPv6-mapped (`::ffff:x.x.x.x`) in `src_ip`; queries normalize with `replaceRegexpAll(toString(src_ip), '^::ffff:', '')`.
|
||||
- **NULL guards** — all row fields are coalesced: `row[n] or ""`, `row[n] or 0`, `row[n] or "LOW"`.
|
||||
- **`anomaly_score`** can be negative in the DB; always normalize with `abs()` for display.
|
||||
- **`analysis.py`** stores SOC classifications in a `classifications` ClickHouse table. The `audit_logs` table is optional — routes silently return empty results if absent.
|
||||
|
||||
### Frontend
|
||||
|
||||
- **API calls** use the axios instance from `src/api/client.ts` (baseURL `/api`) or direct `fetch('/api/...')`. There is **no global state manager** — components use `useState`/`useEffect` or custom hooks directly.
|
||||
- **TypeScript interfaces** in `client.ts` mirror the Pydantic models in `backend/models.py`. Both must be kept in sync when changing data shapes.
|
||||
- **Tailwind uses semantic CSS-variable tokens** — always use `bg-background`, `bg-background-secondary`, `bg-background-card`, `text-text-primary`, `text-text-secondary`, `text-text-disabled`, `bg-accent-primary`, `threat-critical/high/medium/low` rather than raw Tailwind color classes (e.g., `slate-800`). This ensures dark/light theme compatibility.
|
||||
- **Threat level taxonomy**: `CRITICAL` > `HIGH` > `MEDIUM` > `LOW` — always uppercase strings; colors: red / orange / yellow / green.
|
||||
- **URL encoding**: entity values with special characters (JA4 fingerprints, subnets) are `encodeURIComponent`-encoded. Subnets use `_24` in place of `/24` (e.g., `/entities/subnet/141.98.11.0_24`).
|
||||
- **Recent investigations** are stored in `localStorage` under `soc_recent_investigations` (max 8). Tracked by `RouteTracker` component. Only types `ip`, `ja4`, `subnet` are tracked.
|
||||
- **Auto-refresh**: metrics every 30 s, incidents every 60 s.
|
||||
- **French UI text** — all user-facing strings and log messages are in French; code identifiers are in English.
|
||||
|
||||
### Frontend → Backend in Dev vs Production
|
||||
|
||||
- **Dev**: Vite dev server on `:3000` proxies `/api/*` to `http://localhost:8000` (see `vite.config.ts`).
|
||||
- **Production**: React SPA is served by FastAPI from `frontend/dist/`. API calls hit the same origin at `:8000` — no proxy needed.
|
||||
|
||||
### Docker
|
||||
|
||||
- Single service using `network_mode: "host"` — no port mapping; the container shares the host network stack.
|
||||
- Multi-stage Dockerfile: `node:20-alpine` builds the frontend → `python:3.11-slim` installs deps → final image copies both.
|
||||
|
||||
## Environment Variables (`.env`)
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `CLICKHOUSE_HOST` | `clickhouse` | ClickHouse hostname |
|
||||
| `CLICKHOUSE_PORT` | `8123` | ClickHouse HTTP port (set in code) |
|
||||
| `CLICKHOUSE_DB` | `mabase_prod` | Database name |
|
||||
| `CLICKHOUSE_USER` | `admin` | |
|
||||
| `CLICKHOUSE_PASSWORD` | `` | |
|
||||
| `API_HOST` | `0.0.0.0` | Uvicorn bind host |
|
||||
| `API_PORT` | `8000` | Uvicorn bind port |
|
||||
| `CORS_ORIGINS` | `["http://localhost:3000", ...]` | Allowed origins |
|
||||
|
||||
> ⚠️ The `.env` file contains real credentials — never commit it to public repos.
|
||||
|
||||
## ClickHouse Tables
|
||||
|
||||
| Table / View | Used by |
|
||||
|---|---|
|
||||
| `ml_detected_anomalies` | Primary source for detections, metrics, variability, analysis |
|
||||
| `view_dashboard_entities` | User agents, client headers, paths, query params (entities routes) |
|
||||
| `classifications` | SOC analyst classifications (created by `analysis.py`) |
|
||||
| `mabase_prod.audit_logs` | Audit trail (optional — missing table is handled silently) |
|
||||
86
services/dashboard/.gitignore
vendored
Normal file
86
services/dashboard/.gitignore
vendored
Normal file
@ -0,0 +1,86 @@
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
# GITIGNORE - Bot Detector Dashboard
|
||||
# ═══════════════════════════════════════════════════════════════════════════════
|
||||
|
||||
# ───────────────────────────────────────────────────────────────────────────────
|
||||
# SÉCURITÉ - Ne jamais committer
|
||||
# ───────────────────────────────────────────────────────────────────────────────
|
||||
.env
|
||||
.env.local
|
||||
.env.production
|
||||
*.pem
|
||||
*.key
|
||||
secrets/
|
||||
credentials/
|
||||
|
||||
# ───────────────────────────────────────────────────────────────────────────────
|
||||
# Python
|
||||
# ───────────────────────────────────────────────────────────────────────────────
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
*.so
|
||||
.Python
|
||||
build/
|
||||
develop-eggs/
|
||||
dist/
|
||||
downloads/
|
||||
eggs/
|
||||
.eggs/
|
||||
lib/
|
||||
lib64/
|
||||
parts/
|
||||
sdist/
|
||||
var/
|
||||
wheels/
|
||||
*.egg-info/
|
||||
.installed.cfg
|
||||
*.egg
|
||||
.pytest_cache/
|
||||
.coverage
|
||||
htmlcov/
|
||||
*.manifest
|
||||
*.spec
|
||||
|
||||
# ───────────────────────────────────────────────────────────────────────────────
|
||||
# Node.js / Frontend
|
||||
# ───────────────────────────────────────────────────────────────────────────────
|
||||
node_modules/
|
||||
npm-debug.log*
|
||||
yarn-debug.log*
|
||||
yarn-error.log*
|
||||
frontend/node_modules/
|
||||
frontend/dist/
|
||||
frontend/build/
|
||||
package-lock.json
|
||||
yarn.lock
|
||||
|
||||
# ───────────────────────────────────────────────────────────────────────────────
|
||||
# IDE / Éditeurs
|
||||
# ───────────────────────────────────────────────────────────────────────────────
|
||||
.idea/
|
||||
.vscode/
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
|
||||
# ───────────────────────────────────────────────────────────────────────────────
|
||||
# Logs
|
||||
# ───────────────────────────────────────────────────────────────────────────────
|
||||
*.log
|
||||
logs/
|
||||
test_output.log
|
||||
|
||||
# ───────────────────────────────────────────────────────────────────────────────
|
||||
# Docker
|
||||
# ───────────────────────────────────────────────────────────────────────────────
|
||||
docker-compose.override.yml
|
||||
*.tar
|
||||
|
||||
# ───────────────────────────────────────────────────────────────────────────────
|
||||
# Documentation temporaire
|
||||
# ───────────────────────────────────────────────────────────────────────────────
|
||||
# *.md.tmp
|
||||
# *.md.bak
|
||||
203
services/dashboard/AUDIT_SOC_DASHBOARD.md
Normal file
203
services/dashboard/AUDIT_SOC_DASHBOARD.md
Normal file
@ -0,0 +1,203 @@
|
||||
# Audit SOC du dashboard
|
||||
|
||||
## Résumé exécutif
|
||||
|
||||
Le dashboard est riche fonctionnellement (incidents, investigation IP/JA4, threat intel), mais **pas prêt pour un usage SOC en production** sans durcissement.
|
||||
|
||||
Points majeurs :
|
||||
|
||||
- **Sécurité d’accès insuffisante** : pas d’authentification/RBAC.
|
||||
- **Navigation incohérente** : plusieurs liens pointent vers des routes inexistantes.
|
||||
- **Traçabilité/audit partielle** : journalisation contournable et parfois “success” même en échec.
|
||||
- **Organisation UX perfectible** pour un triage SOC rapide (priorisation, workflow, “next actions”).
|
||||
|
||||
|
||||
## Périmètre audité
|
||||
|
||||
- Frontend React (`frontend/src/App.tsx` + composants de navigation et investigation).
|
||||
- Backend FastAPI (`backend/main.py` + routes `incidents`, `audit`, `entities`, `analysis`, `detections`, `reputation`).
|
||||
- Documentation projet (`README.md`).
|
||||
|
||||
|
||||
## Cartographie des pages et navigation
|
||||
|
||||
### Routes front déclarées
|
||||
|
||||
- `/` → `IncidentsView`
|
||||
- `/threat-intel` → `ThreatIntelView`
|
||||
- `/detections` → `DetectionsList`
|
||||
- `/detections/:type/:value` → `DetailsView`
|
||||
- `/investigation/:ip` → `InvestigationView`
|
||||
- `/investigation/ja4/:ja4` → `JA4InvestigationView`
|
||||
- `/entities/subnet/:subnet` → `SubnetInvestigation`
|
||||
- `/entities/:type/:value` → `EntityInvestigationView`
|
||||
- `/tools/correlation-graph/:ip` → `CorrelationGraph`
|
||||
- `/tools/timeline/:ip?` → `InteractiveTimeline`
|
||||
|
||||
### Graphe de navigation (pages)
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
A["/ (Incidents)"] --> B["/investigation/:ip"]
|
||||
A --> C["/entities/subnet/:subnet"]
|
||||
A --> X["/bulk-classify?ips=... (route absente)"]
|
||||
A --> T["/threat-intel"]
|
||||
|
||||
D["/detections"] --> E["/detections/:type/:value"]
|
||||
D --> B
|
||||
E --> B
|
||||
E --> F["/investigation/ja4/:ja4"]
|
||||
|
||||
C --> B
|
||||
C --> G["/entities/ip/:ip"]
|
||||
G --> B
|
||||
G --> F
|
||||
F --> B
|
||||
|
||||
B --> H["/tools/correlation-graph/:ip"]
|
||||
B --> I["/tools/timeline/:ip?"]
|
||||
|
||||
Q["QuickSearch (global + local)"] --> Y["/investigate/... (route absente)"]
|
||||
Q --> Z["/incidents?threat_level=CRITICAL (route absente)"]
|
||||
```
|
||||
|
||||
### Incohérences de navigation identifiées
|
||||
|
||||
- `QuickSearch` navigue vers `/investigate/...` et `/incidents...` mais ces routes n’existent pas.
|
||||
- `IncidentsView` envoie vers `/bulk-classify?...` sans route déclarée.
|
||||
- `DetectionsList` utilise `window.location.href` (rechargement complet) au lieu du router.
|
||||
- Navigation top-level limitée à 2 entrées (“Incidents”, “Threat Intel”), alors que “Détections” est une vue centrale SOC.
|
||||
- Usage de `window.location.pathname` dans `App.tsx` pour récupérer `:ip` sur certaines routes outils (fragile, non idiomatique React Router).
|
||||
|
||||
|
||||
## Constat sécurité / robustesse (usage SOC)
|
||||
|
||||
## Critique
|
||||
|
||||
- **Absence d’authentification et de RBAC** (confirmé aussi dans le README “usage local”).
|
||||
- Impact SOC : impossible d’attribuer correctement les actions analyste, risque d’accès non maîtrisé.
|
||||
|
||||
- **Injection potentielle dans `entities.py`** :
|
||||
- Construction d’un `IN (...)` SQL par concaténation de valeurs (`ip_values`), non paramétrée.
|
||||
- Impact : surface d’injection côté backend.
|
||||
|
||||
- **Audit log non fiable** :
|
||||
- `/api/audit/logs` accepte un `user` fourni par la requête (default `soc_user`).
|
||||
- En cas d’échec d’insert audit, le code retourne quand même `status: success`.
|
||||
- Impact : non-répudiation faible, traçabilité compromise.
|
||||
|
||||
## Élevé
|
||||
|
||||
- **Rate limiting non appliqué** :
|
||||
- Variable `RATE_LIMIT_PER_MINUTE` existe mais pas de middleware effectif.
|
||||
- Impact : exposition aux abus/DoS et scraping massif.
|
||||
|
||||
- **Fuite d’erreurs internes** :
|
||||
- Plusieurs endpoints retournent `detail=f"Erreur: {str(e)}"`.
|
||||
- Impact : divulgation d’informations techniques.
|
||||
|
||||
## Moyen
|
||||
|
||||
- **Dépendance externe réputation IP** (`ip-api` en HTTP + `ipinfo`) sans contrôle de résilience avancé (fallback opérationnel limité).
|
||||
- **Composants avec `console.error`/`console.log`** en production front.
|
||||
- **Endpoints incidents partiellement “mockés”** (`Implementation en cours`) pouvant tromper l’analyste.
|
||||
|
||||
|
||||
## Format des pages : ce qu’il faut améliorer
|
||||
|
||||
## 1) Priorisation SOC visuelle
|
||||
|
||||
- Uniformiser les conventions de sévérité (couleur, wording, position).
|
||||
- Ajouter un bandeau “Incidents nécessitant action immédiate” en haut de `/`.
|
||||
- Afficher systématiquement : **niveau, confiance, impact, dernière activité, action recommandée**.
|
||||
|
||||
## 2) Densité et lisibilité
|
||||
|
||||
- Réduire l’usage d’emojis non essentiels dans les zones de décision.
|
||||
- Passer les tableaux volumineux en mode “triage” :
|
||||
- colonnes par défaut minimales,
|
||||
- tri par criticité/recence,
|
||||
- tags compacts avec tooltip.
|
||||
|
||||
## 3) Workflow analyste explicite
|
||||
|
||||
- Introduire des CTA standardisés :
|
||||
- `Investiguer`, `Escalader`, `Classer`, `Créer IOC`, `Exporter`.
|
||||
- Ajouter une timeline d’actions SOC (qui a fait quoi, quand, pourquoi) directement sur les vues incident/investigation.
|
||||
|
||||
## 4) Accessibilité opérationnelle
|
||||
|
||||
- Raccourcis clavier cohérents (navigation, filtres, next incident).
|
||||
- État vide explicite + actions suggérées.
|
||||
- Breadcrumb homogène entre toutes les vues.
|
||||
|
||||
|
||||
## Organisation de l’information : recommandations
|
||||
|
||||
## IA) Repenser l’IA de navigation (menu)
|
||||
|
||||
Proposition de structure :
|
||||
|
||||
- **Triage**
|
||||
- Incidents (par défaut)
|
||||
- Détections
|
||||
- **Investigation**
|
||||
- Recherche entité
|
||||
- Vue IP
|
||||
- Vue JA4
|
||||
- Subnet
|
||||
- **Knowledge**
|
||||
- Threat Intel
|
||||
- Tags/Patterns
|
||||
- **Administration**
|
||||
- Audit logs
|
||||
- Santé plateforme
|
||||
|
||||
## IB) Normaliser les routes
|
||||
|
||||
- Remplacer les routes mortes (`/investigate`, `/incidents`, `/bulk-classify` non déclaré) par des routes existantes ou les implémenter.
|
||||
- Éviter `window.location.*` dans les composants routés.
|
||||
- Centraliser les chemins dans un module unique (ex: `routes.ts`) pour éviter les divergences.
|
||||
|
||||
## IC) Standardiser le modèle de page
|
||||
|
||||
Chaque page SOC devrait avoir la même ossature :
|
||||
|
||||
1. Contexte (titre + périmètre + horodatage).
|
||||
2. KPIs critiques.
|
||||
3. Tableau principal de triage.
|
||||
4. Panneau actions.
|
||||
5. Journal d’activité lié à la page.
|
||||
|
||||
|
||||
## Plan d’amélioration priorisé
|
||||
|
||||
## Phase 1 (bloquant prod SOC)
|
||||
|
||||
- Ajouter auth SSO/OIDC + RBAC (viewer/analyst/admin).
|
||||
- Corriger routes mortes et navigation cassée.
|
||||
- Corriger requête SQL non paramétrée dans `entities.py`.
|
||||
- Fiabiliser audit log (identité dérivée de l’auth, échec explicite si log non écrit).
|
||||
|
||||
## Phase 2 (fiabilité)
|
||||
|
||||
- Mettre en place rate limiting effectif.
|
||||
- Assainir gestion d’erreurs (messages utilisateurs + logs serveurs structurés).
|
||||
- Retirer `window.location.href` et unifier navigation SPA.
|
||||
|
||||
## Phase 3 (UX SOC)
|
||||
|
||||
- Refonte “triage-first” des écrans (priorité, next action, temps de traitement).
|
||||
- Uniformiser design tokens et hiérarchie visuelle.
|
||||
- Ajouter vues “queue analyste” et “handover” (passation de quart).
|
||||
|
||||
|
||||
## Verdict
|
||||
|
||||
Le socle est prometteur pour l’investigation technique, mais pour un SOC opérationnel il faut d’abord :
|
||||
|
||||
1. **Sécuriser l’accès et la traçabilité**.
|
||||
2. **Fiabiliser la navigation et les routes**.
|
||||
3. **Recentrer les pages sur le flux de triage SOC**.
|
||||
|
||||
Sans ces corrections, le risque principal est une **dette opérationnelle** (temps perdu en triage) et une **dette de conformité** (auditabilité insuffisante).
|
||||
22
services/dashboard/Dockerfile
Normal file
22
services/dashboard/Dockerfile
Normal file
@ -0,0 +1,22 @@
|
||||
FROM node:20-alpine AS frontend-builder
|
||||
WORKDIR /app/frontend
|
||||
COPY services/dashboard/frontend/package*.json ./
|
||||
RUN npm install
|
||||
COPY services/dashboard/frontend/ ./
|
||||
RUN npm run build
|
||||
|
||||
FROM python:3.11-slim AS backend
|
||||
WORKDIR /app
|
||||
COPY shared/python/ja4_common/ /app/shared/ja4_common/
|
||||
RUN pip install --no-cache-dir /app/shared/ja4_common/
|
||||
COPY services/dashboard/requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
COPY services/dashboard/backend/ ./backend/
|
||||
|
||||
FROM python:3.11-slim
|
||||
WORKDIR /app
|
||||
COPY --from=backend /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
|
||||
COPY --from=backend /app/backend ./backend
|
||||
COPY --from=frontend-builder /app/frontend/dist ./frontend/dist
|
||||
EXPOSE 8000
|
||||
CMD ["python", "-m", "uvicorn", "backend.main:app", "--host", "0.0.0.0", "--port", "8000"]
|
||||
10
services/dashboard/Dockerfile.tests
Normal file
10
services/dashboard/Dockerfile.tests
Normal file
@ -0,0 +1,10 @@
|
||||
FROM python:3.11-slim
|
||||
WORKDIR /app
|
||||
COPY shared/python/ja4_common/ /app/shared/ja4_common/
|
||||
RUN pip install --no-cache-dir /app/shared/ja4_common/
|
||||
COPY services/dashboard/requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
RUN pip install --no-cache-dir pytest pytest-mock httpx
|
||||
COPY services/dashboard/backend/ ./backend/
|
||||
COPY services/dashboard/backend/tests/ ./backend/tests/
|
||||
CMD ["pytest", "backend/tests/", "-v"]
|
||||
242
services/dashboard/RAPPORT_FINAL.md
Normal file
242
services/dashboard/RAPPORT_FINAL.md
Normal file
@ -0,0 +1,242 @@
|
||||
# Rapport Final — SOC Bot Detector Dashboard
|
||||
**Date :** 2026-03-16
|
||||
**Commits :** `8032eba` (corrections bugs), `d4c3512` (améliorations)
|
||||
|
||||
---
|
||||
|
||||
## 1. Corrections de bugs (commit 8032eba)
|
||||
|
||||
| Bug | Cause | Correction |
|
||||
|-----|-------|-----------|
|
||||
| Brute Force > Attaquants : IPs affichées en `::ffff:x.x.x.x` | Pas de normalisation IPv6 dans la requête SQL | `replaceRegexpAll(toString(src_ip), '^::ffff:', '')` ajouté |
|
||||
| Brute Force > Cibles : lien "Voir détails" → page inexistante | Navigation vers `/investigation/{host}` (hostname) au lieu d'une IP | Remplacement par composant `TargetRow` avec expansion inline des attaquants par host |
|
||||
| Header Fingerprint : tableau de détail toujours vide | Frontend lisait `data.ips` au lieu de `data.items` | Correction de la clé |
|
||||
| Heatmap Temporelle : "Top hosts ciblés" vide | Frontend lisait `data.hosts` + erreur de type TypeScript `{ hosts: TopHost[] }` | Correction clé `data.items` + type annotation |
|
||||
| Botnets Distribués : clic sur ligne n'affiche rien | Frontend lisait `data.countries` au lieu de `data.items` | Correction de la clé |
|
||||
| Rotation & Persistance : IPs en `::ffff:` + historique toujours vide | Pas de normalisation + frontend lisait `data.history` au lieu de `data.ja4_history` | Normalisation SQL + correction de la clé |
|
||||
| TCP Spoofing : spoofings détectés sans corrélation TTL | Filtre Python-side sur données déjà filtrées TTL=30–31 | Filtre SQL `spoof_only` déplacé côté ClickHouse |
|
||||
|
||||
---
|
||||
|
||||
## 2. Améliorations implémentées (commit d4c3512)
|
||||
|
||||
### J — Synthèse IP multi-sources
|
||||
- **Endpoint :** `GET /api/investigation/{ip}/summary`
|
||||
- **Widget :** `IPActivitySummary` en haut de toute page d'investigation IP
|
||||
- **Données :** ML + bruteforce + TCP spoofing + JA4 rotation + persistance + timeline 24h
|
||||
- **Score de risque :** 0–100 (jauge SVG colorée)
|
||||
- **Résultat :** Contexte immédiat en un coup d'œil, sans naviguer entre 6 pages
|
||||
|
||||
### I — Comparaison baseline 24h/hier
|
||||
- **Endpoint :** `GET /api/metrics/baseline`
|
||||
- **Widget :** 3 cartes (Détections 24h, IPs uniques, CRITICAL) avec variation ▲▼ en %
|
||||
- **Impact :** Détecte immédiatement les pics anormaux (ex: +246% détections observé)
|
||||
|
||||
### M-4 — Score de sophistication adversaire
|
||||
- **Endpoint :** `GET /api/rotation/sophistication`
|
||||
- **Calcul :** JOIN 3 tables (rotation JA4 × 10 + récurrence × 20 + log(bruteforce+1) × 5)
|
||||
- **Tiers :** APT-like / Advanced / Automated / Basic
|
||||
- **Résultat :** Prioritisation des enquêtes les plus urgentes
|
||||
|
||||
### M-7 — Chasse proactive (low-and-slow)
|
||||
- **Endpoint :** `GET /api/rotation/proactive-hunt`
|
||||
- **Logique :** IPs récurrentes avec `abs(anomaly_score) < 0.5` — volent sous le radar ML
|
||||
- **Évaluation :** "Évadeur potentiel" (ratio récurrence/score > 10) ou "Persistant modéré"
|
||||
- **Impact :** Détecte les botnets slow-and-low que le modèle ML sous-score
|
||||
|
||||
### M-2 — Badge réputation ASN inline
|
||||
- **Modification :** LEFT JOIN `asn_reputation` dans la requête des détections
|
||||
- **Badge :** Rouge (malicious/bot/scanner), orange (proxy/vpn), vert (human)
|
||||
- **Limitation :** La table `asn_reputation` contient 36 ASN français (ISPs légitimes) — les ASNs malveillants connus ne sont pas encore catalogués
|
||||
|
||||
---
|
||||
|
||||
## 3. Tests exhaustifs Playwright
|
||||
|
||||
| Page | Résultat | Notes |
|
||||
|------|----------|-------|
|
||||
| Dashboard principal | ✅ | Baseline ▲ +246.5% détections, ▲ +11.6% IPs, = CRITICAL |
|
||||
| Détections | ✅ | Badge ASN affiché (null pour ASNs hors table reputation) |
|
||||
| Investigation IP (162.55.94.175) | ✅ | Score 38, TCP Spoof TTL 59, JA4 Rotation 9 sig |
|
||||
| Rotation > Sophistication | ✅ | APT-like: 162.55.94.175 (score 100), 46.4.81.149 (score 100) |
|
||||
| Rotation > Chasse proactive | ✅ | IPs avec scores négatifs sous le radar ML |
|
||||
| Brute Force > Attaquants | ✅ | IPs propres (sans `::ffff:`) |
|
||||
| Brute Force > Cibles | ✅ | Expansion inline des attaquants par host |
|
||||
| Header Fingerprint | ✅ | Tableau détail rempli au clic |
|
||||
| Heatmap Temporelle | ✅ | Top hosts ciblés affiché |
|
||||
| Botnets Distribués | ✅ | Détail pays au clic |
|
||||
| TCP Spoofing | ✅ | Filtre `spoof_only` fonctionnel |
|
||||
|
||||
---
|
||||
|
||||
## 4. Points problématiques et axes d'amélioration
|
||||
|
||||
### 🔴 Critiques
|
||||
|
||||
1. **Table `asn_reputation` incomplète** — 36 entrées uniquement (ISPs français). Pour être utile, elle devrait contenir les ASNs des datacenters, VPS, proxies connus (OVH, DigitalOcean, AWS, Linode, etc.). Source suggérée : AbuseIPDB ASN database, IPInfo, Maxmind.
|
||||
|
||||
2. **Chasse proactive — scores négatifs** — `view_ip_recurrence.worst_score` stocke le score brut (peut être négatif). La condition `abs(score) < 0.5` capture des IPs HIGH avec score -0.18 qui sont déjà détectées par ML. Il faudrait filtrer par niveau de menace (`worst_threat_level NOT IN ('HIGH', 'CRITICAL')`) pour vraiment identifier les cas sous le radar.
|
||||
|
||||
3. **Pas de persistance des classifications SOC** — Les classifications manuelles (`/api/analysis/classify`) ne persistent que pendant la session si la table `classifications` n'est pas créée. Un script d'init DB serait utile.
|
||||
|
||||
### 🟡 Moyens
|
||||
|
||||
4. **Score de sophistication biaised** — Les IPs avec forte rotation JA4 mais `recurrence=0` dans `view_ip_recurrence` (non présentes) atteignent quand même score 100. Les données des deux vues ne sont pas toujours cohérentes sur la même période temporelle.
|
||||
|
||||
5. **Timeline 24h dans la synthèse IP** — Utilise `window_start >= now() - INTERVAL 24 HOUR` sur `agg_host_ip_ja4_1h`. Si les données ont moins de 24h d'historique, le graphique sera partiel/vide. Adapter la fenêtre dynamiquement selon les données disponibles.
|
||||
|
||||
6. **Heatmap Temporelle** — Les données de `agg_host_ip_ja4_1h` ne sont agrégées que pour les dernières 24h dans l'endpoint. Un sélecteur de plage temporelle (7j, 30j) permettrait de détecter les patterns de vagues cycliques (botnets hebdomadaires).
|
||||
|
||||
7. **Pas d'export des résultats** — Les analystes SOC ne peuvent pas exporter les listes d'IPs malveillantes (CSV, STIX). Un endpoint `GET /api/rotation/sophistication?format=csv` serait utile pour l'IOC sharing.
|
||||
|
||||
### 🟢 Mineurs
|
||||
|
||||
8. **"Investiguer" dans le RotationView ne transmet pas le contexte** — Un clic sur "Investiguer" depuis l'onglet Sophistication navigue vers `/investigation/{ip}` sans pré-charger le contexte de l'onglet source. Un `?source=sophistication&score=100` dans l'URL permettrait d'afficher un bandeau contextuel.
|
||||
|
||||
9. **Onglets non présents dans la sidebar** — Les 7 dashboards d'analyse avancée ne sont pas organisés en sous-menus. Avec l'ajout des onglets Sophistication et Chasse proactive dans Rotation, la sidebar commence à être longue.
|
||||
|
||||
10. **Badge ASN ne trie pas les détections** — Il n'y a pas encore de filtre "Afficher seulement les ASNs malveillants" dans les détections.
|
||||
|
||||
---
|
||||
|
||||
## 5. Architecture — points de vigilance
|
||||
|
||||
- Le **SPA catch-all** (`/{full_path:path}`) doit rester **le dernier router** dans `main.py`
|
||||
- L'endpoint `/api/investigation/{ip}/summary` utilise le préfixe `/api/investigation` — compatible avec la route SPA `/investigation/:ip` (distinct)
|
||||
- Les **scores négatifs** dans `anomaly_score` et `worst_score` sont normaux — toujours utiliser `abs()` pour l'affichage
|
||||
- Les **IPv6-mapped** (`::ffff:x.x.x.x`) sont présentes dans toutes les vues agrégées — systématiquement utiliser `replaceRegexpAll(toString(src_ip), '^::ffff:', '')`
|
||||
|
||||
|
||||
---
|
||||
|
||||
# Rapport — v2.0.0 : TCP Fingerprinting Multi-Signal + Clustering IPs
|
||||
**Date :** 2026-03-19
|
||||
**Commit :** `e2db8ca`
|
||||
|
||||
---
|
||||
|
||||
## 1. TCP Fingerprinting OS amélioré
|
||||
|
||||
### Problème initial
|
||||
L'ancien `tcp_spoofing.py` utilisait uniquement le TTL avec 3 plages grossières (≤64 = Linux, ≤128 = Windows, sinon = Network). Résultat : faux positifs, aucune détection de bots scanners.
|
||||
|
||||
### Solution implémentée
|
||||
|
||||
**`backend/services/tcp_fingerprint.py`** — 20 signatures OS, scoring multi-signal :
|
||||
|
||||
| Signal | Poids | Source ClickHouse |
|
||||
|--------|-------|------------------|
|
||||
| TTL initial (estimé) | 40% | `tcp_ttl_raw` |
|
||||
| MSS | 30% | `tcp_mss_raw` |
|
||||
| Fenêtre TCP | 20% | `tcp_win_raw` |
|
||||
| Scale factor | 10% | `tcp_scale_raw` |
|
||||
|
||||
**Détections validées en production :**
|
||||
- **Masscan** : `win=5808, mss=1452, scale=4, TTL 48–57` → confiance **97%**
|
||||
- **Googlebot** : stack Windows détecté avec UA Android → **spoof confirmé**
|
||||
- **Bot-tool** : `risk_score += 30` (vs +15 pour spoof simple)
|
||||
|
||||
**MSS → chemin réseau :**
|
||||
- 1460 → Ethernet standard
|
||||
- 1452 → PPPoE / DSL (Masscan pattern)
|
||||
- 1420–1452 → VPN probable
|
||||
- < 1420 → Tunnel / double-encapsulation
|
||||
|
||||
**Fichiers modifiés :**
|
||||
- `backend/services/tcp_fingerprint.py` (nouveau)
|
||||
- `backend/routes/tcp_spoofing.py` (réécriture complète — queries `agg_host_ip_ja4_1h`)
|
||||
- `backend/routes/investigation_summary.py` (utilise le service tcp_fingerprint)
|
||||
- `frontend/src/components/TcpSpoofingView.tsx` (nouvelles colonnes MSS/scale/confiance, graphique distribution MSS)
|
||||
|
||||
---
|
||||
|
||||
## 2. Clustering IPs multi-métriques
|
||||
|
||||
### Problème initial
|
||||
La première version du clustering utilisait uniquement des règles sur les propriétés TCP. L'utilisateur a demandé d'utiliser **l'ensemble des métriques disponibles**.
|
||||
|
||||
### Solution implémentée
|
||||
|
||||
**`backend/services/clustering_engine.py`** — K-means++ pur Python (sans dépendances ML) :
|
||||
|
||||
**21 features normalisées [0,1] :**
|
||||
|
||||
| Catégorie | Features |
|
||||
|-----------|----------|
|
||||
| Stack TCP (4) | TTL initial, MSS, scale, fenêtre |
|
||||
| Anomalie ML (6) | score, vélocité, fuzzing, headless, POST ratio, IP-ID zéro |
|
||||
| TLS/Protocole (5) | ALPN mismatch, ALPN absent, efficacité H2, ordre headers, UA-CH mismatch |
|
||||
| Navigateur (1) | score navigateur moderne (normalisé /50) |
|
||||
| Temporel (3) | entropie, diversité JA4 (log1p), UA rotatif |
|
||||
| Comportement (2) | ratio assets, ratio accès direct |
|
||||
|
||||
**Algorithme :**
|
||||
```
|
||||
K-means++ : init O(k·n), n_init=3, meilleure inertie retenue
|
||||
Power iter : X^T(Xv) trick, O(n·d) par iter — pas de matrice n×n
|
||||
Déflation : Hotelling pour PC2 après extraction PC1
|
||||
```
|
||||
|
||||
**Stratégie d'échantillonnage :** `ORDER BY avg(abs(anomaly_score)) DESC` → les bots (score élevé) sont inclus en priorité, même si leurs hits individuels sont faibles (cas Masscan).
|
||||
|
||||
**Résultats en production (k=14, 3000 IPs) :**
|
||||
- **289 bots confirmés** : clusters UA rotatif + UA-CH mismatch (cloud providers : Microsoft, Google, Akamai)
|
||||
- **655 IPs suspects** : anomalie ML modérée ou UA-CH incohérent
|
||||
- **ASN dominants** : MICROSOFT-CORP-MSN-AS-BLOCK, GOOGLE-CLOUD-PLATFORM, OVH, AMAZON
|
||||
- **Temps de calcul** : ~5–9 secondes (Python pur, 3000 points × 21 features)
|
||||
|
||||
---
|
||||
|
||||
## 3. Visualisation clustering redesignée
|
||||
|
||||
### Problème initial
|
||||
La première version utilisait des bulles ReactFlow positionnées par PCA. L'utilisateur a signalé : **"l'affichage du graphe est illisible"**.
|
||||
|
||||
### Solution implémentée
|
||||
|
||||
**Deux vues distinctes, accessibles par onglets :**
|
||||
|
||||
#### ⊞ Tableau de bord (défaut — toujours lisible)
|
||||
- Grille de cartes groupées par niveau de risque
|
||||
- **Bots & Menaces confirmées** (rouge) → **Suspects** (orange) → **Légitimes** (vert)
|
||||
- Chaque carte : label + IP count + hits + badge CRITIQUE/ÉLEVÉ/MODÉRÉ/SAIN + 4 mini-barres + stack TCP + pays + ASN
|
||||
|
||||
#### ⬡ Graphe de relations
|
||||
- Nœuds-cartes ReactFlow (220px — texte entièrement lisible)
|
||||
- **Colonnes par niveau de menace** (disposition déterministe, pas PCA)
|
||||
- Arêtes colorées : orange=similaire, gris=distant, animé=très fort
|
||||
- Légende intégrée, minimap, contrôles zoom
|
||||
|
||||
#### Sidebar de détail
|
||||
- RadarChart comportemental (10 axes)
|
||||
- Toutes les métriques avec barres de progression
|
||||
- Liste des IPs avec badges menace/pays
|
||||
- Export **Copier IPs** + **⬇ CSV**
|
||||
- Intégrée dans le flux flex (ne bloque plus la barre de contrôle)
|
||||
|
||||
**Fichiers modifiés :**
|
||||
- `backend/routes/clustering.py` (réécriture complète)
|
||||
- `backend/services/clustering_engine.py` (nouveau — seuils calibrés sur données réelles)
|
||||
- `frontend/src/components/ClusteringView.tsx` (réécriture complète)
|
||||
- `frontend/src/App.tsx` (route `/clustering` + nav "🔬 Clustering IPs")
|
||||
|
||||
---
|
||||
|
||||
## 4. Points d'attention
|
||||
|
||||
### Performances
|
||||
- K-means++ sur 3000 × 21 : **5–9s** (acceptable — pas de cache implémenté)
|
||||
- Le cache mémoire du drill-down (`_cache["cluster_ips"]`) est volatile : rechargement = recalcul
|
||||
- Pour améliorer : cache Redis ou TTL 5 min avec `functools.lru_cache`
|
||||
|
||||
### Calibration des seuils
|
||||
Les seuils de `name_cluster()` et `risk_score_from_centroid()` sont calibrés sur les données observées :
|
||||
- `anomaly_score` en production : plage 0.2–0.35 (pas 0–1 comme attendu)
|
||||
- Score normalisé affiché : `min(1, score / 0.5)` pour étirer la plage utile
|
||||
- UA-CH mismatch = 1.0 sur les clusters bot = signal **très fort** (cloud providers simulant un navigateur)
|
||||
|
||||
### Données manquantes dans le LEFT JOIN
|
||||
Certaines IPs n'apparaissent pas dans `ml_detected_anomalies` (score=0, fuzz=0). Ce sont les IPs légitimes non détectées par le modèle ML. Elles forment naturellement les clusters "Trafic Légitime".
|
||||
|
||||
### Fuzzing_index = 100% dans beaucoup de clusters
|
||||
Après analyse : le `fuzzing_index` log-normalisé dépasse souvent le seuil de 100% car les valeurs brutes sont très variables (0 à 229+). Ce n'est pas un bug — c'est la nature du trafic web moderne (beaucoup de requêtes avec des paths variés).
|
||||
672
services/dashboard/README.md
Normal file
672
services/dashboard/README.md
Normal file
@ -0,0 +1,672 @@
|
||||
# 🛡️ Bot Detector Dashboard
|
||||
|
||||
Dashboard web interactif pour visualiser et investiguer les décisions de classification du Bot Detector IA.
|
||||
|
||||
**Version:** 2.0.0 - TCP Fingerprinting Multi-Signal + Clustering IPs Multi-Métriques
|
||||
|
||||
## 🚀 Démarrage Rapide
|
||||
|
||||
### Prérequis
|
||||
|
||||
- Docker et Docker Compose
|
||||
- Le service `clickhouse` déjà déployé
|
||||
- Des données dans la table `ml_detected_anomalies`
|
||||
- Des données dans la table `http_logs` (pour les user-agents)
|
||||
|
||||
> **Note:** Le dashboard peut fonctionner indépendamment de `bot_detector_ai`. Il lit les données déjà détectées dans ClickHouse.
|
||||
|
||||
### Lancement
|
||||
|
||||
```bash
|
||||
# 1. Vérifier que .env existe
|
||||
cp .env.example .env # Si ce n'est pas déjà fait
|
||||
|
||||
# 2. Lancer le dashboard (avec Docker Compose v2)
|
||||
docker compose up -d dashboard_web
|
||||
|
||||
# Ou avec l'ancienne syntaxe
|
||||
docker-compose up -d dashboard_web
|
||||
|
||||
# 3. Ouvrir le dashboard
|
||||
# http://localhost:3000
|
||||
```
|
||||
|
||||
### Arrêt
|
||||
|
||||
```bash
|
||||
docker compose stop dashboard_web
|
||||
```
|
||||
|
||||
### Vérifier le statut
|
||||
|
||||
```bash
|
||||
# Voir les services en cours d'exécution
|
||||
docker compose ps
|
||||
|
||||
# Voir les logs en temps réel
|
||||
docker compose logs -f dashboard_web
|
||||
```
|
||||
|
||||
## 📊 Fonctionnalités
|
||||
|
||||
### Dashboard Principal
|
||||
- **Métriques en temps réel** : Total détections, menaces, bots connus, IPs uniques
|
||||
- **Comparaison baseline J-1** : variation ▲▼ vs hier (détections, IPs uniques, CRITICAL)
|
||||
- **Répartition par menace** : Visualisation CRITICAL/HIGH/MEDIUM/LOW
|
||||
- **Évolution temporelle** : Graphique des détections sur 24h
|
||||
- **Incidents clusterisés** : Regroupement automatique par subnet /24
|
||||
- **Top Menaces Actives** : Top 10 des IPs les plus dangereuses
|
||||
|
||||
### 🧬 TCP Spoofing & Fingerprinting OS (amélioré v2.0)
|
||||
- **Détection multi-signal** : TTL initial + MSS + scale + fenêtre TCP (p0f-style)
|
||||
- **20 signatures OS** : Linux, Windows, macOS, Android, iOS, Masscan, ZMap, Shodan, Googlebot…
|
||||
- **Estimation hop-count** : différence TTL initial (arrondi) − TTL observé
|
||||
- **Détection réseau** : MSS → Ethernet (1460) / PPPoE (1452) / VPN (1420) / Tunnel (<1420)
|
||||
- **Confiance 0–100%** : score pondéré (TTL 40% + MSS 30% + fenêtre 20% + scale 10%)
|
||||
- **Badge bot-tool** : Masscan détecté à 97% (win=5808, mss=1452, scale=4)
|
||||
- **Distribution MSS** : histogramme des MSS observés par cluster
|
||||
|
||||
### 🔬 Clustering IPs Multi-Métriques (nouveau v2.0)
|
||||
- **URL:** `/clustering`
|
||||
- **Algorithme :** K-means++ (Arthur & Vassilvitskii, 2007), initialisé avec k-means++, 3 runs
|
||||
- **21 features normalisées [0,1] :**
|
||||
- Stack TCP : TTL initial, MSS, scale, fenêtre TCP
|
||||
- Anomalie ML : score, vélocité, fuzzing, headless, POST ratio, IP-ID zéro
|
||||
- TLS/Protocole : ALPN mismatch, ALPN absent, efficacité H2 (multiplexing)
|
||||
- Navigateur : score navigateur moderne, ordre headers, UA-CH mismatch
|
||||
- Temporel : entropie, diversité JA4, UA rotatif
|
||||
- **Positionnement 2D :** PCA par puissance itérative (Hotelling) + déflation
|
||||
- **Nommage automatique :** Masscan / Bot UA Rotatif / Bot Fuzzer / Anomalie ML / Linux / Windows / VPN
|
||||
|
||||
**Vue Tableau de bord (défaut) :**
|
||||
- Grille de cartes groupées : Bots confirmés → Suspects → Légitimes
|
||||
- Chaque carte : label, IP count, hits, badge CRITIQUE/ÉLEVÉ/MODÉRÉ/SAIN
|
||||
- 4 mini-barres : anomalie, UA-CH mismatch, fuzzing, UA rotatif
|
||||
- Stack TCP (TTL, MSS, Scale), top pays, ASN
|
||||
|
||||
**Vue Graphe de relations :**
|
||||
- Nœuds-cartes ReactFlow (220px, texte lisible)
|
||||
- Colonnes par niveau de menace : Bots | Suspects | Légitimes
|
||||
- Arêtes colorées par similarité (orange=fort, animé=très fort)
|
||||
- Légende intégrée, minimap, contrôles zoom
|
||||
|
||||
**Sidebar de détail :**
|
||||
- RadarChart comportemental (10 axes : anomalie, UA-CH, fuzzing, headless…)
|
||||
- Toutes les métriques avec barres de progression colorées
|
||||
- Liste des IPs avec badges menace/pays/ASN
|
||||
- Export **Copier IPs** + **⬇ CSV**
|
||||
|
||||
### Investigation Subnet /24
|
||||
- **URL:** `/entities/subnet/x.x.x.x_24`
|
||||
- Stats globales, tableau des IPs, actions par IP
|
||||
|
||||
### Investigation IP + Réputation
|
||||
- **URL:** `/investigation/:ip`
|
||||
- Synthèse multi-sources (ML + bruteforce + TCP + JA4 + timeline)
|
||||
- Score de risque 0–100, réputation IP-API + IPinfo
|
||||
|
||||
### Investigation (Variabilité)
|
||||
- User-Agents, JA4 fingerprints, pays, ASN, hosts, niveaux de menace
|
||||
- Insights automatiques, navigation enchaînable
|
||||
|
||||
## 🏗️ Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Docker Compose │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │
|
||||
│ │ ClickHouse │ │ bot_detector│ │ dashboard_web │ │
|
||||
│ │ :8123 │ │ (existant) │ │ :8000 (web+API)│ │
|
||||
│ │ :9000 │ │ │ │ network=host │ │
|
||||
│ └──────┬──────┘ └──────┬──────┘ └────────┬────────┘ │
|
||||
│ └────────────────┴───────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
> Le container utilise `network_mode: "host"` — le frontend buildé est servi par FastAPI
|
||||
> sur le **port 8000 uniquement** (pas de port 3000 en production).
|
||||
|
||||
### Composants
|
||||
|
||||
| Composant | Technologie | Description |
|
||||
|-----------|-------------|-------------|
|
||||
| **Frontend** | React 18 + TypeScript 5 + Vite 5 + Tailwind CSS 3 | Interface utilisateur (SPA) |
|
||||
| **Backend API** | FastAPI 0.111 + Python 3.11 | API REST + serveur statique SPA |
|
||||
| **Database** | ClickHouse (existant) — port 8123 | Base de données principale |
|
||||
| **Clustering** | K-means++ pur Python + PCA puissance itérative | Algorithmes embarqués, sans dépendance ML |
|
||||
|
||||
## 📁 Structure
|
||||
|
||||
```
|
||||
dashboard/
|
||||
├── Dockerfile # Multi-stage: node:20-alpine → python:3.11-slim
|
||||
├── docker-compose.yaml
|
||||
├── requirements.txt
|
||||
├── backend/
|
||||
│ ├── main.py # FastAPI: CORS, routers, SPA catch-all (doit être DERNIER)
|
||||
│ ├── config.py # pydantic-settings, lit .env
|
||||
│ ├── database.py # ClickHouseClient singleton (db)
|
||||
│ ├── models.py # Modèles Pydantic v2
|
||||
│ ├── routes/
|
||||
│ │ ├── metrics.py # GET /api/metrics, /api/metrics/baseline
|
||||
│ │ ├── detections.py # GET /api/detections
|
||||
│ │ ├── variability.py # GET /api/variability
|
||||
│ │ ├── attributes.py # GET /api/attributes
|
||||
│ │ ├── incidents.py # GET /api/incidents/clusters
|
||||
│ │ ├── entities.py # GET /api/entities
|
||||
│ │ ├── analysis.py # GET/POST /api/analysis — classifications SOC
|
||||
│ │ ├── reputation.py # GET /api/reputation/ip/{ip}
|
||||
│ │ ├── tcp_spoofing.py # GET /api/tcp-spoofing — fingerprinting OS multi-signal
|
||||
│ │ ├── clustering.py # GET /api/clustering/clusters + /cluster/{id}/ips
|
||||
│ │ └── investigation_summary.py # GET /api/investigation/{ip}/summary
|
||||
│ └── services/
|
||||
│ ├── tcp_fingerprint.py # 20 signatures OS, scoring, hop-count, réseau path
|
||||
│ ├── clustering_engine.py # K-means++, PCA-2D, nommage, score risque (pur Python)
|
||||
│ └── reputation_ip.py # httpx → ip-api.com + ipinfo.io (async, sans API key)
|
||||
└── frontend/
|
||||
├── package.json
|
||||
├── vite.config.ts # Proxy /api → :8000 en dev
|
||||
└── src/
|
||||
├── App.tsx # BrowserRouter + Sidebar + TopHeader + Routes
|
||||
├── ThemeContext.tsx # dark/light/auto, localStorage: soc_theme
|
||||
├── api/client.ts # Axios baseURL=/api + toutes les interfaces TypeScript
|
||||
├── components/
|
||||
│ ├── ClusteringView.tsx # K-means++ clustering — 2 vues
|
||||
│ ├── TcpSpoofingView.tsx # TCP fingerprinting OS
|
||||
│ ├── InvestigationView.tsx # Investigation IP complète
|
||||
│ └── ... # Autres vues
|
||||
├── hooks/ # useMetrics, useDetections, useVariability (polling)
|
||||
└── utils/STIXExporter.ts
|
||||
```
|
||||
|
||||
## 🔌 API
|
||||
|
||||
### Endpoints
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| GET | `/api/metrics` | Métriques globales |
|
||||
| GET | `/api/metrics/baseline` | Comparaison J-1 (détections, IPs, CRITICAL) |
|
||||
| GET | `/api/metrics/threats` | Distribution par menace |
|
||||
| GET | `/api/detections` | Liste des détections paginée |
|
||||
| GET | `/api/detections/{id}` | Détails d'une détection |
|
||||
| GET | `/api/variability/{type}/{value}` | Variabilité d'un attribut |
|
||||
| GET | `/api/attributes/{type}` | Valeurs uniques d'un attribut |
|
||||
| GET | `/api/incidents/clusters` | Incidents clusterisés par subnet /24 |
|
||||
| GET | `/api/entities/subnet/{subnet}` | Investigation subnet (ex: `141.98.11.0_24`) |
|
||||
| GET | `/api/entities/{type}/{value}` | Investigation entité (IP, JA4, UA…) |
|
||||
| GET | `/api/reputation/ip/{ip}` | Réputation IP (IP-API + IPinfo) |
|
||||
| GET | `/api/investigation/{ip}/summary` | Synthèse IP multi-sources (ML + TCP + JA4) |
|
||||
| GET | `/api/analysis/{ip}/subnet` | Analyse subnet / ASN |
|
||||
| GET | `/api/analysis/{ip}/recommendation` | Recommandation de classification |
|
||||
| POST | `/api/analysis/classifications` | Sauvegarder classification SOC |
|
||||
| GET | `/api/tcp-spoofing/overview` | Vue d'ensemble TCP spoofing + OS |
|
||||
| GET | `/api/tcp-spoofing/list` | Liste des détections TCP spoofing |
|
||||
| GET | `/api/tcp-spoofing/matrix` | Matrice OS déclaré vs OS réel |
|
||||
| GET | `/api/clustering/clusters` | Clustering K-means++ (`?k=14&n_samples=3000`) |
|
||||
| GET | `/api/clustering/cluster/{id}/ips` | IPs d'un cluster (drill-down) |
|
||||
| GET | `/health` | Health check |
|
||||
|
||||
### Exemples
|
||||
|
||||
```bash
|
||||
# Health check
|
||||
curl http://localhost:8000/health
|
||||
|
||||
# Métriques globales + baseline
|
||||
curl http://localhost:8000/api/metrics | jq '.summary'
|
||||
curl http://localhost:8000/api/metrics/baseline | jq
|
||||
|
||||
# Détections CRITICAL
|
||||
curl "http://localhost:8000/api/detections?threat_level=CRITICAL&page=1" | jq '.items | length'
|
||||
|
||||
# TCP Spoofing — vue d'ensemble
|
||||
curl http://localhost:8000/api/tcp-spoofing/overview | jq
|
||||
|
||||
# Clustering IPs (14 clusters sur 3000 échantillons)
|
||||
curl "http://localhost:8000/api/clustering/clusters?k=14&n_samples=3000" | jq '.stats'
|
||||
|
||||
# Drill-down d'un cluster
|
||||
curl "http://localhost:8000/api/clustering/cluster/c0_k14/ips?limit=20" | jq '.ips[].ip'
|
||||
|
||||
# Réputation IP
|
||||
curl http://localhost:8000/api/reputation/ip/162.55.94.175 | jq
|
||||
```
|
||||
|
||||
## ⚙️ Configuration
|
||||
|
||||
### Variables d'Environnement
|
||||
|
||||
| Variable | Défaut | Description |
|
||||
|----------|--------|-------------|
|
||||
| `CLICKHOUSE_HOST` | `clickhouse` | Hôte ClickHouse |
|
||||
| `CLICKHOUSE_PORT` | `8123` | Port HTTP ClickHouse |
|
||||
| `CLICKHOUSE_DB` | `mabase_prod` | Base de données |
|
||||
| `CLICKHOUSE_USER` | `admin` | Utilisateur |
|
||||
| `CLICKHOUSE_PASSWORD` | `` | Mot de passe |
|
||||
| `API_HOST` | `0.0.0.0` | Bind Uvicorn |
|
||||
| `API_PORT` | `8000` | Port API + frontend |
|
||||
| `CORS_ORIGINS` | `["http://localhost:3000", ...]` | Origines CORS autorisées |
|
||||
|
||||
Ces variables sont lues depuis le fichier `.env` à la racine du projet.
|
||||
|
||||
> ⚠️ Le fichier `.env` contient les credentials réels — ne jamais le committer.
|
||||
|
||||
## 🔍 Workflows d'Investigation
|
||||
|
||||
### Exemple 1 : Identifier un bot Masscan
|
||||
|
||||
1. **🔬 Clustering IPs** → Cluster "🤖 Masscan / Scanner IP" visible en rouge
|
||||
2. **Clic sur la carte** → Sidebar : TTL=52, MSS=1452, Scale=4 — pattern Masscan
|
||||
3. **Copier les IPs** → Liste prête pour le blocage
|
||||
4. **Export CSV** → Import dans le SIEM ou firewall
|
||||
|
||||
### Exemple 2 : Analyser des bots UA-rotatifs (cloud)
|
||||
|
||||
1. **Clustering** → Cluster "🤖 Bot UA Rotatif + CH Mismatch" (risque 50%)
|
||||
2. **RadarChart** → UA-CH=100%, UA rotatif=100%, anomalie=59%
|
||||
3. **Top ASN** → Microsoft, Google, Akamai — cloud providers
|
||||
4. **🧬 TCP Spoofing** → Confirmer : ces IPs déclarent Windows UA mais ont TTL Linux
|
||||
5. **Investigation IP** → Détail complet avec timeline 24h
|
||||
|
||||
### Exemple 3 : Détecter le spoofing d'OS
|
||||
|
||||
1. **🧬 TCP Spoofing** → Liste des IPs avec mismatch OS
|
||||
2. **Matrice UA×OS** → User-Agent Android mais stack TCP Windows = spoof
|
||||
3. **Confiance 85%** → MSS=1460 (Ethernet), scale=7, TTL≈64 → Linux réel
|
||||
4. **Action** → Classer comme bot avec IP proxy
|
||||
|
||||
### Exemple 4 : Investiguer une IP suspecte
|
||||
|
||||
1. **🎯 Détections** → IP classifiée 🔴 CRITICAL
|
||||
2. **Clic sur l'IP** → Synthèse : ML + TCP + JA4 + bruteforce + timeline
|
||||
3. **Score de risque** : 85/100
|
||||
4. **User-Agents** → 3 UA différents en 24h (rotation)
|
||||
5. **TCP** → TTL initial 128 (Windows) mais UA Linux → spoof
|
||||
6. **Action** → Blacklist immédiate
|
||||
|
||||
## 🧬 Services techniques (v2.0)
|
||||
|
||||
### `backend/services/tcp_fingerprint.py`
|
||||
|
||||
Détection multi-signal de l'OS réel basée sur la stack TCP :
|
||||
|
||||
```python
|
||||
from backend.services.tcp_fingerprint import fingerprint_os, detect_spoof
|
||||
|
||||
result = fingerprint_os(ttl=52, win=5808, scale=4, mss=1452)
|
||||
# → OSFingerprint(os_family="Masscan/Scanner", confidence=0.97, is_bot_tool=True)
|
||||
|
||||
spoof = detect_spoof(declared_ua="Chrome/Windows", fingerprint=result)
|
||||
# → SpoofResult(is_spoof=True, reason="UA Windows mais stack Masscan", risk_score=30)
|
||||
```
|
||||
|
||||
**Poids du scoring :** TTL initial 40% + MSS 30% + fenêtre 20% + scale 10%
|
||||
|
||||
**Estimation hop-count :**
|
||||
- TTL observé 52 → TTL initial arrondi = 64 → hops = 64 − 52 = **12**
|
||||
- TTL observé 119 → TTL initial = 128 → hops = 9
|
||||
|
||||
**MSS → chemin réseau :**
|
||||
| MSS | Réseau détecté |
|
||||
|-----|---------------|
|
||||
| 1460 | Ethernet standard |
|
||||
| 1452 | PPPoE / DSL |
|
||||
| 1420–1452 | VPN probable |
|
||||
| < 1420 | Tunnel / double-encap |
|
||||
|
||||
### `backend/services/clustering_engine.py`
|
||||
|
||||
K-means++ + PCA-2D embarqués en pur Python (sans numpy/sklearn) :
|
||||
|
||||
```
|
||||
K-means++ init : O(k·n) distances, n_init=3 runs → meilleure inertie
|
||||
Power iteration : X^T(Xv) trick → O(n·d) par itération, pas de matrice n×n
|
||||
Déflation Hotelling : retire PC1 de X avant de calculer PC2
|
||||
```
|
||||
|
||||
**21 features normalisées [0,1]** — voir `FEATURES` dans le fichier.
|
||||
|
||||
**Nommage automatique** par priorité décroissante :
|
||||
1. Pattern Masscan (mss 1440–1460, scale 3–5, TTL<60)
|
||||
2. Fuzzing agressif (fuzzing_index normalisé > 0.35 ≈ valeur brute > 100)
|
||||
3. UA rotatif + UA-CH mismatch simultanés
|
||||
4. UA-CH mismatch seul > 80%
|
||||
5. Score anomalie ML > 20% + signal comportemental
|
||||
6. Classification réseau / OS par TTL/MSS
|
||||
|
||||
## 🗄️ Tables ClickHouse utilisées
|
||||
|
||||
| Table / Vue | Routes |
|
||||
|---|---|
|
||||
| `mabase_prod.ml_detected_anomalies` | metrics, detections, variability, analysis, clustering |
|
||||
| `mabase_prod.agg_host_ip_ja4_1h` | tcp_spoofing, clustering, investigation_summary |
|
||||
| `mabase_prod.view_dashboard_entities` | entities (UA, JA4, paths, query params) |
|
||||
| `mabase_prod.classifications` | analysis (classifications SOC manuelles) |
|
||||
| `mabase_prod.audit_logs` | audit (optionnel — silencieux si absent) |
|
||||
|
||||
**Conventions SQL :**
|
||||
- IPs stockées en IPv6-mappé : `replaceRegexpAll(toString(src_ip), '^::ffff:', '')`
|
||||
- `anomaly_score` peut être négatif : toujours utiliser `abs()`
|
||||
- `fuzzing_index` peut dépasser 200 : normaliser avec `log1p`
|
||||
- `multiplexing_efficiency` peut dépasser 1 : normaliser avec `log1p`
|
||||
- Paramètres SQL : syntaxe `%(name)s` (dict ClickHouse)
|
||||
- **SPA catch-all DOIT être le dernier router dans `main.py`**
|
||||
|
||||
## 🎨 Thème
|
||||
|
||||
Le dashboard utilise un **thème sombre** optimisé SOC (dark par défaut, clair et auto disponibles) :
|
||||
|
||||
- **Tokens CSS sémantiques** : `bg-background`, `bg-background-card`, `text-text-primary`, `text-text-secondary`…
|
||||
- **Taxonomie menaces** : rouge CRITICAL / orange HIGH / jaune MEDIUM / vert LOW
|
||||
- **Persistance** : `localStorage` clé `soc_theme`
|
||||
- **Ne jamais utiliser** de classes Tailwind brutes (`slate-800`) — toujours les tokens sémantiques
|
||||
|
||||
## 📝 Logs
|
||||
|
||||
Les logs du dashboard sont accessibles via Docker :
|
||||
|
||||
```bash
|
||||
# Logs du container
|
||||
docker logs dashboard_web
|
||||
|
||||
# Logs en temps réel
|
||||
docker logs -f dashboard_web
|
||||
```
|
||||
|
||||
## 🧪 Tests et Validation
|
||||
|
||||
### Script de test rapide
|
||||
|
||||
Créez un fichier `test_dashboard.sh` :
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
echo "=== Test Dashboard Bot Detector ==="
|
||||
|
||||
# 1. Health check
|
||||
echo -n "1. Health check... "
|
||||
curl -s http://localhost:3000/health > /dev/null && echo "✅ OK" || echo "❌ ÉCHOUÉ"
|
||||
|
||||
# 2. API Metrics
|
||||
echo -n "2. API Metrics... "
|
||||
curl -s http://localhost:3000/api/metrics | jq -e '.summary' > /dev/null && echo "✅ OK" || echo "❌ ÉCHOUÉ"
|
||||
|
||||
# 3. API Detections
|
||||
echo -n "3. API Detections... "
|
||||
curl -s http://localhost:3000/api/detections | jq -e '.items' > /dev/null && echo "✅ OK" || echo "❌ ÉCHOUÉ"
|
||||
|
||||
# 4. Frontend
|
||||
echo -n "4. Frontend HTML... "
|
||||
curl -s http://localhost:3000 | grep -q "Bot Detector" && echo "✅ OK" || echo "❌ ÉCHOUÉ"
|
||||
|
||||
echo "=== Tests terminés ==="
|
||||
```
|
||||
|
||||
Rendez-le exécutable et lancez-le :
|
||||
|
||||
```bash
|
||||
chmod +x test_dashboard.sh
|
||||
./test_dashboard.sh
|
||||
```
|
||||
|
||||
### Tests manuels de l'API
|
||||
|
||||
```bash
|
||||
# 1. Health check
|
||||
curl http://localhost:3000/health
|
||||
|
||||
# 2. Métriques globales
|
||||
curl http://localhost:3000/api/metrics | jq
|
||||
|
||||
# 3. Liste des détections (page 1, 25 items)
|
||||
curl "http://localhost:3000/api/detections?page=1&page_size=25" | jq
|
||||
|
||||
# 4. Filtrer par menace CRITICAL
|
||||
curl "http://localhost:3000/api/detections?threat_level=CRITICAL" | jq '.items[].src_ip'
|
||||
|
||||
# 5. Distribution par menace
|
||||
curl http://localhost:3000/api/metrics/threats | jq
|
||||
|
||||
# 6. Liste des IPs uniques (top 10)
|
||||
curl "http://localhost:3000/api/attributes/ip?limit=10" | jq
|
||||
|
||||
# 7. Variabilité d'une IP (remplacer par une IP réelle)
|
||||
curl http://localhost:3000/api/variability/ip/192.168.1.100 | jq
|
||||
|
||||
# 8. Variabilité d'un pays
|
||||
curl http://localhost:3000/api/variability/country/FR | jq
|
||||
|
||||
# 9. Variabilité d'un ASN
|
||||
curl http://localhost:3000/api/variability/asn/16276 | jq
|
||||
```
|
||||
|
||||
### Test du Frontend
|
||||
|
||||
```bash
|
||||
# Vérifier que le HTML est servi
|
||||
curl -s http://localhost:3000 | head -20
|
||||
|
||||
# Ou ouvrir dans le navigateur
|
||||
# http://localhost:3000
|
||||
```
|
||||
|
||||
### Scénarios de test utilisateur
|
||||
|
||||
1. **Navigation de base**
|
||||
- Ouvrir http://localhost:3000
|
||||
- Vérifier que les métriques s'affichent
|
||||
- Cliquer sur "📋 Détections"
|
||||
|
||||
2. **Recherche et filtres**
|
||||
- Rechercher une IP : `192.168`
|
||||
- Filtrer par menace : CRITICAL
|
||||
- Changer de page
|
||||
|
||||
3. **Investigation (variabilité)**
|
||||
- Cliquer sur une IP dans le tableau
|
||||
- Vérifier la section "User-Agents" (plusieurs valeurs ?)
|
||||
- Cliquer sur un User-Agent pour investiguer
|
||||
- Utiliser le breadcrumb pour revenir en arrière
|
||||
|
||||
4. **Insights**
|
||||
- Trouver une IP avec plusieurs User-Agents
|
||||
- Vérifier que l'insight "Possible rotation/obfuscation" s'affiche
|
||||
|
||||
### Vérifier les données ClickHouse
|
||||
|
||||
```bash
|
||||
# Compter les détections (24h)
|
||||
docker compose exec clickhouse clickhouse-client -d mabase_prod -q \
|
||||
"SELECT count() FROM ml_detected_anomalies WHERE detected_at >= now() - INTERVAL 24 HOUR"
|
||||
|
||||
# Voir un échantillon
|
||||
docker compose exec clickhouse clickhouse-client -d mabase_prod -q \
|
||||
"SELECT src_ip, threat_level, model_name, detected_at FROM ml_detected_anomalies ORDER BY detected_at DESC LIMIT 5"
|
||||
|
||||
# Vérifier les vues du dashboard
|
||||
docker compose exec clickhouse clickhouse-client -d mabase_prod -q \
|
||||
"SELECT * FROM view_dashboard_summary"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🐛 Dépannage
|
||||
|
||||
### Diagnostic rapide
|
||||
|
||||
```bash
|
||||
# 1. Vérifier que les services tournent
|
||||
docker compose ps
|
||||
|
||||
# 2. Vérifier les logs du dashboard
|
||||
docker compose logs dashboard_web | tail -50
|
||||
|
||||
# 3. Tester la connexion ClickHouse depuis le dashboard
|
||||
docker compose exec dashboard_web curl -v http://clickhouse:8123/ping
|
||||
```
|
||||
|
||||
### Le dashboard ne démarre pas
|
||||
|
||||
```bash
|
||||
# Vérifier les logs
|
||||
docker compose logs dashboard_web
|
||||
|
||||
# Erreur courante: Port déjà utilisé
|
||||
# Solution: Changer le port dans docker-compose.yml
|
||||
|
||||
# Erreur courante: Image non construite
|
||||
docker compose build dashboard_web
|
||||
docker compose up -d dashboard_web
|
||||
```
|
||||
|
||||
### Aucune donnée affichée (dashboard vide)
|
||||
|
||||
```bash
|
||||
# 1. Vérifier qu'il y a des données dans ClickHouse
|
||||
docker compose exec clickhouse clickhouse-client -d mabase_prod -q \
|
||||
"SELECT count() FROM ml_detected_anomalies WHERE detected_at >= now() - INTERVAL 24 HOUR"
|
||||
|
||||
# Si le résultat est 0:
|
||||
# - Lancer bot_detector_ai pour générer des données
|
||||
docker compose up -d bot_detector_ai
|
||||
docker compose logs -f bot_detector_ai
|
||||
|
||||
# - Ou importer des données manuellement
|
||||
```
|
||||
|
||||
### Erreur "Connexion ClickHouse échoué"
|
||||
|
||||
```bash
|
||||
# 1. Vérifier que ClickHouse est démarré
|
||||
docker compose ps clickhouse
|
||||
|
||||
# 2. Tester la connexion
|
||||
docker compose exec clickhouse clickhouse-client -q "SELECT 1"
|
||||
|
||||
# 3. Vérifier les credentials dans .env
|
||||
cat .env | grep CLICKHOUSE
|
||||
|
||||
# 4. Redémarrer le dashboard
|
||||
docker compose restart dashboard_web
|
||||
|
||||
# 5. Vérifier les logs d'erreur
|
||||
docker compose logs dashboard_web | grep -i error
|
||||
```
|
||||
|
||||
### Erreur 404 sur les routes API
|
||||
|
||||
```bash
|
||||
# Vérifier que l'API répond
|
||||
curl http://localhost:3000/health
|
||||
curl http://localhost:3000/api/metrics
|
||||
|
||||
# Si 404, redémarrer le dashboard
|
||||
docker compose restart dashboard_web
|
||||
```
|
||||
|
||||
### Port 3000 déjà utilisé
|
||||
|
||||
```bash
|
||||
# Option 1: Changer le port dans docker-compose.yml
|
||||
# Remplacer: - "3000:8000"
|
||||
# Par: - "8080:8000"
|
||||
|
||||
# Option 2: Trouver et tuer le processus
|
||||
lsof -i :3000
|
||||
kill <PID>
|
||||
|
||||
# Puis redémarrer
|
||||
docker compose up -d dashboard_web
|
||||
```
|
||||
|
||||
### Frontend ne se charge pas (page blanche)
|
||||
|
||||
```bash
|
||||
# 1. Vérifier la console du navigateur (F12)
|
||||
# 2. Vérifier que le build frontend existe
|
||||
docker compose exec dashboard_web ls -la /app/frontend/dist
|
||||
|
||||
# 3. Si vide, reconstruire l'image
|
||||
docker compose build --no-cache dashboard_web
|
||||
docker compose up -d dashboard_web
|
||||
```
|
||||
|
||||
### Logs d'erreur courants
|
||||
|
||||
| Erreur | Cause | Solution |
|
||||
|--------|-------|----------|
|
||||
| `Connection refused` | ClickHouse pas démarré | `docker compose up -d clickhouse` |
|
||||
| `Authentication failed` | Mauvais credentials | Vérifier `.env` |
|
||||
| `Table doesn't exist` | Vues non créées | Lancer `deploy_views.sql` |
|
||||
| `No data available` | Pas de données | Lancer `bot_detector_ai` |
|
||||
|
||||
---
|
||||
|
||||
## 🔒 Sécurité
|
||||
|
||||
- **Pas d'authentification** : Dashboard conçu pour un usage local
|
||||
- **CORS restreint** : Seulement localhost:3000
|
||||
- **Rate limiting** : 100 requêtes/minute
|
||||
- **Credentials** : Via variables d'environnement (jamais en dur)
|
||||
|
||||
## 📊 Performances
|
||||
|
||||
- **Temps de chargement** : < 2s (avec données)
|
||||
- **Requêtes ClickHouse** : Optimisées avec agrégations
|
||||
- **Rafraîchissement auto** : 30 secondes (métriques)
|
||||
|
||||
## 🧪 Développement
|
||||
|
||||
### Build local (sans Docker)
|
||||
|
||||
```bash
|
||||
# Backend
|
||||
cd dashboard
|
||||
pip install -r requirements.txt
|
||||
python -m uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000
|
||||
|
||||
# Frontend (dans un autre terminal)
|
||||
cd dashboard/frontend
|
||||
npm install
|
||||
npm run dev # http://localhost:5173
|
||||
```
|
||||
|
||||
### Documentation API interactive
|
||||
|
||||
L'API inclut une documentation Swagger interactive :
|
||||
|
||||
```bash
|
||||
# Ouvrir dans le navigateur
|
||||
http://localhost:3000/docs
|
||||
|
||||
# Ou directement sur le port API
|
||||
http://localhost:8000/docs
|
||||
```
|
||||
|
||||
### Tests unitaires (à venir)
|
||||
|
||||
```bash
|
||||
# Backend (pytest)
|
||||
cd dashboard
|
||||
pytest backend/tests/
|
||||
|
||||
# Frontend (jest)
|
||||
cd dashboard/frontend
|
||||
npm test
|
||||
```
|
||||
|
||||
## 📄 License
|
||||
|
||||
Même license que le projet principal Bot Detector.
|
||||
|
||||
---
|
||||
|
||||
## 📞 Support
|
||||
|
||||
Pour toute question ou problème :
|
||||
|
||||
1. Vérifier la section **🐛 Dépannage** ci-dessus
|
||||
2. Consulter les logs : `docker compose logs dashboard_web`
|
||||
3. Vérifier que ClickHouse contient des données
|
||||
4. Ouvrir une issue sur le dépôt
|
||||
57
services/dashboard/ROUTES_NAVIGATION_PROGRESS.md
Normal file
57
services/dashboard/ROUTES_NAVIGATION_PROGRESS.md
Normal file
@ -0,0 +1,57 @@
|
||||
# Plan d'exécution — Routes & Navigation
|
||||
|
||||
## Contexte
|
||||
|
||||
- Authentification applicative **hors périmètre** (gérée par `htaccess`).
|
||||
- Objectif: rendre les routes/navigation cohérentes et sans liens cassés.
|
||||
|
||||
## Étapes et avancement
|
||||
|
||||
| Étape | Description | Statut | Notes |
|
||||
|---|---|---|---|
|
||||
| 1 | Préparer ce document de suivi | ✅ Fait | Document créé et utilisé comme source de progression. |
|
||||
| 2 | Lancer un baseline (checks existants) | ✅ Fait | `docker compose build dashboard_web` exécuté (OK). |
|
||||
| 3 | Corriger les routes déclarées (aliases + routes manquantes) | ✅ Fait | Ajout de `/incidents`, `/investigate`, `/investigate/:type/:value`, `/bulk-classify` + wrappers tools route params. |
|
||||
| 4 | Corriger la navigation (liens/boutons/quick search) | ✅ Fait | Navigation top enrichie, quick actions corrigées, suppression de `window.location.href`. |
|
||||
| 5 | Valider après changements (build/checks) | ✅ Fait | `docker compose build dashboard_web` OK après modifications. |
|
||||
| 6 | Finaliser ce document avec résultats | ✅ Fait | Synthèse et statut final complétés. |
|
||||
| 7 | Réécriture graph de corrélations | ✅ Fait | Custom node types, layout radial, fitView, séparation fetch/filtre, erreur gérée, hauteur 700px. |
|
||||
|
||||
## Journal d’avancement
|
||||
|
||||
### Étape 1 — Préparer le document
|
||||
- Statut: ✅ Fait
|
||||
- Action: création du document de suivi avec étapes et statuts.
|
||||
|
||||
### Étape 2 — Baseline Docker
|
||||
- Statut: ✅ Fait
|
||||
- Action: exécution de `docker compose build dashboard_web`.
|
||||
- Résultat: build OK (code de sortie 0), warning non bloquant sur `version` obsolète dans compose.
|
||||
|
||||
### Étape 3 — Correction des routes
|
||||
- Statut: ✅ Fait
|
||||
- Actions:
|
||||
- ajout route alias `/incidents` vers la vue incidents;
|
||||
- ajout routes `/investigate` et `/investigate/:type/:value` avec redirection intelligente;
|
||||
- ajout route `/bulk-classify` avec wrapper d’intégration;
|
||||
- remplacement des usages `window.location.pathname` par des wrappers route basés sur `useParams`.
|
||||
|
||||
### Étape 4 — Correction de la navigation
|
||||
- Statut: ✅ Fait
|
||||
- Actions:
|
||||
- ajout d’un onglet navigation `Détections`;
|
||||
- activation menu corrigée (gestion des alias/sous-routes);
|
||||
- remplacement de `window.location.href` dans `DetectionsList` par `navigate(...)`;
|
||||
- action rapide “Investigation avancée” alignée vers `/detections`.
|
||||
|
||||
### Étape 5 — Validation Docker post-modifications
|
||||
- Statut: ✅ Fait
|
||||
- Action: exécution de `docker compose build dashboard_web`.
|
||||
- Résultat: build OK (code de sortie 0), warning compose `version` obsolète non bloquant.
|
||||
|
||||
### Étape 6 — Clôture
|
||||
- Statut: ✅ Fait
|
||||
- Résultat global:
|
||||
- routes invalides couvertes via aliases/wrappers;
|
||||
- navigation interne homogène en SPA;
|
||||
- build Docker validé avant/après.
|
||||
1
services/dashboard/backend/__init__.py
Normal file
1
services/dashboard/backend/__init__.py
Normal file
@ -0,0 +1 @@
|
||||
# Backend package
|
||||
27
services/dashboard/backend/config.py
Normal file
27
services/dashboard/backend/config.py
Normal file
@ -0,0 +1,27 @@
|
||||
"""
|
||||
Configuration du Dashboard Bot Detector
|
||||
"""
|
||||
from pydantic_settings import BaseSettings
|
||||
|
||||
|
||||
class Settings(BaseSettings):
|
||||
# ClickHouse
|
||||
CLICKHOUSE_HOST: str = "clickhouse"
|
||||
CLICKHOUSE_PORT: int = 8123
|
||||
CLICKHOUSE_DB: str = "mabase_prod"
|
||||
CLICKHOUSE_USER: str = "admin"
|
||||
CLICKHOUSE_PASSWORD: str = ""
|
||||
|
||||
# API
|
||||
API_HOST: str = "0.0.0.0"
|
||||
API_PORT: int = 8000
|
||||
|
||||
# CORS
|
||||
CORS_ORIGINS: list = ["http://localhost:3000", "http://127.0.0.1:3000"]
|
||||
|
||||
class Config:
|
||||
env_file = ".env"
|
||||
case_sensitive = True
|
||||
|
||||
|
||||
settings = Settings()
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user