chore: suppression des services obsolètes (sentinel, correlator, mod-reqin-log)

Remplacés par l'agent ja4ebpf (eBPF CO-RE). Nettoyage complet :

Supprimé :
- old/ (archive de l'ancienne architecture)
- services/correlator/ (logcorrelator Go)
- services/sentinel/ (capture pcap Go)
- services/mod-reqin-log/ (module Apache C)
- shared/go/ja4common/ (lib Go partagée — plus importée par ja4ebpf)
- tests/integration/platform/ (test correlator+sentinel+httpd)
- tests/integration/docker-compose.yml (compose ancienne archi)
- tests/integration/run-tests.sh (runner correlator/sentinel)
- tests/integration/verify_mvs.py (script orphelin)

Nettoyé :
- go.work : retire ./shared/go/ja4common
- services/ja4ebpf/go.mod : retire replace ja4common (jamais importé)
- services/ja4ebpf/Dockerfile* : retire les COPY ja4common inutiles
- Makefile : retire test-ja4common-python, test-integration*, targets obsolètes
- tests/integration/README.md : réécrit pour l'architecture ja4ebpf

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
toto
2026-04-12 01:48:14 +02:00
parent dc6ffd6474
commit 9734e21fe3
252 changed files with 34 additions and 67348 deletions

View File

@ -1,98 +1,42 @@
# Tests d'intégration full-stack — ja4-platform
# Tests d'intégration — ja4-platform
## Architecture
## Architecture de test
```
┌─────────────────────────────────────────────────────┐
│ platform (Rocky Linux 9) │
│ │
│ ┌──────────┐ http.socket ┌────────────┐ │
│ │ Apache │───────────────→│ │ │
│ │+ mod-reqin│ │ correlator │──→ ClickHouse
│ └──────────┘ │ │ │
│ ┌──────────┐ network.socket │ │ │
│ │ sentinel │───────────────→│ │ │
│ │(TLS pcap) │ └────────────┘ │
│ └──────────┘ │
│ cap_add: NET_RAW, NET_ADMIN │
└─────────────────────────────────────────────────────┘
↑ HTTPS │
test traffic ja4_logs.http_logs_raw
┌──────────────────┐
│ ClickHouse │
│ ja4_logs │
│ ja4_processing │
└──────────────────┘
↑ ↑
┌──────┘ └──────┐
┌──────────────┐ ┌──────────────┐
│ bot-detector │ │ dashboard │
│ (ML/Python) │ │ (FastAPI) │
└──────────────┘ └──────────────┘
```
Chaque stack lance deux containers Docker :
- **platform** : serveur web (Apache/nginx/varnish/hitch) + agent **ja4ebpf** (eBPF CO-RE)
- **clickhouse** : base ClickHouse locale avec le schéma complet
## Utilisation
Le générateur de trafic envoie 300 requêtes HTTPS et vérifie que la chaîne
complète fonctionne : TC ingress (L3/L4) → uprobe SSL (L7) → `http_logs_raw` → MV → `http_logs`.
## Stacks disponibles
| Stack | Serveur TLS | Backend |
|-------|-------------|---------|
| `apache` | Apache httpd + mod_ssl + mod_http2 | — |
| `nginx` | nginx + HTTP/2 | — |
| `nginx-varnish` | nginx (TLS) | Varnish (`-p feature=+http2`) |
| `hitch-varnish` | hitch (TLS offload, ALPN h2) | Varnish (`-p feature=+http2`) |
## Commandes
```bash
# Lancer les tests (build + start + test + teardown)
./run-tests.sh
# Stack unique (Rocky Linux 9)
make test-apache
make test-nginx
make test-nginx-varnish
make test-hitch-varnish
# Garder le stack actif après les tests (debug)
./run-tests.sh --no-down
# Toutes les stacks (Rocky Linux 9)
make test-all-stacks
# Build uniquement (pas de tests)
./run-tests.sh --build-only
# Ou depuis la racine du monorepo :
make test-integration
# Matrice multi-distro (el8 / el9 / el10)
make test-matrix
make test-matrix MATRIX_STACKS=nginx,nginx-varnish MATRIX_DISTROS=el9
```
## Conteneurs
## Matrice de compatibilité
| Conteneur | Image | Rôle |
|-----------|-------|------|
| `clickhouse` | clickhouse/clickhouse-server:24.8 | Base de données, schema auto-init |
| `platform` | Rocky Linux 9 (build custom) | Apache HTTPS + mod-reqin-log + sentinel + correlator |
| `bot-detector` | Python 3.11 | Détection d'anomalies ML |
| `dashboard` | Python 3.11 / FastAPI | API SOC |
## Capabilities réseau
Le conteneur `platform` a besoin de :
- `NET_RAW` — pour la capture de paquets réseau (sentinel/pcap)
- `NET_ADMIN` — pour la configuration de l'interface réseau
Ces capabilities sont déclarées dans `docker-compose.yml` :
```yaml
platform:
cap_add:
- NET_RAW
- NET_ADMIN
```
## Phases de test
1. **Schema ClickHouse** — vérifie les 2 bases, tables clés, utilisateurs
2. **Génération de trafic** — 50+ requêtes HTTPS vers Apache
3. **Pipeline de données** — vérifie les logs bruts et parsés dans ClickHouse
4. **Dashboard API** — vérifie /health et /api/metrics
5. **Bot-detector** — vérifie que le processus tourne
6. **Sentinel** — vérifie la capture réseau
## Debug
```bash
# Logs du platform (Apache + correlator + sentinel)
docker compose logs platform
# Logs corrélés
docker compose exec platform cat /var/log/logcorrelator/correlated.log
# Requête ClickHouse directe
docker compose exec clickhouse clickhouse-client \
-q "SELECT time, src_ip, method, host, path FROM ja4_logs.http_logs ORDER BY time DESC LIMIT 10"
# Shell dans le platform
docker compose exec platform bash
```
| Stack | el8 (AlmaLinux 8) | el9 (Rocky Linux 9) | el10 (AlmaLinux 10) |
|-------|:-----------------:|:-------------------:|:-------------------:|
| apache | ✓ | ✓ | ✓ |

View File

@ -1,169 +0,0 @@
# =============================================================================
# ja4-platform — Full-stack integration test
#
# Compose:
# clickhouse — ClickHouse server (schema init via entrypoint)
# platform — Rocky Linux 9: Apache + mod-reqin-log + sentinel + correlator
# bot-detector — ML anomaly detection (Python)
# dashboard — SOC dashboard API (FastAPI)
#
# Usage:
# cd tests/integration && ./run-tests.sh
# =============================================================================
services:
# ---------------------------------------------------------------------------
# ClickHouse — schema auto-init from shared/clickhouse/*.sql
# ---------------------------------------------------------------------------
clickhouse:
image: clickhouse/clickhouse-server:24.8
hostname: clickhouse
environment:
CLICKHOUSE_DB: ja4_processing
CLICKHOUSE_USER: default
CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1
volumes:
# Init script: copies, patches credentials, and executes SQL files
- ./platform/clickhouse-init.sh:/docker-entrypoint-initdb.d/00_init.sh
# SQL sources (read-only, patched by init script before execution)
- ../../shared/clickhouse/00_database.sql:/initdb-src/00_database.sql:ro
- ../../shared/clickhouse/01_raw_tables.sql:/initdb-src/01_raw_tables.sql:ro
- ../../shared/clickhouse/02_dictionaries.sql:/initdb-src/02_dictionaries.sql:ro
- ../../shared/clickhouse/03_anubis_tables.sql:/initdb-src/03_anubis_tables.sql:ro
- ../../shared/clickhouse/04_mv_http_logs.sql:/initdb-src/04_mv_http_logs.sql:ro
- ../../shared/clickhouse/05_aggregation_tables.sql:/initdb-src/05_aggregation_tables.sql:ro
- ../../shared/clickhouse/06_ml_tables.sql:/initdb-src/06_ml_tables.sql:ro
- ../../shared/clickhouse/07_ai_features_view.sql:/initdb-src/07_ai_features_view.sql:ro
- ../../shared/clickhouse/08_users.sql:/initdb-src/08_users.sql:ro
- ../../shared/clickhouse/09_audit_table.sql:/initdb-src/09_audit_table.sql:ro
- ../../shared/clickhouse/10_perf_indexes.sql:/initdb-src/10_perf_indexes.sql:ro
- ../../shared/clickhouse/11_views.sql:/initdb-src/11_views.sql:ro
- ../../shared/clickhouse/12_thesis_features.sql:/initdb-src/12_thesis_features.sql:ro
# Reference CSV files (dictionaries / browser signatures)
- ../../shared/data/browser_h2.csv:/initdb-src/browser_h2.csv:ro
# Empty CSV stubs (dictionaries expect these files)
- ./platform/csv-stubs:/var/lib/clickhouse/user_files
ports:
- "9000:9000"
- "8123:8123"
healthcheck:
test: ["CMD", "clickhouse-client", "--query", "SELECT 1"]
interval: 5s
timeout: 3s
retries: 30
networks:
- ja4net
# ---------------------------------------------------------------------------
# Platform — Rocky Linux 9: Apache (HTTPS) + mod-reqin-log + sentinel + correlator
# ---------------------------------------------------------------------------
platform:
build:
context: ../..
dockerfile: tests/integration/platform/Dockerfile
hostname: platform
cap_add:
- NET_RAW
- NET_ADMIN
environment:
LOGCORRELATOR_CLICKHOUSE_DSN: "clickhouse://default:@clickhouse:9000/ja4_logs"
depends_on:
clickhouse:
condition: service_healthy
ports:
- "443:443"
- "80:80"
healthcheck:
test: ["CMD", "curl", "-sfk", "https://localhost/health"]
interval: 5s
timeout: 3s
retries: 30
networks:
- ja4net
# ---------------------------------------------------------------------------
# Bot-detector — ML anomaly detection
# ---------------------------------------------------------------------------
bot-detector:
build:
context: ../..
dockerfile: services/bot-detector/bot_detector/Dockerfile
hostname: bot-detector
environment:
CLICKHOUSE_HOST: clickhouse
CLICKHOUSE_PORT: 8123
CLICKHOUSE_DB_PROCESSING: ja4_processing
CLICKHOUSE_DB_LOGS: ja4_logs
CLICKHOUSE_USER: default
CLICKHOUSE_PASSWORD: ""
CYCLE_INTERVAL_SEC: 30
RETRAIN_INTERVAL_HOURS: 1
ANOMALY_THRESHOLD: "-0.05"
HEALTH_PORT: 8080
depends_on:
clickhouse:
condition: service_healthy
platform:
condition: service_healthy
healthcheck:
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8080/')"]
interval: 10s
timeout: 5s
retries: 10
networks:
- ja4net
# ---------------------------------------------------------------------------
# Dashboard — FastAPI SOC UI
# ---------------------------------------------------------------------------
dashboard:
build:
context: ../..
dockerfile: services/dashboard/Dockerfile
hostname: dashboard
environment:
CLICKHOUSE_HOST: clickhouse
CLICKHOUSE_PORT: 8123
CLICKHOUSE_DB: ja4_processing
CLICKHOUSE_DB_PROCESSING: ja4_processing
CLICKHOUSE_DB_LOGS: ja4_logs
CLICKHOUSE_USER: default
CLICKHOUSE_PASSWORD: ""
API_HOST: 0.0.0.0
API_PORT: 8000
CORS_ORIGINS: '["*"]'
depends_on:
clickhouse:
condition: service_healthy
ports:
- "8000:8000"
healthcheck:
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"]
interval: 5s
timeout: 3s
retries: 30
networks:
- ja4net
# ---------------------------------------------------------------------------
# Traffic generator — Python (stdlib only) sending varied HTTP/HTTPS requests
# to platform across the Docker network so sentinel (pcap on eth0) captures
# TLS ClientHello packets with real JA4/JA3 fingerprints.
# Multiple SSL contexts produce different TLS fingerprints per request.
# ---------------------------------------------------------------------------
traffic-gen:
build:
context: traffic-gen
hostname: traffic-gen
depends_on:
platform:
condition: service_healthy
volumes:
- ../../scripts/data:/app/data:ro
networks:
- ja4net
networks:
ja4net:
driver: bridge

View File

@ -1,97 +0,0 @@
# =============================================================================
# Platform container — Rocky Linux 9
# Runs: Apache (HTTPS) + mod-reqin-log + sentinel + correlator
#
# Multi-stage:
# 1. go-builder — compile correlator (static, no CGO) on golang image
# 2. platform — Rocky Linux 9: builds sentinel (CGO+libpcap), mod-reqin-log,
# installs Apache, runs everything
#
# sentinel is compiled on Rocky so it links against the same libpcap as runtime.
# This mirrors RPM packaging where build and target are the same distro.
# =============================================================================
# ---------------------------------------------------------------------------
# Stage 1: Build correlator (static binary, no CGO — distro-independent)
# ---------------------------------------------------------------------------
FROM golang:1.24 AS go-builder
WORKDIR /src
COPY go.work go.work.sum* ./
COPY shared/go/ja4common/ shared/go/ja4common/
COPY services/correlator/ services/correlator/
COPY services/sentinel/ services/sentinel/
RUN cd services/correlator && \
CGO_ENABLED=0 go build -ldflags="-s -w" -o /out/correlator ./cmd/logcorrelator
# ---------------------------------------------------------------------------
# Stage 2: Rocky Linux 9 — build sentinel + mod-reqin-log, then run everything
# ---------------------------------------------------------------------------
FROM rockylinux:9
# Install build deps + runtime deps
RUN dnf install -y --allowerasing \
httpd httpd-devel mod_ssl \
apr-devel apr-util-devel \
gcc make redhat-rpm-config \
libpcap \
golang \
procps-ng curl \
&& dnf install -y --enablerepo=crb libpcap-devel \
&& dnf clean all
# -- Build sentinel on Rocky (CGO + libpcap from Rocky repos) ---------------
COPY go.work go.work.sum* /tmp/sentinel-build/
COPY shared/go/ja4common/ /tmp/sentinel-build/shared/go/ja4common/
COPY services/sentinel/ /tmp/sentinel-build/services/sentinel/
COPY services/correlator/ /tmp/sentinel-build/services/correlator/
RUN cd /tmp/sentinel-build/services/sentinel && \
CGO_ENABLED=1 go build -ldflags="-s -w" -o /usr/local/bin/sentinel ./cmd/ja4sentinel && \
rm -rf /tmp/sentinel-build /root/go
# -- Build mod-reqin-log from source -----------------------------------------
COPY services/mod-reqin-log/src/ /tmp/mod-reqin-log/src/
COPY services/mod-reqin-log/Makefile /tmp/mod-reqin-log/Makefile
RUN cd /tmp/mod-reqin-log && make all && \
cp modules/mod_reqin_log.so /usr/lib64/httpd/modules/ 2>/dev/null || \
cp build/.libs/mod_reqin_log.so /usr/lib64/httpd/modules/ && \
rm -rf /tmp/mod-reqin-log
# -- Copy correlator from builder (static binary, no deps) -------------------
COPY --from=go-builder /out/correlator /usr/local/bin/correlator
# -- Create runtime directories ----------------------------------------------
RUN mkdir -p /var/run/logcorrelator \
/var/log/logcorrelator \
/var/log/ja4sentinel \
/etc/logcorrelator \
/etc/ja4sentinel
# -- Correlator config -------------------------------------------------------
COPY tests/integration/platform/correlator.yml /etc/logcorrelator/correlator.yml
# -- Sentinel config ----------------------------------------------------------
COPY tests/integration/platform/sentinel.yml /etc/ja4sentinel/config.yml
# -- Apache config (HTTPS + mod-reqin-log) ------------------------------------
COPY tests/integration/platform/httpd-integration.conf /etc/httpd/conf.d/integration.conf
# -- Generate self-signed TLS certificate -------------------------------------
RUN openssl req -x509 -nodes -days 365 \
-subj "/CN=platform.test" \
-newkey rsa:2048 \
-keyout /etc/pki/tls/private/localhost.key \
-out /etc/pki/tls/certs/localhost.crt
# -- Simple health endpoint for Apache ---------------------------------------
RUN mkdir -p /var/www/html && \
echo '{"status":"ok"}' > /var/www/html/health
# -- Entrypoint (manages all processes) --------------------------------------
COPY tests/integration/platform/entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh
EXPOSE 80 443
CMD ["/entrypoint.sh"]

View File

@ -1,48 +0,0 @@
#!/bin/bash
# =============================================================================
# clickhouse-init.sh — Pre-process shared SQL files for integration testing
#
# Copies SQL from /initdb-src/ to /tmp, patches credentials, then executes.
# =============================================================================
set -e
SRC_DIR="/initdb-src"
TMP_DIR="/tmp/initdb-patched"
USER_FILES="/var/lib/clickhouse/user_files"
mkdir -p "$TMP_DIR"
# Copier les CSV de référence dans user_files (dictionnaires navigateurs)
for csv in "$SRC_DIR"/*.csv; do
[ -f "$csv" ] || continue
fname=$(basename "$csv")
if [ ! -f "$USER_FILES/$fname" ]; then
cp "$csv" "$USER_FILES/$fname"
echo "[init] CSV copié : $fname"
fi
done
for f in "$SRC_DIR"/*.sql; do
[ -f "$f" ] || continue
base=$(basename "$f")
echo "[init] Patching $base"
sed \
-e "s/USER 'admin'/USER 'default'/g" \
-e "s/PASSWORD 'CHANGE_ME'/PASSWORD ''/g" \
-e "s/PASSWORD 'ChangeMe'/PASSWORD ''/g" \
"$f" > "$TMP_DIR/$base"
done
for f in "$TMP_DIR"/*.sql; do
[ -f "$f" ] || continue
base=$(basename "$f")
echo "[init] Executing $base"
# 10_perf_indexes.sql uses ALTER TABLE ADD INDEX which may fail if index
# already exists — allow non-zero exit for migration/perf scripts
if [[ "$base" == 10_* ]]; then
clickhouse-client --multiquery < "$f" || echo "[init] WARNING: $base had errors (expected for duplicate indexes)"
else
clickhouse-client --multiquery < "$f"
fi
done
echo "[init] All SQL files executed successfully"

View File

@ -1,51 +0,0 @@
# Correlator config for integration tests
log:
level: DEBUG
inputs:
unix_sockets:
- name: http
source_type: A
path: /var/run/logcorrelator/http.socket
format: json
socket_permissions: "0666"
- name: network
source_type: B
path: /var/run/logcorrelator/network.socket
format: json
socket_permissions: "0666"
outputs:
clickhouse:
enabled: true
dsn: clickhouse://default:@clickhouse:9000/ja4_logs
table: http_logs_raw
batch_size: 10
flush_interval_ms: 500
max_buffer_size: 5000
drop_on_overflow: false
async_insert: true
timeout_ms: 5000
file:
enabled: true
path: /var/log/logcorrelator/correlated.log
stdout:
enabled: true
correlation:
time_window:
value: 10
unit: s
orphan_policy:
apache_always_emit: true
apache_emit_delay_ms: 1000
network_emit: false
matching:
mode: one_to_many
buffers:
max_http_items: 10000
max_network_items: 20000
ttl:
network_ttl_s: 120

View File

@ -1,59 +0,0 @@
#!/usr/bin/env bash
# =============================================================================
# Platform entrypoint — starts correlator, Apache, sentinel in order
# =============================================================================
set -eo pipefail
log() { echo "[entrypoint] $(date +%H:%M:%S) $*"; }
CORRELATOR_PID=""
HTTPD_PID=""
SENTINEL_PID=""
cleanup() {
log "Shutting down..."
[ -n "$SENTINEL_PID" ] && kill "$SENTINEL_PID" 2>/dev/null || true
[ -n "$CORRELATOR_PID" ] && kill "$CORRELATOR_PID" 2>/dev/null || true
httpd -k stop 2>/dev/null || true
wait 2>/dev/null || true
log "All processes stopped."
}
trap cleanup EXIT SIGTERM SIGINT
# -- 1. Start correlator (creates Unix sockets) ------------------------------
log "Starting correlator..."
correlator -config /etc/logcorrelator/correlator.yml &
CORRELATOR_PID=$!
# Wait for correlator to create its sockets
for i in $(seq 1 30); do
if [ -S /var/run/logcorrelator/http.socket ] && [ -S /var/run/logcorrelator/network.socket ]; then
log "Correlator sockets ready."
break
fi
sleep 0.5
done
if [ ! -S /var/run/logcorrelator/http.socket ]; then
log "ERROR: correlator sockets not created after 15s"
exit 1
fi
# -- 2. Start Apache (with mod-reqin-log writing to http.socket) -------------
log "Starting Apache..."
httpd -DFOREGROUND &
HTTPD_PID=$!
sleep 2
# -- 3. Start sentinel (captures network traffic) ----------------------------
log "Starting sentinel..."
sentinel -config /etc/ja4sentinel/config.yml &
SENTINEL_PID=$!
log "All services started. PIDs: correlator=$CORRELATOR_PID httpd=$HTTPD_PID sentinel=$SENTINEL_PID"
# -- Wait for any process to exit (indicates failure) -------------------------
wait -n "$CORRELATOR_PID" "$HTTPD_PID" "$SENTINEL_PID" 2>/dev/null || true
EXIT_CODE=$?
log "A process exited with code $EXIT_CODE — triggering shutdown."
exit $EXIT_CODE

View File

@ -1,40 +0,0 @@
# Integration test Apache config — HTTPS + mod-reqin-log
# Load mod-reqin-log
LoadModule reqin_log_module modules/mod_reqin_log.so
# Enable HTTP/2 negotiation (mod_http2 loaded by default on Rocky 9)
Protocols h2 http/1.1
# mod_remoteip: trust X-Forwarded-For from Docker internal subnets.
# mod_reqin_log reads r->useragent_ip which mod_remoteip updates,
# so the XFF IP appears as src_ip in the correlated logs.
LoadModule remoteip_module modules/mod_remoteip.so
RemoteIPHeader X-Forwarded-For
RemoteIPInternalProxy 172.0.0.0/8
RemoteIPInternalProxy 192.168.0.0/16
RemoteIPInternalProxy 10.0.0.0/8
# Enable mod-reqin-log with correlator socket
JsonSockLogEnabled On
JsonSockLogSocket "/var/run/logcorrelator/http.socket"
JsonSockLogHeaders X-Request-Id User-Agent Referer X-Forwarded-For \
Sec-CH-UA Sec-CH-UA-Mobile Sec-CH-UA-Platform \
Sec-Fetch-Dest Sec-Fetch-Mode Sec-Fetch-Site \
Accept Accept-Language Accept-Encoding Content-Type
JsonSockLogMaxHeaders 25
JsonSockLogMaxHeaderValueLen 256
JsonSockLogReconnectInterval 5
JsonSockLogErrorReportInterval 5
JsonSockLogLevel DEBUG
# HTTPS virtual host (port 443 already configured by mod_ssl)
<VirtualHost *:80>
ServerName platform.test
DocumentRoot /var/www/html
# Simple test pages
<Location /health>
Require all granted
</Location>
</VirtualHost>

View File

@ -1,18 +0,0 @@
# Sentinel config for integration tests
core:
interface: eth0
listen_ports:
- 443
flow_timeout_sec: 30
packet_buffer_size: 1000
log_level: debug
outputs:
- type: unix_socket
enabled: true
async_buffer: 5000
params:
socket_path: /var/run/logcorrelator/network.socket
- type: stdout
enabled: true

View File

@ -1,411 +0,0 @@
#!/usr/bin/env bash
# =============================================================================
# run-tests.sh — Full-stack integration test for ja4-platform
#
# Starts the entire pipeline in Docker Compose, generates traffic, and verifies
# data flows end-to-end: Apache → mod-reqin-log → correlator → ClickHouse
# sentinel ↗ ↓
# bot-detector → ML scores
# dashboard API ← query
#
# Usage:
# ./run-tests.sh # run tests (build + up + test + down)
# ./run-tests.sh --no-down # keep stack running after tests (for debugging)
# ./run-tests.sh --build-only # build images only, don't run tests
# =============================================================================
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
cd "$SCRIPT_DIR"
# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
CYAN='\033[0;36m'
NC='\033[0m'
KEEP_UP=false
BUILD_ONLY=false
TESTS_PASSED=0
TESTS_FAILED=0
for arg in "$@"; do
case "$arg" in
--no-down) KEEP_UP=true ;;
--build-only) BUILD_ONLY=true ;;
esac
done
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
log() { echo -e "${CYAN}[test]${NC} $(date +%H:%M:%S) $*"; }
pass() { echo -e "${GREEN}$*${NC}"; TESTS_PASSED=$((TESTS_PASSED + 1)); }
fail() { echo -e "${RED}$*${NC}"; TESTS_FAILED=$((TESTS_FAILED + 1)); }
warn() { echo -e "${YELLOW}$*${NC}"; }
cleanup() {
if [ "$KEEP_UP" = false ]; then
log "Tearing down stack..."
docker compose down -v --remove-orphans 2>/dev/null || true
else
log "Stack left running (--no-down). Stop with: docker compose down -v"
fi
}
trap cleanup EXIT
ch_query() {
docker compose exec -T clickhouse clickhouse-client --query "$1" 2>/dev/null
}
wait_for_service() {
local service="$1"
local max_wait="${2:-120}"
log "Waiting for $service to be healthy (max ${max_wait}s)..."
local elapsed=0
while [ $elapsed -lt "$max_wait" ]; do
local status
status=$(docker compose ps --format json "$service" 2>/dev/null | python3 -c "
import sys, json
for line in sys.stdin:
d = json.loads(line)
print(d.get('Health','unknown'))
" 2>/dev/null || echo "unknown")
if [ "$status" = "healthy" ]; then
log "$service is healthy (${elapsed}s)"
return 0
fi
sleep 2
elapsed=$((elapsed + 2))
done
log "ERROR: $service not healthy after ${max_wait}s"
docker compose logs --tail=30 "$service"
return 1
}
# =============================================================================
# Phase 1: Build
# =============================================================================
log "============================================"
log "Phase 1: Building images"
log "============================================"
docker compose build --parallel 2>&1 | tail -20
if [ "$BUILD_ONLY" = true ]; then
log "Build complete (--build-only). Exiting."
exit 0
fi
# =============================================================================
# Phase 2: Start stack (always fresh — destroy volumes to reset DB)
# =============================================================================
log "============================================"
log "Phase 2: Starting stack (fresh DB)"
log "============================================"
# Always destroy volumes so ClickHouse reinitializes schema from scratch.
# This guarantees test isolation across runs.
log "Resetting state (docker compose down -v)..."
docker compose down -v --remove-orphans 2>/dev/null || true
docker compose up -d
wait_for_service clickhouse 120
wait_for_service platform 120
wait_for_service dashboard 60
# =============================================================================
# Phase 3: Verify ClickHouse schema
# =============================================================================
log "============================================"
log "Phase 3: Verifying ClickHouse schema"
log "============================================"
# Check databases exist
DB_COUNT=$(ch_query "SELECT count() FROM system.databases WHERE name IN ('ja4_logs','ja4_processing')")
if [ "$DB_COUNT" = "2" ]; then
pass "Both databases created (ja4_logs, ja4_processing)"
else
fail "Expected 2 databases, got $DB_COUNT"
fi
# Check key tables
for table in "ja4_logs.http_logs_raw" "ja4_logs.http_logs" "ja4_processing.ml_detected_anomalies" "ja4_processing.agg_host_ip_ja4_1h"; do
db=$(echo "$table" | cut -d. -f1)
tbl=$(echo "$table" | cut -d. -f2)
EXISTS=$(ch_query "SELECT count() FROM system.tables WHERE database='$db' AND name='$tbl'")
if [ "$EXISTS" = "1" ]; then
pass "Table $table exists"
else
fail "Table $table missing"
fi
done
# Check users
for user in data_writer analyst; do
EXISTS=$(ch_query "SELECT count() FROM system.users WHERE name='$user'")
if [ "$EXISTS" = "1" ]; then
pass "User '$user' created"
else
fail "User '$user' missing"
fi
done
# =============================================================================
# Phase 4: Seed ClickHouse + Generate test traffic
# =============================================================================
log "============================================"
log "Phase 4a: Seeding ClickHouse with synthetic data"
log "============================================"
# The seeder inserts directly into http_logs_raw, triggering all MVs:
# http_logs_raw → mv_http_logs → http_logs → mv_agg_host_ip_ja4_1h → agg_host_ip_ja4_1h
# This pre-populates:
# - ~350,000 rows from 14,000 browser IPs (ISP ASN ranges → asn_label='isp')
# - ~100,000 rows from 3,000 scanner IPs (datacenter ASN → ML anomaly candidates)
# - ~30,000 rows from 2,000 legit bot IPs (from bot_ip.csv CIDRs)
# - ~20,000 rows from 1,000 AI bot IPs (datacenter ranges)
# After seeding, bot_detector has ≥500 human rows → can train and run.
log "Running seed_clickhouse.py..."
if docker compose exec -T traffic-gen python /app/seed_clickhouse.py \
--host clickhouse --port 8123 --user default --password ""; then
pass "ClickHouse seeded (500K rows: 350K browser + 100K scanner + 30K legit-bot + 20K AI-bot)"
else
warn "Seeder reported errors (pipeline verification will show impact)"
fi
log "============================================"
log "Phase 4b: Generating live test traffic via Apache"
log "============================================"
# Live traffic crosses the Docker network so sentinel can capture TLS handshakes.
# X-Forwarded-For is always set — mod_remoteip updates r->useragent_ip → diverse src_ips.
log "Starting traffic generator (500 requests, 10 workers)..."
if docker compose exec -T traffic-gen python /app/generate_traffic.py \
--host platform --http-port 80 --https-port 443 \
--requests 500 --workers 10; then
pass "Traffic generation complete (500 requests with diverse XFF IPs: browsers, bots)"
else
warn "Traffic generator reported some errors (>80% success still passes)"
fi
# Wait for correlator to flush all batches to ClickHouse
log "Waiting 20s for correlator to flush and bot-detector first cycle..."
sleep 20
# =============================================================================
# Phase 5: Verify data pipeline
# =============================================================================
log "============================================"
log "Phase 5: Verifying data pipeline"
log "============================================"
# 5a. Raw logs ingested
RAW_COUNT=$(ch_query "SELECT count() FROM ja4_logs.http_logs_raw")
if [ "$RAW_COUNT" -gt 0 ] 2>/dev/null; then
pass "Raw logs ingested: $RAW_COUNT rows in http_logs_raw (seeder + live traffic)"
else
fail "No raw logs in http_logs_raw (correlator → ClickHouse failed)"
# Debug
log "Correlator logs:"
docker compose logs --tail=30 platform 2>&1 | grep -i "correlator\|error\|clickhouse" | head -20
fi
# 5b. Parsed logs via materialized view
PARSED_COUNT=$(ch_query "SELECT count() FROM ja4_logs.http_logs")
if [ "$PARSED_COUNT" -gt 0 ] 2>/dev/null; then
pass "Parsed logs: $PARSED_COUNT rows in http_logs (MV working)"
else
warn "No parsed logs in http_logs (MV may need INSERT trigger, or dict loading failed)"
fi
# 5c. Check a sample parsed log has expected fields
if [ "$PARSED_COUNT" -gt 0 ] 2>/dev/null; then
# Verify variety of User-Agents (browsers + bots)
UA_TYPES=$(ch_query "SELECT count(DISTINCT header_user_agent) FROM ja4_logs.http_logs")
if [ "$UA_TYPES" -gt 5 ] 2>/dev/null; then
pass "Varied User-Agents: $UA_TYPES distinct UAs in logs"
else
warn "Low User-Agent variety: only $UA_TYPES distinct UAs"
fi
# Verify HTTP method variety
METHODS=$(ch_query "SELECT groupArray(method) FROM (SELECT DISTINCT method FROM ja4_logs.http_logs ORDER BY method)")
pass "HTTP methods captured: $METHODS"
fi
# 5d. TLS fingerprints captured (sentinel → correlator → ClickHouse)
if [ "$PARSED_COUNT" -gt 0 ] 2>/dev/null; then
JA4_COUNT=$(ch_query "SELECT count() FROM ja4_logs.http_logs WHERE ja4 != ''")
JA4_UNIQ=$(ch_query "SELECT count(DISTINCT ja4) FROM ja4_logs.http_logs WHERE ja4 != ''")
JA3_COUNT=$(ch_query "SELECT count() FROM ja4_logs.http_logs WHERE ja3 != ''")
JA3_UNIQ=$(ch_query "SELECT count(DISTINCT ja3_hash) FROM ja4_logs.http_logs WHERE ja3_hash != ''")
TLS_VERSIONS=$(ch_query "SELECT groupArray(tls_version) FROM (SELECT DISTINCT tls_version FROM ja4_logs.http_logs WHERE tls_version != '' ORDER BY tls_version)")
if [ "$JA4_COUNT" -gt 0 ] 2>/dev/null; then
pass "TLS capture: $JA4_COUNT rows with JA4 ($JA4_UNIQ unique fingerprints)"
SAMPLE=$(ch_query "SELECT ja4, tls_version FROM ja4_logs.http_logs WHERE ja4 != '' LIMIT 1 FORMAT TabSeparated")
log " JA4 sample: $SAMPLE"
else
warn "No JA4 fingerprints (sentinel may not see traffic on eth0)"
fi
if [ "$JA3_COUNT" -gt 0 ] 2>/dev/null; then
pass "TLS capture: $JA3_COUNT rows with JA3 ($JA3_UNIQ unique fingerprints)"
fi
if [ -n "$TLS_VERSIONS" ]; then
pass "TLS versions seen: $TLS_VERSIONS"
fi
fi
# 5e. Check correlator log file
CORR_LINES=$(docker compose exec -T platform wc -l < /var/log/logcorrelator/correlated.log 2>/dev/null || echo 0)
if [ "$CORR_LINES" -gt 0 ] 2>/dev/null; then
pass "Correlator file output: $CORR_LINES lines in correlated.log"
else
warn "Correlator file output empty"
fi
# 5f. Verify seeder data reached agg table and AI features view
AGG_COUNT=$(ch_query "SELECT count() FROM ja4_processing.agg_host_ip_ja4_1h")
HUMAN_COUNT=$(ch_query "SELECT count() FROM ja4_processing.view_ai_features_1h WHERE asn_label='isp'")
BOT_LABEL_COUNT=$(ch_query "SELECT count() FROM ja4_processing.view_ai_features_1h WHERE bot_name != ''")
UNIQ_SRC_IPS=$(ch_query "SELECT count(DISTINCT src_ip) FROM ja4_processing.view_ai_features_1h")
UNIQ_JA4=$(ch_query "SELECT count(DISTINCT ja4) FROM ja4_processing.view_ai_features_1h")
if [ "$AGG_COUNT" -gt 0 ] 2>/dev/null; then
pass "Aggregation table populated: $AGG_COUNT sessions in agg_host_ip_ja4_1h"
else
fail "agg_host_ip_ja4_1h empty (MV chain broken)"
fi
if [ "$HUMAN_COUNT" -ge 500 ] 2>/dev/null; then
pass "Bot-detector baseline: $HUMAN_COUNT ISP sessions (≥500 threshold met)"
elif [ "$HUMAN_COUNT" -gt 0 ] 2>/dev/null; then
warn "ISP sessions below threshold: $HUMAN_COUNT < 500 (bot_detector will skip cycle)"
else
fail "No ISP sessions in view_ai_features_1h (asn_reputation CSV not loaded?)"
fi
if [ "$BOT_LABEL_COUNT" -gt 0 ] 2>/dev/null; then
pass "Known bots labeled: $BOT_LABEL_COUNT sessions with bot_name (bot_ip/bot_ja4 dicts working)"
else
warn "No known-bot labels in view_ai_features_1h (bot_ip.csv / bot_ja4.csv empty?)"
fi
log " Unique src_ips: $UNIQ_SRC_IPS | Unique JA4: $UNIQ_JA4"
# =============================================================================
# Phase 6: Verify dashboard API
# =============================================================================
log "============================================"
log "Phase 6: Verifying dashboard API"
log "============================================"
# Health check (dashboard has no curl, use python urllib)
HEALTH=$(docker compose exec -T dashboard python -c "
import urllib.request, json
r = urllib.request.urlopen('http://localhost:8000/health')
print(json.loads(r.read()).get('status',''))
" 2>/dev/null || echo "FAIL")
if [ "$HEALTH" = "healthy" ] || [ "$HEALTH" = "ok" ]; then
pass "Dashboard /health returns $HEALTH"
else
fail "Dashboard /health failed: $HEALTH"
fi
# Metrics endpoint
METRICS_STATUS=$(docker compose exec -T dashboard python -c "
import urllib.request
try:
r = urllib.request.urlopen('http://localhost:8000/api/metrics')
print(r.status)
except urllib.error.HTTPError as e:
print(e.code)
except Exception:
print(0)
" 2>/dev/null || echo "000")
if [ "$METRICS_STATUS" = "200" ] || [ "$METRICS_STATUS" = "404" ]; then
pass "Dashboard /api/metrics responds (HTTP $METRICS_STATUS)"
else
fail "Dashboard /api/metrics failed (HTTP $METRICS_STATUS)"
fi
# =============================================================================
# Phase 7: Verify bot-detector
# =============================================================================
log "============================================"
log "Phase 7: Verifying bot-detector"
log "============================================"
BOT_STATUS=$(docker compose ps --format json bot-detector 2>/dev/null | python3 -c "
import sys, json
for line in sys.stdin:
d = json.loads(line)
print(d.get('State','unknown'))
" 2>/dev/null || echo "unknown")
if [ "$BOT_STATUS" = "running" ]; then
pass "Bot-detector is running"
else
warn "Bot-detector state: $BOT_STATUS"
fi
# Check if bot-detector successfully ran a detection cycle (not just SKIPPED_LOW_DATA)
BD_SCORES=$(ch_query "SELECT count() FROM ja4_processing.ml_all_scores" 2>/dev/null || echo 0)
BD_ANOMALIES=$(ch_query "SELECT count() FROM ja4_processing.ml_detected_anomalies" 2>/dev/null || echo 0)
if [ "$BD_SCORES" -gt 0 ] 2>/dev/null; then
pass "Bot-detector scored traffic: $BD_SCORES rows in ml_all_scores, $BD_ANOMALIES anomalies detected"
else
warn "ml_all_scores is empty — bot-detector may not have completed a cycle yet"
warn " (check: docker compose logs bot-detector | grep -E 'CYCLE|SKIP|train')"
fi
# =============================================================================
# Phase 8: Network capture verification (sentinel)
# =============================================================================
log "============================================"
log "Phase 8: Verifying sentinel capture"
log "============================================"
SENTINEL_RUNNING=$(docker compose exec -T platform pgrep -x sentinel > /dev/null 2>&1 && echo "yes" || echo "no")
if [ "$SENTINEL_RUNNING" = "yes" ]; then
pass "Sentinel process is running"
else
fail "Sentinel process not found"
docker compose logs --tail=10 platform 2>&1 | grep -i sentinel | head -5
fi
# Check sentinel log output
SENTINEL_LOG=$(docker compose exec -T platform cat /var/log/ja4sentinel/sentinel.log 2>/dev/null | head -5 || echo "")
if [ -n "$SENTINEL_LOG" ]; then
pass "Sentinel producing log output"
else
warn "No sentinel log file found (may be logging to stdout only)"
fi
# =============================================================================
# Summary
# =============================================================================
echo ""
log "============================================"
log "RESULTS"
log "============================================"
TOTAL=$((TESTS_PASSED + TESTS_FAILED))
echo -e " ${GREEN}Passed: $TESTS_PASSED${NC} / $TOTAL"
if [ "$TESTS_FAILED" -gt 0 ]; then
echo -e " ${RED}Failed: $TESTS_FAILED${NC} / $TOTAL"
fi
echo ""
if [ "$TESTS_FAILED" -gt 0 ]; then
log "Some tests failed. Use --no-down to keep the stack running for debugging."
log "Debug commands:"
log " docker compose logs platform"
log " docker compose exec platform cat /var/log/logcorrelator/correlated.log"
log " docker compose exec clickhouse clickhouse-client -q 'SELECT * FROM ja4_logs.http_logs_raw LIMIT 5'"
exit 1
else
log "All tests passed!"
exit 0
fi

View File

@ -1,313 +0,0 @@
#!/usr/bin/env python3
"""
verify_mvs.py — Vérifie que les vues matérialisées ClickHouse sont correctement peuplées.
Assertions effectuées :
1. http_logs_raw : nb lignes >= 5 000 (mod_reqin_log + logcorrelator fonctionnent)
2. http_logs : nb lignes >= 5 000 (MV mv_http_logs fonctionne)
3. agg_host_ip_ja4_1h : nb lignes > 0 (MV mv_agg_host_ip_ja4_1h fonctionne)
4. view_ai_features_1h : nb lignes > 0 (vue alimentée par agg_host_ip_ja4_1h)
5. ml_all_scores : nb lignes > 0 (bot_detector a tourné)
6. ml_detected_anomalies : requête OK (table accessible)
Codes de sortie :
0 = tous les tests passent
1 = au moins un test échoue
"""
import os
import sys
import time
import clickhouse_connect
# --------------------------------------------------------------------------
# Connexion ClickHouse
# --------------------------------------------------------------------------
CLICKHOUSE_HOST = os.getenv("CLICKHOUSE_HOST", "clickhouse")
CLICKHOUSE_PORT = int(os.getenv("CLICKHOUSE_PORT", "8123"))
CLICKHOUSE_DB = os.getenv("CLICKHOUSE_DB", "ja4_processing")
CLICKHOUSE_USER = os.getenv("CLICKHOUSE_USER", "default")
CLICKHOUSE_PASS = os.getenv("CLICKHOUSE_PASSWORD", "")
# --------------------------------------------------------------------------
# Helpers
# --------------------------------------------------------------------------
PASS = "\033[92m✓\033[0m"
FAIL = "\033[91m✗\033[0m"
def _count(client, query: str) -> int:
"""Exécute un SELECT count(*) et retourne le résultat."""
result = client.query(query)
return int(result.result_rows[0][0])
def _check(label: str, actual: int, op: str, expected: int) -> bool:
"""Vérifie une assertion et affiche le résultat."""
ok = (op == ">=" and actual >= expected) or \
(op == ">" and actual > expected) or \
(op == "==" and actual == expected)
symbol = PASS if ok else FAIL
print(f" {symbol} {label:40s} {actual:>8,d} (attendu {op} {expected:,})")
return ok
def _wait_for_clickhouse(host: str, port: int, retries: int = 30) -> "clickhouse_connect.driver.client.Client":
"""Attend que ClickHouse soit disponible et retourne un client connecté."""
for attempt in range(retries):
try:
client = clickhouse_connect.get_client(
host=host, port=port,
database=CLICKHOUSE_DB,
username=CLICKHOUSE_USER,
password=CLICKHOUSE_PASS,
connect_timeout=3,
)
client.ping()
return client
except Exception as exc:
if attempt < retries - 1:
print(f" [verifier] ClickHouse non prêt ({exc}), tentative {attempt + 1}/{retries}...")
time.sleep(3)
else:
raise
def main() -> None:
"""Point d'entrée — exécute toutes les assertions."""
print()
print("=" * 65)
print(" VÉRIFICATION DES VUES MATÉRIALISÉES CLICKHOUSE")
print("=" * 65)
# Connexion
print(f"\n[verifier] Connexion à ClickHouse ({CLICKHOUSE_HOST}:{CLICKHOUSE_PORT}/{CLICKHOUSE_DB})...")
client = _wait_for_clickhouse(CLICKHOUSE_HOST, CLICKHOUSE_PORT)
print(f" {PASS} Connexion établie\n")
failures = 0
# ------------------------------------------------------------------
# 1. Données brutes (ingestion mod_reqin_log → logcorrelator)
# ------------------------------------------------------------------
print("── 1. Ingestion des logs HTTP ──────────────────────────────")
n = _count(client, f"SELECT count(*) FROM {CLICKHOUSE_DB}.http_logs_raw")
if not _check("http_logs_raw count", n, ">=", 5000):
failures += 1
# ------------------------------------------------------------------
# 2. MV mv_http_logs — parsing JSON → http_logs
# ------------------------------------------------------------------
print("\n── 2. Vue matérialisée mv_http_logs ────────────────────────")
n = _count(client, f"SELECT count(*) FROM {CLICKHOUSE_DB}.http_logs")
if not _check("http_logs count", n, ">=", 5000):
failures += 1
# Vérification champs non vides
n_methods = _count(client,
f"SELECT count(*) FROM {CLICKHOUSE_DB}.http_logs WHERE method != ''")
if not _check("http_logs rows with method", n_methods, ">=", 5000):
failures += 1
n_ips = _count(client,
f"SELECT count(*) FROM {CLICKHOUSE_DB}.http_logs WHERE src_ip != toIPv4('0.0.0.0')")
if not _check("http_logs rows with src_ip", n_ips, ">=", 5000):
failures += 1
# ------------------------------------------------------------------
# 3. MV mv_agg_host_ip_ja4_1h — agrégation comportementale
# ------------------------------------------------------------------
print("\n── 3. Vue matérialisée mv_agg_host_ip_ja4_1h ───────────────")
n = _count(client, f"SELECT count(*) FROM {CLICKHOUSE_DB}.agg_host_ip_ja4_1h")
if not _check("agg_host_ip_ja4_1h count", n, ">", 0):
failures += 1
# Vérification que les hits sont cohérents
total_hits = _count(client,
f"SELECT sum(hits) FROM {CLICKHOUSE_DB}.agg_host_ip_ja4_1h")
if not _check("agg_host_ip_ja4_1h total hits", total_hits, ">=", 5000):
failures += 1
# ------------------------------------------------------------------
# 4. Vue view_ai_features_1h — entrée du bot_detector
# ------------------------------------------------------------------
print("\n── 4. Vue view_ai_features_1h ──────────────────────────────")
n = _count(client, f"SELECT count(*) FROM {CLICKHOUSE_DB}.view_ai_features_1h")
if not _check("view_ai_features_1h count", n, ">", 0):
failures += 1
# ------------------------------------------------------------------
# 5. Résultats bot_detector — ml_all_scores
# ------------------------------------------------------------------
print("\n── 5. Résultats bot_detector (ml_all_scores) ───────────────")
n = _count(client, f"SELECT count(*) FROM {CLICKHOUSE_DB}.ml_all_scores")
if not _check("ml_all_scores count", n, ">", 0):
failures += 1
# Distribution des scores
result = client.query(
f"SELECT threat_level, count(*) as n "
f"FROM {CLICKHOUSE_DB}.ml_all_scores "
f"GROUP BY threat_level ORDER BY n DESC"
)
print(f"\n Distribution des threat_level :")
for row in result.result_rows:
print(f" {row[0]:20s} : {row[1]:,}")
# ------------------------------------------------------------------
# 6. Table ml_detected_anomalies — anomalies détectées
# ------------------------------------------------------------------
print("\n── 6. Anomalies détectées (ml_detected_anomalies) ──────────")
n = _count(client, f"SELECT count(*) FROM {CLICKHOUSE_DB}.ml_detected_anomalies")
_check("ml_detected_anomalies count", n, ">=", 0) # peut être 0 si seuil non atteint
print(f" (peut être 0 si aucune session ne dépasse le seuil d'anomalie)")
# ------------------------------------------------------------------
# Bilan des MVs secondaires
# ------------------------------------------------------------------
print("\n── 7. Vues/tables secondaires ──────────────────────────────")
for view in ["agg_header_fingerprint_1h", "view_ip_recurrence",
"view_form_bruteforce_detected", "view_host_ip_ja4_rotation"]:
try:
n = _count(client, f"SELECT count(*) FROM {CLICKHOUSE_DB}.{view}")
print(f" {PASS} {view:40s} {n:>8,d} lignes")
except Exception as exc:
print(f" \033[93m?\033[0m {view:40s} ERREUR : {exc}")
# ------------------------------------------------------------------
# 8. Tables d'agrégation avancées (thèse §5)
# ------------------------------------------------------------------
print("\n── 8. Tables d'agrégation thèse §5 ─────────────────────────")
thesis_tables = [
("agg_path_sequences_1h", "§5.1 Path Sequence Entropy"),
("agg_request_timing_1h", "§5.3 Request Cadence"),
("agg_ip_behavior_1h", "§5.5/§5.8 JA4 Drift + Cross-Domain"),
("agg_resource_cascade_1h", "§5.4 Resource Dependency Tree"),
]
for table, desc in thesis_tables:
try:
n = _count(client, f"SELECT count(*) FROM {CLICKHOUSE_DB}.{table}")
if not _check(f"{table} ({desc})", n, ">", 0):
failures += 1
except Exception as exc:
print(f" {FAIL} {table:40s} ERREUR : {exc}")
failures += 1
# ------------------------------------------------------------------
# 9. Vue view_thesis_features_1h — features avancées
# ------------------------------------------------------------------
print("\n── 9. Vue view_thesis_features_1h (thèse §5) ───────────────")
try:
n = _count(client, f"SELECT count(*) FROM {CLICKHOUSE_DB}.view_thesis_features_1h")
if not _check("view_thesis_features_1h count", n, ">", 0):
failures += 1
# Vérification des colonnes §5.1
result = client.query(
f"SELECT avg(path_transition_entropy) AS avg_entropy "
f"FROM {CLICKHOUSE_DB}.view_thesis_features_1h "
f"WHERE path_transition_entropy >= 0"
)
avg_ent = float(result.result_rows[0][0]) if result.result_rows else -1
ok = 0 <= avg_ent <= 1.0
print(f" {'' if ok else ''} §5.1 path_transition_entropy avg {avg_ent:.4f} (attendu [0, 1])")
if not ok:
failures += 1
# Vérification des colonnes §5.3
result = client.query(
f"SELECT avg(cadence_cv) AS avg_cv, avg(burst_ratio) AS avg_burst "
f"FROM {CLICKHOUSE_DB}.view_thesis_features_1h "
f"WHERE cadence_cv IS NOT NULL"
)
if result.result_rows:
avg_cv = float(result.result_rows[0][0])
avg_burst = float(result.result_rows[0][1])
print(f" {PASS} §5.3 cadence_cv avg {avg_cv:.4f}")
print(f" {PASS} §5.3 burst_ratio avg {avg_burst:.4f}")
else:
print(f" \033[93m?\033[0m §5.3 cadence features pas de données")
# Vérification des colonnes §5.3 nouvelles (lag1_autocorrelation, benford_deviation)
result = client.query(
f"SELECT avg(lag1_autocorrelation) AS avg_lag1, avg(benford_deviation) AS avg_benford "
f"FROM {CLICKHOUSE_DB}.view_thesis_features_1h "
f"WHERE lag1_autocorrelation IS NOT NULL"
)
if result.result_rows:
avg_lag1 = float(result.result_rows[0][0])
avg_benford = float(result.result_rows[0][1])
ok_lag1 = -1.0 <= avg_lag1 <= 1.0
ok_benford = avg_benford >= 0
print(f" {'' if ok_lag1 else ''} §5.3 lag1_autocorrelation avg {avg_lag1:.4f} (attendu [-1, 1])")
print(f" {'' if ok_benford else ''} §5.3 benford_deviation avg {avg_benford:.4f} (attendu >= 0)")
if not ok_lag1: failures += 1
if not ok_benford: failures += 1
else:
print(f" \033[93m?\033[0m §5.3 lag1/benford features pas de données")
# Vérification des colonnes §5.5
result = client.query(
f"SELECT avg(ja4_drift_ratio) AS avg_drift, "
f" avg(host_diversity) AS avg_hosts "
f"FROM {CLICKHOUSE_DB}.view_thesis_features_1h "
f"WHERE ja4_drift_ratio IS NOT NULL"
)
if result.result_rows:
avg_drift = float(result.result_rows[0][0])
avg_hosts = float(result.result_rows[0][1])
ok_drift = 0 <= avg_drift <= 1.0
print(f" {'' if ok_drift else ''} §5.5 ja4_drift_ratio avg {avg_drift:.4f} (attendu [0, 1])")
print(f" {PASS} §5.8 host_diversity avg {avg_hosts:.2f}")
if not ok_drift:
failures += 1
else:
print(f" \033[93m?\033[0m §5.5/§5.8 drift/cross-domain pas de données")
except Exception as exc:
print(f" {FAIL} view_thesis_features_1h ERREUR : {exc}")
failures += 1
# ------------------------------------------------------------------
# 10. Vue view_resource_cascade_1h (thèse §5.4)
# ------------------------------------------------------------------
print("\n── 10. Vue view_resource_cascade_1h (thèse §5.4) ───────────")
try:
n = _count(client, f"SELECT count(*) FROM {CLICKHOUSE_DB}.view_resource_cascade_1h")
_check("view_resource_cascade_1h count", n, ">=", 0)
if n > 0:
result = client.query(
f"SELECT avg(root_to_first_asset_delay), avg(asset_load_stddev) "
f"FROM {CLICKHOUSE_DB}.view_resource_cascade_1h "
f"WHERE root_to_first_asset_delay >= 0"
)
if result.result_rows:
avg_delay = float(result.result_rows[0][0])
avg_stddev = float(result.result_rows[0][1])
print(f" {PASS} §5.4 root_to_first_asset_delay avg {avg_delay:.2f}s")
print(f" {PASS} §5.4 asset_load_stddev avg {avg_stddev:.2f}s")
else:
print(f" (peut être 0 si pas de mix document/asset dans le trafic test)")
except Exception as exc:
print(f" {FAIL} view_resource_cascade_1h ERREUR : {exc}")
# ------------------------------------------------------------------
# Résumé
# ------------------------------------------------------------------
print()
print("=" * 65)
if failures == 0:
print(f" {PASS} TOUS LES TESTS PASSENT")
else:
print(f" {FAIL} {failures} TEST(S) ÉCHOUÉ(S)")
print("=" * 65)
print()
sys.exit(0 if failures == 0 else 1)
if __name__ == "__main__":
main()