Option A — X-Forwarded-For + mod_remoteip:
- httpd-integration.conf: load mod_remoteip, trust all Docker RFC-1918
subnets (172/192.168/10). mod_reqin_log uses r->useragent_ip which
mod_remoteip updates from XFF → each request logged with distinct src_ip
- generate_traffic.py: XFF always set (was 30% only); human scenarios
use 91.121/78.41/90.x ranges, bot scenarios use 185.220/45.155/193.32;
pool of 1168 human IPs and 180 bot IPs; default --requests 500
Option D — Direct ClickHouse seeder (seed_clickhouse.py, stdlib only):
- Inserts ~4000 rows into http_logs_raw triggering full MV chain:
http_logs_raw → mv_http_logs → http_logs
→ mv_agg_host_ip_ja4_1h → agg_host_ip_ja4_1h
• 720 human sessions: IPs in OVH/SFR/Orange ASN ranges (16276/15557/3215)
→ dict_asn_reputation maps these to asn_label='human'
→ satisfies bot_detector human_baseline >= 500 threshold
• 150 scanner sessions: datacenter IPs, attack paths (/.env, wp-login,
SQLi, path traversal), scanner UAs, minimal TCP fingerprints
• 100 known-bot sessions: IPs matching bot_ip.csv entries
• 20 brute-force clusters: 20-50 POST /login per IP
All TCP/TLS metadata is profile-realistic (window, MSS, TTL, JA4, JA3)
CSV stubs (mounted at /var/lib/clickhouse/user_files/):
- iplocate-ip-to-asn.csv: 13 CIDR→ASN mappings (OVH/SFR/Orange/Tor/Contabo)
- asn_reputation.csv: 13 ASN→label (8 'human', 3 'datacenter'/'hosting')
- bot_ip.csv: 14 known scanner/Tor IPs (Shodan, Censys, Tor exits)
- bot_ja4.csv: 5 bot JA4 fingerprints (curl, python-requests, masscan, zgrab)
run-tests.sh:
- Phase 4a: seeder runs before live traffic (ensures bot_detector baseline)
- Phase 4b: live traffic gen at 500 requests (up from 200)
- Phase 5f: new assertions — agg_host_ip_ja4_1h populated, ≥500 human
rows in view_ai_features_1h, known-bot labels present
- Phase 7: verifies ml_all_scores populated (bot_detector ran a cycle)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Tests d'intégration full-stack — ja4-platform
Architecture
┌─────────────────────────────────────────────────────┐
│ platform (Rocky Linux 9) │
│ │
│ ┌──────────┐ http.socket ┌────────────┐ │
│ │ Apache │───────────────→│ │ │
│ │+ mod-reqin│ │ correlator │──→ ClickHouse
│ └──────────┘ │ │ │
│ ┌──────────┐ network.socket │ │ │
│ │ sentinel │───────────────→│ │ │
│ │(TLS pcap) │ └────────────┘ │
│ └──────────┘ │
│ cap_add: NET_RAW, NET_ADMIN │
└─────────────────────────────────────────────────────┘
↑ HTTPS │
test traffic ja4_logs.http_logs_raw
↓
┌──────────────────┐
│ ClickHouse │
│ ja4_logs │
│ ja4_processing │
└──────────────────┘
↑ ↑
┌──────┘ └──────┐
┌──────────────┐ ┌──────────────┐
│ bot-detector │ │ dashboard │
│ (ML/Python) │ │ (FastAPI) │
└──────────────┘ └──────────────┘
Utilisation
# Lancer les tests (build + start + test + teardown)
./run-tests.sh
# Garder le stack actif après les tests (debug)
./run-tests.sh --no-down
# Build uniquement (pas de tests)
./run-tests.sh --build-only
# Ou depuis la racine du monorepo :
make test-integration
Conteneurs
| Conteneur | Image | Rôle |
|---|---|---|
clickhouse |
clickhouse/clickhouse-server:24.8 | Base de données, schema auto-init |
platform |
Rocky Linux 9 (build custom) | Apache HTTPS + mod-reqin-log + sentinel + correlator |
bot-detector |
Python 3.11 | Détection d'anomalies ML |
dashboard |
Python 3.11 / FastAPI | API SOC |
Capabilities réseau
Le conteneur platform a besoin de :
NET_RAW— pour la capture de paquets réseau (sentinel/pcap)NET_ADMIN— pour la configuration de l'interface réseau
Ces capabilities sont déclarées dans docker-compose.yml :
platform:
cap_add:
- NET_RAW
- NET_ADMIN
Phases de test
- Schema ClickHouse — vérifie les 2 bases, tables clés, utilisateurs
- Génération de trafic — 50+ requêtes HTTPS vers Apache
- Pipeline de données — vérifie les logs bruts et parsés dans ClickHouse
- Dashboard API — vérifie /health et /api/metrics
- Bot-detector — vérifie que le processus tourne
- Sentinel — vérifie la capture réseau
Debug
# Logs du platform (Apache + correlator + sentinel)
docker compose logs platform
# Logs corrélés
docker compose exec platform cat /var/log/logcorrelator/correlated.log
# Requête ClickHouse directe
docker compose exec clickhouse clickhouse-client \
-q "SELECT time, src_ip, method, host, path FROM ja4_logs.http_logs ORDER BY time DESC LIMIT 10"
# Shell dans le platform
docker compose exec platform bash