3 Commits

Author SHA1 Message Date
36b5065a0a feat(e2e): add multi-IP endpoint architecture with dedicated traffic VM
Replace single-service-per-endpoint with all-ips mode running nginx, apache,
and hitch+varnish simultaneously on 3 dedicated IPs per VM (eth1 alias IPs).
Add a dedicated traffic VM with curl-impersonate for realistic TLS fingerprints,
parallelized traffic generation, and paired SNI_HOSTS/TARGET_IPS lists for
per-VM per-service hostname identification (e.g. rocky9-nginx-platform.test).

Key changes:
- run-tests-vm.sh: add setup_all_ips(), IP-specific Listen/bind directives
  with reset-before-apply pattern, graceful service availability checks
- run-e2e-test.sh: traffic VM architecture, all-ips mode, eth1 network,
  paired IP/SNI lists, updated cleanup for alias IPs
- generate-traffic.sh: parallel background jobs, curl-impersonate detection,
  auto source interface detection via ip route get, Host header in HTTP traffic
- Vagrantfile: add traffic VM with provision-traffic.sh
- provision-traffic.sh: install curl-impersonate and httpx for traffic gen
- test-rpm.sh: multi-interface TC check, updated ja4ebpf config
- clickhouse-init.sh: load CSV stubs for Anubis/bot-networks dictionaries
- Remove obsolete correlator/sentinel/mod-reqin-log docs
- Add h2_settings_ack column to http_logs schema
- Upgrade Go toolchain to 1.25.0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 14:25:24 +02:00
61addc8cfa feat: JA3 fingerprinting, SSL correlation fix, ML pipeline overhaul, E2E test infra
ja4ebpf:
- Add JA3 raw + MD5 hash fingerprinting (ComputeJA3 in TLS parser)
- Fix accept4 port double-swap bug (__builtin_bswap16 on already-host-order value)
- Fix scheme override bug in ClickHouse writer (HTTP block clearing HTTPS)
- Add HTTP/2 passive fingerprinting (Akamai H2 FP, SETTINGS, pseudo-header order)
- Enrich ClickHouse schema with IP/TCP metadata, H2 settings, Sec-* headers
- Ensure maximum data completeness: all available L3/L4, TLS, HTTP fields emitted

bot-detector:
- Replace logistic regression with MLP fusion classifier
- Replace KS drift detection with ADWIN online learning
- Replace NetworkX/Louvain with PyTorch Geometric GraphSAGE for fleet detection
- Replace autoencoder with RealNVP normalizing flow + SessionTransformer embeddings

infra:
- Add distributed E2E test infrastructure (4 VMs: endpoints + analysis)
- Add Vagrant provisioning for analysis VM, e2e Makefile targets, run scripts

docs:
- Restructure thesis into chapter files with corrected references
- Add E2E testing documentation
- Update architecture, schema, deployment, service docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 02:57:07 +02:00
f88b739992 feat(e2e): add distributed E2E test framework with parametric traffic generation
Add run-e2e-test.sh with CLI parameters (--hits, --http-ratio, --dns, --tls,
--src-ips, --keep-analysis, --up) for configurable traffic generation. Traffic
runs from VM endpoints with multiple source IPs (alias IPs on eth0) to produce
distinct sessions for the ML pipeline. Fix curl TLS flags (--tlsv1.2 instead
of --tls-v1-2), skip redundant local verification in distributed mode, and
fix dashboard is_available() cache that never retried after ClickHouse recovery.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 00:09:32 +02:00