Commit Graph

13 Commits

Author SHA1 Message Date
a02423fd18 feat: maximize data completeness across L3/L4/TLS/HTTP layers and add E2E test infra
Add SSL_write uprobe for HTTP response capture, HPACK decoder for HTTP/2
header extraction, and AcceptCache for reliable SSL/TC session correlation.
Populate all ClickHouse fields including tcp_meta_options, ip_meta_total_length,
syn_to_clienthello_ms, client_headers, TLS cipher suites/extensions, and
h2_enable_connect_protocol. Increase BPF capture buffers (HTTP 512B, TLS 1024B).
Add distributed E2E testing infrastructure with multi-VM Vagrant setup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 03:34:33 +02:00
e25caa85da fix(ja4ebpf): remove double bswap16 on accept4 port
The manual byte assembly (sa_buf[2]<<8 | sa_buf[3]) already produces
a host-byte-order port value; __builtin_bswap16 was swapping it again,
causing SSL events to use wrong source ports and preventing TLS/HTTP
session correlation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 02:57:29 +02:00
61addc8cfa feat: JA3 fingerprinting, SSL correlation fix, ML pipeline overhaul, E2E test infra
ja4ebpf:
- Add JA3 raw + MD5 hash fingerprinting (ComputeJA3 in TLS parser)
- Fix accept4 port double-swap bug (__builtin_bswap16 on already-host-order value)
- Fix scheme override bug in ClickHouse writer (HTTP block clearing HTTPS)
- Add HTTP/2 passive fingerprinting (Akamai H2 FP, SETTINGS, pseudo-header order)
- Enrich ClickHouse schema with IP/TCP metadata, H2 settings, Sec-* headers
- Ensure maximum data completeness: all available L3/L4, TLS, HTTP fields emitted

bot-detector:
- Replace logistic regression with MLP fusion classifier
- Replace KS drift detection with ADWIN online learning
- Replace NetworkX/Louvain with PyTorch Geometric GraphSAGE for fleet detection
- Replace autoencoder with RealNVP normalizing flow + SessionTransformer embeddings

infra:
- Add distributed E2E test infrastructure (4 VMs: endpoints + analysis)
- Add Vagrant provisioning for analysis VM, e2e Makefile targets, run scripts

docs:
- Restructure thesis into chapter files with corrected references
- Add E2E testing documentation
- Update architecture, schema, deployment, service docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 02:57:07 +02:00
f88b739992 feat(e2e): add distributed E2E test framework with parametric traffic generation
Add run-e2e-test.sh with CLI parameters (--hits, --http-ratio, --dns, --tls,
--src-ips, --keep-analysis, --up) for configurable traffic generation. Traffic
runs from VM endpoints with multiple source IPs (alias IPs on eth0) to produce
distinct sessions for the ML pipeline. Fix curl TLS flags (--tlsv1.2 instead
of --tls-v1-2), skip redundant local verification in distributed mode, and
fix dashboard is_available() cache that never retried after ClickHouse recovery.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 00:09:32 +02:00
ac75ce2956 chore: remove regenerable data and build artifacts from git tracking
Add .gitignore rules for generated CSV data, eBPF compiled objects,
and vmlinux.h header. Remove 19 tracked files (~175 MB) that can be
regenerated from scripts (generate_*.py), bpftool, or bpf2go.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-13 02:31:27 +02:00
d75825278e feat: multi-distro VM tests, ja4ebpf eBPF improvements, bot-detector scoring
ja4ebpf:
- Refactor BPF TC capture with improved SYN offset handling and TCP option parsing
- Enhance TLS uprobe SSL hooking for better key extraction
- Add ClickHouse writer improvements for HTTP log materialized views
- Update RPM spec for Rocky Linux 8/9/10, fix systemd service
- Simplify loader with cleaner bpf2go integration

bot-detector:
- Add H2 SETTINGS per-parameter comparison in browser_matcher
- Enhance browser signatures and scoring pipeline
- Improve preprocessing and cycle detection

infra:
- Multi-distro Vagrantfile (centos8, rocky9, rocky10) with per-distro provisioning
- New Makefile targets: vm-up-all, test-vm-matrix, test-vm-centos8/rocky10
- Add debug helpers and run-test-from-host.sh for host-driven VM testing
- Update run-tests-vm.sh for cross-distro compatibility
- Remove accidental binary blob (\004)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-13 01:09:33 +02:00
957918c565 fix(ja4ebpf): Rocky Linux RPM builder, remove correlated field, fix thesis
- Dockerfile.package: migre go-builder de golang:bookworm (Debian) vers
  rockylinux:9, installe Go depuis le tarball officiel, remplace apt par
  dnf (clang llvm libbpf-devel bpftool)

- Suppression du champ 'correlated' de l'agent ja4ebpf : avec eBPF/XDP,
  la corrélation L3/L4↔L7 est toujours implicite par présence des champs.
  Supprimé de : session.go, manager.go, main.go (x5), clickhouse.go

- Thèse (6 corrections listées + cohérence correlated) :
  1. §3.5 + §3.9.1 : SSL_read retourne des octets bruts sans respecter les
     frontières H2 → buffer circulaire de réassemblage en Go userspace
  2. §3.1 : supprimé libpcap + CAP_NET_RAW, remplacé par définition uprobe
  3. §4 + §7 : compte exact 96 features en 8 familles (Famille 1–8),
     supprimé taxonomie F1–F11 obsolète, tous les totaux mis à jour
  4. §2.4 + §8 : remplacé 7 fausses URLs arXiv par [Référence à vérifier]
  5. §4 Famille 2 : ja4_drift_ratio → renvoi à Famille 8 (définition complète)
  6. §6.4 : ajouté limite 'Overhead de l'uprobe SSL_read'
  + §3.6 : supprimé correlated=0/1 du texte architectural

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-12 04:48:40 +02:00
b1218a2367 fix(ja4ebpf): fix TLS capture, SYN offsets, TCP option parsing
- Increase MAX_TLS_PAYLOAD from 512 to 2048 bytes to capture full
  TLS ClientHellos (modern browsers/curl send 1000-1543 byte ClientHellos)
- Fix ParseClientHello to tolerate XDP-truncated payloads: clamp
  recordLength and chLen to available data instead of returning error
- Fix cipher suites, compression, extensions truncation to use clamping
- Fix consumeSynEvents struct field offsets: dst_ip (4 bytes at offset 4)
  was not accounted for, causing all L3/L4 metadata to be read from
  wrong positions (TTL was actually dst_ip[0], windowSize was dst_port, etc.)
- Add parseTCPOptions() to extract MSS and Window Scale from raw TCP options
  (C code sets defaults of mss=0, window_scale=0xFF, expects Go to parse)
- Fix consumeAcceptEvents: skip zero-IP events to avoid phantom sessions
- Fix consumeSSLEvents: filter zero-IP/port events when proc fallback fails
- Add missing consumeHTTPPlainEvents goroutine (was defined but never called)
- Fix race condition: SYN consumer sets Correlated=true if TLS already present
- Update tls_hello_event struct offsets in Go consumer (payload_len now at
  offset 2054, was 518, due to payload array growing from 512 to 2048 bytes)
- Remove debug logging from consumers and GC

E2E verified: HTTP plain (port 80) and HTTPS (port 443) both produce
fully correlated sessions in ClickHouse with correct:
  - ip_meta_ttl=64, ip_meta_df=true, ip_meta_id
  - tcp_meta_window_size=64240, tcp_meta_window_scale=10, tcp_meta_mss=1460
  - ja4=t13i3010_1d37bd780c83_95d2a80e6515
  - tls_alpn=http/1.1
  - method=GET, path=/, header_order_signature=Host;User-Agent;Accept
  - correlated=1

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-12 04:16:44 +02:00
f85a10b012 feat: pipeline L7 HTTP complet + infrastructure tests VM
Correctifs pipeline L7 (uprobe SSL_read) :
- uprobe_ssl.c : ssl_set_fd ne retourne plus tôt quand fd_conn_map est
  vide (accept4 non disponible en Docker). Sauvegarde ssl_ptr→{fd,0,0}
  pour permettre le fallback /proc côté Go.
- main.go : consumeSSLEvents reécrit avec routeur magic-bytes complet :
  * HTTP/2 preface → extraction SETTINGS + conversion correlation.HTTP2Settings
  * HTTP/1.x requête → method, path, query, headers, header_order_sig
  * HTTP/1.x réponse → status_code
  * Fallback /proc/<tgid>/fd/<fd> quand src_ip=0 (accept4 absent)
- writer/clickhouse.go : export header_order_signature ajouté

Nouveaux packages :
- internal/parser/http1.go : parseur HTTP/1.x (IsHTTP1Request,
  ParseHTTP1Request, IsHTTP1Response, ParseHTTP1Response)
- internal/parser/http1_test.go : 11 tests unitaires (28 total passent)
- internal/procutil/proc_lookup.go : résolution fd→IP via /proc avec cache
  TTL 5s (FDCache). Supporte /proc/PID/net/tcp et tcp6, IPv4-mappé IPv6.

Infrastructure tests VM (tests/vm/) :
- Vagrantfile : VM Rocky Linux 9 KVM, 4 CPU / 4 GB RAM
- provision.sh : installation toolchain eBPF + Go + Docker + nginx
- run-tests-vm.sh : suite de test complète dans la VM (L3/L4+TLS+L7)
- README.md : guide d'installation et d'utilisation
- Makefile : cibles vm-up, vm-down, vm-ssh, test-vm-nginx, test-vm-all,
  vm-rebuild-ja4ebpf

Corrections stack Docker :
- Dockerfiles nginx/apache/nginx-varnish/hitch-varnish : suppression des
  références à shared/go/ja4common/ (répertoire supprimé)
- clickhouse-init.sh : restauré depuis git, seed anubis_ua_rules obsolète
  supprimé (table REGEXP_TREE supprimée du schéma)
- traffic-gen : ajout HTTP/1.0 (http.client) et HTTP/2 (httpx)
- verify_db.py : script de vérification 35 checks (L3/L4/TLS/L7/corrélation)
- run-stack-tests.sh : phase 6 verify_db ajoutée

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-12 02:37:00 +02:00
9734e21fe3 chore: suppression des services obsolètes (sentinel, correlator, mod-reqin-log)
Remplacés par l'agent ja4ebpf (eBPF CO-RE). Nettoyage complet :

Supprimé :
- old/ (archive de l'ancienne architecture)
- services/correlator/ (logcorrelator Go)
- services/sentinel/ (capture pcap Go)
- services/mod-reqin-log/ (module Apache C)
- shared/go/ja4common/ (lib Go partagée — plus importée par ja4ebpf)
- tests/integration/platform/ (test correlator+sentinel+httpd)
- tests/integration/docker-compose.yml (compose ancienne archi)
- tests/integration/run-tests.sh (runner correlator/sentinel)
- tests/integration/verify_mvs.py (script orphelin)

Nettoyé :
- go.work : retire ./shared/go/ja4common
- services/ja4ebpf/go.mod : retire replace ja4common (jamais importé)
- services/ja4ebpf/Dockerfile* : retire les COPY ja4common inutiles
- Makefile : retire test-ja4common-python, test-integration*, targets obsolètes
- tests/integration/README.md : réécrit pour l'architecture ja4ebpf

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-12 01:48:14 +02:00
dc6ffd6474 fix: tests intégration matrix — procps-ng, varnish h2, hitch ALPN, pgrep→ps
- Ajout de procps-ng dans les 4 Dockerfiles runtime (ps/pgrep disponibles)
- Remplacement de pgrep par ps -C dans tous les run-tests.sh
- Correction entrypoint nginx-varnish : pgrep nginx → cat nginx.pid (exit 127)
- Activation HTTP/2 dans Varnish : ajout de -p feature=+http2 dans les
  entrypoints nginx-varnish et hitch-varnish
- Restauration ALPN h2,http/1.1 dans hitch.conf (varnish supporte maintenant h2)
- Correction healthcheck hitch-varnish : curl sans --http1.1 (h2 fonctionnel)
- Correction requêtes phase_verify : http_logs_raw → http_logs, colonnes correctes
- Correction writer clickhouse.go : noms JSON alignés avec la MV (ip_meta_*, tls_sni…)
- Fix toStartOfSecond(DateTime) → toStartOfSecond(toDateTime64(col, 3))
- Retrait du SKIP el8/nginx-varnish (varnish s'installe bien sur AlmaLinux 8)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-12 01:29:01 +02:00
3b047b680a fix(ja4ebpf): split bpf2go generate into Ja4Tc + Ja4Ssl, fix RPM systemd-rpm-macros
- Use two separate //go:generate directives (Ja4Tc for tc_capture.c, Ja4Ssl
  for uprobe_ssl.c) to avoid duplicate LICENSE symbol and multi-file clang issue
- Update loader.go to hold tcObjs/sslObjs separately with correct field names:
  UprobeSslSetFd, UprobeSslReadEntry, UretprobeSslReadExit,
  KprobeAccept4Entry, KretprobeAccept4Exit
- Add systemd-rpm-macros to all three RPM build stages (el8/el9/el10)
  so that %{_unitdir} macro resolves correctly
- RPMs now build successfully for el8, el9, el10

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-11 23:21:11 +02:00
a1e4c1dad5 feat: add ja4ebpf service — eBPF-based TLS/TCP fingerprinting daemon
- TC ingress hook captures TCP SYN (L3/L4) and TLS ClientHello
- Uprobes on SSL_read/SSL_set_fd capture decrypted TLS data
- Kprobes on accept4 correlate socket FDs to client IP:port
- JA4 fingerprint computed from parsed TLS ClientHello
- HTTP/2 SETTINGS and WINDOW_UPDATE extracted from decrypted streams
- Session manager with sharded map (256 shards) and GC goroutine
- Slowloris detection: sessions with no requests after 10s threshold
- ClickHouse batch writer to ja4_logs.http_logs_raw (raw_json)
- All tests pass: 17 parser + 10 correlation tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-11 22:43:26 +02:00