|
|
f85a10b012
|
feat: pipeline L7 HTTP complet + infrastructure tests VM
Correctifs pipeline L7 (uprobe SSL_read) :
- uprobe_ssl.c : ssl_set_fd ne retourne plus tôt quand fd_conn_map est
vide (accept4 non disponible en Docker). Sauvegarde ssl_ptr→{fd,0,0}
pour permettre le fallback /proc côté Go.
- main.go : consumeSSLEvents reécrit avec routeur magic-bytes complet :
* HTTP/2 preface → extraction SETTINGS + conversion correlation.HTTP2Settings
* HTTP/1.x requête → method, path, query, headers, header_order_sig
* HTTP/1.x réponse → status_code
* Fallback /proc/<tgid>/fd/<fd> quand src_ip=0 (accept4 absent)
- writer/clickhouse.go : export header_order_signature ajouté
Nouveaux packages :
- internal/parser/http1.go : parseur HTTP/1.x (IsHTTP1Request,
ParseHTTP1Request, IsHTTP1Response, ParseHTTP1Response)
- internal/parser/http1_test.go : 11 tests unitaires (28 total passent)
- internal/procutil/proc_lookup.go : résolution fd→IP via /proc avec cache
TTL 5s (FDCache). Supporte /proc/PID/net/tcp et tcp6, IPv4-mappé IPv6.
Infrastructure tests VM (tests/vm/) :
- Vagrantfile : VM Rocky Linux 9 KVM, 4 CPU / 4 GB RAM
- provision.sh : installation toolchain eBPF + Go + Docker + nginx
- run-tests-vm.sh : suite de test complète dans la VM (L3/L4+TLS+L7)
- README.md : guide d'installation et d'utilisation
- Makefile : cibles vm-up, vm-down, vm-ssh, test-vm-nginx, test-vm-all,
vm-rebuild-ja4ebpf
Corrections stack Docker :
- Dockerfiles nginx/apache/nginx-varnish/hitch-varnish : suppression des
références à shared/go/ja4common/ (répertoire supprimé)
- clickhouse-init.sh : restauré depuis git, seed anubis_ua_rules obsolète
supprimé (table REGEXP_TREE supprimée du schéma)
- traffic-gen : ajout HTTP/1.0 (http.client) et HTTP/2 (httpx)
- verify_db.py : script de vérification 35 checks (L3/L4/TLS/L7/corrélation)
- run-stack-tests.sh : phase 6 verify_db ajoutée
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
2026-04-12 02:37:00 +02:00 |
|
|
|
fc882dd3e7
|
feat(tests): realistic traffic seeder + IP diversity via mod_remoteip
Option A — X-Forwarded-For + mod_remoteip:
- httpd-integration.conf: load mod_remoteip, trust all Docker RFC-1918
subnets (172/192.168/10). mod_reqin_log uses r->useragent_ip which
mod_remoteip updates from XFF → each request logged with distinct src_ip
- generate_traffic.py: XFF always set (was 30% only); human scenarios
use 91.121/78.41/90.x ranges, bot scenarios use 185.220/45.155/193.32;
pool of 1168 human IPs and 180 bot IPs; default --requests 500
Option D — Direct ClickHouse seeder (seed_clickhouse.py, stdlib only):
- Inserts ~4000 rows into http_logs_raw triggering full MV chain:
http_logs_raw → mv_http_logs → http_logs
→ mv_agg_host_ip_ja4_1h → agg_host_ip_ja4_1h
• 720 human sessions: IPs in OVH/SFR/Orange ASN ranges (16276/15557/3215)
→ dict_asn_reputation maps these to asn_label='human'
→ satisfies bot_detector human_baseline >= 500 threshold
• 150 scanner sessions: datacenter IPs, attack paths (/.env, wp-login,
SQLi, path traversal), scanner UAs, minimal TCP fingerprints
• 100 known-bot sessions: IPs matching bot_ip.csv entries
• 20 brute-force clusters: 20-50 POST /login per IP
All TCP/TLS metadata is profile-realistic (window, MSS, TTL, JA4, JA3)
CSV stubs (mounted at /var/lib/clickhouse/user_files/):
- iplocate-ip-to-asn.csv: 13 CIDR→ASN mappings (OVH/SFR/Orange/Tor/Contabo)
- asn_reputation.csv: 13 ASN→label (8 'human', 3 'datacenter'/'hosting')
- bot_ip.csv: 14 known scanner/Tor IPs (Shodan, Censys, Tor exits)
- bot_ja4.csv: 5 bot JA4 fingerprints (curl, python-requests, masscan, zgrab)
run-tests.sh:
- Phase 4a: seeder runs before live traffic (ensures bot_detector baseline)
- Phase 4b: live traffic gen at 500 requests (up from 200)
- Phase 5f: new assertions — agg_host_ip_ja4_1h populated, ≥500 human
rows in view_ai_features_1h, known-bot labels present
- Phase 7: verifies ml_all_scores populated (bot_detector ran a cycle)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
2026-04-08 11:35:34 +02:00 |
|
|
|
12d60975da
|
feat: Python traffic generator with realistic varied HTTP/HTTPS traffic
- Replace curlimages/curl with Python stdlib traffic generator
- 200 requests, 10 workers, 16 scenario types:
browsers (Chrome/Firefox/Safari/Edge/mobile), bots (Googlebot/Bing/curl/wget),
GET/POST/HEAD/PUT/PATCH/DELETE/OPTIONS, HTTP + HTTPS
- Multiple SSL contexts (default, TLS1.2-only, TLS1.3-only, few_ciphers)
→ 4 distinct JA4/JA3 fingerprints per test run
- Realistic headers: Accept, Accept-Language, Sec-Fetch-*, Referer,
X-Forwarded-For, Cookie, Cache-Control
- JSON payloads, form data, CORS preflights
- DB always reset (down -v) at start of each test run
- Enhanced Phase 5 checks: distinct UAs, method variety, JA4/JA3 counts + uniqueness
Results: 199/200 OK, 24 distinct UAs, 7 HTTP methods, TLS 1.2+1.3, 4 JA4 fingerprints
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
2026-04-07 21:14:55 +02:00 |
|