Files
ja4-platform/tests/integration/platform/httpd-integration.conf
toto fc882dd3e7 feat(tests): realistic traffic seeder + IP diversity via mod_remoteip
Option A — X-Forwarded-For + mod_remoteip:
- httpd-integration.conf: load mod_remoteip, trust all Docker RFC-1918
  subnets (172/192.168/10). mod_reqin_log uses r->useragent_ip which
  mod_remoteip updates from XFF → each request logged with distinct src_ip
- generate_traffic.py: XFF always set (was 30% only); human scenarios
  use 91.121/78.41/90.x ranges, bot scenarios use 185.220/45.155/193.32;
  pool of 1168 human IPs and 180 bot IPs; default --requests 500

Option D — Direct ClickHouse seeder (seed_clickhouse.py, stdlib only):
- Inserts ~4000 rows into http_logs_raw triggering full MV chain:
    http_logs_raw → mv_http_logs → http_logs
                 → mv_agg_host_ip_ja4_1h → agg_host_ip_ja4_1h
  • 720 human sessions: IPs in OVH/SFR/Orange ASN ranges (16276/15557/3215)
    → dict_asn_reputation maps these to asn_label='human'
    → satisfies bot_detector human_baseline >= 500 threshold
  • 150 scanner sessions: datacenter IPs, attack paths (/.env, wp-login,
    SQLi, path traversal), scanner UAs, minimal TCP fingerprints
  • 100 known-bot sessions: IPs matching bot_ip.csv entries
  • 20 brute-force clusters: 20-50 POST /login per IP
  All TCP/TLS metadata is profile-realistic (window, MSS, TTL, JA4, JA3)

CSV stubs (mounted at /var/lib/clickhouse/user_files/):
- iplocate-ip-to-asn.csv: 13 CIDR→ASN mappings (OVH/SFR/Orange/Tor/Contabo)
- asn_reputation.csv: 13 ASN→label (8 'human', 3 'datacenter'/'hosting')
- bot_ip.csv: 14 known scanner/Tor IPs (Shodan, Censys, Tor exits)
- bot_ja4.csv: 5 bot JA4 fingerprints (curl, python-requests, masscan, zgrab)

run-tests.sh:
- Phase 4a: seeder runs before live traffic (ensures bot_detector baseline)
- Phase 4b: live traffic gen at 500 requests (up from 200)
- Phase 5f: new assertions — agg_host_ip_ja4_1h populated, ≥500 human
  rows in view_ai_features_1h, known-bot labels present
- Phase 7: verifies ml_all_scores populated (bot_detector ran a cycle)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-08 11:35:34 +02:00

38 lines
1.3 KiB
Plaintext

# Integration test Apache config — HTTPS + mod-reqin-log
# Load mod-reqin-log
LoadModule reqin_log_module modules/mod_reqin_log.so
# mod_remoteip: trust X-Forwarded-For from Docker internal subnets.
# mod_reqin_log reads r->useragent_ip which mod_remoteip updates,
# so the XFF IP appears as src_ip in the correlated logs.
LoadModule remoteip_module modules/mod_remoteip.so
RemoteIPHeader X-Forwarded-For
RemoteIPInternalProxy 172.0.0.0/8
RemoteIPInternalProxy 192.168.0.0/16
RemoteIPInternalProxy 10.0.0.0/8
# Enable mod-reqin-log with correlator socket
JsonSockLogEnabled On
JsonSockLogSocket "/var/run/logcorrelator/http.socket"
JsonSockLogHeaders X-Request-Id User-Agent Referer X-Forwarded-For \
Sec-CH-UA Sec-CH-UA-Mobile Sec-CH-UA-Platform \
Sec-Fetch-Dest Sec-Fetch-Mode Sec-Fetch-Site \
Accept Accept-Language Accept-Encoding Content-Type
JsonSockLogMaxHeaders 25
JsonSockLogMaxHeaderValueLen 256
JsonSockLogReconnectInterval 5
JsonSockLogErrorReportInterval 5
JsonSockLogLevel DEBUG
# HTTPS virtual host (port 443 already configured by mod_ssl)
<VirtualHost *:80>
ServerName platform.test
DocumentRoot /var/www/html
# Simple test pages
<Location /health>
Require all granted
</Location>
</VirtualHost>