Option A — X-Forwarded-For + mod_remoteip:
- httpd-integration.conf: load mod_remoteip, trust all Docker RFC-1918
subnets (172/192.168/10). mod_reqin_log uses r->useragent_ip which
mod_remoteip updates from XFF → each request logged with distinct src_ip
- generate_traffic.py: XFF always set (was 30% only); human scenarios
use 91.121/78.41/90.x ranges, bot scenarios use 185.220/45.155/193.32;
pool of 1168 human IPs and 180 bot IPs; default --requests 500
Option D — Direct ClickHouse seeder (seed_clickhouse.py, stdlib only):
- Inserts ~4000 rows into http_logs_raw triggering full MV chain:
http_logs_raw → mv_http_logs → http_logs
→ mv_agg_host_ip_ja4_1h → agg_host_ip_ja4_1h
• 720 human sessions: IPs in OVH/SFR/Orange ASN ranges (16276/15557/3215)
→ dict_asn_reputation maps these to asn_label='human'
→ satisfies bot_detector human_baseline >= 500 threshold
• 150 scanner sessions: datacenter IPs, attack paths (/.env, wp-login,
SQLi, path traversal), scanner UAs, minimal TCP fingerprints
• 100 known-bot sessions: IPs matching bot_ip.csv entries
• 20 brute-force clusters: 20-50 POST /login per IP
All TCP/TLS metadata is profile-realistic (window, MSS, TTL, JA4, JA3)
CSV stubs (mounted at /var/lib/clickhouse/user_files/):
- iplocate-ip-to-asn.csv: 13 CIDR→ASN mappings (OVH/SFR/Orange/Tor/Contabo)
- asn_reputation.csv: 13 ASN→label (8 'human', 3 'datacenter'/'hosting')
- bot_ip.csv: 14 known scanner/Tor IPs (Shodan, Censys, Tor exits)
- bot_ja4.csv: 5 bot JA4 fingerprints (curl, python-requests, masscan, zgrab)
run-tests.sh:
- Phase 4a: seeder runs before live traffic (ensures bot_detector baseline)
- Phase 4b: live traffic gen at 500 requests (up from 200)
- Phase 5f: new assertions — agg_host_ip_ja4_1h populated, ≥500 human
rows in view_ai_features_1h, known-bot labels present
- Phase 7: verifies ml_all_scores populated (bot_detector ran a cycle)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
757 B
757 B
| 1 | network | asn | country_code | name | org | domain |
|---|---|---|---|---|---|---|
| 2 | 91.121.0.0/16 | 16276 | FR | OVH SAS | OVH | ovh.com |
| 3 | 78.41.0.0/16 | 15557 | FR | SFR SA | SFR | sfr.com |
| 4 | 90.0.0.0/8 | 3215 | FR | Orange SA | Orange | orange.fr |
| 5 | 212.0.0.0/8 | 5432 | DE | Deutsche Telekom AG | Telekom | telekom.de |
| 6 | 84.116.0.0/16 | 1136 | NL | KPN Internet BV | KPN | kpn.com |
| 7 | 77.108.0.0/16 | 2856 | GB | BT Group plc | BT | bt.com |
| 8 | 82.45.0.0/16 | 8913 | GB | Virgin Media | Virgin Media | virginmedia.com |
| 9 | 62.98.0.0/16 | 3352 | ES | Telefonica Spain | Telefonica | telefonica.es |
| 10 | 66.249.64.0/19 | 15169 | US | Google LLC | google.com | |
| 11 | 157.55.0.0/16 | 8075 | US | Microsoft Corporation | Bing | microsoft.com |
| 12 | 185.220.0.0/16 | 210644 | NL | Accelerated-IT Services | Tor Project | tor-project.org |
| 13 | 45.155.205.0/24 | 209083 | DE | Contabo GmbH | Contabo | contabo.de |
| 14 | 193.32.162.0/24 | 197695 | RU | Reg.ru Hosting | Reg.ru | reg.ru |