feat(tests): realistic traffic seeder + IP diversity via mod_remoteip

Option A — X-Forwarded-For + mod_remoteip:
- httpd-integration.conf: load mod_remoteip, trust all Docker RFC-1918
  subnets (172/192.168/10). mod_reqin_log uses r->useragent_ip which
  mod_remoteip updates from XFF → each request logged with distinct src_ip
- generate_traffic.py: XFF always set (was 30% only); human scenarios
  use 91.121/78.41/90.x ranges, bot scenarios use 185.220/45.155/193.32;
  pool of 1168 human IPs and 180 bot IPs; default --requests 500

Option D — Direct ClickHouse seeder (seed_clickhouse.py, stdlib only):
- Inserts ~4000 rows into http_logs_raw triggering full MV chain:
    http_logs_raw → mv_http_logs → http_logs
                 → mv_agg_host_ip_ja4_1h → agg_host_ip_ja4_1h
  • 720 human sessions: IPs in OVH/SFR/Orange ASN ranges (16276/15557/3215)
    → dict_asn_reputation maps these to asn_label='human'
    → satisfies bot_detector human_baseline >= 500 threshold
  • 150 scanner sessions: datacenter IPs, attack paths (/.env, wp-login,
    SQLi, path traversal), scanner UAs, minimal TCP fingerprints
  • 100 known-bot sessions: IPs matching bot_ip.csv entries
  • 20 brute-force clusters: 20-50 POST /login per IP
  All TCP/TLS metadata is profile-realistic (window, MSS, TTL, JA4, JA3)

CSV stubs (mounted at /var/lib/clickhouse/user_files/):
- iplocate-ip-to-asn.csv: 13 CIDR→ASN mappings (OVH/SFR/Orange/Tor/Contabo)
- asn_reputation.csv: 13 ASN→label (8 'human', 3 'datacenter'/'hosting')
- bot_ip.csv: 14 known scanner/Tor IPs (Shodan, Censys, Tor exits)
- bot_ja4.csv: 5 bot JA4 fingerprints (curl, python-requests, masscan, zgrab)

run-tests.sh:
- Phase 4a: seeder runs before live traffic (ensures bot_detector baseline)
- Phase 4b: live traffic gen at 500 requests (up from 200)
- Phase 5f: new assertions — agg_host_ip_ja4_1h populated, ≥500 human
  rows in view_ai_features_1h, known-bot labels present
- Phase 7: verifies ml_all_scores populated (bot_detector ran a cycle)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
toto
2026-04-08 11:35:34 +02:00
parent f448dcb4b0
commit fc882dd3e7
8 changed files with 688 additions and 45 deletions

View File

@ -0,0 +1,14 @@
src_asn,label
16276,human
15557,human
3215,human
5432,human
1136,human
2856,human
8913,human
3352,human
15169,human
8075,human
210644,datacenter
209083,datacenter
197695,hosting

1 src_asn label
1 src_asn label
2 16276 human
3 15557 human
4 3215 human
5 5432 human
6 1136 human
7 2856 human
8 8913 human
9 3352 human
10 15169 human
11 8075 human
12 210644 datacenter
13 209083 datacenter
14 197695 hosting

View File

@ -0,0 +1,14 @@
185.220.101.34/32,Tor_Exit_Node
185.220.101.47/32,Tor_Exit_Node
185.220.101.52/32,Tor_Exit_Node
185.220.101.73/32,Tor_Exit_Node
185.220.101.91/32,Tor_Exit_Node
185.220.100.253/32,Tor_Exit_Node
45.155.205.233/32,Shodan_Scanner
45.155.205.220/32,Shodan_Scanner
45.155.205.205/32,Shodan_Scanner
45.155.205.190/32,Shodan_Scanner
45.155.205.175/32,Shodan_Scanner
193.32.162.10/32,Censys_Scanner
193.32.162.11/32,Censys_Scanner
193.32.162.25/32,Censys_Scanner

1 185.220.101.34/32 Tor_Exit_Node
1 185.220.101.34/32 Tor_Exit_Node
2 185.220.101.47/32 Tor_Exit_Node
3 185.220.101.52/32 Tor_Exit_Node
4 185.220.101.73/32 Tor_Exit_Node
5 185.220.101.91/32 Tor_Exit_Node
6 185.220.100.253/32 Tor_Exit_Node
7 45.155.205.233/32 Shodan_Scanner
8 45.155.205.220/32 Shodan_Scanner
9 45.155.205.205/32 Shodan_Scanner
10 45.155.205.190/32 Shodan_Scanner
11 45.155.205.175/32 Shodan_Scanner
12 193.32.162.10/32 Censys_Scanner
13 193.32.162.11/32 Censys_Scanner
14 193.32.162.25/32 Censys_Scanner

View File

@ -0,0 +1,5 @@
t13d030500_ffd59bab1b39_6e7f7df63e98,curl_scanner
t13d020300_6b9b1b2c3d4e_ffd59bab1b39,python_requests_scanner
t10d170000_0a1b2c3d4e5f_1b2c3d4e5f60,Masscan
t12d050700_5a6b7c8d9e0f_1a2b3c4d5e6f,zgrab_scanner
t13d010100_aabbccddeeff_0011223344aa,Headless_Chrome_Automation

1 t13d030500_ffd59bab1b39_6e7f7df63e98 curl_scanner
1 t13d030500_ffd59bab1b39_6e7f7df63e98 curl_scanner
2 t13d020300_6b9b1b2c3d4e_ffd59bab1b39 python_requests_scanner
3 t10d170000_0a1b2c3d4e5f_1b2c3d4e5f60 Masscan
4 t12d050700_5a6b7c8d9e0f_1a2b3c4d5e6f zgrab_scanner
5 t13d010100_aabbccddeeff_0011223344aa Headless_Chrome_Automation

View File

@ -1 +1,14 @@
network,asn,country_code,name,org,domain
91.121.0.0/16,16276,FR,OVH SAS,OVH,ovh.com
78.41.0.0/16,15557,FR,SFR SA,SFR,sfr.com
90.0.0.0/8,3215,FR,Orange SA,Orange,orange.fr
212.0.0.0/8,5432,DE,Deutsche Telekom AG,Telekom,telekom.de
84.116.0.0/16,1136,NL,KPN Internet BV,KPN,kpn.com
77.108.0.0/16,2856,GB,BT Group plc,BT,bt.com
82.45.0.0/16,8913,GB,Virgin Media,Virgin Media,virginmedia.com
62.98.0.0/16,3352,ES,Telefonica Spain,Telefonica,telefonica.es
66.249.64.0/19,15169,US,Google LLC,Google,google.com
157.55.0.0/16,8075,US,Microsoft Corporation,Bing,microsoft.com
185.220.0.0/16,210644,NL,Accelerated-IT Services,Tor Project,tor-project.org
45.155.205.0/24,209083,DE,Contabo GmbH,Contabo,contabo.de
193.32.162.0/24,197695,RU,Reg.ru Hosting,Reg.ru,reg.ru

1 network asn country_code name org domain
2 91.121.0.0/16 16276 FR OVH SAS OVH ovh.com
3 78.41.0.0/16 15557 FR SFR SA SFR sfr.com
4 90.0.0.0/8 3215 FR Orange SA Orange orange.fr
5 212.0.0.0/8 5432 DE Deutsche Telekom AG Telekom telekom.de
6 84.116.0.0/16 1136 NL KPN Internet BV KPN kpn.com
7 77.108.0.0/16 2856 GB BT Group plc BT bt.com
8 82.45.0.0/16 8913 GB Virgin Media Virgin Media virginmedia.com
9 62.98.0.0/16 3352 ES Telefonica Spain Telefonica telefonica.es
10 66.249.64.0/19 15169 US Google LLC Google google.com
11 157.55.0.0/16 8075 US Microsoft Corporation Bing microsoft.com
12 185.220.0.0/16 210644 NL Accelerated-IT Services Tor Project tor-project.org
13 45.155.205.0/24 209083 DE Contabo GmbH Contabo contabo.de
14 193.32.162.0/24 197695 RU Reg.ru Hosting Reg.ru reg.ru

View File

@ -3,6 +3,15 @@
# Load mod-reqin-log
LoadModule reqin_log_module modules/mod_reqin_log.so
# mod_remoteip: trust X-Forwarded-For from Docker internal subnets.
# mod_reqin_log reads r->useragent_ip which mod_remoteip updates,
# so the XFF IP appears as src_ip in the correlated logs.
LoadModule remoteip_module modules/mod_remoteip.so
RemoteIPHeader X-Forwarded-For
RemoteIPInternalProxy 172.0.0.0/8
RemoteIPInternalProxy 192.168.0.0/16
RemoteIPInternalProxy 10.0.0.0/8
# Enable mod-reqin-log with correlator socket
JsonSockLogEnabled On
JsonSockLogSocket "/var/run/logcorrelator/http.socket"