Commit Graph

2 Commits

Author SHA1 Message Date
7b7b69dee3 Rewrite seed_clickhouse.py: 500K rows from 20K IPs with realistic traffic
- 350K browser rows (14K IPs) using real JA4s from browser_ja4.csv
- 100K scanner rows (3K IPs) with vuln/cred/scraper/DDoS sub-categories
- 30K legit bot rows (2K IPs) from real bot_ip.csv CIDRs
- 20K AI bot rows (1K IPs) for GPTBot, ClaudeBot, etc.

Key improvements:
- Load browser_ja4.csv at startup, match JA4 to browser family
- Load bot_ip.csv to generate IPs from real Googlebot/Bingbot CIDRs
- Hard-coded ISP /24 prefixes from real ASNs (Comcast, Orange, DT, etc.)
- Realistic navigation patterns with Referer chains and cookies
- Sec-CH-UA headers for Chromium browsers (modern_browser_score >= 50)
- Batch size increased to 2000, progress reporting every 10K rows
- New CLI args: --rows, --ips, --seed, --data-dir
- Bot JA4s are synthetic hashes guaranteed NOT in browser_ja4.csv

Also updated:
- Dockerfile: COPY *.py (was missing seed_clickhouse.py)
- docker-compose.yml: mount scripts/data as /app/data for CSV access
- run-tests.sh: updated seeder description comments

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-08 16:35:40 +02:00
12d60975da feat: Python traffic generator with realistic varied HTTP/HTTPS traffic
- Replace curlimages/curl with Python stdlib traffic generator
- 200 requests, 10 workers, 16 scenario types:
  browsers (Chrome/Firefox/Safari/Edge/mobile), bots (Googlebot/Bing/curl/wget),
  GET/POST/HEAD/PUT/PATCH/DELETE/OPTIONS, HTTP + HTTPS
- Multiple SSL contexts (default, TLS1.2-only, TLS1.3-only, few_ciphers)
  → 4 distinct JA4/JA3 fingerprints per test run
- Realistic headers: Accept, Accept-Language, Sec-Fetch-*, Referer,
  X-Forwarded-For, Cookie, Cache-Control
- JSON payloads, form data, CORS preflights
- DB always reset (down -v) at start of each test run
- Enhanced Phase 5 checks: distinct UAs, method variety, JA4/JA3 counts + uniqueness

Results: 199/200 OK, 24 distinct UAs, 7 HTTP methods, TLS 1.2+1.3, 4 JA4 fingerprints

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-07 21:14:55 +02:00