|
|
5c5bca71d1
|
feat: rewrite ASN classification with PeeringDB + expanded heuristics
Major improvements to generate_asn_data.py:
- Add PeeringDB network data source (34K networks with info_type)
- Add new categories: education, government, enterprise
- Rename 'human' label to 'isp' across all consumers
- Expand keyword heuristics (ISP, datacenter, hosting, CDN, education, gov)
- Add hard-coded lists for education, government, enterprise ASNs
- Support both --output-dir and --output-asn/--output-ipasn CLI interfaces
- Add --no-peeringdb flag for offline use
Results: unknown dropped from 86% to 57%, ISP coverage 21.8K ASNs,
education 3.1K, enterprise 5.7K, government 520.
Updated consumers:
- bot_detector.py: 'human' -> 'isp' for baseline selection
- dashboard api.py: 'human' -> 'isp' in SQL queries
- run-tests.sh: 'human' -> 'isp' in integration test assertions
- update-csv-data.sh: updated label description comment
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
2026-04-08 16:02:07 +02:00 |
|
|
|
7d09c614c3
|
feat: browser JA4 detection, Anubis bot rules, worldwide ASN data
- Add generate_browser_ja4.py: 1,186 browser JA4 fingerprints from FoxIO + ja4db.com
covering 11 families (Chromium, Firefox, Safari, Edge, Tor, Opera, Vivaldi...)
- Rewrite generate_bot_ip.py: Anubis YAML rules (Google, Bing, Apple, DuckDuck,
OpenAI, Perplexity bots) + Tor exit nodes + cloud scanner IPs (3,555 entries)
- Rewrite generate_asn_data.py: worldwide iptoasn.com data (78,049 ASNs, 714K CIDRs)
- Add dict_browser_ja4 ClickHouse dictionary + browser_family in AI features views
- Add /api/browsers dashboard endpoint
- Fix CSV quoting for fields containing commas (User-Agent strings)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
2026-04-08 15:27:37 +02:00 |
|
|
|
b6184e6529
|
feat: CSV generation scripts, API filter params, enriched CSV stubs
- scripts/generate_bot_ip.py: download Tor exit nodes + curate scanner IPs (1353 entries)
- scripts/generate_bot_ja4.py: 31 bot JA4 fingerprints across 16 families
- scripts/generate_asn_data.py: 38 ASNs + 96 IP-to-ASN prefixes
- scripts/update-csv-data.sh: master orchestrator with --install-stubs
- api.py: add asn_org/country_code/ja4/bot_name filters on detections+scores
- pages.py: add /network route
- csv-stubs: enriched with generated data (Tor nodes, scanner IPs, etc.)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
2026-04-08 15:05:43 +02:00 |
|