6d02f21c1e
feat: implement thesis §5 advanced detection techniques as ClickHouse MVs
...
New aggregation tables + materialized views:
- agg_path_sequences_1h + MV (§5.1 Path Sequence Entropy)
- agg_request_timing_1h + MV (§5.3 Request Cadence Fingerprint)
- agg_ip_behavior_1h + MV (§5.5 JA4 Drift + §5.8 Cross-Domain)
- agg_resource_cascade_1h + MV (§5.4 Resource Dependency Tree)
New analytical views:
- view_thesis_features_1h: unified view exposing all computable features
(path_transition_entropy, cadence_cv, burst_ratio, pause_ratio,
ja4_drift_ratio, host_diversity, host_sweep_speed,
host_coverage_uniformity)
- view_resource_cascade_1h: root_to_first_asset_delay, asset_load_stddev
Documented future techniques (not feasible as MV):
- §5.2 Bipartite Fleet Graph (needs Python networkx)
- §5.6 DNS Shadow Analysis (needs sentinel UDP/53 extension)
- §5.7 Compression Ratio Invariant (needs mod_reqin_log extension)
Updated: deploy_schema.sh, verify_mvs.py (sections 8-10)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-08 01:42:52 +02:00
51b8eb57a8
feat: port v14 schema fixes, migration, MV verifier, thesis from ja4/
...
deploy_views.sql (v13 → v14):
- CRITICAL: ml_detected_anomalies ORDER BY (src_ip) → (src_ip, ja4, host, model_name)
ReplacingMergeTree was collapsing all detections to 1 row per IP on merge
- Add PARTITION BY toDate + ttl_only_drop_parts on all 4 data tables
- ml_all_scores TTL 3d → 7d; ml_detected_anomalies TTL 30d → 7d
- agg_host_ip_ja4_1h + agg_header_fingerprint_1h: add partition + TTL 7d
- view_ip_recurrence: add WHERE detected_at >= now() - 7 DAY (was full scan)
- Remove dead views: summary/timeseries/threat_dist/variability
- Add view_dashboard_entities (fixes HTTP 500 in clustering/incidents/fingerprints)
- Add view_dashboard_user_agents (fixes HTTP 500 in fingerprints/metrics)
- Add view_ai_features_24h (enables ENABLE_MULTIWINDOW in bot_detector)
- Mark max_requests_per_sec as DEPRECATED (always 0)
New files:
- correlator/sql/migrations/01_ttl_adjustments.sql: ALTER TABLE migration
- tests/integration/verify_mvs.py: MV pipeline verification assertions
- docs/THESIS_HTTP_Traffic_Detection.md: detection techniques thesis
All DB references use ja4_processing/ja4_logs (no mabase_prod).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-07 23:51:56 +02:00
12d60975da
feat: Python traffic generator with realistic varied HTTP/HTTPS traffic
...
- Replace curlimages/curl with Python stdlib traffic generator
- 200 requests, 10 workers, 16 scenario types:
browsers (Chrome/Firefox/Safari/Edge/mobile), bots (Googlebot/Bing/curl/wget),
GET/POST/HEAD/PUT/PATCH/DELETE/OPTIONS, HTTP + HTTPS
- Multiple SSL contexts (default, TLS1.2-only, TLS1.3-only, few_ciphers)
→ 4 distinct JA4/JA3 fingerprints per test run
- Realistic headers: Accept, Accept-Language, Sec-Fetch-*, Referer,
X-Forwarded-For, Cookie, Cache-Control
- JSON payloads, form data, CORS preflights
- DB always reset (down -v) at start of each test run
- Enhanced Phase 5 checks: distinct UAs, method variety, JA4/JA3 counts + uniqueness
Results: 199/200 OK, 24 distinct UAs, 7 HTTP methods, TLS 1.2+1.3, 4 JA4 fingerprints
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-07 21:14:55 +02:00
da8357f43d
fix: TLS capture end-to-end in integration tests
...
- Add traffic-gen container (curlimages/curl) to send HTTPS traffic
across Docker network so sentinel (pcap on eth0) captures ClientHello
- Seed anubis_ua_rules with catch-all rule (REGEXP_TREE needs ≥1 entry)
so MV mv_http_logs processes raw logs without errors
- Add JA4/JA3 fingerprint verification in Phase 5 tests
- Dashboard healthcheck via python urllib (no curl in image)
Results: 59 raw logs, 59 parsed, 53 with JA4+JA3 fingerprints (TLS 1.3)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-07 21:04:14 +02:00
d4e7e674d8
feat: full-stack Docker Compose integration tests
...
- 4-container stack: ClickHouse, platform (Rocky 9), bot-detector, dashboard
- Platform builds sentinel on Rocky (CGO+libpcap native), correlator static
- mod-reqin-log compiled with apxs on Rocky (matching RPM build target)
- ClickHouse init script patches credentials for test env (sed-based)
- 8-phase test runner: schema, traffic gen, pipeline, dashboard API, bot-detector, sentinel
- All 13 checks pass, 3 non-blocking warnings (empty dicts, log paths)
SQL schema fixes discovered during integration:
- 02_dictionaries: IPv6CIDR → String (not a valid ClickHouse type)
- 03_anubis_tables: dict_anubis_ua missing has_ip/rule_id/category attrs
- 03_anubis_tables: dict_anubis_country FLAT() → COMPLEX_KEY_HASHED() (String key)
- 09_audit_table: CODEC before DEFAULT → DEFAULT before CODEC
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-07 20:33:25 +02:00