toto 51b8eb57a8 feat: port v14 schema fixes, migration, MV verifier, thesis from ja4/
deploy_views.sql (v13 → v14):
- CRITICAL: ml_detected_anomalies ORDER BY (src_ip) → (src_ip, ja4, host, model_name)
  ReplacingMergeTree was collapsing all detections to 1 row per IP on merge
- Add PARTITION BY toDate + ttl_only_drop_parts on all 4 data tables
- ml_all_scores TTL 3d → 7d; ml_detected_anomalies TTL 30d → 7d
- agg_host_ip_ja4_1h + agg_header_fingerprint_1h: add partition + TTL 7d
- view_ip_recurrence: add WHERE detected_at >= now() - 7 DAY (was full scan)
- Remove dead views: summary/timeseries/threat_dist/variability
- Add view_dashboard_entities (fixes HTTP 500 in clustering/incidents/fingerprints)
- Add view_dashboard_user_agents (fixes HTTP 500 in fingerprints/metrics)
- Add view_ai_features_24h (enables ENABLE_MULTIWINDOW in bot_detector)
- Mark max_requests_per_sec as DEPRECATED (always 0)

New files:
- correlator/sql/migrations/01_ttl_adjustments.sql: ALTER TABLE migration
- tests/integration/verify_mvs.py: MV pipeline verification assertions
- docs/THESIS_HTTP_Traffic_Detection.md: detection techniques thesis

All DB references use ja4_processing/ja4_logs (no mabase_prod).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-07 23:51:56 +02:00

ja4-platform

ja4-platform is a monorepo security pipeline for TLS fingerprinting (JA4/JA3) and bot detection. It captures live network traffic, correlates TLS handshakes with HTTP requests, detects anomalous behavior using machine learning (Isolation Forest), and presents results through a SOC analyst dashboard — all backed by ClickHouse as the central data store.

Pipeline Overview

  ┌─────────────────────────────────────────────────────────────────────────────┐
  │                           Linux Server (Apache)                            │
  │                                                                            │
  │  ┌─────────────────┐        ┌─────────────────────┐                        │
  │  │  mod-reqin-log   │───────▶│  UNIX socket (HTTP) │──┐                    │
  │  │  (Apache module) │  JSON  │  /var/run/logcorr/   │  │                    │
  │  │  C · httpd DSO   │        │  http.socket          │  │                    │
  │  └─────────────────┘        └─────────────────────┘  │                    │
  │                                                       ▼                    │
  │  ┌─────────────────┐        ┌─────────────────────┐  ┌──────────────────┐  │
  │  │  sentinel        │───────▶│  UNIX socket (TLS)  │─▶│  correlator      │  │
  │  │  (TLS capture)   │  JSON  │  /var/run/logcorr/   │  │  (event join)    │  │
  │  │  Go · libpcap    │        │  network.socket      │  │  Go · hex. arch  │  │
  │  └─────────────────┘        └─────────────────────┘  └────────┬─────────┘  │
  │                                                                │            │
  └────────────────────────────────────────────────────────────────┼────────────┘
                                                                   │ INSERT
                                                                   ▼
                                                         ┌──────────────────┐
                                                         │   ClickHouse     │
                                                         │   ja4_processing    │
                                                         │   (all tables)   │
                                                         └────────┬─────────┘
                                                                   │ SELECT
                                              ┌────────────────────┼────────────────────┐
                                              ▼                                         ▼
                                    ┌──────────────────┐                      ┌──────────────────┐
                                    │  bot-detector     │                      │  dashboard        │
                                    │  (ML anomaly det) │                      │  (SOC web UI)     │
                                    │  Python · sklearn  │                      │  FastAPI + React  │
                                    └──────────────────┘                      └──────────────────┘

Services

Service Language Purpose Interface
sentinel Go Live TLS packet capture, JA4/JA3 fingerprint generation UNIX socket (network.socket)
mod-reqin-log C Apache HTTPD module, HTTP request JSON logging UNIX socket (http.socket)
correlator Go Joins HTTP + TLS events by src_ip:src_port + time window ClickHouse INSERT, file, stdout
bot-detector Python Isolation Forest ML anomaly detection on aggregated traffic ClickHouse read/write, HTTP :8080
dashboard Python/JS SOC analyst web dashboard (FastAPI + React) HTTP :8000

Shared Libraries

Library Language Description
go/ja4common Go Logger, config loader, shutdown handler, IP filter
python/ja4_common Python ClickHouse client singleton, settings

Quickstart

Prerequisites

  • Docker (with BuildKit) and Docker Compose
  • make
  • No native Go, Python, or C toolchains required — all builds run inside Docker

Build All Services

make build-all

Run All Tests

make test-all

Build RPM Packages

make rpm-all
# RPMs written to services/<service>/dist/

Documentation

Document Description
Architecture System architecture, data flow, component interactions
Development Build, test, package, and extend the platform
Database Schema Every ClickHouse table, view, dictionary, and materialized view
Database Migrations Migration order, application, verification, and rollback

Service Documentation

Shared Library Documentation

Go Workspace

The repository uses a Go workspace (go.work) to link the Go modules:

go 1.21

use (
    ./services/sentinel
    ./services/correlator
    ./shared/go/ja4common
)

License

See individual service directories for license information.

Description
No description provided
Readme 22 MiB
Languages
Python 38.2%
HTML 24.8%
Go 16.1%
Shell 15.1%
C 3.5%
Other 2.3%