toto f6e2d3c0ca feat(bot-detector): implement 8 state-of-art improvements
- EIF: Extended Isolation Forest via isotree (fallback to sklearn IF)
- Benford's Law deviation feature on inter-request timing
- Lag-1 autocorrelation feature for cadence analysis
- Validation gate: reject model if val_anomaly_rate > 20%
- Feature pruning: remove variance < 1e-6 features before training
- Quantile drift: replace N(μ,σ) synthetic with quantile interpolation
- Thread safety: Lock for _service_healthy/_consecutive_failures
- Score normalization: inverted to [0,1] where 1=most anomalous

SQL: add lag1_autocorrelation + benford_deviation to view_thesis_features_1h
Tests: 10 new test functions covering all improvements
Integration: verify_mvs.py checks new thesis feature columns

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-08 02:31:26 +02:00

ja4-platform

ja4-platform is a monorepo security pipeline for TLS fingerprinting (JA4/JA3) and bot detection. It captures live network traffic, correlates TLS handshakes with HTTP requests, detects anomalous behavior using machine learning (Isolation Forest), and presents results through a SOC analyst dashboard — all backed by ClickHouse as the central data store.

Pipeline Overview

  ┌─────────────────────────────────────────────────────────────────────────────┐
  │                           Linux Server (Apache)                            │
  │                                                                            │
  │  ┌─────────────────┐        ┌─────────────────────┐                        │
  │  │  mod-reqin-log   │───────▶│  UNIX socket (HTTP) │──┐                    │
  │  │  (Apache module) │  JSON  │  /var/run/logcorr/   │  │                    │
  │  │  C · httpd DSO   │        │  http.socket          │  │                    │
  │  └─────────────────┘        └─────────────────────┘  │                    │
  │                                                       ▼                    │
  │  ┌─────────────────┐        ┌─────────────────────┐  ┌──────────────────┐  │
  │  │  sentinel        │───────▶│  UNIX socket (TLS)  │─▶│  correlator      │  │
  │  │  (TLS capture)   │  JSON  │  /var/run/logcorr/   │  │  (event join)    │  │
  │  │  Go · libpcap    │        │  network.socket      │  │  Go · hex. arch  │  │
  │  └─────────────────┘        └─────────────────────┘  └────────┬─────────┘  │
  │                                                                │            │
  └────────────────────────────────────────────────────────────────┼────────────┘
                                                                   │ INSERT
                                                                   ▼
                                                         ┌──────────────────┐
                                                         │   ClickHouse     │
                                                         │   ja4_processing    │
                                                         │   (all tables)   │
                                                         └────────┬─────────┘
                                                                   │ SELECT
                                              ┌────────────────────┼────────────────────┐
                                              ▼                                         ▼
                                    ┌──────────────────┐                      ┌──────────────────┐
                                    │  bot-detector     │                      │  dashboard        │
                                    │  (ML anomaly det) │                      │  (SOC web UI)     │
                                    │  Python · sklearn  │                      │  FastAPI + React  │
                                    └──────────────────┘                      └──────────────────┘

Services

Service Language Purpose Interface
sentinel Go Live TLS packet capture, JA4/JA3 fingerprint generation UNIX socket (network.socket)
mod-reqin-log C Apache HTTPD module, HTTP request JSON logging UNIX socket (http.socket)
correlator Go Joins HTTP + TLS events by src_ip:src_port + time window ClickHouse INSERT, file, stdout
bot-detector Python Isolation Forest ML anomaly detection on aggregated traffic ClickHouse read/write, HTTP :8080
dashboard Python/JS SOC analyst web dashboard (FastAPI + React) HTTP :8000

Shared Libraries

Library Language Description
go/ja4common Go Logger, config loader, shutdown handler, IP filter
python/ja4_common Python ClickHouse client singleton, settings

Quickstart

Prerequisites

  • Docker (with BuildKit) and Docker Compose
  • make
  • No native Go, Python, or C toolchains required — all builds run inside Docker

Build All Services

make build-all

Run All Tests

make test-all

Build RPM Packages

make rpm-all
# RPMs written to services/<service>/dist/

Documentation

Document Description
Architecture System architecture, data flow, component interactions
Development Build, test, package, and extend the platform
Database Schema Every ClickHouse table, view, dictionary, and materialized view
Database Migrations Migration order, application, verification, and rollback

Service Documentation

Shared Library Documentation

Go Workspace

The repository uses a Go workspace (go.work) to link the Go modules:

go 1.21

use (
    ./services/sentinel
    ./services/correlator
    ./shared/go/ja4common
)

License

See individual service directories for license information.

Description
No description provided
Readme 22 MiB
Languages
Python 38.2%
HTML 24.8%
Go 16.1%
Shell 15.1%
C 3.5%
Other 2.3%