toto 0d1a6a81e0 docs: update thesis with EIF, autoencoders, ensemble architecture, quantile drift
- §2.4.2: Add Extended Isolation Forest theory (Hariri et al., TKDE 2021)
- §2.4.2b: New section on autoencoders for network anomaly detection
  (Kitsune, β-VAE, hybrid AE+IF studies)
- §2.4.2c: New section on hybrid supervised+unsupervised ensembles
  (triple-voice architecture: EIF + AE + XGBoost + meta-learner)
- §2.4.3: Enhanced drift detection with quantile digest and validation gate
- §6.2: Multi-level baseline contamination mitigation
- §7: Updated conclusion reflecting ensemble architecture
- §8: 10 new references (Hariri 2021, Mirsky 2018, Baptiste 2026, etc.)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-08 02:23:00 +02:00

ja4-platform

ja4-platform is a monorepo security pipeline for TLS fingerprinting (JA4/JA3) and bot detection. It captures live network traffic, correlates TLS handshakes with HTTP requests, detects anomalous behavior using machine learning (Isolation Forest), and presents results through a SOC analyst dashboard — all backed by ClickHouse as the central data store.

Pipeline Overview

  ┌─────────────────────────────────────────────────────────────────────────────┐
  │                           Linux Server (Apache)                            │
  │                                                                            │
  │  ┌─────────────────┐        ┌─────────────────────┐                        │
  │  │  mod-reqin-log   │───────▶│  UNIX socket (HTTP) │──┐                    │
  │  │  (Apache module) │  JSON  │  /var/run/logcorr/   │  │                    │
  │  │  C · httpd DSO   │        │  http.socket          │  │                    │
  │  └─────────────────┘        └─────────────────────┘  │                    │
  │                                                       ▼                    │
  │  ┌─────────────────┐        ┌─────────────────────┐  ┌──────────────────┐  │
  │  │  sentinel        │───────▶│  UNIX socket (TLS)  │─▶│  correlator      │  │
  │  │  (TLS capture)   │  JSON  │  /var/run/logcorr/   │  │  (event join)    │  │
  │  │  Go · libpcap    │        │  network.socket      │  │  Go · hex. arch  │  │
  │  └─────────────────┘        └─────────────────────┘  └────────┬─────────┘  │
  │                                                                │            │
  └────────────────────────────────────────────────────────────────┼────────────┘
                                                                   │ INSERT
                                                                   ▼
                                                         ┌──────────────────┐
                                                         │   ClickHouse     │
                                                         │   ja4_processing    │
                                                         │   (all tables)   │
                                                         └────────┬─────────┘
                                                                   │ SELECT
                                              ┌────────────────────┼────────────────────┐
                                              ▼                                         ▼
                                    ┌──────────────────┐                      ┌──────────────────┐
                                    │  bot-detector     │                      │  dashboard        │
                                    │  (ML anomaly det) │                      │  (SOC web UI)     │
                                    │  Python · sklearn  │                      │  FastAPI + React  │
                                    └──────────────────┘                      └──────────────────┘

Services

Service Language Purpose Interface
sentinel Go Live TLS packet capture, JA4/JA3 fingerprint generation UNIX socket (network.socket)
mod-reqin-log C Apache HTTPD module, HTTP request JSON logging UNIX socket (http.socket)
correlator Go Joins HTTP + TLS events by src_ip:src_port + time window ClickHouse INSERT, file, stdout
bot-detector Python Isolation Forest ML anomaly detection on aggregated traffic ClickHouse read/write, HTTP :8080
dashboard Python/JS SOC analyst web dashboard (FastAPI + React) HTTP :8000

Shared Libraries

Library Language Description
go/ja4common Go Logger, config loader, shutdown handler, IP filter
python/ja4_common Python ClickHouse client singleton, settings

Quickstart

Prerequisites

  • Docker (with BuildKit) and Docker Compose
  • make
  • No native Go, Python, or C toolchains required — all builds run inside Docker

Build All Services

make build-all

Run All Tests

make test-all

Build RPM Packages

make rpm-all
# RPMs written to services/<service>/dist/

Documentation

Document Description
Architecture System architecture, data flow, component interactions
Development Build, test, package, and extend the platform
Database Schema Every ClickHouse table, view, dictionary, and materialized view
Database Migrations Migration order, application, verification, and rollback

Service Documentation

Shared Library Documentation

Go Workspace

The repository uses a Go workspace (go.work) to link the Go modules:

go 1.21

use (
    ./services/sentinel
    ./services/correlator
    ./shared/go/ja4common
)

License

See individual service directories for license information.

Description
No description provided
Readme 22 MiB
Languages
Python 38.2%
HTML 24.8%
Go 16.1%
Shell 15.1%
C 3.5%
Other 2.3%