toto 3ae8c7d9c9 feat(bot-detector): upgrade to state-of-the-art detection pipeline
- Fix UnboundLocalError on global _consecutive_failures/_service_healthy
- Add SQL identifier validation for DB names at startup
- Replace Z-score drift detection with KS test (scipy.stats.ks_2samp)
- Replace DBSCAN with HDBSCAN (adaptive clustering, no epsilon needed)
- Fix NaN→0 blanket imputation with per-feature median/sentinel strategy
- Add 80/20 temporal train/validation split with offline metrics logging
- Integrate thesis §5 features from view_thesis_features_1h:
  path_transition_entropy, cadence_cv, burst/pause ratios,
  host_diversity, host_sweep_speed, host_coverage_uniformity,
  ja4_drift_ratio (Complet model only)
- Add SOC feedback loop: read classifications from audit_logs,
  reclassify FP IPs as human, exclude TP IPs from baseline
- Update dependencies: clickhouse-connect 0.8.12, scikit-learn 1.6.1,
  pandas 2.2.3, shap 0.47.2, add scipy>=1.14, hdbscan>=0.8.38

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-08 02:09:18 +02:00

ja4-platform

ja4-platform is a monorepo security pipeline for TLS fingerprinting (JA4/JA3) and bot detection. It captures live network traffic, correlates TLS handshakes with HTTP requests, detects anomalous behavior using machine learning (Isolation Forest), and presents results through a SOC analyst dashboard — all backed by ClickHouse as the central data store.

Pipeline Overview

  ┌─────────────────────────────────────────────────────────────────────────────┐
  │                           Linux Server (Apache)                            │
  │                                                                            │
  │  ┌─────────────────┐        ┌─────────────────────┐                        │
  │  │  mod-reqin-log   │───────▶│  UNIX socket (HTTP) │──┐                    │
  │  │  (Apache module) │  JSON  │  /var/run/logcorr/   │  │                    │
  │  │  C · httpd DSO   │        │  http.socket          │  │                    │
  │  └─────────────────┘        └─────────────────────┘  │                    │
  │                                                       ▼                    │
  │  ┌─────────────────┐        ┌─────────────────────┐  ┌──────────────────┐  │
  │  │  sentinel        │───────▶│  UNIX socket (TLS)  │─▶│  correlator      │  │
  │  │  (TLS capture)   │  JSON  │  /var/run/logcorr/   │  │  (event join)    │  │
  │  │  Go · libpcap    │        │  network.socket      │  │  Go · hex. arch  │  │
  │  └─────────────────┘        └─────────────────────┘  └────────┬─────────┘  │
  │                                                                │            │
  └────────────────────────────────────────────────────────────────┼────────────┘
                                                                   │ INSERT
                                                                   ▼
                                                         ┌──────────────────┐
                                                         │   ClickHouse     │
                                                         │   ja4_processing    │
                                                         │   (all tables)   │
                                                         └────────┬─────────┘
                                                                   │ SELECT
                                              ┌────────────────────┼────────────────────┐
                                              ▼                                         ▼
                                    ┌──────────────────┐                      ┌──────────────────┐
                                    │  bot-detector     │                      │  dashboard        │
                                    │  (ML anomaly det) │                      │  (SOC web UI)     │
                                    │  Python · sklearn  │                      │  FastAPI + React  │
                                    └──────────────────┘                      └──────────────────┘

Services

Service Language Purpose Interface
sentinel Go Live TLS packet capture, JA4/JA3 fingerprint generation UNIX socket (network.socket)
mod-reqin-log C Apache HTTPD module, HTTP request JSON logging UNIX socket (http.socket)
correlator Go Joins HTTP + TLS events by src_ip:src_port + time window ClickHouse INSERT, file, stdout
bot-detector Python Isolation Forest ML anomaly detection on aggregated traffic ClickHouse read/write, HTTP :8080
dashboard Python/JS SOC analyst web dashboard (FastAPI + React) HTTP :8000

Shared Libraries

Library Language Description
go/ja4common Go Logger, config loader, shutdown handler, IP filter
python/ja4_common Python ClickHouse client singleton, settings

Quickstart

Prerequisites

  • Docker (with BuildKit) and Docker Compose
  • make
  • No native Go, Python, or C toolchains required — all builds run inside Docker

Build All Services

make build-all

Run All Tests

make test-all

Build RPM Packages

make rpm-all
# RPMs written to services/<service>/dist/

Documentation

Document Description
Architecture System architecture, data flow, component interactions
Development Build, test, package, and extend the platform
Database Schema Every ClickHouse table, view, dictionary, and materialized view
Database Migrations Migration order, application, verification, and rollback

Service Documentation

Shared Library Documentation

Go Workspace

The repository uses a Go workspace (go.work) to link the Go modules:

go 1.21

use (
    ./services/sentinel
    ./services/correlator
    ./shared/go/ja4common
)

License

See individual service directories for license information.

Description
No description provided
Readme 22 MiB
Languages
Python 38.2%
HTML 24.8%
Go 16.1%
Shell 15.1%
C 3.5%
Other 2.3%