toto 2f2c5e03bb fix(sql): contournement bug scope ClickHouse 24.8 dans view_ai_features_1h
- Restructure 07_ai_features_view.sql : single anonymous inner subquery
  avec aliases explicites sur toutes les colonnes (a.xxx AS xxx, h.xxx AS xxx,
  h2.xxx AS xxx) pour résoudre l'ambiguïté PARTITION BY src_ip dans l'outer SELECT
- Supprime les CTEs multiples (h2_agg, enriched) qui déclenchaient le bug
- Fix migration 04_http2_fields.sql : ordre DEFAULT avant CODEC (syntax ClickHouse)
- make init-stack : 0 erreur sur 13 fichiers SQL

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-10 00:48:05 +02:00

ja4-platform

ja4-platform is a monorepo security pipeline for TLS fingerprinting (JA4/JA3) and bot detection. It captures live network traffic, correlates TLS handshakes with HTTP requests, applies triple-voice ML anomaly detection (Extended Isolation Forest + Autoencoder + XGBoost), and surfaces results through a SOC analyst dashboard — all backed by ClickHouse with a dual-database architecture.

Pipeline Overview

  ┌──────────────────────────────────────────────────────────────────────────────┐
  │                          Linux Server (Apache)                               │
  │                                                                              │
  │  ┌─────────────────┐        UNIX socket (DGRAM)        ┌──────────────────┐  │
  │  │  mod-reqin-log   │──── http.socket ────────────────▶│                  │  │
  │  │  (Apache C11)    │        (source A)                 │   correlator     │  │
  │  └─────────────────┘                                   │   (Go · hex.     │  │
  │                                                         │    architecture) │  │
  │  ┌─────────────────┐        UNIX socket (DGRAM)        │                  │  │
  │  │  sentinel        │──── network.socket ─────────────▶│   Joins by       │  │
  │  │  (Go · libpcap)  │        (source B)                 │   src_ip:src_port│  │
  │  │  JA4/JA3 gen.    │                                   └────────┬─────────┘  │
  │  └─────────────────┘                                             │            │
  └──────────────────────────────────────────────────────────────────┼────────────┘
                                                                     │ INSERT
                                                                     ▼
                                          ┌──────────────────────────────────────┐
                                          │          ClickHouse 24.8             │
                                          │                                      │
                                          │  ja4_logs          ja4_processing    │
                                          │  ┌──────────┐     ┌──────────────┐  │
                                          │  │_raw → MV │────▶│ agg_* (×6)   │  │
                                          │  │→ http_logs│     │ ml_* (×2)    │  │
                                          │  └──────────┘     │ views, dicts │  │
                                          │                    └──────────────┘  │
                                          └─────────┬───────────────┬────────────┘
                                                    │               │
                                   ┌────────────────┘               └───────────────┐
                                   ▼                                                 ▼
                        ┌────────────────────┐                           ┌────────────────────┐
                        │  bot-detector       │                           │  dashboard          │
                        │  Python 3.11        │                           │  FastAPI + Jinja2   │
                        │  EIF + AE + XGBoost │                           │  htmx + Chart.js    │
                        │  HDBSCAN · SHAP     │                           │  55 routes · 14 pp  │
                        └────────────────────┘                           └────────────────────┘

Services

Service Language Description Interface
sentinel Go 1.24.6 TLS/TCP packet capture via libpcap, JA4/JA3 fingerprint generation UNIX socket → network.socket
mod-reqin-log C11 Apache HTTPD module, HTTP request JSON logging UNIX socket → http.socket
correlator Go 1.24.6 Hexagonal architecture, correlates HTTP+TLS events by src_ip:src_port ClickHouse INSERT (Native TCP)
bot-detector Python 3.11 Triple-voice ML ensemble (EIF+AE+XGB), HDBSCAN campaigns, SHAP explainability ClickHouse read/write, HTTP :8080
dashboard Python 3.11 SOC analyst dashboard: 55 routes, 15 templates, 14 pages HTTP :8000

Shared Libraries

Library Language Description
go/ja4common Go Logger, config loader, graceful shutdown handler, IP filter
python/ja4_common Python ClickHouseClient singleton, ClickHouseSettings (pydantic-settings)

Quickstart

Prerequisites

  • Docker (with BuildKit) and Docker Compose
  • make
  • No native Go, Python, or C toolchains required — all builds run inside Docker

Build All Services

make build-all

Run All Tests

make test-all

Build RPM Packages

make rpm-all
# RPMs written to services/<service>/dist/rpm/el{8,9,10}/

Scripts

Helper scripts are located in scripts/:

Script Description
init-stack.sh Full ClickHouse stack initialization — deploys schema, loads CSV data, verifies all components
import-prod-data.sh Imports pre-exported production data into the dev database with dynamic date shifting
reload-prod-logs.sh Exports http_logs from production and re-imports into the dev database
update-csv-data.sh Downloads and generates all CSV reference data (bot IPs, JA4 signatures, ASN reputation)
generate_bot_ip.py Generates bot_ip.csv from known scanner/bot sources + Tor exit nodes
generate_bot_ja4.py Generates bot_ja4.csv from known bot TLS fingerprints
generate_asn_data.py Generates asn_reputation.csv (ASN→label mapping)
generate_browser_ja4.py Generates browser JA4 reference data for legitimate browser detection

Corresponding Makefile targets:

make init-stack        # runs scripts/init-stack.sh
make import-prod-data  # runs scripts/import-prod-data.sh
make init-and-import   # init-stack + import-prod-data
make reload-prod-logs  # runs scripts/reload-prod-logs.sh

Integration Tests

Full-stack integration tests run against Docker Compose with a real ClickHouse instance:

make test-integration          # 8 phases: build → start → schema → traffic → pipeline → dashboard → bot-detector → sentinel
make test-integration-keep     # same but leaves stack running after
make test-integration-down     # tear down integration stack

The integration test suite is located in tests/integration/ and resets the database between runs.

Documentation

Document Description
Architecture System architecture, data flow, component interactions
Deployment Step-by-step production deployment guide
Development Build, test, package, and extend the platform
Database Schema Every ClickHouse table, view, dictionary, and materialized view
Database Migrations Migration order, application, verification, and rollback
Commenting Standard Code commenting conventions (French comments, English identifiers)
Thesis Reference Academic reference: HTTP traffic detection techniques
Audit vs Thesis Comparison between platform implementation and thesis techniques

Service Documentation

  • Sentinel — TLS/TCP capture daemon (Go + libpcap)
  • mod-reqin-log — Apache HTTP logging module (C11)
  • Correlator — HTTP/TLS event correlation engine (Go)
  • Bot Detector — Triple-voice ML anomaly detection (Python)
  • Dashboard — SOC analyst dashboard and API (FastAPI)

Shared Library Documentation

  • go-ja4common — Go shared library (logger, config, shutdown, ipfilter)
  • python-ja4common — Python shared library (ClickHouse client, settings)

Go Workspace

The repository uses a Go workspace (go.work) to link the Go modules:

go 1.24.6

use (
    ./services/sentinel
    ./services/correlator
    ./shared/go/ja4common
)

Both Go services have a replace directive in their go.mod pointing to ../../shared/go/ja4common. The workspace takes precedence for local development; the replace is needed for Docker builds where go.work is not available.

License

See individual service directories for license information.

Description
No description provided
Readme 22 MiB
Languages
Python 38.2%
HTML 24.8%
Go 16.1%
Shell 15.1%
C 3.5%
Other 2.3%