toto f704541f83 feat(h2): direct per-parameter SETTINGS comparison in browser_matcher
- Rewrote _d1_h2_settings() with 3-signal weighted formula:
  direct_score×0.60 + dict_match×0.30 + ja4_coherence×0.10
  when individual SETTINGS cols are available in the DataFrame
- Added _H2_SETTINGS_COLS dict (IDs 1,2,3,4,5,6,8 → column names)
- Fallback to dict_match×0.80 + ja4_coherence×0.20 for backward compat
- Fix view_ai_features_1h: pass 7 individual SETTINGS columns through
  base_data CTE (h2_header_table_size, h2_enable_push,
  h2_max_concurrent_streams, h2_initial_window_size, h2_max_frame_size,
  h2_max_header_list_size, h2_enable_connect_protocol)
- Remove non-existent h2_dict_confidence reference from view SQL
  (dict_browser_h2 only exposes browser_family attribute)
- Add 7 new pytest cases: exact match, one wrong setting, forbidden key
  penalty, unknown fingerprint with correct settings, fallback path,
  CDN proxy neutralisation, full Chrome simulation
- 53/53 bot-detector tests pass
- Update thesis §3.9.2: document direct comparison algorithm + fallback

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-11 03:05:36 +02:00

ja4-platform

ja4-platform is a monorepo security pipeline for TLS fingerprinting (JA4/JA3) and bot detection. It captures live network traffic, correlates TLS handshakes with HTTP requests, applies triple-voice ML anomaly detection (Extended Isolation Forest + Autoencoder + XGBoost), and surfaces results through a SOC analyst dashboard — all backed by ClickHouse with a dual-database architecture.

Pipeline Overview

  ┌──────────────────────────────────────────────────────────────────────────────┐
  │                          Linux Server (Apache)                               │
  │                                                                              │
  │  ┌─────────────────┐        UNIX socket (DGRAM)        ┌──────────────────┐  │
  │  │  mod-reqin-log   │──── http.socket ────────────────▶│                  │  │
  │  │  (Apache C11)    │        (source A)                 │   correlator     │  │
  │  └─────────────────┘                                   │   (Go · hex.     │  │
  │                                                         │    architecture) │  │
  │  ┌─────────────────┐        UNIX socket (DGRAM)        │                  │  │
  │  │  sentinel        │──── network.socket ─────────────▶│   Joins by       │  │
  │  │  (Go · libpcap)  │        (source B)                 │   src_ip:src_port│  │
  │  │  JA4/JA3 gen.    │                                   └────────┬─────────┘  │
  │  └─────────────────┘                                             │            │
  └──────────────────────────────────────────────────────────────────┼────────────┘
                                                                     │ INSERT
                                                                     ▼
                                          ┌──────────────────────────────────────┐
                                          │          ClickHouse 24.8             │
                                          │                                      │
                                          │  ja4_logs          ja4_processing    │
                                          │  ┌──────────┐     ┌──────────────┐  │
                                          │  │_raw → MV │────▶│ agg_* (×6)   │  │
                                          │  │→ http_logs│     │ ml_* (×2)    │  │
                                          │  └──────────┘     │ views, dicts │  │
                                          │                    └──────────────┘  │
                                          └─────────┬───────────────┬────────────┘
                                                    │               │
                                   ┌────────────────┘               └───────────────┐
                                   ▼                                                 ▼
                        ┌────────────────────┐                           ┌────────────────────┐
                        │  bot-detector       │                           │  dashboard          │
                        │  Python 3.11        │                           │  FastAPI + Jinja2   │
                        │  EIF + AE + XGBoost │                           │  htmx + Chart.js    │
                        │  HDBSCAN · SHAP     │                           │  55 routes · 14 pp  │
                        └────────────────────┘                           └────────────────────┘

Services

Service Language Description Interface
sentinel Go 1.24.6 TLS/TCP packet capture via libpcap, JA4/JA3 fingerprint generation UNIX socket → network.socket
mod-reqin-log C11 Apache HTTPD module, HTTP request JSON logging UNIX socket → http.socket
correlator Go 1.24.6 Hexagonal architecture, correlates HTTP+TLS events by src_ip:src_port ClickHouse INSERT (Native TCP)
bot-detector Python 3.11 Triple-voice ML ensemble (EIF+AE+XGB), HDBSCAN campaigns, SHAP explainability ClickHouse read/write, HTTP :8080
dashboard Python 3.11 SOC analyst dashboard: 55 routes, 15 templates, 14 pages HTTP :8000

Shared Libraries

Library Language Description
go/ja4common Go Logger, config loader, graceful shutdown handler, IP filter
python/ja4_common Python ClickHouseClient singleton, ClickHouseSettings (pydantic-settings)

Quickstart

Prerequisites

  • Docker (with BuildKit) and Docker Compose
  • make
  • No native Go, Python, or C toolchains required — all builds run inside Docker

Build All Services

make build-all

Run All Tests

make test-all

Build RPM Packages

make rpm-all
# RPMs written to services/<service>/dist/rpm/el{8,9,10}/

Scripts

Helper scripts are located in scripts/:

Script Description
init-stack.sh Full ClickHouse stack initialization — deploys schema, loads CSV data, verifies all components
import-prod-data.sh Imports pre-exported production data into the dev database with dynamic date shifting
reload-prod-logs.sh Exports http_logs from production and re-imports into the dev database
update-csv-data.sh Downloads and generates all CSV reference data (bot IPs, JA4 signatures, ASN reputation)
generate_bot_ip.py Generates bot_ip.csv from known scanner/bot sources + Tor exit nodes
generate_bot_ja4.py Generates bot_ja4.csv from known bot TLS fingerprints
generate_asn_data.py Generates asn_reputation.csv (ASN→label mapping)
generate_browser_ja4.py Generates browser JA4 reference data for legitimate browser detection

Corresponding Makefile targets:

make init-stack        # runs scripts/init-stack.sh
make import-prod-data  # runs scripts/import-prod-data.sh
make init-and-import   # init-stack + import-prod-data
make reload-prod-logs  # runs scripts/reload-prod-logs.sh

Integration Tests

Full-stack integration tests run against Docker Compose with a real ClickHouse instance:

make test-integration          # 8 phases: build → start → schema → traffic → pipeline → dashboard → bot-detector → sentinel
make test-integration-keep     # same but leaves stack running after
make test-integration-down     # tear down integration stack

The integration test suite is located in tests/integration/ and resets the database between runs.

Documentation

Document Description
Architecture System architecture, data flow, component interactions
Deployment Step-by-step production deployment guide
Development Build, test, package, and extend the platform
Database Schema Every ClickHouse table, view, dictionary, and materialized view
Database Migrations Migration order, application, verification, and rollback
Commenting Standard Code commenting conventions (French comments, English identifiers)
Thesis Reference Academic reference: HTTP traffic detection techniques
Audit vs Thesis Comparison between platform implementation and thesis techniques

Service Documentation

  • Sentinel — TLS/TCP capture daemon (Go + libpcap)
  • mod-reqin-log — Apache HTTP logging module (C11)
  • Correlator — HTTP/TLS event correlation engine (Go)
  • Bot Detector — Triple-voice ML anomaly detection (Python)
  • Dashboard — SOC analyst dashboard and API (FastAPI)

Shared Library Documentation

  • go-ja4common — Go shared library (logger, config, shutdown, ipfilter)
  • python-ja4common — Python shared library (ClickHouse client, settings)

Go Workspace

The repository uses a Go workspace (go.work) to link the Go modules:

go 1.24.6

use (
    ./services/sentinel
    ./services/correlator
    ./shared/go/ja4common
)

Both Go services have a replace directive in their go.mod pointing to ../../shared/go/ja4common. The workspace takes precedence for local development; the replace is needed for Docker builds where go.work is not available.

License

See individual service directories for license information.

Description
No description provided
Readme 22 MiB
Languages
Python 38.2%
HTML 24.8%
Go 16.1%
Shell 15.1%
C 3.5%
Other 2.3%