f0c8fe81c6
feat(ja4ebpf): add multi-interface TC, LPM_TRIE ignore_src, unit tests, and fix bugs
...
- Add multi-interface TC attachment (default "any" = all UP interfaces)
- Add BPF LPM_TRIE map ignored_src for kernel-side CIDR filtering
- Add userspace ignore_src filtering for SSL/accept4 path via net.IPNet.Contains()
- Add AcceptCache for fd→SessionKey correlation with TTL and Close()
- Add 5 test files covering writer, procutil, dispatcher, accept_cache, and cmd
- Fix formatTCPOptions infinite loop on EOL (case 0 break→return)
- Fix pseudoOrderToShort panic on empty slice (negative cap)
- Fix AcceptCache goroutine leak (add done channel + Close())
- Update config.yml.example with interfaces, listen_ports, ignore_src
- Rewrite docs/services/ja4ebpf.md (was massively stale: XDP, RingBuffer, etc.)
- Fix stale XDP/RingBuffer references in docs/architecture.md, thesis, tls.go
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-04-16 01:49:26 +02:00
f88b739992
feat(e2e): add distributed E2E test framework with parametric traffic generation
...
Add run-e2e-test.sh with CLI parameters (--hits, --http-ratio, --dns, --tls,
--src-ips, --keep-analysis, --up) for configurable traffic generation. Traffic
runs from VM endpoints with multiple source IPs (alias IPs on eth0) to produce
distinct sessions for the ML pipeline. Fix curl TLS flags (--tlsv1.2 instead
of --tls-v1-2), skip redundant local verification in distributed mode, and
fix dashboard is_available() cache that never retried after ClickHouse recovery.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-04-15 00:09:32 +02:00
c1821dcbc4
feat(ml): replace Autoencoder with RealNVP Normalizing Flow and add SessionTransformer embeddings
...
Replace TrafficAutoEncoder (MSE reconstruction scoring) with TrafficNormalizingFlow
(RealNVP via FrEIA, 4 affine coupling blocks, anomaly score = -log p(x)) for
mathematically rigorous density estimation. Add SessionTransformer module producing
32-dimensional sequence embeddings from raw HTTP request sequences (path, method,
timing) via a lightweight TransformerEncoder, replacing path_transition_entropy and
cadence_cv features. Update thesis documentation sections 2.4.2b and 3.8 accordingly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-04-13 15:11:21 +02:00
6e5eb38efd
docs: update thesis and docs with Cleanlab label filtering integration
...
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-04-13 02:19:46 +02:00
c60ce97f23
feat(bot-detector): add dynamic browser profiling engine with HDBSCAN clustering
...
Implement offline profile building (profile_builder.py) and real-time
dynamic scoring (browser_matcher_dynamic.py) using HDBSCAN-based browser
fingerprint clustering. Add ClickHouse materialized view (13_h2_profiling.sql)
for h2_profile_stats aggregation. Update thesis and project documentation
to cover the new dynamic profiling architecture.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-04-13 02:06:00 +02:00
3b047b680a
fix(ja4ebpf): split bpf2go generate into Ja4Tc + Ja4Ssl, fix RPM systemd-rpm-macros
...
- Use two separate //go:generate directives (Ja4Tc for tc_capture.c, Ja4Ssl
for uprobe_ssl.c) to avoid duplicate LICENSE symbol and multi-file clang issue
- Update loader.go to hold tcObjs/sslObjs separately with correct field names:
UprobeSslSetFd, UprobeSslReadEntry, UretprobeSslReadExit,
KprobeAccept4Entry, KretprobeAccept4Exit
- Add systemd-rpm-macros to all three RPM build stages (el8/el9/el10)
so that %{_unitdir} macro resolves correctly
- RPMs now build successfully for el8, el9, el10
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-11 23:21:11 +02:00
85d3b95b7b
feat: HTTP/2 passive fingerprinting with individual SETTINGS fields
...
Complete implementation of HTTP/2 passive fingerprinting per thesis §2.5.3:
mod-reqin-log (C module):
- Replace connection-level filter with ap_hook_process_connection (APR_HOOK_FIRST)
to capture H2 preface before mod_http2 takes over the connection
- AP_MODE_SPECULATIVE read of 512 bytes from c->input_filters
- Parse SETTINGS, WINDOW_UPDATE, PRIORITY flags, pseudo-header order
- Output individual SETTINGS params as separate JSON fields (IDs 1-6, 8)
- Read H2 notes from c1 (master connection) for mod_http2 secondary conns
- Fix header_order_signature JSON length bug (26→strlen)
ClickHouse schema:
- Add 8 new columns to http_logs: h2_has_priority, h2_header_table_size,
h2_enable_push, h2_max_concurrent_streams, h2_initial_window_size,
h2_max_frame_size, h2_max_header_list_size, h2_enable_connect_protocol
- Use Int32/Int64 with DEFAULT -1 to distinguish absent vs zero
- Update mv_http_logs to extract individual fields via JSONHas/JSONExtractInt
- Migration 04_http2_fields.sql updated for existing deployments
Correlator:
- Accept both timestamp_ns and timestamp field names (backward compat)
Integration:
- Enable HTTP/2 in Apache: Protocols h2 http/1.1 in httpd-integration.conf
Validated end-to-end via Playwright: H2 curl traffic → mod-reqin-log →
correlator → ClickHouse with all 12 H2 columns populated correctly.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-11 02:33:45 +02:00
51dd376f7a
docs: mise à jour complète — 7/8 techniques, 85 features, 12 modules
...
Reflète l'état réel du système après les étapes 1-9 du roadmap :
- §5.2 (fleet_detector NetworkX/Louvain) et §5.8 (Jaccard cross-domain) : ✅
- MetaLearner (régression logistique, fallback poids fixes) : documenté
- ExIFFI (profondeur isolation EIF) + erreur AE par feature : documenté
- KL divergence en complément du KS, drift adversarial : documenté
- HTTP/2 fingerprinting (h2_fingerprint, dict_browser_h2, axis_h2_coherence) : documenté
- Métriques de cycle (metrics.py, ml_performance_metrics, alertes) : documenté
- Browser confidence : 5 axes → 6 axes (axis_h2_coherence)
- 85 features (73 FEATURES + 12 FEATURES_COMPLET), 12 modules, 53 routes dashboard
- Conformité thèse : 99.4% (était 97.9%), §5 : 87.5% (était 62.5%)
- Tables nouvelles : fleet_detections, ml_performance_metrics, soc_feedback
- Dictionnaires : 8 (dict_browser_h2 ajouté)
- Dashboard : 16 pages + 37 API routes (fleet, health ajoutés)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-10 01:31:20 +02:00
c96c41fb45
docs: réécriture complète de la documentation des services en français
...
- bot-detector.md : architecture 11 modules, 77/65 features,
ensemble triple voix (EIF+AE+XGBoost), browser 5 axes, HDBSCAN,
toutes les variables d'environnement vérifiées depuis le code source
- dashboard.md : corrigé stack (Jinja2+htmx, pas React+Vite),
14 pages + 35 API routes + health, dual-database, IPv4/IPv6
- python-ja4common.md : ajouté CLICKHOUSE_DB_PROCESSING/LOGS,
schéma dual-database, note dashboard n'utilise pas ja4_common
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-09 22:04:58 +02:00
9f3e0621e5
feat: split ClickHouse into dual configurable databases (ja4_logs / ja4_processing)
...
Architecture:
- ja4_logs: raw log ingestion (http_logs_raw, http_logs, mv_http_logs)
- ja4_processing: analytics, aggregation, ML, dictionaries, audit
Configuration (env vars):
- CLICKHOUSE_DB_LOGS (default: ja4_logs)
- CLICKHOUSE_DB_PROCESSING (default: ja4_processing)
Changes:
- SQL migrations (10 files): all mabase_prod refs → ja4_logs or ja4_processing
with correct cross-database references (MVs, views, dicts)
- deploy_schema.sh: substitutes DB names from env vars at deploy time
- Python shared settings: added CLICKHOUSE_DB_LOGS + CLICKHOUSE_DB_PROCESSING
- Dashboard routes (19 files): replaced ~80 hardcoded mabase_prod refs
with settings.CLICKHOUSE_DB_LOGS / settings.CLICKHOUSE_DB_PROCESSING
- Bot-detector: DB → CLICKHOUSE_DB_PROCESSING, fetch_rules.py configurable
- Correlator: DSN example updated to ja4_logs
- Docker-compose + .env files: new env vars with defaults
- All documentation updated (14 markdown files)
All tests pass: sentinel 10/10, correlator 67.1%, bot-detector 11, dashboard 20, ja4_common 18
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-07 19:10:35 +02:00
d469e39da7
feat: ja4-platform monorepo — 5 services unified, tests & RPM builds standardized
...
Services:
- ja4sentinel: TLS/JA4 fingerprint capture daemon (Go, libpcap)
- logcorrelator: JA4 log correlation engine (Go, ClickHouse)
- mod_reqin_log: Apache module (C, JSON request logging)
- bot_detector: ML bot detection pipeline (Python)
- dashboard: FastAPI/Streamlit analytics UI (Python)
Shared libraries:
- shared/go/ja4common: logger, config, shutdown, ipfilter (Go module)
- shared/python/ja4_common: ClickHouseClient, ClickHouseSettings (Python package)
- shared/clickhouse/: canonical SQL migrations (10 files)
Build & packaging:
- Unified 3-stage Dockerfile.package for Go RPMs (el8/el9/el10)
- go.work workspace linking sentinel, correlator, ja4common
- Makefile with test-all, build-all, rpm-* targets
Fixes applied:
- go.work: 1.21 → 1.24.6 (required by sentinel)
- correlator Dockerfiles: golang:1.21 → golang:1.24
- replace directives in go.mod for ja4common local path
- pyproject.toml: setuptools.backends → setuptools.build_meta
- Removed static libpcap linking (unavailable on Rocky 9)
- Fixed data races in output/writers_test.go (sync.Mutex + atomic.Int32)
- Rewrote corrupted test files (logger_test.go × 2)
Test coverage:
- correlator: 67.1% total (unixsocket 80.5%, config 91.7%, app 83.3%, multi 87.7%, stdout 100%)
- sentinel: all 10 packages pass (api, capture, config, fingerprint, ipfilter, logging, output, tlsparse)
Documentation:
- README.md + docs/ (architecture, development, 5 services, shared libs, DB schema & migrations)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-07 16:42:59 +02:00