feat: multi-distro VM tests, ja4ebpf eBPF improvements, bot-detector scoring

ja4ebpf:
- Refactor BPF TC capture with improved SYN offset handling and TCP option parsing
- Enhance TLS uprobe SSL hooking for better key extraction
- Add ClickHouse writer improvements for HTTP log materialized views
- Update RPM spec for Rocky Linux 8/9/10, fix systemd service
- Simplify loader with cleaner bpf2go integration

bot-detector:
- Add H2 SETTINGS per-parameter comparison in browser_matcher
- Enhance browser signatures and scoring pipeline
- Improve preprocessing and cycle detection

infra:
- Multi-distro Vagrantfile (centos8, rocky9, rocky10) with per-distro provisioning
- New Makefile targets: vm-up-all, test-vm-matrix, test-vm-centos8/rocky10
- Add debug helpers and run-test-from-host.sh for host-driven VM testing
- Update run-tests-vm.sh for cross-distro compatibility
- Remove accidental binary blob (\004)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Jacquin Antoine
2026-04-13 01:09:33 +02:00
parent d81463a589
commit d75825278e
32 changed files with 2148 additions and 890 deletions

View File

@ -44,6 +44,8 @@ FEATURES = [
'host_diversity', 'host_sweep_speed', 'host_coverage_uniformity',
# §5.8b — Similarité Jaccard cross-domaine (chemins partagés entre hosts)
'cross_domain_path_similarity',
# §5.4 — Resource Dependency Tree (cascade de chargement)
'root_to_first_asset_delay', 'asset_load_stddev',
# P0+P1 : features sous-exploitées (SQL existant ou ajouté)
'is_fake_navigation',
'true_window_size', 'window_mss_ratio',
@ -59,6 +61,9 @@ FEATURES = [
'h2_order_chromesafari', 'h2_order_firefox',
# §3 — Score de cohérence de fingerprint cross-layer
'fingerprint_coherence_score',
# §3.9.4 — Browser matcher scores (passif H2)
'browser_match_chrome', 'browser_match_firefox', 'browser_match_safari',
'browser_match_max',
]
# Features supplémentaires pour le modèle Complet (données TCP/TLS requises)
@ -103,6 +108,11 @@ def preprocess_df(df: pd.DataFrame) -> pd.DataFrame:
# browser_confidence jusqu'à la validation complète.
if BROWSER_MATCHER_ENABLED:
df = run_browser_matcher(df)
else:
# Colonnes par défaut quand le matcher est désactivé
for col in ['browser_match_chrome', 'browser_match_firefox', 'browser_match_safari',
'browser_match_max', 'browser_family_detected']:
df[col] = 0.0 if col != 'browser_family_detected' else ''
# Rétro-compatibilité
df['is_known_browser'] = browser_axes['axis_ja4_known'].astype(int)