feat: HTTP/2 passive fingerprinting with individual SETTINGS fields

Complete implementation of HTTP/2 passive fingerprinting per thesis §2.5.3:

mod-reqin-log (C module):
- Replace connection-level filter with ap_hook_process_connection (APR_HOOK_FIRST)
  to capture H2 preface before mod_http2 takes over the connection
- AP_MODE_SPECULATIVE read of 512 bytes from c->input_filters
- Parse SETTINGS, WINDOW_UPDATE, PRIORITY flags, pseudo-header order
- Output individual SETTINGS params as separate JSON fields (IDs 1-6, 8)
- Read H2 notes from c1 (master connection) for mod_http2 secondary conns
- Fix header_order_signature JSON length bug (26→strlen)

ClickHouse schema:
- Add 8 new columns to http_logs: h2_has_priority, h2_header_table_size,
  h2_enable_push, h2_max_concurrent_streams, h2_initial_window_size,
  h2_max_frame_size, h2_max_header_list_size, h2_enable_connect_protocol
- Use Int32/Int64 with DEFAULT -1 to distinguish absent vs zero
- Update mv_http_logs to extract individual fields via JSONHas/JSONExtractInt
- Migration 04_http2_fields.sql updated for existing deployments

Correlator:
- Accept both timestamp_ns and timestamp field names (backward compat)

Integration:
- Enable HTTP/2 in Apache: Protocols h2 http/1.1 in httpd-integration.conf

Validated end-to-end via Playwright: H2 curl traffic → mod-reqin-log →
correlator → ClickHouse with all 12 H2 columns populated correctly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
toto
2026-04-11 02:33:45 +02:00
parent bd81331411
commit 85d3b95b7b
25 changed files with 649 additions and 160 deletions

View File

@ -144,7 +144,10 @@ CREATE TABLE IF NOT EXISTS ja4_processing.agg_host_ip_ja4_1h
)
ENGINE = AggregatingMergeTree()
ORDER BY (window_start, src_ip, ja4, host)
SETTINGS deduplicate_merge_projection_mode = 'drop';
TTL window_start + INTERVAL 7 DAY
SETTINGS
deduplicate_merge_projection_mode = 'drop',
ttl_only_drop_parts = 1;
-- -----------------------------------------------------------------------------
@ -177,7 +180,15 @@ SELECT
sum(IF(match(src.path, '(?i)\.(png|jpg|jpeg|gif|css|js|ico|woff2|svg|eot)$'), 1, 0)) AS count_assets,
sum(IF(position(src.client_headers, 'Referer') = 0, 1, 0)) AS count_no_referer,
uniqState(src.header_user_agent) AS uniq_ua,
0 AS max_requests_per_sec, -- TODO(P0): calculer via sous-requête par seconde (impossible dans un seul GROUP BY)
toUInt32(if(count() > 0,
arrayMax(
arrayMap(
s -> toUInt64(countEqual(groupArray(toStartOfSecond(src.time)), s)),
arrayDistinct(groupArray(toStartOfSecond(src.time)))
)
),
0
)) AS max_requests_per_sec,
varPopState(toFloat64(length(replaceAll(src.path, '/', '//')) - length(src.path))) AS url_depth_variance,
sum(IF(src.ip_meta_total_length < 60 OR src.ip_meta_total_length > 1500, 1, 0)) AS count_anomalous_payload,
uniqState(src.ja3) AS uniq_ja3,
@ -224,7 +235,9 @@ CREATE TABLE IF NOT EXISTS ja4_processing.agg_header_fingerprint_1h
sec_fetch_dest SimpleAggregateFunction(any, String)
)
ENGINE = AggregatingMergeTree()
ORDER BY (window_start, src_ip);
ORDER BY (window_start, src_ip)
TTL window_start + INTERVAL 7 DAY
SETTINGS ttl_only_drop_parts = 1;
DROP VIEW IF EXISTS ja4_processing.mv_agg_header_fingerprint_1h;
@ -249,3 +262,36 @@ SELECT
any(src.header_sec_fetch_dest) AS sec_fetch_dest
FROM ja4_logs.http_logs AS src
GROUP BY window_start, src.src_ip;
-- -----------------------------------------------------------------------------
-- unknown_h2_fingerprints — file d'examen pour signatures H2 inconnues (§3.9.5)
--
-- Sessions dont le fingerprint H2 ne correspond à aucune famille connue
-- (browser_match_max < 0.45) mais qui présentent un comportement navigateur
-- (browser_confidence ≥ 0.55, Sec-Fetch-* présent, TLS 1.3).
-- Utilisée pour enrichir progressivement browser_signatures.
-- -----------------------------------------------------------------------------
CREATE TABLE IF NOT EXISTS ja4_processing.unknown_h2_fingerprints
(
observed_at DateTime DEFAULT now(),
src_ip IPv6,
ja4 String CODEC(ZSTD(3)),
h2_fingerprint String CODEC(ZSTD(3)),
h2_settings_fp String CODEC(ZSTD(3)),
h2_window_update UInt32,
h2_pseudo_order LowCardinality(String),
h2_has_priority UInt8,
browser_confidence_score Float32,
header_user_agent String CODEC(ZSTD(3)),
tls_version LowCardinality(String),
hit_count UInt64 DEFAULT 1,
INDEX idx_observed_at observed_at TYPE minmax GRANULARITY 4
)
ENGINE = ReplacingMergeTree(observed_at)
ORDER BY (h2_fingerprint, ja4, src_ip)
TTL observed_at + INTERVAL 30 DAY
SETTINGS
index_granularity = 8192,
ttl_only_drop_parts = 1;