Complete implementation of HTTP/2 passive fingerprinting per thesis §2.5.3: mod-reqin-log (C module): - Replace connection-level filter with ap_hook_process_connection (APR_HOOK_FIRST) to capture H2 preface before mod_http2 takes over the connection - AP_MODE_SPECULATIVE read of 512 bytes from c->input_filters - Parse SETTINGS, WINDOW_UPDATE, PRIORITY flags, pseudo-header order - Output individual SETTINGS params as separate JSON fields (IDs 1-6, 8) - Read H2 notes from c1 (master connection) for mod_http2 secondary conns - Fix header_order_signature JSON length bug (26→strlen) ClickHouse schema: - Add 8 new columns to http_logs: h2_has_priority, h2_header_table_size, h2_enable_push, h2_max_concurrent_streams, h2_initial_window_size, h2_max_frame_size, h2_max_header_list_size, h2_enable_connect_protocol - Use Int32/Int64 with DEFAULT -1 to distinguish absent vs zero - Update mv_http_logs to extract individual fields via JSONHas/JSONExtractInt - Migration 04_http2_fields.sql updated for existing deployments Correlator: - Accept both timestamp_ns and timestamp field names (backward compat) Integration: - Enable HTTP/2 in Apache: Protocols h2 http/1.1 in httpd-integration.conf Validated end-to-end via Playwright: H2 curl traffic → mod-reqin-log → correlator → ClickHouse with all 12 H2 columns populated correctly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
17 lines
558 B
SQL
17 lines
558 B
SQL
-- =============================================================================
|
|
-- 01_raw_tables.sql — Raw ingest table (direct target for logcorrelator inserts)
|
|
-- =============================================================================
|
|
|
|
CREATE TABLE IF NOT EXISTS ja4_logs.http_logs_raw
|
|
(
|
|
`raw_json` String CODEC(ZSTD(3)),
|
|
`ingest_time` DateTime DEFAULT now()
|
|
)
|
|
ENGINE = MergeTree
|
|
PARTITION BY toDate(ingest_time)
|
|
ORDER BY ingest_time
|
|
TTL ingest_time + INTERVAL 2 HOUR
|
|
SETTINGS
|
|
index_granularity = 8192,
|
|
ttl_only_drop_parts = 1;
|