docs: update ClickHouse schema (http_logs_raw + http_logs)
Some checks failed
Build and Test / test (push) Has been cancelled
Build and Test / build (push) Has been cancelled
Build and Test / docker (push) Has been cancelled

- README.md: documenter les deux tables (raw + enrichie)
- architecture.yml: décrire le schema complet avec colonnes matérialisées
- Table http_logs_raw: ingestion JSON brut (colonne raw_json unique)
- Table http_logs: extraction des champs via DEFAULT JSONExtract*

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
This commit is contained in:
toto
2026-03-03 11:53:13 +01:00
parent 560ee59d85
commit 58b23ccc1e
2 changed files with 203 additions and 36 deletions

View File

@ -160,7 +160,7 @@ config:
clickhouse:
enabled: true
dsn: clickhouse://user:pass@localhost:9000/db
table: correlated_logs_http_network
table: http_logs_raw
batch_size: 500
flush_interval_ms: 200
max_buffer_size: 5000
@ -246,10 +246,10 @@ outputs:
clickhouse:
enabled: true
description: >
Sink principal pour larchivage et lanalyse quasi temps réel. Inserts
Sink principal pour l'archivage et l'analyse quasi temps réel. Inserts
batch asynchrones, drop en cas de saturation.
dsn: clickhouse://user:pass@host:9000/db
table: correlated_logs_http_network
table: http_logs_raw
batch_size: 500
flush_interval_ms: 200
max_buffer_size: 5000
@ -402,28 +402,132 @@ schema:
clickhouse_schema:
strategy: external_ddls
description: >
La table ClickHouse est gérée en dehors du service. logcorrelator remplit
les colonnes connues et met NULL si un champ manque. Tous les champs fusionnés
sont exposés dans une colonne JSON (fields).
base_columns:
- name: timestamp
type: DateTime64(9)
- name: src_ip
type: String
- name: src_port
type: UInt32
- name: dst_ip
type: String
- name: dst_port
type: UInt32
- name: correlated
type: UInt8
- name: orphan_side
type: String
- name: fields
type: JSON
dynamic_fields:
mode: map_or_additional_columns
La table ClickHouse est gérée en dehors du service. Deux tables sont utilisées :
http_logs_raw (table d'ingestion avec le JSON brut) et http_logs (table enrichie
avec extraction des champs via des colonnes matérialisées).
tables:
- name: http_logs_raw
description: >
Table d'ingestion brute. Une seule colonne raw_json contient le log corrélé
complet sérialisé en JSON. Le service insère via INSERT INTO http_logs_raw (raw_json).
engine: MergeTree
order_by: tuple()
columns:
- name: raw_json
type: String
insert_format: >
INSERT INTO http_logs_raw (raw_json) FORMAT JSONEachRow
{"raw_json":"{...log corrélé sérialisé en JSON...}"}
- name: http_logs
description: >
Table enrichie avec extraction des champs du JSON brut via des expressions DEFAULT.
Partitionnée par mois, optimisée pour les requêtes analytiques.
engine: MergeTree
partition_by: toYYYYMM(log_date)
order_by: (log_date, dst_ip, src_ip, time)
columns:
- name: raw_json
type: String
- name: time_str
type: String
default: JSONExtractString(raw_json, 'time')
- name: timestamp_str
type: String
default: JSONExtractString(raw_json, 'timestamp')
- name: time
type: DateTime
default: parseDateTimeBestEffort(time_str)
- name: log_date
type: Date
default: toDate(time)
- name: src_ip
type: IPv4
default: toIPv4(JSONExtractString(raw_json, 'src_ip'))
- name: src_port
type: UInt16
default: toUInt16(JSONExtractUInt(raw_json, 'src_port'))
- name: dst_ip
type: IPv4
default: toIPv4(JSONExtractString(raw_json, 'dst_ip'))
- name: dst_port
type: UInt16
default: toUInt16(JSONExtractUInt(raw_json, 'dst_port'))
- name: correlated
type: UInt8
default: JSONExtractBool(raw_json, 'correlated')
- name: keepalives
type: UInt16
default: toUInt16(JSONExtractUInt(raw_json, 'keepalives'))
- name: method
type: LowCardinality(String)
default: JSONExtractString(raw_json, 'method')
- name: scheme
type: LowCardinality(String)
default: JSONExtractString(raw_json, 'scheme')
- name: host
type: LowCardinality(String)
default: JSONExtractString(raw_json, 'host')
- name: path
type: String
default: JSONExtractString(raw_json, 'path')
- name: query
type: String
default: JSONExtractString(raw_json, 'query')
- name: http_version
type: LowCardinality(String)
default: JSONExtractString(raw_json, 'http_version')
- name: orphan_side
type: LowCardinality(String)
default: JSONExtractString(raw_json, 'orphan_side')
- name: a_timestamp
type: UInt64
default: JSONExtractUInt(raw_json, 'a_timestamp')
- name: b_timestamp
type: UInt64
default: JSONExtractUInt(raw_json, 'b_timestamp')
- name: conn_id
type: String
default: JSONExtractString(raw_json, 'conn_id')
- name: ip_meta_df
type: UInt8
default: JSONExtractBool(raw_json, 'ip_meta_df')
- name: ip_meta_id
type: UInt32
default: JSONExtractUInt(raw_json, 'ip_meta_id')
- name: ip_meta_total_length
type: UInt32
default: JSONExtractUInt(raw_json, 'ip_meta_total_length')
- name: ip_meta_ttl
type: UInt8
default: JSONExtractUInt(raw_json, 'ip_meta_ttl')
- name: tcp_meta_options
type: LowCardinality(String)
default: JSONExtractString(raw_json, 'tcp_meta_options')
- name: tcp_meta_window_size
type: UInt32
default: JSONExtractUInt(raw_json, 'tcp_meta_window_size')
- name: syn_to_clienthello_ms
type: Int32
default: toInt32(JSONExtractInt(raw_json, 'syn_to_clienthello_ms'))
- name: tls_version
type: LowCardinality(String)
default: JSONExtractString(raw_json, 'tls_version')
- name: tls_sni
type: LowCardinality(String)
default: JSONExtractString(raw_json, 'tls_sni')
- name: ja3
type: String
default: JSONExtractString(raw_json, 'ja3')
- name: ja3_hash
type: String
default: JSONExtractString(raw_json, 'ja3_hash')
- name: ja4
type: String
default: JSONExtractString(raw_json, 'ja4')
- name: extra
type: JSON
default: raw_json
architecture:
description: >