docs: update ClickHouse schema with TTL, MV and users
Some checks failed
Build and Test / test (push) Has been cancelled
Build and Test / build (push) Has been cancelled
Build and Test / docker (push) Has been cancelled

- README.md: add complete DDL with mabase_prod database
- Add TTL (1 day) on http_logs_raw table
- Add materialized view mv_http_logs for automatic data transfer
- Document users (data_writer, analyst) and grants
- Add migration script for existing data
- architecture.yml: add database, TTL settings, MV, users sections

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
This commit is contained in:
toto
2026-03-03 13:39:47 +01:00
parent 51e1eb8d57
commit 60cd8d87e4
2 changed files with 104 additions and 28 deletions

View File

@ -194,34 +194,31 @@ Tous les champs des sources A et B sont fusionnés au même niveau. Les champs d
## Schema ClickHouse
Le service utilise deux tables ClickHouse :
Le service utilise deux tables ClickHouse dans la base `mabase_prod` :
### Table brute (`http_logs_raw`)
Table d'ingestion qui stocke le log corrélé brut au format JSON :
### Setup complet
```sql
CREATE TABLE http_logs_raw
-- 1. Créer la base de données
CREATE DATABASE IF NOT EXISTS mabase_prod;
-- 2. Table brute avec TTL (1 jour de rétention)
DROP TABLE IF EXISTS mabase_prod.http_logs_raw;
CREATE TABLE mabase_prod.http_logs_raw
(
raw_json String
raw_json String,
ingest_time DateTime DEFAULT now()
)
ENGINE = MergeTree
ORDER BY tuple();
```
ORDER BY tuple()
TTL ingest_time + INTERVAL 1 DAY
SETTINGS ttl_only_drop_parts = 1;
**Format d'insertion :** Le service envoie chaque log corrélé sérialisé en JSON dans la colonne `raw_json` :
-- 3. Table parsée
DROP TABLE IF EXISTS mabase_prod.http_logs;
```sql
INSERT INTO http_logs_raw (raw_json) FORMAT JSONEachRow
{"raw_json":"{\"timestamp\":\"2024-01-01T12:00:00Z\",\"src_ip\":\"192.168.1.1\",\"correlated\":true,...}"}
```
### Table enrichie (`http_logs`)
Vue matérialisée qui extrait les champs du JSON pour l'analyse :
```sql
CREATE TABLE http_logs
CREATE TABLE mabase_prod.http_logs
(
raw_json String,
@ -246,7 +243,6 @@ CREATE TABLE http_logs
http_version LowCardinality(String) DEFAULT JSONExtractString(raw_json, 'http_version'),
orphan_side LowCardinality(String) DEFAULT JSONExtractString(raw_json, 'orphan_side'),
-- champs « presque toujours là »
a_timestamp UInt64 DEFAULT JSONExtractUInt(raw_json, 'a_timestamp'),
b_timestamp UInt64 DEFAULT JSONExtractUInt(raw_json, 'b_timestamp'),
conn_id String DEFAULT JSONExtractString(raw_json, 'conn_id'),
@ -263,12 +259,54 @@ CREATE TABLE http_logs
ja3_hash String DEFAULT JSONExtractString(raw_json, 'ja3_hash'),
ja4 String DEFAULT JSONExtractString(raw_json, 'ja4'),
-- tous les autres champs JSON (headers dynamiques etc.)
extra JSON DEFAULT raw_json
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(log_date)
ORDER BY (log_date, dst_ip, src_ip, time);
-- 4. Vue matérialisée (RAW → logs)
DROP VIEW IF EXISTS mabase_prod.mv_http_logs;
CREATE MATERIALIZED VIEW mabase_prod.mv_http_logs
TO mabase_prod.http_logs
AS
SELECT raw_json
FROM mabase_prod.http_logs_raw;
```
### Utilisateurs et permissions
```sql
-- Créer les utilisateurs
CREATE USER IF NOT EXISTS data_writer IDENTIFIED WITH TonMotDePasseInsert;
CREATE USER IF NOT EXISTS analyst IDENTIFIED WITH TonMotDePasseAnalyst;
-- Droits pour data_writer (INSERT + SELECT pour la MV)
GRANT INSERT(raw_json) ON mabase_prod.http_logs_raw TO data_writer;
GRANT SELECT(raw_json) ON mabase_prod.http_logs_raw TO data_writer;
-- Droits pour analyst (lecture seule sur les logs parsés)
GRANT SELECT ON mabase_prod.http_logs TO analyst;
```
### Format d'insertion
Le service envoie chaque log corrélé sérialisé en JSON dans la colonne `raw_json` :
```sql
INSERT INTO mabase_prod.http_logs_raw (raw_json) FORMAT JSONEachRow
{"raw_json":"{\"timestamp\":\"2024-01-01T12:00:00Z\",\"src_ip\":\"192.168.1.1\",\"correlated\":true,...}"}
```
### Migration des données existantes
Si vous avez déjà des données dans l'ancienne table `http_logs_raw` :
```sql
INSERT INTO mabase_prod.http_logs (raw_json)
SELECT raw_json
FROM mabase_prod.http_logs_raw;
```
## Tests