docs: réécriture complète de la documentation base de données en français
Réécriture des 3 fichiers de documentation de la base de données ClickHouse : - docs/database/schema.md : couverture complète des 2 bases, 14+ tables, 7 dictionnaires, 8 MVs, 8 vues, TTL, partitions, moteurs et colonnes - docs/database/migrations.md : 13 fichiers SQL (ajout 10-12), prérequis mis à jour (ClickHouse 24.8+, 5 CSV), deploy_schema.sh, init-stack.sh, vérification et rollback complets - shared/clickhouse/README.md : référence rapide des 13 fichiers, deploy_schema.sh, patron double-base, prérequis Suppression des références obsolètes : dict_anubis_ua, dict_anubis_country, anubis_ua_rules, anubis_country_rules. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@ -1,256 +1,402 @@
|
||||
# Database Migrations
|
||||
# Migrations de base de données
|
||||
|
||||
The ClickHouse schema for ja4-platform is managed through numbered SQL migration files in `shared/clickhouse/`. Migrations are idempotent (using `IF NOT EXISTS` / `IF EXISTS`) and must be applied in numeric order.
|
||||
Le schéma ClickHouse de ja4-platform est géré via 13 fichiers SQL numérotés dans
|
||||
`shared/clickhouse/`. Toutes les migrations sont **idempotentes** (utilisation de
|
||||
`IF NOT EXISTS` / `IF EXISTS` / `CREATE OR REPLACE`) et doivent être appliquées
|
||||
dans l'ordre numérique.
|
||||
|
||||
## Migration Order
|
||||
Le schéma utilise un **patron double-base** :
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `00_database.sql` | Creates the `ja4_processing` database |
|
||||
| `01_raw_tables.sql` | Creates `http_logs_raw` ingest table (MergeTree, 1-day TTL) |
|
||||
| `02_dictionaries.sql` | Creates ASN geo dictionary (`dict_iplocate_asn`), bot IP/JA4 reference tables, `ref_bot_networks` |
|
||||
| `03_anubis_tables.sql` | Creates Anubis crawler rule tables (`anubis_ua_rules`, `anubis_ip_rules`, `anubis_asn_rules`, `anubis_country_rules`) and their dictionaries (`dict_anubis_ua`, `dict_anubis_ip`, `dict_anubis_asn`, `dict_anubis_country`) |
|
||||
| `04_mv_http_logs.sql` | Creates the canonical `http_logs` table and `mv_http_logs` materialized view with full Anubis enrichment |
|
||||
| `05_aggregation_tables.sql` | Creates reputation dictionaries (`dict_bot_ip`, `dict_bot_ja4`, `dict_asn_reputation`), behavioral aggregation tables (`agg_host_ip_ja4_1h`, `agg_header_fingerprint_1h`), and their materialized views |
|
||||
| `06_ml_tables.sql` | Creates ML output tables (`ml_detected_anomalies`, `ml_all_scores`) and `view_ip_recurrence` |
|
||||
| `07_ai_features_view.sql` | Creates `view_ai_features_1h` — the 50+ feature view used by bot-detector |
|
||||
| `08_users.sql` | Creates ClickHouse users (`data_writer`, `analyst`) and grants permissions |
|
||||
| `09_audit_table.sql` | Creates `audit_logs` table for SOC dashboard audit trail |
|
||||
| Base | Variable d'environnement | Défaut | Contenu |
|
||||
|------|--------------------------|--------|---------|
|
||||
| Logs | `CLICKHOUSE_DB_LOGS` | `ja4_logs` | `http_logs_raw`, `http_logs`, `mv_http_logs` |
|
||||
| Processing | `CLICKHOUSE_DB_PROCESSING` | `ja4_processing` | Agrégations, ML, vues, dictionnaires, audit |
|
||||
|
||||
## Prerequisites
|
||||
---
|
||||
|
||||
### 1. ClickHouse Server
|
||||
## Ordre des migrations
|
||||
|
||||
A running ClickHouse server (version 23.8+ recommended for `REGEXP_TREE` dictionary support).
|
||||
| Fichier | Lignes | Contenu |
|
||||
|---------|--------|---------|
|
||||
| `00_database.sql` | 5 | Création des bases `ja4_logs` et `ja4_processing` |
|
||||
| `01_raw_tables.sql` | 16 | Table d'ingestion `http_logs_raw` (MergeTree, TTL 2h) |
|
||||
| `02_dictionaries.sql` | 57 | Dictionnaire `dict_iplocate_asn` (IP_TRIE, CSV), tables `ref_bot_networks`, `bot_ip`, `bot_ja4` |
|
||||
| `03_anubis_tables.sql` | 73 | Tables de règles Anubis (`anubis_ip_rules`, `anubis_asn_rules`) et dictionnaires (`dict_anubis_ip`, `dict_anubis_asn`) |
|
||||
| `04_mv_http_logs.sql` | 197 | Table `http_logs` (MergeTree, TTL 30j) + vue matérialisée `mv_http_logs` (parse JSON + enrichissement Anubis COALESCE IP→ASN) |
|
||||
| `05_aggregation_tables.sql` | 234 | Dictionnaires de réputation (`dict_bot_ip`, `dict_bot_ja4`, `dict_browser_ja4`, `dict_asn_reputation`), 2 tables d'agrégation (`agg_host_ip_ja4_1h`, `agg_header_fingerprint_1h`) + 2 vues matérialisées |
|
||||
| `06_ml_tables.sql` | 144 | Tables ML (`ml_detected_anomalies`, `ml_all_scores`) + vue `view_ip_recurrence` |
|
||||
| `07_ai_features_view.sql` | 156 | Vue `view_ai_features_1h` (~65+ features ML depuis les agrégations + dictionnaires) |
|
||||
| `08_users.sql` | 22 | Utilisateurs `data_writer` et `analyst` avec permissions |
|
||||
| `09_audit_table.sql` | 21 | Table `audit_logs` pour le journal d'audit SOC |
|
||||
| `10_perf_indexes.sql` | 113 | Index secondaires et projections de performance (migration idempotente pour instances existantes) |
|
||||
| `11_views.sql` | 216 | Vues dashboard (`view_dashboard_entities`, `view_dashboard_user_agents`, `view_form_bruteforce_detected`, `view_host_ip_ja4_rotation`, `view_resource_cascade_1h`) |
|
||||
| `12_thesis_features.sql` | 580 | 4 tables d'agrégation thèse (`agg_path_sequences_1h`, `agg_request_timing_1h`, `agg_ip_behavior_1h`, `agg_resource_cascade_1h`) + 4 MVs + vue `view_thesis_features_1h` |
|
||||
|
||||
### 2. CSV Data Files
|
||||
---
|
||||
|
||||
Place the following files in `/var/lib/clickhouse/user_files/`:
|
||||
## Prérequis
|
||||
|
||||
| File | Source | Description |
|
||||
|------|--------|-------------|
|
||||
| `iplocate-ip-to-asn.csv` | [IPLocate](https://iplocate.io) | IP-to-ASN mapping with country, org, domain |
|
||||
| `bot_ip.csv` | Custom | Known bot IP prefixes (CIDR format) |
|
||||
| `bot_ja4.csv` | Custom | Known bot JA4 fingerprints |
|
||||
| `asn_reputation.csv` | Custom | ASN reputation labels (`human`, `bot`, `unknown`) |
|
||||
### 1. Serveur ClickHouse
|
||||
|
||||
### 3. Anubis Passwords
|
||||
Un serveur ClickHouse en fonctionnement, **version 24.8+** requise (support des
|
||||
projections AggregatingMergeTree avec `deduplicate_merge_projection_mode`).
|
||||
|
||||
Migration `03_anubis_tables.sql` contains placeholder passwords (`CHANGE_ME`) for the Anubis dictionaries. Replace these with the actual ClickHouse admin password before applying:
|
||||
### 2. Fichiers CSV de données
|
||||
|
||||
Placer les fichiers suivants dans `/var/lib/clickhouse/user_files/` :
|
||||
|
||||
| Fichier | Source | Description | Entrées approx. |
|
||||
|---------|--------|-------------|------------------|
|
||||
| `iplocate-ip-to-asn.csv` | [IPLocate](https://iplocate.io) | Correspondance IP→ASN avec pays, org, domaine | ~714K |
|
||||
| `bot_ip.csv` | Personnalisé | Préfixes IP de bots connus (format CIDR) | ~3,5K |
|
||||
| `bot_ja4.csv` | Personnalisé | Empreintes JA4 de bots connus | ~31 |
|
||||
| `browser_ja4.csv` | Personnalisé | Empreintes JA4 de navigateurs légitimes | ~1,2K |
|
||||
| `asn_reputation.csv` | Personnalisé | Labels de réputation ASN (`human`, `bot`, `unknown`) | ~82K |
|
||||
|
||||
### 3. Mots de passe Anubis
|
||||
|
||||
Le fichier `03_anubis_tables.sql` contient des mots de passe par défaut (`CHANGE_ME`)
|
||||
pour les dictionnaires Anubis basés sur ClickHouse. Les remplacer avant d'appliquer :
|
||||
|
||||
```bash
|
||||
sed -i "s/CHANGE_ME/your_actual_password/g" 03_anubis_tables.sql
|
||||
sed -i "s/CHANGE_ME/mot_de_passe_réel/g" 03_anubis_tables.sql
|
||||
```
|
||||
|
||||
## How to Apply
|
||||
---
|
||||
|
||||
### Full Initial Setup
|
||||
## Comment appliquer
|
||||
|
||||
Apply all migrations in order:
|
||||
### Méthode recommandée : deploy_schema.sh
|
||||
|
||||
Le script `deploy_schema.sh` applique les 13 fichiers dans l'ordre en substituant
|
||||
automatiquement les noms de base de données :
|
||||
|
||||
```bash
|
||||
cd shared/clickhouse/
|
||||
|
||||
clickhouse-client --multiquery < 00_database.sql
|
||||
clickhouse-client --multiquery < 01_raw_tables.sql
|
||||
clickhouse-client --multiquery < 02_dictionaries.sql
|
||||
clickhouse-client --multiquery < 03_anubis_tables.sql
|
||||
clickhouse-client --multiquery < 04_mv_http_logs.sql
|
||||
clickhouse-client --multiquery < 05_aggregation_tables.sql
|
||||
clickhouse-client --multiquery < 06_ml_tables.sql
|
||||
clickhouse-client --multiquery < 07_ai_features_view.sql
|
||||
clickhouse-client --multiquery < 08_users.sql
|
||||
clickhouse-client --multiquery < 09_audit_table.sql
|
||||
# Avec les noms de base par défaut (ja4_logs / ja4_processing)
|
||||
./deploy_schema.sh
|
||||
|
||||
# Avec des noms personnalisés
|
||||
CLICKHOUSE_DB_LOGS=my_logs \
|
||||
CLICKHOUSE_DB_PROCESSING=my_proc \
|
||||
CLICKHOUSE_HOST=clickhouse-server \
|
||||
CLICKHOUSE_USER=admin \
|
||||
CLICKHOUSE_PASSWORD='secret' \
|
||||
./deploy_schema.sh
|
||||
```
|
||||
|
||||
### With Authentication
|
||||
Variables d'environnement supportées :
|
||||
|
||||
| Variable | Défaut | Description |
|
||||
|----------|--------|-------------|
|
||||
| `CLICKHOUSE_DB_LOGS` | `ja4_logs` | Nom de la base de logs |
|
||||
| `CLICKHOUSE_DB_PROCESSING` | `ja4_processing` | Nom de la base de traitement |
|
||||
| `CLICKHOUSE_HOST` | `localhost` | Hôte ClickHouse |
|
||||
| `CLICKHOUSE_PORT` | `9000` | Port natif ClickHouse |
|
||||
| `CLICKHOUSE_USER` | `default` | Utilisateur ClickHouse |
|
||||
| `CLICKHOUSE_PASSWORD` | (vide) | Mot de passe ClickHouse |
|
||||
|
||||
### Méthode alternative : init-stack.sh
|
||||
|
||||
Le script `scripts/init-stack.sh` fournit une initialisation complète incluant
|
||||
le schéma, les migrations, la validation et le nettoyage :
|
||||
|
||||
```bash
|
||||
clickhouse-client --user admin --password 'your_password' --multiquery < 00_database.sql
|
||||
# ... repeat for each file
|
||||
./scripts/init-stack.sh
|
||||
```
|
||||
|
||||
### One-Liner (All at Once)
|
||||
### Application manuelle
|
||||
|
||||
```bash
|
||||
cd shared/clickhouse/
|
||||
for f in 0*.sql; do
|
||||
echo "Applying $f..."
|
||||
|
||||
for f in 0*.sql 1*.sql; do
|
||||
echo "Application de $f..."
|
||||
clickhouse-client --multiquery < "$f"
|
||||
done
|
||||
```
|
||||
|
||||
## How to Verify
|
||||
Avec authentification :
|
||||
|
||||
After applying all migrations, run these queries to verify each migration was successful:
|
||||
|
||||
### 00 — Database
|
||||
|
||||
```sql
|
||||
SHOW DATABASES LIKE 'ja4_processing';
|
||||
-- Expected: ja4_processing
|
||||
```bash
|
||||
clickhouse-client --user admin --password 'secret' --multiquery < 00_database.sql
|
||||
# ... répéter pour chaque fichier
|
||||
```
|
||||
|
||||
### 01 — Raw Tables
|
||||
---
|
||||
|
||||
## Comment vérifier
|
||||
|
||||
Après l'application de toutes les migrations, exécuter ces requêtes pour valider
|
||||
chaque étape.
|
||||
|
||||
### 00 — Bases de données
|
||||
|
||||
```sql
|
||||
SHOW DATABASES LIKE 'ja4%';
|
||||
-- Attendu : ja4_logs, ja4_processing
|
||||
```
|
||||
|
||||
### 01 — Table brute
|
||||
|
||||
```sql
|
||||
EXISTS ja4_logs.http_logs_raw;
|
||||
-- Expected: 1
|
||||
-- Attendu : 1
|
||||
```
|
||||
|
||||
### 02 — Dictionaries
|
||||
### 02 — Dictionnaire ASN + tables de référence
|
||||
|
||||
```sql
|
||||
SELECT dictGetOrDefault('ja4_processing.dict_iplocate_asn', 'country_code',
|
||||
toIPv6(toIPv4('8.8.8.8')), 'MISSING');
|
||||
-- Expected: US (if CSV loaded) or MISSING
|
||||
-- Attendu : US (si CSV chargé) ou MISSING
|
||||
|
||||
EXISTS ja4_processing.ref_bot_networks;
|
||||
-- Attendu : 1
|
||||
```
|
||||
|
||||
### 03 — Anubis Tables
|
||||
### 03 — Tables Anubis
|
||||
|
||||
```sql
|
||||
EXISTS ja4_processing.anubis_ua_rules;
|
||||
EXISTS ja4_processing.anubis_ip_rules;
|
||||
EXISTS ja4_processing.anubis_asn_rules;
|
||||
EXISTS ja4_processing.anubis_country_rules;
|
||||
-- Expected: 1 for each
|
||||
-- Attendu : 1 pour chacune
|
||||
```
|
||||
|
||||
### 04 — MV + http_logs
|
||||
### 04 — http_logs + vue matérialisée
|
||||
|
||||
```sql
|
||||
EXISTS ja4_logs.http_logs;
|
||||
SELECT name FROM system.tables WHERE database = 'ja4_logs' AND name = 'mv_http_logs';
|
||||
-- Expected: mv_http_logs
|
||||
-- Attendu : mv_http_logs
|
||||
```
|
||||
|
||||
### 05 — Aggregation Tables
|
||||
### 05 — Tables d'agrégation + dictionnaires de réputation
|
||||
|
||||
```sql
|
||||
EXISTS ja4_processing.agg_host_ip_ja4_1h;
|
||||
EXISTS ja4_processing.agg_header_fingerprint_1h;
|
||||
SELECT name FROM system.dictionaries WHERE database = 'ja4_processing' AND name = 'dict_bot_ip';
|
||||
-- Expected: dict_bot_ip
|
||||
SELECT name FROM system.dictionaries
|
||||
WHERE database = 'ja4_processing' AND name IN ('dict_bot_ip', 'dict_bot_ja4', 'dict_browser_ja4', 'dict_asn_reputation');
|
||||
-- Attendu : 4 lignes
|
||||
```
|
||||
|
||||
### 06 — ML Tables
|
||||
### 06 — Tables ML
|
||||
|
||||
```sql
|
||||
EXISTS ja4_processing.ml_detected_anomalies;
|
||||
EXISTS ja4_processing.ml_all_scores;
|
||||
SELECT name FROM system.tables WHERE database = 'ja4_processing' AND name LIKE 'view_ip%';
|
||||
-- Expected: view_ip_recurrence
|
||||
SELECT name FROM system.tables WHERE database = 'ja4_processing' AND name = 'view_ip_recurrence';
|
||||
-- Attendu : view_ip_recurrence
|
||||
```
|
||||
|
||||
### 07 — AI Features View
|
||||
### 07 — Vue de features AI
|
||||
|
||||
```sql
|
||||
SELECT name FROM system.tables WHERE database = 'ja4_processing' AND name = 'view_ai_features_1h';
|
||||
-- Expected: view_ai_features_1h
|
||||
-- Attendu : view_ai_features_1h
|
||||
```
|
||||
|
||||
### 08 — Users
|
||||
### 08 — Utilisateurs
|
||||
|
||||
```sql
|
||||
SHOW GRANTS FOR data_writer;
|
||||
-- Expected: GRANT INSERT, SELECT ON ja4_logs.http_logs_raw TO data_writer
|
||||
-- Attendu : GRANT INSERT, SELECT ON ja4_logs.http_logs_raw TO data_writer
|
||||
SHOW GRANTS FOR analyst;
|
||||
-- Expected: GRANT SELECT ON multiple tables
|
||||
-- Attendu : GRANT SELECT sur 6 tables/vues
|
||||
```
|
||||
|
||||
### 09 — Audit Table
|
||||
### 09 — Table d'audit
|
||||
|
||||
```sql
|
||||
EXISTS ja4_processing.audit_logs;
|
||||
-- Expected: 1
|
||||
-- Attendu : 1
|
||||
```
|
||||
|
||||
### Full Verification Query
|
||||
### 10 — Index de performance
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
count() AS total_tables
|
||||
SELECT name FROM system.data_skipping_indices
|
||||
WHERE table = 'ml_detected_anomalies' AND database = 'ja4_processing';
|
||||
-- Attendu : idx_detected_at, idx_threat_level, idx_bot_name
|
||||
```
|
||||
|
||||
### 11 — Vues dashboard
|
||||
|
||||
```sql
|
||||
SELECT name FROM system.tables
|
||||
WHERE database = 'ja4_processing'
|
||||
AND name LIKE 'view_%'
|
||||
AND engine = 'View';
|
||||
-- Attendu : ≥ 7 vues (view_ip_recurrence, view_ai_features_1h,
|
||||
-- view_form_bruteforce_detected, view_host_ip_ja4_rotation,
|
||||
-- view_dashboard_user_agents, view_dashboard_entities, view_resource_cascade_1h)
|
||||
```
|
||||
|
||||
### 12 — Tables et vue de thèse
|
||||
|
||||
```sql
|
||||
EXISTS ja4_processing.agg_path_sequences_1h;
|
||||
EXISTS ja4_processing.agg_request_timing_1h;
|
||||
EXISTS ja4_processing.agg_ip_behavior_1h;
|
||||
EXISTS ja4_processing.agg_resource_cascade_1h;
|
||||
SELECT name FROM system.tables WHERE database = 'ja4_processing' AND name = 'view_thesis_features_1h';
|
||||
-- Attendu : 1 pour chaque EXISTS, view_thesis_features_1h
|
||||
```
|
||||
|
||||
### Vérification complète
|
||||
|
||||
```sql
|
||||
-- Tables dans ja4_logs
|
||||
SELECT count() AS tables_logs
|
||||
FROM system.tables
|
||||
WHERE database = 'ja4_logs'
|
||||
AND name IN ('http_logs_raw', 'http_logs', 'mv_http_logs');
|
||||
-- Attendu : 3
|
||||
|
||||
-- Tables dans ja4_processing
|
||||
SELECT count() AS tables_processing
|
||||
FROM system.tables
|
||||
WHERE database = 'ja4_processing'
|
||||
AND name IN (
|
||||
'http_logs_raw', 'http_logs', 'agg_host_ip_ja4_1h', 'agg_header_fingerprint_1h',
|
||||
'ml_detected_anomalies', 'ml_all_scores', 'ref_bot_networks',
|
||||
'anubis_ua_rules', 'anubis_ip_rules', 'anubis_asn_rules', 'anubis_country_rules',
|
||||
'audit_logs', 'bot_ip', 'bot_ja4'
|
||||
'ref_bot_networks', 'bot_ip', 'bot_ja4',
|
||||
'anubis_ip_rules', 'anubis_asn_rules',
|
||||
'agg_host_ip_ja4_1h', 'agg_header_fingerprint_1h',
|
||||
'agg_path_sequences_1h', 'agg_request_timing_1h',
|
||||
'agg_ip_behavior_1h', 'agg_resource_cascade_1h',
|
||||
'ml_detected_anomalies', 'ml_all_scores', 'audit_logs'
|
||||
);
|
||||
-- Expected: 14
|
||||
-- Attendu : 14
|
||||
|
||||
-- Dictionnaires
|
||||
SELECT count() AS dicts
|
||||
FROM system.dictionaries
|
||||
WHERE database = 'ja4_processing';
|
||||
-- Attendu : 7
|
||||
|
||||
-- Vues matérialisées dans ja4_logs
|
||||
SELECT count() AS mvs_logs
|
||||
FROM system.tables
|
||||
WHERE database = 'ja4_logs' AND engine = 'MaterializedView';
|
||||
-- Attendu : 1
|
||||
|
||||
-- Vues matérialisées dans ja4_processing
|
||||
SELECT count() AS mvs_proc
|
||||
FROM system.tables
|
||||
WHERE database = 'ja4_processing' AND engine = 'MaterializedView';
|
||||
-- Attendu : 6
|
||||
```
|
||||
|
||||
## Rollback Notes
|
||||
---
|
||||
|
||||
### General Approach
|
||||
## Rollback
|
||||
|
||||
ClickHouse does not support transactional DDL. To roll back a migration:
|
||||
### Approche générale
|
||||
|
||||
1. **Tables**: `DROP TABLE IF EXISTS ja4_processing.<table_name>`
|
||||
2. **Materialized Views**: `DROP VIEW IF EXISTS ja4_processing.<mv_name>` (drop MV before its target table)
|
||||
3. **Dictionaries**: `DROP DICTIONARY IF EXISTS ja4_processing.<dict_name>`
|
||||
4. **Views**: `DROP VIEW IF EXISTS ja4_processing.<view_name>`
|
||||
5. **Users**: `DROP USER IF EXISTS <username>`
|
||||
ClickHouse ne supporte pas les DDL transactionnels. Pour annuler une migration :
|
||||
|
||||
### Rollback Order (Reverse of Apply)
|
||||
1. **Vues matérialisées** : supprimer la MV **avant** sa table cible
|
||||
2. **Dictionnaires** : supprimer le dictionnaire avant les vues/MVs qui l'utilisent
|
||||
3. **Tables** : `DROP TABLE IF EXISTS`
|
||||
4. **Vues** : `DROP VIEW IF EXISTS`
|
||||
5. **Utilisateurs** : `DROP USER IF EXISTS`
|
||||
|
||||
### Ordre de rollback (inverse de l'application)
|
||||
|
||||
```sql
|
||||
-- 09: Audit
|
||||
-- 12 : Tables et vue de thèse
|
||||
DROP VIEW IF EXISTS ja4_processing.view_thesis_features_1h;
|
||||
DROP VIEW IF EXISTS ja4_processing.view_resource_cascade_1h;
|
||||
DROP VIEW IF EXISTS ja4_processing.mv_agg_resource_cascade_1h;
|
||||
DROP VIEW IF EXISTS ja4_processing.mv_agg_ip_behavior_1h;
|
||||
DROP VIEW IF EXISTS ja4_processing.mv_agg_request_timing_1h;
|
||||
DROP VIEW IF EXISTS ja4_processing.mv_agg_path_sequences_1h;
|
||||
DROP TABLE IF EXISTS ja4_processing.agg_resource_cascade_1h;
|
||||
DROP TABLE IF EXISTS ja4_processing.agg_ip_behavior_1h;
|
||||
DROP TABLE IF EXISTS ja4_processing.agg_request_timing_1h;
|
||||
DROP TABLE IF EXISTS ja4_processing.agg_path_sequences_1h;
|
||||
|
||||
-- 11 : Vues dashboard
|
||||
DROP VIEW IF EXISTS ja4_processing.view_dashboard_entities;
|
||||
DROP VIEW IF EXISTS ja4_processing.view_dashboard_user_agents;
|
||||
DROP VIEW IF EXISTS ja4_processing.view_host_ip_ja4_rotation;
|
||||
DROP VIEW IF EXISTS ja4_processing.view_form_bruteforce_detected;
|
||||
|
||||
-- 10 : Index de performance (pas de rollback nécessaire — idempotent)
|
||||
|
||||
-- 09 : Table d'audit
|
||||
DROP TABLE IF EXISTS ja4_processing.audit_logs;
|
||||
|
||||
-- 08: Users
|
||||
-- 08 : Utilisateurs
|
||||
DROP USER IF EXISTS data_writer;
|
||||
DROP USER IF EXISTS analyst;
|
||||
|
||||
-- 07: AI Features View
|
||||
-- 07 : Vue de features AI
|
||||
DROP VIEW IF EXISTS ja4_processing.view_ai_features_1h;
|
||||
|
||||
-- 06: ML Tables
|
||||
-- 06 : Tables ML
|
||||
DROP VIEW IF EXISTS ja4_processing.view_ip_recurrence;
|
||||
DROP TABLE IF EXISTS ja4_processing.ml_all_scores;
|
||||
DROP TABLE IF EXISTS ja4_processing.ml_detected_anomalies;
|
||||
|
||||
-- 05: Aggregation
|
||||
-- 05 : Agrégations + dictionnaires de réputation
|
||||
DROP VIEW IF EXISTS ja4_processing.mv_agg_header_fingerprint_1h;
|
||||
DROP VIEW IF EXISTS ja4_processing.mv_agg_host_ip_ja4_1h;
|
||||
DROP TABLE IF EXISTS ja4_processing.agg_header_fingerprint_1h;
|
||||
DROP TABLE IF EXISTS ja4_processing.agg_host_ip_ja4_1h;
|
||||
DROP DICTIONARY IF EXISTS ja4_processing.dict_asn_reputation;
|
||||
DROP DICTIONARY IF EXISTS ja4_processing.dict_browser_ja4;
|
||||
DROP DICTIONARY IF EXISTS ja4_processing.dict_bot_ja4;
|
||||
DROP DICTIONARY IF EXISTS ja4_processing.dict_bot_ip;
|
||||
|
||||
-- 04: MV + http_logs
|
||||
-- 04 : MV + http_logs
|
||||
DROP VIEW IF EXISTS ja4_logs.mv_http_logs;
|
||||
DROP TABLE IF EXISTS ja4_logs.http_logs;
|
||||
|
||||
-- 03: Anubis
|
||||
DROP DICTIONARY IF EXISTS ja4_processing.dict_anubis_country;
|
||||
-- 03 : Anubis
|
||||
DROP DICTIONARY IF EXISTS ja4_processing.dict_anubis_asn;
|
||||
DROP DICTIONARY IF EXISTS ja4_processing.dict_anubis_ip;
|
||||
DROP DICTIONARY IF EXISTS ja4_processing.dict_anubis_ua;
|
||||
DROP TABLE IF EXISTS ja4_processing.anubis_country_rules;
|
||||
DROP TABLE IF EXISTS ja4_processing.anubis_asn_rules;
|
||||
DROP TABLE IF EXISTS ja4_processing.anubis_ip_rules;
|
||||
DROP TABLE IF EXISTS ja4_processing.anubis_ua_rules;
|
||||
|
||||
-- 02: Dictionaries
|
||||
-- 02 : Dictionnaire ASN + tables de référence
|
||||
DROP DICTIONARY IF EXISTS ja4_processing.dict_iplocate_asn;
|
||||
DROP TABLE IF EXISTS ja4_processing.bot_ja4;
|
||||
DROP TABLE IF EXISTS ja4_processing.bot_ip;
|
||||
DROP TABLE IF EXISTS ja4_processing.ref_bot_networks;
|
||||
|
||||
-- 01: Raw Tables
|
||||
-- 01 : Table brute
|
||||
DROP TABLE IF EXISTS ja4_logs.http_logs_raw;
|
||||
|
||||
-- 00: Database
|
||||
-- 00 : Bases de données
|
||||
DROP DATABASE IF EXISTS ja4_processing;
|
||||
DROP DATABASE IF EXISTS ja4_logs;
|
||||
```
|
||||
|
||||
### Important Notes
|
||||
### Notes importantes
|
||||
|
||||
- **Data loss**: Dropping tables destroys all data. Always back up before rollback.
|
||||
- **MV dependency**: Materialized views must be dropped before their target tables.
|
||||
- **Dictionary dependency**: Views/MVs using dictionaries will fail if dictionaries are dropped while they still reference them.
|
||||
- **Idempotent re-apply**: After rollback, migrations can be safely re-applied since they use `IF NOT EXISTS`.
|
||||
- **`04_mv_http_logs.sql`** is the canonical version of the MV, superseding any base version in `services/correlator/sql/init.sql`.
|
||||
- **Perte de données** : la suppression d'une table détruit toutes ses données.
|
||||
Toujours sauvegarder avant un rollback.
|
||||
- **Dépendance MV** : les vues matérialisées doivent être supprimées **avant**
|
||||
leur table cible.
|
||||
- **Dépendance dictionnaire** : les vues/MVs utilisant `dictGet()` échoueront
|
||||
si le dictionnaire référencé est supprimé.
|
||||
- **Ré-application idempotente** : après un rollback, les migrations peuvent être
|
||||
ré-appliquées sans risque grâce aux clauses `IF NOT EXISTS`.
|
||||
- **`04_mv_http_logs.sql`** est la version canonique de la vue matérialisée,
|
||||
remplaçant toute version antérieure dans `services/correlator/sql/init.sql`.
|
||||
|
||||
---
|
||||
|
||||
## Migrations post-déploiement
|
||||
|
||||
Le répertoire `services/correlator/sql/migrations/` contient des instructions
|
||||
`ALTER TABLE` pour les déploiements existants. Les appliquer manuellement :
|
||||
|
||||
```bash
|
||||
clickhouse-client --multiquery < services/correlator/sql/migrations/<fichier>.sql
|
||||
```
|
||||
|
||||
Ces migrations sont distinctes du schéma de base et ne sont nécessaires que
|
||||
pour mettre à jour des instances déjà en production.
|
||||
|
||||
Reference in New Issue
Block a user