# ClickHouse Migrations — ja4-platform ## Migration Order Apply these files in numeric order against the ClickHouse server: ```bash clickhouse-client --multiquery < 00_database.sql clickhouse-client --multiquery < 01_raw_tables.sql clickhouse-client --multiquery < 02_dictionaries.sql clickhouse-client --multiquery < 03_anubis_tables.sql clickhouse-client --multiquery < 04_mv_http_logs.sql clickhouse-client --multiquery < 05_aggregation_tables.sql clickhouse-client --multiquery < 06_ml_tables.sql clickhouse-client --multiquery < 07_ai_features_view.sql clickhouse-client --multiquery < 08_users.sql clickhouse-client --multiquery < 09_audit_table.sql ``` ## File Descriptions | File | Contents | |------|----------| | `00_database.sql` | CREATE DATABASE | | `01_raw_tables.sql` | `http_logs_raw` ingest table | | `02_dictionaries.sql` | ASN geo dict, bot IP/JA4/network reference tables | | `03_anubis_tables.sql` | Anubis crawler rule tables and dictionaries (UA, IP, ASN, country) | | `04_mv_http_logs.sql` | Canonical `http_logs` target table + `mv_http_logs` materialized view with full Anubis enrichment | | `05_aggregation_tables.sql` | `agg_host_ip_ja4_1h`, `agg_header_fingerprint_1h` + their MVs | | `06_ml_tables.sql` | `ml_detected_anomalies`, `ml_all_scores` | | `07_ai_features_view.sql` | `view_ai_features_1h` with Anubis enrichment | | `08_users.sql` | ClickHouse users and grants | | `09_audit_table.sql` | `audit_logs` table for SOC dashboard audit trail | ## Prerequisites Place CSV data files in `/var/lib/clickhouse/user_files/`: - `iplocate-ip-to-asn.csv` — IP-to-ASN mapping (from IPLocate) - `bot_ip.csv` — Known bot IP prefixes - `bot_ja4.csv` — Known bot JA4 fingerprints - `asn_reputation.csv` — ASN reputation labels ## Notes - `04_mv_http_logs.sql` is the **canonical** version of the MV, superseding the base version in `services/correlator/sql/init.sql`. It includes full Anubis enrichment. - All migrations are idempotent (use `IF NOT EXISTS` / `IF EXISTS`). - Anubis dictionary passwords in `03_anubis_tables.sql` **must** be changed before production use.