# Database Migrations The ClickHouse schema for ja4-platform is managed through numbered SQL migration files in `shared/clickhouse/`. Migrations are idempotent (using `IF NOT EXISTS` / `IF EXISTS`) and must be applied in numeric order. ## Migration Order | File | Purpose | |------|---------| | `00_database.sql` | Creates the `mabase_prod` database | | `01_raw_tables.sql` | Creates `http_logs_raw` ingest table (MergeTree, 1-day TTL) | | `02_dictionaries.sql` | Creates ASN geo dictionary (`dict_iplocate_asn`), bot IP/JA4 reference tables, `ref_bot_networks` | | `03_anubis_tables.sql` | Creates Anubis crawler rule tables (`anubis_ua_rules`, `anubis_ip_rules`, `anubis_asn_rules`, `anubis_country_rules`) and their dictionaries (`dict_anubis_ua`, `dict_anubis_ip`, `dict_anubis_asn`, `dict_anubis_country`) | | `04_mv_http_logs.sql` | Creates the canonical `http_logs` table and `mv_http_logs` materialized view with full Anubis enrichment | | `05_aggregation_tables.sql` | Creates reputation dictionaries (`dict_bot_ip`, `dict_bot_ja4`, `dict_asn_reputation`), behavioral aggregation tables (`agg_host_ip_ja4_1h`, `agg_header_fingerprint_1h`), and their materialized views | | `06_ml_tables.sql` | Creates ML output tables (`ml_detected_anomalies`, `ml_all_scores`) and `view_ip_recurrence` | | `07_ai_features_view.sql` | Creates `view_ai_features_1h` — the 50+ feature view used by bot-detector | | `08_users.sql` | Creates ClickHouse users (`data_writer`, `analyst`) and grants permissions | | `09_audit_table.sql` | Creates `audit_logs` table for SOC dashboard audit trail | ## Prerequisites ### 1. ClickHouse Server A running ClickHouse server (version 23.8+ recommended for `REGEXP_TREE` dictionary support). ### 2. CSV Data Files Place the following files in `/var/lib/clickhouse/user_files/`: | File | Source | Description | |------|--------|-------------| | `iplocate-ip-to-asn.csv` | [IPLocate](https://iplocate.io) | IP-to-ASN mapping with country, org, domain | | `bot_ip.csv` | Custom | Known bot IP prefixes (CIDR format) | | `bot_ja4.csv` | Custom | Known bot JA4 fingerprints | | `asn_reputation.csv` | Custom | ASN reputation labels (`human`, `bot`, `unknown`) | ### 3. Anubis Passwords Migration `03_anubis_tables.sql` contains placeholder passwords (`CHANGE_ME`) for the Anubis dictionaries. Replace these with the actual ClickHouse admin password before applying: ```bash sed -i "s/CHANGE_ME/your_actual_password/g" 03_anubis_tables.sql ``` ## How to Apply ### Full Initial Setup Apply all migrations in order: ```bash cd shared/clickhouse/ clickhouse-client --multiquery < 00_database.sql clickhouse-client --multiquery < 01_raw_tables.sql clickhouse-client --multiquery < 02_dictionaries.sql clickhouse-client --multiquery < 03_anubis_tables.sql clickhouse-client --multiquery < 04_mv_http_logs.sql clickhouse-client --multiquery < 05_aggregation_tables.sql clickhouse-client --multiquery < 06_ml_tables.sql clickhouse-client --multiquery < 07_ai_features_view.sql clickhouse-client --multiquery < 08_users.sql clickhouse-client --multiquery < 09_audit_table.sql ``` ### With Authentication ```bash clickhouse-client --user admin --password 'your_password' --multiquery < 00_database.sql # ... repeat for each file ``` ### One-Liner (All at Once) ```bash cd shared/clickhouse/ for f in 0*.sql; do echo "Applying $f..." clickhouse-client --multiquery < "$f" done ``` ## How to Verify After applying all migrations, run these queries to verify each migration was successful: ### 00 — Database ```sql SHOW DATABASES LIKE 'mabase_prod'; -- Expected: mabase_prod ``` ### 01 — Raw Tables ```sql EXISTS mabase_prod.http_logs_raw; -- Expected: 1 ``` ### 02 — Dictionaries ```sql SELECT dictGetOrDefault('mabase_prod.dict_iplocate_asn', 'country_code', toIPv6(toIPv4('8.8.8.8')), 'MISSING'); -- Expected: US (if CSV loaded) or MISSING ``` ### 03 — Anubis Tables ```sql EXISTS mabase_prod.anubis_ua_rules; EXISTS mabase_prod.anubis_ip_rules; EXISTS mabase_prod.anubis_asn_rules; EXISTS mabase_prod.anubis_country_rules; -- Expected: 1 for each ``` ### 04 — MV + http_logs ```sql EXISTS mabase_prod.http_logs; SELECT name FROM system.tables WHERE database = 'mabase_prod' AND name = 'mv_http_logs'; -- Expected: mv_http_logs ``` ### 05 — Aggregation Tables ```sql EXISTS mabase_prod.agg_host_ip_ja4_1h; EXISTS mabase_prod.agg_header_fingerprint_1h; SELECT name FROM system.dictionaries WHERE database = 'mabase_prod' AND name = 'dict_bot_ip'; -- Expected: dict_bot_ip ``` ### 06 — ML Tables ```sql EXISTS mabase_prod.ml_detected_anomalies; EXISTS mabase_prod.ml_all_scores; SELECT name FROM system.tables WHERE database = 'mabase_prod' AND name LIKE 'view_ip%'; -- Expected: view_ip_recurrence ``` ### 07 — AI Features View ```sql SELECT name FROM system.tables WHERE database = 'mabase_prod' AND name = 'view_ai_features_1h'; -- Expected: view_ai_features_1h ``` ### 08 — Users ```sql SHOW GRANTS FOR data_writer; -- Expected: GRANT INSERT, SELECT ON mabase_prod.http_logs_raw TO data_writer SHOW GRANTS FOR analyst; -- Expected: GRANT SELECT ON multiple tables ``` ### 09 — Audit Table ```sql EXISTS mabase_prod.audit_logs; -- Expected: 1 ``` ### Full Verification Query ```sql SELECT count() AS total_tables FROM system.tables WHERE database = 'mabase_prod' AND name IN ( 'http_logs_raw', 'http_logs', 'agg_host_ip_ja4_1h', 'agg_header_fingerprint_1h', 'ml_detected_anomalies', 'ml_all_scores', 'ref_bot_networks', 'anubis_ua_rules', 'anubis_ip_rules', 'anubis_asn_rules', 'anubis_country_rules', 'audit_logs', 'bot_ip', 'bot_ja4' ); -- Expected: 14 ``` ## Rollback Notes ### General Approach ClickHouse does not support transactional DDL. To roll back a migration: 1. **Tables**: `DROP TABLE IF EXISTS mabase_prod.` 2. **Materialized Views**: `DROP VIEW IF EXISTS mabase_prod.` (drop MV before its target table) 3. **Dictionaries**: `DROP DICTIONARY IF EXISTS mabase_prod.` 4. **Views**: `DROP VIEW IF EXISTS mabase_prod.` 5. **Users**: `DROP USER IF EXISTS ` ### Rollback Order (Reverse of Apply) ```sql -- 09: Audit DROP TABLE IF EXISTS mabase_prod.audit_logs; -- 08: Users DROP USER IF EXISTS data_writer; DROP USER IF EXISTS analyst; -- 07: AI Features View DROP VIEW IF EXISTS mabase_prod.view_ai_features_1h; -- 06: ML Tables DROP VIEW IF EXISTS mabase_prod.view_ip_recurrence; DROP TABLE IF EXISTS mabase_prod.ml_all_scores; DROP TABLE IF EXISTS mabase_prod.ml_detected_anomalies; -- 05: Aggregation DROP VIEW IF EXISTS mabase_prod.mv_agg_header_fingerprint_1h; DROP VIEW IF EXISTS mabase_prod.mv_agg_host_ip_ja4_1h; DROP TABLE IF EXISTS mabase_prod.agg_header_fingerprint_1h; DROP TABLE IF EXISTS mabase_prod.agg_host_ip_ja4_1h; DROP DICTIONARY IF EXISTS mabase_prod.dict_asn_reputation; DROP DICTIONARY IF EXISTS mabase_prod.dict_bot_ja4; DROP DICTIONARY IF EXISTS mabase_prod.dict_bot_ip; -- 04: MV + http_logs DROP VIEW IF EXISTS mabase_prod.mv_http_logs; DROP TABLE IF EXISTS mabase_prod.http_logs; -- 03: Anubis DROP DICTIONARY IF EXISTS mabase_prod.dict_anubis_country; DROP DICTIONARY IF EXISTS mabase_prod.dict_anubis_asn; DROP DICTIONARY IF EXISTS mabase_prod.dict_anubis_ip; DROP DICTIONARY IF EXISTS mabase_prod.dict_anubis_ua; DROP TABLE IF EXISTS mabase_prod.anubis_country_rules; DROP TABLE IF EXISTS mabase_prod.anubis_asn_rules; DROP TABLE IF EXISTS mabase_prod.anubis_ip_rules; DROP TABLE IF EXISTS mabase_prod.anubis_ua_rules; -- 02: Dictionaries DROP DICTIONARY IF EXISTS mabase_prod.dict_iplocate_asn; DROP TABLE IF EXISTS mabase_prod.bot_ja4; DROP TABLE IF EXISTS mabase_prod.bot_ip; DROP TABLE IF EXISTS mabase_prod.ref_bot_networks; -- 01: Raw Tables DROP TABLE IF EXISTS mabase_prod.http_logs_raw; -- 00: Database DROP DATABASE IF EXISTS mabase_prod; ``` ### Important Notes - **Data loss**: Dropping tables destroys all data. Always back up before rollback. - **MV dependency**: Materialized views must be dropped before their target tables. - **Dictionary dependency**: Views/MVs using dictionaries will fail if dictionaries are dropped while they still reference them. - **Idempotent re-apply**: After rollback, migrations can be safely re-applied since they use `IF NOT EXISTS`. - **`04_mv_http_logs.sql`** is the canonical version of the MV, superseding any base version in `services/correlator/sql/init.sql`.