Files
ja4-platform/shared/clickhouse
toto 8180f4af04 refactor(anubis): simplify to IP/CIDR + ASN only, remove UA and Country rules
- Remove UA regex extraction (extract_ua_regex, _extract_ua_from_all/any)
- Remove Country rule collection from parse_bot_policies_inline
- Simplify fetch_rules.py: collect_all_rules returns (ip_rules, asn_rules)
- Remove insert_ua_rules and insert_country_rules functions
- reload_dicts now only reloads dict_anubis_ip + dict_anubis_asn
- Simplify CASE blocks in 04_mv_http_logs.sql, 07_ai_features_view.sql,
  view_ai_features_anubis.sql, mv_http_logs.sql: IP > ASN (was 5-level
  UA+IP > UA > IP > ASN > Country cascade)
- Remove dict_anubis_country + dict_anubis_ua from 03_anubis_tables.sql
  (UA table kept as stub for REGEXP_TREE catch-all compatibility)
- Remove anubis_country_rules table from schema
- Remove Anubis UA and Country tabs from dashboard reflists page
- Remove anubis_ua_rules/country_rules from API reflist queries
- deploy_schema.sql simplified from 339 to 122 lines
- 764 lines removed across 9 files

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-09 15:25:33 +02:00
..

ClickHouse Migrations — ja4-platform

Migration Order

Apply these files in numeric order against the ClickHouse server:

clickhouse-client --multiquery < 00_database.sql
clickhouse-client --multiquery < 01_raw_tables.sql
clickhouse-client --multiquery < 02_dictionaries.sql
clickhouse-client --multiquery < 03_anubis_tables.sql
clickhouse-client --multiquery < 04_mv_http_logs.sql
clickhouse-client --multiquery < 05_aggregation_tables.sql
clickhouse-client --multiquery < 06_ml_tables.sql
clickhouse-client --multiquery < 07_ai_features_view.sql
clickhouse-client --multiquery < 08_users.sql
clickhouse-client --multiquery < 09_audit_table.sql

File Descriptions

File Contents
00_database.sql CREATE DATABASE
01_raw_tables.sql http_logs_raw ingest table
02_dictionaries.sql ASN geo dict, bot IP/JA4/network reference tables
03_anubis_tables.sql Anubis crawler rule tables and dictionaries (UA, IP, ASN, country)
04_mv_http_logs.sql Canonical http_logs target table + mv_http_logs materialized view with full Anubis enrichment
05_aggregation_tables.sql agg_host_ip_ja4_1h, agg_header_fingerprint_1h + their MVs
06_ml_tables.sql ml_detected_anomalies, ml_all_scores
07_ai_features_view.sql view_ai_features_1h with Anubis enrichment
08_users.sql ClickHouse users and grants
09_audit_table.sql audit_logs table for SOC dashboard audit trail

Prerequisites

Place CSV data files in /var/lib/clickhouse/user_files/:

  • iplocate-ip-to-asn.csv — IP-to-ASN mapping (from IPLocate)
  • bot_ip.csv — Known bot IP prefixes
  • bot_ja4.csv — Known bot JA4 fingerprints
  • asn_reputation.csv — ASN reputation labels

Notes

  • 04_mv_http_logs.sql is the canonical version of the MV, superseding the base version in services/correlator/sql/init.sql. It includes full Anubis enrichment.
  • All migrations are idempotent (use IF NOT EXISTS / IF EXISTS).
  • Anubis dictionary passwords in 03_anubis_tables.sql must be changed before production use.