Bug de production : view_form_bruteforce_detected, view_host_ip_ja4_rotation,
view_dashboard_entities, view_dashboard_user_agents étaient référencées dans
13 endpoints du dashboard mais n'existaient nulle part dans le schéma.
Tous ces endpoints retournaient HTTP 500 en production.
shared/clickhouse/11_views.sql (nouveau) :
view_form_bruteforce_detected
Source : agg_host_ip_ja4_1h (24h)
Logique : GROUP BY (src_ip, host) HAVING count_post >= 10
Usage : bruteforce.py (3 endpoints), investigation_summary.py
view_host_ip_ja4_rotation
Source : agg_host_ip_ja4_1h (24h)
Logique : uniqExact(ja4) par src_ip, HAVING >= 2 (rotation de fingerprint)
Usage : rotation.py (3 endpoints), investigation_summary.py
view_dashboard_entities
Source : http_logs (7 jours), UNION ALL 5 branches (ip/ja4/country/asn/host)
Colonnes : entity_type, entity_value, src_ip, ja4, host, log_date,
client_headers Array(String), asns Array, countries Array,
user_agents Array
Usage : entities.py (5 endpoints), clustering.py
view_dashboard_user_agents
Source : http_logs (7 jours), GROUP BY (src_ip, ja4, hour)
Colonnes : src_ip, ja4, hour, log_date, user_agents Array(String), requests
Usage : variability.py (4 endpoints), fingerprints.py (5 endpoints)
attributes.py (2 endpoints)
deploy_schema.sh : ajout de 10_perf_indexes.sql et 11_views.sql dans la liste
routes/variability.py + fingerprints.py :
Correction de 9 requêtes utilisant view_dashboard_user_agents sans préfixe
de base de données → remplacé par {settings.CLICKHOUSE_DB_PROCESSING}.view_*
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ClickHouse Migrations — ja4-platform
Migration Order
Apply these files in numeric order against the ClickHouse server:
clickhouse-client --multiquery < 00_database.sql
clickhouse-client --multiquery < 01_raw_tables.sql
clickhouse-client --multiquery < 02_dictionaries.sql
clickhouse-client --multiquery < 03_anubis_tables.sql
clickhouse-client --multiquery < 04_mv_http_logs.sql
clickhouse-client --multiquery < 05_aggregation_tables.sql
clickhouse-client --multiquery < 06_ml_tables.sql
clickhouse-client --multiquery < 07_ai_features_view.sql
clickhouse-client --multiquery < 08_users.sql
clickhouse-client --multiquery < 09_audit_table.sql
File Descriptions
| File | Contents |
|---|---|
00_database.sql |
CREATE DATABASE |
01_raw_tables.sql |
http_logs_raw ingest table |
02_dictionaries.sql |
ASN geo dict, bot IP/JA4/network reference tables |
03_anubis_tables.sql |
Anubis crawler rule tables and dictionaries (UA, IP, ASN, country) |
04_mv_http_logs.sql |
Canonical http_logs target table + mv_http_logs materialized view with full Anubis enrichment |
05_aggregation_tables.sql |
agg_host_ip_ja4_1h, agg_header_fingerprint_1h + their MVs |
06_ml_tables.sql |
ml_detected_anomalies, ml_all_scores |
07_ai_features_view.sql |
view_ai_features_1h with Anubis enrichment |
08_users.sql |
ClickHouse users and grants |
09_audit_table.sql |
audit_logs table for SOC dashboard audit trail |
Prerequisites
Place CSV data files in /var/lib/clickhouse/user_files/:
iplocate-ip-to-asn.csv— IP-to-ASN mapping (from IPLocate)bot_ip.csv— Known bot IP prefixesbot_ja4.csv— Known bot JA4 fingerprintsasn_reputation.csv— ASN reputation labels
Notes
04_mv_http_logs.sqlis the canonical version of the MV, superseding the base version inservices/correlator/sql/init.sql. It includes full Anubis enrichment.- All migrations are idempotent (use
IF NOT EXISTS/IF EXISTS). - Anubis dictionary passwords in
03_anubis_tables.sqlmust be changed before production use.