Files
ja4-platform/services/correlator/config.example.yml
toto 9f3e0621e5 feat: split ClickHouse into dual configurable databases (ja4_logs / ja4_processing)
Architecture:
- ja4_logs: raw log ingestion (http_logs_raw, http_logs, mv_http_logs)
- ja4_processing: analytics, aggregation, ML, dictionaries, audit

Configuration (env vars):
- CLICKHOUSE_DB_LOGS (default: ja4_logs)
- CLICKHOUSE_DB_PROCESSING (default: ja4_processing)

Changes:
- SQL migrations (10 files): all mabase_prod refs → ja4_logs or ja4_processing
  with correct cross-database references (MVs, views, dicts)
- deploy_schema.sh: substitutes DB names from env vars at deploy time
- Python shared settings: added CLICKHOUSE_DB_LOGS + CLICKHOUSE_DB_PROCESSING
- Dashboard routes (19 files): replaced ~80 hardcoded mabase_prod refs
  with settings.CLICKHOUSE_DB_LOGS / settings.CLICKHOUSE_DB_PROCESSING
- Bot-detector: DB → CLICKHOUSE_DB_PROCESSING, fetch_rules.py configurable
- Correlator: DSN example updated to ja4_logs
- Docker-compose + .env files: new env vars with defaults
- All documentation updated (14 markdown files)

All tests pass: sentinel 10/10, correlator 67.1%, bot-detector 11, dashboard 20, ja4_common 18

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-07 19:10:35 +02:00

93 lines
2.7 KiB
YAML

# logcorrelator configuration file
# Format: YAML
# Logging configuration
log:
level: INFO # DEBUG, INFO, WARN, ERROR
inputs:
unix_sockets:
- name: http
source_type: A
path: /var/run/logcorrelator/http.socket
format: json
socket_permissions: "0666" # world read/write
- name: network
source_type: B
path: /var/run/logcorrelator/network.socket
format: json
socket_permissions: "0666"
outputs:
file:
enabled: true
path: /var/log/logcorrelator/correlated.log
clickhouse:
enabled: false
dsn: clickhouse://user:pass@localhost:9000/ja4_logs
table: http_logs_raw
batch_size: 500
flush_interval_ms: 200
max_buffer_size: 5000
drop_on_overflow: true
async_insert: true
timeout_ms: 1000
stdout:
enabled: false
correlation:
# Time window for correlation (A and B must be within this window)
# Increased to 10s to support HTTP Keep-Alive scenarios
time_window:
value: 10
unit: s
# Orphan policy: what to do when no match is found
orphan_policy:
apache_always_emit: true # Always emit A events, even without B match
apache_emit_delay_ms: 500 # Wait 500ms before emitting as orphan (allows B to arrive)
network_emit: false # Never emit B events alone
# Matching mode: one_to_one or one_to_many (Keep-Alive)
matching:
mode: one_to_many
# Buffer limits (max events in memory)
buffers:
max_http_items: 10000
max_network_items: 20000
# TTL for network events (source B)
# Increased to 120s to support long-lived HTTP Keep-Alive sessions
ttl:
network_ttl_s: 120
# Exclude specific source IPs or CIDR ranges from correlation
# Events from these IPs will be silently dropped (not correlated, not emitted)
# Useful for excluding health checks, internal traffic, or known bad actors
exclude_source_ips:
- 10.0.0.1 # Single IP
- 192.168.1.100 # Another single IP
- 172.16.0.0/12 # CIDR range (private network)
- 10.10.10.0/24 # Another CIDR range
# Restrict correlation to specific destination ports (optional)
# If non-empty, only events whose dst_port matches one of these values will be correlated
# Events on other ports are silently ignored (not correlated, not emitted as orphans)
# Useful to focus on HTTP/HTTPS traffic only and ignore unrelated connections
# include_dest_ports:
# - 80 # HTTP
# - 443 # HTTPS
# - 8080 # HTTP alt
# - 8443 # HTTPS alt
# Metrics server configuration (optional, for debugging/monitoring)
metrics:
enabled: false
addr: ":8080" # Address to listen on (e.g., ":8080", "localhost:8080")
# Endpoints:
# GET /metrics - Returns correlation metrics as JSON
# GET /health - Health check endpoint