- feat: new config directive include_dest_ports ([]int) in correlation section - feat: if non-empty, only events with a matching dst_port are correlated - feat: filtered events are silently ignored (not correlated, not emitted as orphan) - feat: new metric failed_dest_port_filtered tracked in ProcessEvent - feat: DEBUG log 'event excluded by dest port filter: source=A dst_port=22' - test: TestCorrelationService_IncludeDestPorts_AllowedPort - test: TestCorrelationService_IncludeDestPorts_FilteredPort - test: TestCorrelationService_IncludeDestPorts_EmptyAllowsAll - docs(readme): full rewrite to match current code (v1.1.12) - docs(readme): add include_dest_ports section, fix version refs, clean outdated sections - docs(arch): add dest_port_filtering section, failed_dest_port_filtered metric, debug log example - fix(config.example): remove obsolete stdout.level field - chore: bump version to 1.1.12 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
16 KiB
logcorrelator
Service de corrélation de logs HTTP et réseau écrit en Go.
Description
logcorrelator reçoit deux flux de logs JSON via des sockets Unix datagrammes (SOCK_DGRAM) :
- Source A : logs HTTP applicatifs (Apache, reverse proxy)
- Source B : logs réseau (métadonnées IP/TCP, JA3/JA4, etc.)
Il corrèle les événements sur la base de src_ip + src_port dans une fenêtre temporelle configurable, et produit des logs corrélés vers :
- Un fichier local (JSON lines)
- ClickHouse (pour analyse et archivage)
Les logs opérationnels du service (démarrage, erreurs, métriques) sont écrits sur stderr et collectés par journald. Aucune donnée corrélée n'apparaît sur stdout.
Architecture
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Source A │────▶│ │────▶│ File Sink │
│ HTTP/Apache │ │ Correlation │ │ (JSON lines) │
│ (Unix DGRAM) │ │ Service │ └─────────────────┘
└─────────────────┘ │ │
│ - Buffers │ ┌─────────────────┐
┌─────────────────┐ │ - Time Window │────▶│ ClickHouse │
│ Source B │────▶│ - Orphan Policy │ │ Sink │
│ Réseau/JA4 │ │ - Keep-Alive │ └─────────────────┘
│ (Unix DGRAM) │ └──────────────────┘
└─────────────────┘
Architecture hexagonale : domaine pur (internal/domain), ports abstraits (internal/ports), adaptateurs (internal/adapters), orchestration (internal/app).
Build (100% Docker)
Tout le build, les tests et le packaging RPM s'exécutent dans des conteneurs :
# Build complet avec tests (builder stage)
make docker-build-dev
# Packaging RPM (el8, el9, el10)
make package-rpm
# Build rapide sans tests
make docker-build-dev-no-test
# Tests en local (nécessite Go 1.21+)
make test
Prérequis
- Docker 20.10+
Installation
Packages RPM
# Générer les packages
make package-rpm
# Installer (Rocky Linux / AlmaLinux)
sudo dnf install -y dist/rpm/el8/logcorrelator-1.1.12-1.el8.x86_64.rpm
sudo dnf install -y dist/rpm/el9/logcorrelator-1.1.12-1.el9.x86_64.rpm
sudo dnf install -y dist/rpm/el10/logcorrelator-1.1.12-1.el10.x86_64.rpm
# Démarrer
sudo systemctl enable --now logcorrelator
sudo systemctl status logcorrelator
Build manuel
# Binaire local (nécessite Go 1.21+)
go build -o logcorrelator ./cmd/logcorrelator
./logcorrelator -config config.example.yml
Configuration
Fichier YAML. Voir config.example.yml pour un exemple complet.
log:
level: INFO # DEBUG, INFO, WARN, ERROR
inputs:
unix_sockets:
- name: http
source_type: A # Source HTTP
path: /var/run/logcorrelator/http.socket
format: json
socket_permissions: "0666"
- name: network
source_type: B # Source réseau
path: /var/run/logcorrelator/network.socket
format: json
socket_permissions: "0666"
outputs:
file:
path: /var/log/logcorrelator/correlated.log
clickhouse:
enabled: false
dsn: clickhouse://user:pass@localhost:9000/db
table: http_logs_raw
batch_size: 500
flush_interval_ms: 200
max_buffer_size: 5000
drop_on_overflow: true
timeout_ms: 1000
stdout:
enabled: false # no-op pour les données ; logs opérationnels toujours sur stderr
correlation:
time_window:
value: 10
unit: s
orphan_policy:
apache_always_emit: true
apache_emit_delay_ms: 500 # délai avant émission orphelin A (ms)
network_emit: false
matching:
mode: one_to_many # Keep-Alive : un B peut corréler plusieurs A successifs
buffers:
max_http_items: 10000
max_network_items: 20000
ttl:
network_ttl_s: 120 # TTL remis à zéro à chaque corrélation (Keep-Alive)
# Exclure des IPs source (IPs uniques ou plages CIDR)
exclude_source_ips:
- 10.0.0.1
- 172.16.0.0/12
# Restreindre la corrélation à certains ports de destination (optionnel)
# Si la liste est vide, tous les ports sont corrélés
include_dest_ports:
- 80
- 443
metrics:
enabled: false
addr: ":8080"
Format du DSN ClickHouse
clickhouse://username:password@host:port/database
Ports : 9000 (natif, recommandé) ou 8123 (HTTP).
Format des logs
Source A (HTTP)
{
"src_ip": "192.168.1.1", "src_port": 8080,
"dst_ip": "10.0.0.1", "dst_port": 443,
"timestamp": 1704110400000000000,
"method": "GET", "path": "/api/test"
}
Source B (Réseau)
{
"src_ip": "192.168.1.1", "src_port": 8080,
"dst_ip": "10.0.0.1", "dst_port": 443,
"ja3": "abc123", "ja4": "xyz789"
}
Log corrélé (sortie)
Structure JSON plate — tous les champs A et B sont fusionnés à la racine :
{
"timestamp": "2024-01-01T12:00:00Z",
"src_ip": "192.168.1.1", "src_port": 8080,
"dst_ip": "10.0.0.1", "dst_port": 443,
"correlated": true,
"method": "GET", "path": "/api/test",
"ja3": "abc123", "ja4": "xyz789"
}
En cas de collision de champ entre A et B, les deux valeurs sont conservées avec préfixes a_ et b_.
Les orphelins A (sans B correspondant) sont émis avec "correlated": false, "orphan_side": "A".
Schema ClickHouse
Setup complet
-- Base de données
CREATE DATABASE IF NOT EXISTS mabase_prod;
-- Table brute (cible des inserts du service)
CREATE TABLE mabase_prod.http_logs_raw
(
raw_json String,
ingest_time DateTime DEFAULT now()
)
ENGINE = MergeTree
PARTITION BY toDate(ingest_time)
ORDER BY ingest_time;
-- Table parsée
CREATE TABLE mabase_prod.http_logs
(
time DateTime,
log_date Date DEFAULT toDate(time),
src_ip IPv4, src_port UInt16,
dst_ip IPv4, dst_port UInt16,
method LowCardinality(String),
scheme LowCardinality(String),
host LowCardinality(String),
path String, query String,
http_version LowCardinality(String),
orphan_side LowCardinality(String),
correlated UInt8,
tls_version LowCardinality(String),
tls_sni LowCardinality(String),
ja3 String, ja3_hash String, ja4 String,
header_user_agent String, header_accept String,
header_accept_encoding String, header_accept_language String,
header_x_forwarded_for String,
header_sec_ch_ua String, header_sec_ch_ua_mobile String,
header_sec_ch_ua_platform String,
header_sec_fetch_dest String, header_sec_fetch_mode String, header_sec_fetch_site String,
ip_meta_ttl UInt8, ip_meta_df UInt8,
tcp_meta_window_size UInt32, tcp_meta_options LowCardinality(String),
syn_to_clienthello_ms Int32
)
ENGINE = MergeTree
PARTITION BY log_date
ORDER BY (time, src_ip, dst_ip, ja4);
-- Vue matérialisée RAW → http_logs
CREATE MATERIALIZED VIEW mabase_prod.mv_http_logs
TO mabase_prod.http_logs AS
SELECT
parseDateTimeBestEffort(coalesce(JSONExtractString(raw_json,'time'),'1970-01-01T00:00:00Z')) AS time,
toDate(time) AS log_date,
toIPv4(coalesce(JSONExtractString(raw_json,'src_ip'),'0.0.0.0')) AS src_ip,
toUInt16(JSONExtractUInt(raw_json,'src_port')) AS src_port,
toIPv4(coalesce(JSONExtractString(raw_json,'dst_ip'),'0.0.0.0')) AS dst_ip,
toUInt16(JSONExtractUInt(raw_json,'dst_port')) AS dst_port,
coalesce(JSONExtractString(raw_json,'method'),'') AS method,
coalesce(JSONExtractString(raw_json,'scheme'),'') AS scheme,
coalesce(JSONExtractString(raw_json,'host'),'') AS host,
coalesce(JSONExtractString(raw_json,'path'),'') AS path,
coalesce(JSONExtractString(raw_json,'query'),'') AS query,
coalesce(JSONExtractString(raw_json,'http_version'),'') AS http_version,
coalesce(JSONExtractString(raw_json,'orphan_side'),'') AS orphan_side,
toUInt8(JSONExtractBool(raw_json,'correlated')) AS correlated,
coalesce(JSONExtractString(raw_json,'tls_version'),'') AS tls_version,
coalesce(JSONExtractString(raw_json,'tls_sni'),'') AS tls_sni,
coalesce(JSONExtractString(raw_json,'ja3'),'') AS ja3,
coalesce(JSONExtractString(raw_json,'ja3_hash'),'') AS ja3_hash,
coalesce(JSONExtractString(raw_json,'ja4'),'') AS ja4,
coalesce(JSONExtractString(raw_json,'header_User-Agent'),'') AS header_user_agent,
coalesce(JSONExtractString(raw_json,'header_Accept'),'') AS header_accept,
coalesce(JSONExtractString(raw_json,'header_Accept-Encoding'),'') AS header_accept_encoding,
coalesce(JSONExtractString(raw_json,'header_Accept-Language'),'') AS header_accept_language,
coalesce(JSONExtractString(raw_json,'header_X-Forwarded-For'),'') AS header_x_forwarded_for,
coalesce(JSONExtractString(raw_json,'header_Sec-CH-UA'),'') AS header_sec_ch_ua,
coalesce(JSONExtractString(raw_json,'header_Sec-CH-UA-Mobile'),'') AS header_sec_ch_ua_mobile,
coalesce(JSONExtractString(raw_json,'header_Sec-CH-UA-Platform'),'') AS header_sec_ch_ua_platform,
coalesce(JSONExtractString(raw_json,'header_Sec-Fetch-Dest'),'') AS header_sec_fetch_dest,
coalesce(JSONExtractString(raw_json,'header_Sec-Fetch-Mode'),'') AS header_sec_fetch_mode,
coalesce(JSONExtractString(raw_json,'header_Sec-Fetch-Site'),'') AS header_sec_fetch_site,
toUInt8(JSONExtractUInt(raw_json,'ip_meta_ttl')) AS ip_meta_ttl,
toUInt8(JSONExtractBool(raw_json,'ip_meta_df')) AS ip_meta_df,
toUInt32(JSONExtractUInt(raw_json,'tcp_meta_window_size')) AS tcp_meta_window_size,
coalesce(JSONExtractString(raw_json,'tcp_meta_options'),'') AS tcp_meta_options,
toInt32(JSONExtractInt(raw_json,'syn_to_clienthello_ms')) AS syn_to_clienthello_ms
FROM mabase_prod.http_logs_raw;
Utilisateurs et permissions
CREATE USER IF NOT EXISTS data_writer IDENTIFIED WITH plaintext_password BY 'MotDePasse';
CREATE USER IF NOT EXISTS analyst IDENTIFIED WITH plaintext_password BY 'MotDePasseAnalyst';
GRANT INSERT(raw_json) ON mabase_prod.http_logs_raw TO data_writer;
GRANT SELECT(raw_json) ON mabase_prod.http_logs_raw TO data_writer;
GRANT SELECT ON mabase_prod.http_logs TO analyst;
Vérification de l'ingestion
-- Données brutes reçues
SELECT count(*), min(ingest_time), max(ingest_time) FROM http_logs_raw;
-- Données parsées par la vue matérialisée
SELECT count(*), min(time), max(time) FROM http_logs;
-- Derniers logs
SELECT time, src_ip, dst_ip, method, host, path, ja4
FROM http_logs ORDER BY time DESC LIMIT 10;
Signaux
| Signal | Comportement |
|---|---|
SIGINT / SIGTERM |
Arrêt gracieux (drain buffers, flush sinks) |
SIGHUP |
Réouverture des fichiers de sortie (log rotation) |
Logs internes
Les logs opérationnels vont sur stderr :
# Systemd
journalctl -u logcorrelator -f
# Docker
docker logs -f logcorrelator
Structure du projet
cmd/logcorrelator/ # Point d'entrée
internal/
adapters/
inbound/unixsocket/ # Lecture SOCK_DGRAM → NormalizedEvent
outbound/
clickhouse/ # Sink ClickHouse (batch, retry, logging complet)
file/ # Sink fichier (JSON lines, SIGHUP reopen)
multi/ # Fan-out vers plusieurs sinks
stdout/ # No-op pour les données (logs opérationnels sur stderr)
app/ # Orchestrator (sources → corrélation → sinks)
config/ # Chargement/validation YAML
domain/ # CorrelationService, NormalizedEvent, CorrelatedLog
observability/ # Logger, métriques, serveur HTTP /metrics /health
ports/ # Interfaces EventSource, CorrelatedLogSink, CorrelationProcessor
config.example.yml # Exemple de configuration
Dockerfile # Build multi-stage (builder, runtime, dev)
Dockerfile.package # Packaging RPM multi-distros (el8, el9, el10)
Makefile # Cibles de build
architecture.yml # Spécification architecture
logcorrelator.service # Unité systemd
Débogage
Logs DEBUG
log:
level: DEBUG
Exemples de logs produits :
[unixsocket:http] DEBUG event received: source=A src_ip=192.168.1.1 src_port=8080
[correlation] DEBUG processing A event: key=192.168.1.1:8080
[correlation] DEBUG correlation found: A(src_ip=... src_port=... ts=...) + B(...)
[correlation] DEBUG A event has no matching B key in buffer: key=...
[correlation] DEBUG event excluded by IP filter: source=A src_ip=10.0.0.1 src_port=8080
[correlation] DEBUG event excluded by dest port filter: source=A dst_port=22
[correlation] DEBUG TTL reset for B event (Keep-Alive): key=... new_ttl=120s
[clickhouse] DEBUG batch sent: rows=42 table=http_logs_raw
Serveur de métriques
metrics:
enabled: true
addr: ":8080"
GET /health → {"status":"healthy"}
GET /metrics :
{
"events_received_a": 1542, "events_received_b": 1498,
"correlations_success": 1450, "correlations_failed": 92,
"failed_no_match_key": 45, "failed_time_window": 23,
"failed_buffer_eviction": 5, "failed_ttl_expired": 12,
"failed_ip_excluded": 7, "failed_dest_port_filtered": 3,
"buffer_a_size": 23, "buffer_b_size": 18,
"orphans_emitted_a": 92, "orphans_pending_a": 4,
"keepalive_resets": 892
}
Diagnostic par métriques
| Métrique élevée | Cause | Solution |
|---|---|---|
failed_no_match_key |
A et B n'ont pas le même src_ip:src_port |
Vérifier les deux sources |
failed_time_window |
Timestamps trop éloignés | Augmenter time_window.value ou vérifier NTP |
failed_ttl_expired |
B expire avant corrélation | Augmenter ttl.network_ttl_s |
failed_buffer_eviction |
Buffers trop petits | Augmenter buffers.max_http_items / max_network_items |
failed_ip_excluded |
Traffic depuis IPs exclues | Normal si attendu |
failed_dest_port_filtered |
Traffic sur ports non listés | Vérifier include_dest_ports |
orphans_emitted_a élevé |
Beaucoup de A sans B | Vérifier que la source B envoie des événements |
Filtrage par IP source
correlation:
exclude_source_ips:
- 10.0.0.1 # IP unique (health checks)
- 172.16.0.0/12 # Plage CIDR
Les événements depuis ces IPs sont silencieusement ignorés (non corrélés, non émis en orphelin). La métrique failed_ip_excluded comptabilise les exclusions.
Filtrage par port de destination
correlation:
include_dest_ports:
- 80 # HTTP
- 443 # HTTPS
- 8080
- 8443
Si la liste est non vide, seuls les événements dont le dst_port est dans la liste participent à la corrélation. Les autres sont silencieusement ignorés. Liste vide = tous les ports corrélés (comportement par défaut). La métrique failed_dest_port_filtered comptabilise les exclusions.
Scripts de test
# Script Bash (simple)
./scripts/test-correlation.sh -c 10 -v
# Script Python (scénarios complets : basic, time window, keepalive, différentes IPs)
pip install requests
python3 scripts/test-correlation-advanced.py --all
Troubleshooting
ClickHouse : erreurs d'insertion
No such column: vérifier que la table utilise la colonne uniqueraw_json(pas de colonnes séparées)ACCESS_DENIED:GRANT INSERT(raw_json) ON mabase_prod.http_logs_raw TO data_writer;- Les erreurs de flush sont loggées en ERROR dans les logs du service
Vue matérialisée vide
Si http_logs_raw a des données mais http_logs est vide :
SHOW CREATE TABLE mv_http_logs;
GRANT SELECT(raw_json) ON mabase_prod.http_logs_raw TO data_writer;
Sockets Unix : permission denied
Vérifier que socket_permissions: "0666" est configuré et que le répertoire /var/run/logcorrelator appartient à l'utilisateur logcorrelator.
Service systemd ne démarre pas
journalctl -u logcorrelator -n 50 --no-pager
/usr/bin/logcorrelator -config /etc/logcorrelator/logcorrelator.yml
License
MIT