cleanNetworkBufferByTTL was deleting pendingOrphans without emitting them,
causing silent data loss when a B event (network connection) expired while
A events were still waiting in the 500ms orphan delay buffer.
Fix: cleanNetworkBufferByTTL now returns []CorrelatedLog for forced orphans;
cleanExpired propagates them; ProcessEvent includes them in returned results.
TestBTTLExpiry_PurgesPendingOrphans extended to assert the orphan is actually
returned in ProcessEvent results (not just removed from internal state).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Bug #1 - processSourceA: utilise bEventHasValidTTL en mode one_to_many
au lieu de eventsMatch qui comparait les timestamps originaux. Apres ~10s
les requetes A devenaient toutes orphelines alors que la session KA etait active.
Bug #4 - checkPendingOrphansForCorrelation: meme correction, cle identique
= meme connexion en one_to_many, pas besoin de comparer les timestamps.
Bug #3 - cleanNetworkBufferByTTL: expiration B => emission immediate
des pending orphans associes (ils ne peuvent plus jamais corréler).
Bug #2 - Orchestrateur: goroutine ticker 250ms appelle EmitPendingOrphans()
pour drainer les orphans independamment du flux d'evenements entrants.
EmitPendingOrphans() expose la methode comme publique thread-safe.
Tests: 4 nouveaux tests de non-regression (un par bug).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Socket Unix / systemd:
- RuntimeDirectory=logcorrelator dans logcorrelator.service : systemd
recrée /run/logcorrelator avec logcorrelator:logcorrelator à chaque
démarrage/restart, éliminant le problème de droits root:root
- Ajout de packaging/rpm/logcorrelator-tmpfiles.conf pour recréer le
répertoire au boot via systemd-tmpfiles (couche de protection boot)
- Retrait de /var/run/logcorrelator du RPM %files et du %post
- Dockerfile.package : copie de logcorrelator-tmpfiles.conf dans SOURCES/
Corrélation — bugs:
- Fix CRITIQUE emitPendingOrphans : corruption de slice lors de l'expiration
simultanée de plusieurs orphelins pour la même clé (aliasing du tableau
sous-jacent, orphelins émis en double et fantômes persistants)
- Fix HAUT rotateOldestA : événement silencieusement perdu même avec
ApacheAlwaysEmit=true ; retourne désormais *CorrelatedLog propagé dans
ProcessEvent
- Fix MOYEN processSourceB (pending orphan path) : en mode one_to_many, le
B event n'était pas bufferisé après corrélation avec un pending orphan A,
cassant le Keep-Alive pour les requêtes A2+ sur la même connexion
- Fix BAS : suppression du champ mort timer *time.Timer dans pendingOrphan
Corrélation — observabilité:
- Ajout keepalive_seq (1-based) dans NormalizedEvent : numéro de requête
dans la connexion Keep-Alive, incrémenté par processSourceA
- Tous les logs orphelins incluent désormais keepalive_seq=N
- keepAliveSeqA nettoyé automatiquement à l'expiration du TTL B
Tests: 4 nouveaux tests de non-régression (32 tests au total)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- feat: new config directive include_dest_ports ([]int) in correlation section
- feat: if non-empty, only events with a matching dst_port are correlated
- feat: filtered events are silently ignored (not correlated, not emitted as orphan)
- feat: new metric failed_dest_port_filtered tracked in ProcessEvent
- feat: DEBUG log 'event excluded by dest port filter: source=A dst_port=22'
- test: TestCorrelationService_IncludeDestPorts_AllowedPort
- test: TestCorrelationService_IncludeDestPorts_FilteredPort
- test: TestCorrelationService_IncludeDestPorts_EmptyAllowsAll
- docs(readme): full rewrite to match current code (v1.1.12)
- docs(readme): add include_dest_ports section, fix version refs, clean outdated sections
- docs(arch): add dest_port_filtering section, failed_dest_port_filtered metric, debug log example
- fix(config.example): remove obsolete stdout.level field
- chore: bump version to 1.1.12
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- feat(observability): metrics server with /metrics and /health endpoints
- feat(observability): correlation metrics (events, success/failed, reasons, buffers)
- feat(correlation): IP exclusion filter (exact IPs and CIDR ranges)
- feat(correlation): pending orphan delay for late-arriving B events
- fix(stdout): sink is now a no-op for data; JSON must never appear on stdout
- fix(clickhouse): all flush errors were silently discarded, now properly logged
- fix(clickhouse): buffer overflow with DropOnOverflow now logged at WARN
- fix(clickhouse): retry attempts logged at WARN with attempt/delay/error context
- feat(clickhouse): connection success logged at INFO, batch sends at DEBUG
- feat(clickhouse): SetLogger() for external logger injection
- test(stdout): assert stdout remains empty for correlated and orphan logs
- chore(rpm): bump version to 1.1.11, update changelog
- docs: README and architecture.yml updated
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three critical bugs fixed in correlation service:
Bug 1: Premature A event cleanup (CRITICAL)
- cleanExpired() was using system time instead of B event TTL
- A events now only removed when no valid B exists AND A age > TimeWindow
- New cleanBufferAByBTTL() method respects B event TTL
Bug 2: Flush emitting all A as orphans without correlation attempt
- Flush() now tries to correlate remaining A with remaining B first
- Only emits A as orphan if no matching B found
- Preserves correlation during shutdown
Bug 3: Buffer full causing immediate orphan emission
- Implemented FIFO rotation instead of immediate emission
- Oldest A event removed when buffer full, new event buffered
- New rotateOldestA() and rotateOldestB() helper methods
New tests added:
- TestCorrelationService_ALateThanB_WithinTimeWindow
- TestCorrelationService_ALateThanB_AExpiredTooSoon
- TestCorrelationService_Flush_CorrelatesRemainingEvents
- TestCorrelationService_BufferFull_RotatesOldestA
- TestCorrelationService_CleanA_RespectsBTTL
All 24 tests pass.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Fix cleanExpired() to use TTL map instead of event timestamp for B events
- Increase default correlation time window from 1s to 10s
- Increase default network TTL from 30s to 120s for long sessions
- Use payload timestamp for network events when available (fallback to now)
- Add comprehensive Keep-Alive tests (TTL reset, long session scenarios)
- Bump version to 1.1.7
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Table schema has only one column: raw_json (String)
- Serialize entire CorrelatedLog as JSON string
- Use INSERT INTO table (raw_json) with single Append() argument
- Fix "No such column timestamp" errors
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Create /var/run/logcorrelator/ if missing before binding sockets
- Fixes issue with tmpfs /var/run being cleared on reboot
- Add filepath import for directory handling
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Add raw JSON payload to parse error warnings
- Helps diagnose malformed JSON from senders
- Version bumped to 1.1.4
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Install logcorrelator.yml.example to /etc/logcorrelator/ instead of /usr/share/logcorrelator/
- Change default socket permissions from 0660 to 0666 (world read/write)
- Bump version to 1.1.2
- Remove CHANGELOG.md
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Major features:
- One-to-many correlation mode (Keep-Alive) for HTTP connections
- Dynamic TTL for network events with reset on each correlation
- Separate configurable buffer sizes for HTTP and network events
- SIGHUP signal handling for log rotation without service restart
- FileSink.Reopen() method for log file rotation
- logrotate configuration included in RPM
- ExecReload added to systemd service
Configuration changes:
- New YAML structure with nested sections (time_window, orphan_policy, matching, buffers, ttl)
- Backward compatibility maintained for deprecated fields
Packaging:
- RPM version 1.1.0 with logrotate config
- Updated spec file and changelog
- All distributions: el8, el9, el10
Tests:
- New tests for Keep-Alive mode and TTL reset
- Updated mocks with Reopen() interface method
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Remove CorrelationKeyFull() alias, use CorrelationKey() everywhere
- Remove duplicate TimeProvider interface from ports/source.go
- Remove unused time import from ports/source.go
- Update README.md: replace ./build.sh and ./test.sh with make commands
- Update RPM package names in README to match current version (1.0.3)
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Create cmd/logcorrelator/main.go as the application entry point
- Loads configuration from YAML file
- Initializes Unix socket sources, file/ClickHouse/stdout sinks
- Sets up correlation service and orchestrator
- Handles graceful shutdown on SIGINT/SIGTERM
- Supports -version flag to print version
- Add internal/adapters/outbound/stdout/sink.go
- Implements CorrelatedLogSink interface for stdout output
- Writes JSON lines to standard output
- Fix .gitignore to use /logcorrelator instead of logcorrelator
- Prevents cmd/logcorrelator directory from being ignored
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- breaking: remove apache and network subdivisions from JSON output
- feat: all log fields now merged into single-level JSON structure
- feat: custom MarshalJSON() implementation for flat output
- chore: update ClickHouse schema to use single fields JSON column
- docs: update CHANGELOG.md and README.md with v1.0.3 changes
- build: bump version to 1.0.3 in build.sh and RPM spec
Migration notes:
- Existing ClickHouse tables need schema migration to use fields JSON column
- Replace apache JSON and network JSON columns with fields JSON column
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>