Bug #1 - processSourceA: utilise bEventHasValidTTL en mode one_to_many
au lieu de eventsMatch qui comparait les timestamps originaux. Apres ~10s
les requetes A devenaient toutes orphelines alors que la session KA etait active.
Bug #4 - checkPendingOrphansForCorrelation: meme correction, cle identique
= meme connexion en one_to_many, pas besoin de comparer les timestamps.
Bug #3 - cleanNetworkBufferByTTL: expiration B => emission immediate
des pending orphans associes (ils ne peuvent plus jamais corréler).
Bug #2 - Orchestrateur: goroutine ticker 250ms appelle EmitPendingOrphans()
pour drainer les orphans independamment du flux d'evenements entrants.
EmitPendingOrphans() expose la methode comme publique thread-safe.
Tests: 4 nouveaux tests de non-regression (un par bug).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Socket Unix / systemd:
- RuntimeDirectory=logcorrelator dans logcorrelator.service : systemd
recrée /run/logcorrelator avec logcorrelator:logcorrelator à chaque
démarrage/restart, éliminant le problème de droits root:root
- Ajout de packaging/rpm/logcorrelator-tmpfiles.conf pour recréer le
répertoire au boot via systemd-tmpfiles (couche de protection boot)
- Retrait de /var/run/logcorrelator du RPM %files et du %post
- Dockerfile.package : copie de logcorrelator-tmpfiles.conf dans SOURCES/
Corrélation — bugs:
- Fix CRITIQUE emitPendingOrphans : corruption de slice lors de l'expiration
simultanée de plusieurs orphelins pour la même clé (aliasing du tableau
sous-jacent, orphelins émis en double et fantômes persistants)
- Fix HAUT rotateOldestA : événement silencieusement perdu même avec
ApacheAlwaysEmit=true ; retourne désormais *CorrelatedLog propagé dans
ProcessEvent
- Fix MOYEN processSourceB (pending orphan path) : en mode one_to_many, le
B event n'était pas bufferisé après corrélation avec un pending orphan A,
cassant le Keep-Alive pour les requêtes A2+ sur la même connexion
- Fix BAS : suppression du champ mort timer *time.Timer dans pendingOrphan
Corrélation — observabilité:
- Ajout keepalive_seq (1-based) dans NormalizedEvent : numéro de requête
dans la connexion Keep-Alive, incrémenté par processSourceA
- Tous les logs orphelins incluent désormais keepalive_seq=N
- keepAliveSeqA nettoyé automatiquement à l'expiration du TTL B
Tests: 4 nouveaux tests de non-régression (32 tests au total)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- feat: new config directive include_dest_ports ([]int) in correlation section
- feat: if non-empty, only events with a matching dst_port are correlated
- feat: filtered events are silently ignored (not correlated, not emitted as orphan)
- feat: new metric failed_dest_port_filtered tracked in ProcessEvent
- feat: DEBUG log 'event excluded by dest port filter: source=A dst_port=22'
- test: TestCorrelationService_IncludeDestPorts_AllowedPort
- test: TestCorrelationService_IncludeDestPorts_FilteredPort
- test: TestCorrelationService_IncludeDestPorts_EmptyAllowsAll
- docs(readme): full rewrite to match current code (v1.1.12)
- docs(readme): add include_dest_ports section, fix version refs, clean outdated sections
- docs(arch): add dest_port_filtering section, failed_dest_port_filtered metric, debug log example
- fix(config.example): remove obsolete stdout.level field
- chore: bump version to 1.1.12
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- feat(observability): metrics server with /metrics and /health endpoints
- feat(observability): correlation metrics (events, success/failed, reasons, buffers)
- feat(correlation): IP exclusion filter (exact IPs and CIDR ranges)
- feat(correlation): pending orphan delay for late-arriving B events
- fix(stdout): sink is now a no-op for data; JSON must never appear on stdout
- fix(clickhouse): all flush errors were silently discarded, now properly logged
- fix(clickhouse): buffer overflow with DropOnOverflow now logged at WARN
- fix(clickhouse): retry attempts logged at WARN with attempt/delay/error context
- feat(clickhouse): connection success logged at INFO, batch sends at DEBUG
- feat(clickhouse): SetLogger() for external logger injection
- test(stdout): assert stdout remains empty for correlated and orphan logs
- chore(rpm): bump version to 1.1.11, update changelog
- docs: README and architecture.yml updated
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
rpmbuild creates architecture-specific subdirectories (x86_64/)
by default. Updated COPY commands to include this path.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
RPMs are already written to /packages/rpm/${DIST_NAME}/ by rpmbuild
when using --rpmdir flag. No need to copy from RPMS directory.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Simplified rpmbuild process:
- Copy files directly to BUILD directory (no tar archive)
- Use --noclean flag to preserve BUILD contents
- Use %{_builddir} macro in spec file instead of %{_sourcedir}
This avoids the complexity of source archive creation/extraction
and fixes the 'No such file or directory' error.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
The source archive contains files directly (usr/, etc/, var/)
not in a tmp/pkgroot subdirectory.
Fixed paths in %install section:
- Before: %{_sourcedir}/../tmp/pkgroot/usr/bin/logcorrelator
- After: %{_sourcedir}/usr/bin/logcorrelator
This fixes the rpmbuild error:
install: cannot stat '/root/rpmbuild/SOURCES/../tmp/pkgroot/usr/bin/logcorrelator': No such file or directory
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Build optimizations implemented:
1. Makefile: Remove --no-cache flag
- Docker builds now use layer cache (incremental builds)
- Added DOCKER_BUILDKIT=1 for better performance
- Added buildx support for parallel builds
- New targets: docker-build-dev-no-test, package-rpm-sequential
2. Dockerfile: Add SKIP_TESTS argument
- SKIP_TESTS=true for faster production builds
- Tests still run in CI by default
- Added BuildKit cache mounts for:
- /go/pkg/mod (Go modules)
- /var/cache/apt (APT cache)
- /var/lib/apt/lists (APT lists)
3. Dockerfile.package: Factorize common RPM tools
- New stage: rpm-common-tools (shared across el8/el9/el10)
- fpm installed once, reused 3 times
- Common build script: /build-rpm.sh
- Reduced duplication from 300 lines to 60 lines per stage
4. Parallel RPM builds with buildx
- make package-rpm now uses buildx for parallel builds
- el8, el9, el10 built simultaneously
- Fallback: make package-rpm-sequential (if buildx fails)
Expected performance gains:
- Incremental build (code change only): 15-25 min → 3-5 min (-80%)
- Full build (no cache): 15-25 min → 8-12 min (-50%)
- RPM builds (parallel): 9 min → 4 min (-55%)
- Total typical workflow: ~20 min → ~5-7 min (-65%)
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Three critical bugs fixed in correlation service:
Bug 1: Premature A event cleanup (CRITICAL)
- cleanExpired() was using system time instead of B event TTL
- A events now only removed when no valid B exists AND A age > TimeWindow
- New cleanBufferAByBTTL() method respects B event TTL
Bug 2: Flush emitting all A as orphans without correlation attempt
- Flush() now tries to correlate remaining A with remaining B first
- Only emits A as orphan if no matching B found
- Preserves correlation during shutdown
Bug 3: Buffer full causing immediate orphan emission
- Implemented FIFO rotation instead of immediate emission
- Oldest A event removed when buffer full, new event buffered
- New rotateOldestA() and rotateOldestB() helper methods
New tests added:
- TestCorrelationService_ALateThanB_WithinTimeWindow
- TestCorrelationService_ALateThanB_AExpiredTooSoon
- TestCorrelationService_Flush_CorrelatesRemainingEvents
- TestCorrelationService_BufferFull_RotatesOldestA
- TestCorrelationService_CleanA_RespectsBTTL
All 24 tests pass.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Fix cleanExpired() to use TTL map instead of event timestamp for B events
- Increase default correlation time window from 1s to 10s
- Increase default network TTL from 30s to 120s for long sessions
- Use payload timestamp for network events when available (fallback to now)
- Add comprehensive Keep-Alive tests (TTL reset, long session scenarios)
- Bump version to 1.1.7
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Update RPM version numbers to 1.1.6
- Fix config file name (.yml not .conf)
- Add complete configuration example with current schema
- Add ClickHouse DSN format documentation
- Add Troubleshooting section (ClickHouse, MV, sockets, systemd)
- Update project structure with accurate file names
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- http_logs_raw: partition by toDate(ingest_time), order by ingest_time
- http_logs: explicit columns (no DEFAULT), extracted by MV
- mv_http_logs: full SELECT with JSONExtract* + coalesce for all fields
- Add 17 HTTP header fields (User-Agent, Accept, Sec-CH-UA, etc.)
- New ORDER BY: (time, src_ip, dst_ip, ja4)
- architecture.yml: match new schema with MV query details
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Table schema has only one column: raw_json (String)
- Serialize entire CorrelatedLog as JSON string
- Use INSERT INTO table (raw_json) with single Append() argument
- Fix "No such column timestamp" errors
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- RPM %post now sets chmod 755 on /var/run/logcorrelator
- Allows service to create sockets after reboot
- Version bumped to 1.1.5
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Create /var/run/logcorrelator/ if missing before binding sockets
- Fixes issue with tmpfs /var/run being cleared on reboot
- Add filepath import for directory handling
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Add raw JSON payload to parse error warnings
- Helps diagnose malformed JSON from senders
- Version bumped to 1.1.4
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- README.md: use http.socket instead of apache.sock
- architecture.yml: use http.socket instead of apache.sock
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Install logcorrelator.yml.example to /etc/logcorrelator/ instead of /usr/share/logcorrelator/
- Change default socket permissions from 0660 to 0666 (world read/write)
- Bump version to 1.1.2
- Remove CHANGELOG.md
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Move example config from /usr/share/logcorrelator/ to /etc/logcorrelator/
for easier access and consistency with main config file.
Bump version to 1.1.1
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Major features:
- One-to-many correlation mode (Keep-Alive) for HTTP connections
- Dynamic TTL for network events with reset on each correlation
- Separate configurable buffer sizes for HTTP and network events
- SIGHUP signal handling for log rotation without service restart
- FileSink.Reopen() method for log file rotation
- logrotate configuration included in RPM
- ExecReload added to systemd service
Configuration changes:
- New YAML structure with nested sections (time_window, orphan_policy, matching, buffers, ttl)
- Backward compatibility maintained for deprecated fields
Packaging:
- RPM version 1.1.0 with logrotate config
- Updated spec file and changelog
- All distributions: el8, el9, el10
Tests:
- New tests for Keep-Alive mode and TTL reset
- Updated mocks with Reopen() interface method
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Define %global spec_version before Version: field
- Use %{spec_version} in Version: field for proper macro expansion
- Makes version management easier for RPM builds
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Add 'systemctl start logcorrelator.service' in post script
- Update both packaging/rpm/post and logcorrelator.spec
- Service is now enabled AND started automatically on install
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>