feat(scripts): add reload-prod-logs.sh for prod→dev data sync

Exports http_logs from prod ClickHouse via HTTP API, imports into dev
with dynamic date shifting (max(time) → now() by default).

Features:
- Batch export in Native format (200K rows/batch, ~10s each)
- Auto date shift: prod max(time) aligned to current time
- --shift N: manual override (seconds)
- --days N: filter to last N days only
- --cron: silent mode for scheduled runs
- Staging table approach: export → staging → INSERT SELECT with shift → cleanup

Tested: 3,054,122 rows imported in ~3 minutes, dates 2026-04-03→2026-04-09.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
toto
2026-04-09 15:41:38 +02:00
parent 8180f4af04
commit d8ca804a55
2 changed files with 227 additions and 1 deletions

View File

@ -11,7 +11,8 @@
build-correlator test-correlator rpm-correlator \
build-bot-detector test-bot-detector \
build-dashboard test-dashboard \
test-ja4common-python
test-ja4common-python \
reload-prod-logs
# --- Root -------------------------------------------------------------------
@ -139,3 +140,7 @@ test-integration-keep:
test-integration-down:
cd tests/integration && docker compose down -v --remove-orphans
# ── Dev data ─────────────────────────────────────────────────────────────────
reload-prod-logs:
./scripts/reload-prod-logs.sh