Files
ja4-platform/docs/shared/python-ja4common.md
toto 9f3e0621e5 feat: split ClickHouse into dual configurable databases (ja4_logs / ja4_processing)
Architecture:
- ja4_logs: raw log ingestion (http_logs_raw, http_logs, mv_http_logs)
- ja4_processing: analytics, aggregation, ML, dictionaries, audit

Configuration (env vars):
- CLICKHOUSE_DB_LOGS (default: ja4_logs)
- CLICKHOUSE_DB_PROCESSING (default: ja4_processing)

Changes:
- SQL migrations (10 files): all mabase_prod refs → ja4_logs or ja4_processing
  with correct cross-database references (MVs, views, dicts)
- deploy_schema.sh: substitutes DB names from env vars at deploy time
- Python shared settings: added CLICKHOUSE_DB_LOGS + CLICKHOUSE_DB_PROCESSING
- Dashboard routes (19 files): replaced ~80 hardcoded mabase_prod refs
  with settings.CLICKHOUSE_DB_LOGS / settings.CLICKHOUSE_DB_PROCESSING
- Bot-detector: DB → CLICKHOUSE_DB_PROCESSING, fetch_rules.py configurable
- Correlator: DSN example updated to ja4_logs
- Docker-compose + .env files: new env vars with defaults
- All documentation updated (14 markdown files)

All tests pass: sentinel 10/10, correlator 67.1%, bot-detector 11, dashboard 20, ja4_common 18

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-07 19:10:35 +02:00

5.9 KiB

python-ja4common

ja4_common is the shared Python library for the ja4-platform, providing a unified ClickHouse client singleton and configuration settings. It is used by bot-detector and dashboard.

Package name: ja4-common

Python version: ≥ 3.11

Dependencies:

  • clickhouse-connect >= 0.8.0
  • pydantic-settings >= 2.1.0

ClickHouseSettings

Pydantic-settings model that reads configuration from environment variables and .env files.

Fields

Field Type Default Env Variable Description
CLICKHOUSE_HOST str "clickhouse" CLICKHOUSE_HOST ClickHouse server hostname
CLICKHOUSE_PORT int 8123 CLICKHOUSE_PORT ClickHouse HTTP API port
CLICKHOUSE_DB str "ja4_processing" CLICKHOUSE_DB Database name
CLICKHOUSE_USER str "admin" CLICKHOUSE_USER Username for authentication
CLICKHOUSE_PASSWORD str "" CLICKHOUSE_PASSWORD Password for authentication

Configuration Sources

Settings are loaded in order of precedence:

  1. Environment variables (highest priority)
  2. .env file in the current working directory
  3. Default values (lowest priority)

Environment variable names are case-sensitive (e.g., CLICKHOUSE_HOST, not clickhouse_host).

Usage

from ja4_common.settings import settings

print(settings.CLICKHOUSE_HOST)  # "clickhouse" or from env
print(settings.CLICKHOUSE_PORT)  # 8123 or from env

ClickHouseClient

Wraps clickhouse_connect with auto-reconnection and a clean API.

Methods

Method Signature Description
connect connect() -> Client Returns the underlying clickhouse_connect client, creating or reconnecting as needed
query query(query: str, params: dict = None) Execute a SELECT query, returns result set
command command(query: str, params: dict = None) Execute a DDL/DML command (CREATE, INSERT, etc.)
insert insert(table: str, data, column_names=None) Bulk insert data into a table
close close() Close the connection and release resources

Auto-Reconnection

The connect() method automatically reconnects if the current connection is lost:

def connect(self):
    if self._client is None or not self._ping():
        self._client = clickhouse_connect.get_client(
            host=settings.CLICKHOUSE_HOST,
            port=settings.CLICKHOUSE_PORT,
            database=settings.CLICKHOUSE_DB,
            user=settings.CLICKHOUSE_USER,
            password=settings.CLICKHOUSE_PASSWORD,
            connect_timeout=10,
        )
    return self._client

Usage Example

from ja4_common.clickhouse import get_client

client = get_client()

# SELECT query
result = client.query("SELECT count() FROM http_logs WHERE src_ip = {ip:String}", {"ip": "203.0.113.42"})
print(result.result_rows)

# INSERT
client.insert("audit_logs", [[datetime.now(), "analyst1", "investigate", "ip", "203.0.113.42"]], 
              column_names=["timestamp", "user_name", "action", "entity_type", "entity_id"])

# Command
client.command("OPTIMIZE TABLE http_logs FINAL")

get_client() Singleton

The get_client() function provides a module-level singleton ClickHouseClient:

from ja4_common.clickhouse import get_client

# First call creates the client
client1 = get_client()

# Subsequent calls return the same instance
client2 = get_client()
assert client1 is client2

Implementation

_client: Optional[ClickHouseClient] = None

def get_client() -> ClickHouseClient:
    global _client
    if _client is None:
        _client = ClickHouseClient()
    return _client

Using from a New Service

1. Add Dependency

In your service's requirements.txt:

ja4-common @ file:///app/shared/python/ja4_common

Or in pyproject.toml:

[project]
dependencies = [
    "ja4-common",
]

2. Docker Setup

# Copy shared library
COPY shared/python/ja4_common /app/shared/python/ja4_common
RUN pip install /app/shared/python/ja4_common

# Copy service code
COPY services/my-service /app/services/my-service

3. Use in Code

from ja4_common.clickhouse import get_client
from ja4_common.settings import settings

# Access settings
print(f"Connecting to {settings.CLICKHOUSE_HOST}:{settings.CLICKHOUSE_PORT}")

# Use client
db = get_client()
result = db.query("SELECT count() FROM ml_detected_anomalies")

4. Environment Configuration

Create a .env file or set environment variables:

CLICKHOUSE_HOST=clickhouse.example.com
CLICKHOUSE_PORT=8123
CLICKHOUSE_DB=ja4_processing
CLICKHOUSE_USER=data_writer
CLICKHOUSE_PASSWORD=secret

Testing: Mocking the Client

Using unittest.mock

from unittest.mock import MagicMock, patch
from ja4_common.clickhouse import ClickHouseClient

def test_my_service():
    mock_client = MagicMock(spec=ClickHouseClient)
    mock_client.query.return_value = MagicMock(result_rows=[(42,)])
    
    with patch("ja4_common.clickhouse._client", mock_client):
        from ja4_common.clickhouse import get_client
        client = get_client()
        result = client.query("SELECT count() FROM http_logs")
        assert result.result_rows == [(42,)]

Overriding Settings in Tests

from ja4_common.settings import ClickHouseSettings

# Create custom settings for tests
test_settings = ClickHouseSettings(
    CLICKHOUSE_HOST="localhost",
    CLICKHOUSE_PORT=8123,
    CLICKHOUSE_DB="test_db",
    CLICKHOUSE_USER="test_user",
    CLICKHOUSE_PASSWORD="test_pass",
)

Source Files

File Description
ja4_common/settings.py ClickHouseSettings pydantic-settings model
ja4_common/clickhouse.py ClickHouseClient class and get_client() singleton
pyproject.toml Package metadata and dependencies