# python-ja4common `ja4_common` is the shared Python library for the ja4-platform, providing a unified ClickHouse client singleton and configuration settings. It is used by [bot-detector](../services/bot-detector.md) and [dashboard](../services/dashboard.md). **Package name**: `ja4-common` **Python version**: ≥ 3.11 **Dependencies**: - `clickhouse-connect >= 0.8.0` - `pydantic-settings >= 2.1.0` ## ClickHouseSettings Pydantic-settings model that reads configuration from environment variables and `.env` files. ### Fields | Field | Type | Default | Env Variable | Description | |-------|------|---------|-------------|-------------| | `CLICKHOUSE_HOST` | str | `"clickhouse"` | `CLICKHOUSE_HOST` | ClickHouse server hostname | | `CLICKHOUSE_PORT` | int | `8123` | `CLICKHOUSE_PORT` | ClickHouse HTTP API port | | `CLICKHOUSE_DB` | str | `"mabase_prod"` | `CLICKHOUSE_DB` | Database name | | `CLICKHOUSE_USER` | str | `"admin"` | `CLICKHOUSE_USER` | Username for authentication | | `CLICKHOUSE_PASSWORD` | str | `""` | `CLICKHOUSE_PASSWORD` | Password for authentication | ### Configuration Sources Settings are loaded in order of precedence: 1. **Environment variables** (highest priority) 2. **`.env` file** in the current working directory 3. **Default values** (lowest priority) Environment variable names are **case-sensitive** (e.g., `CLICKHOUSE_HOST`, not `clickhouse_host`). ### Usage ```python from ja4_common.settings import settings print(settings.CLICKHOUSE_HOST) # "clickhouse" or from env print(settings.CLICKHOUSE_PORT) # 8123 or from env ``` ## ClickHouseClient Wraps `clickhouse_connect` with auto-reconnection and a clean API. ### Methods | Method | Signature | Description | |--------|-----------|-------------| | `connect` | `connect() -> Client` | Returns the underlying `clickhouse_connect` client, creating or reconnecting as needed | | `query` | `query(query: str, params: dict = None)` | Execute a SELECT query, returns result set | | `command` | `command(query: str, params: dict = None)` | Execute a DDL/DML command (CREATE, INSERT, etc.) | | `insert` | `insert(table: str, data, column_names=None)` | Bulk insert data into a table | | `close` | `close()` | Close the connection and release resources | ### Auto-Reconnection The `connect()` method automatically reconnects if the current connection is lost: ```python def connect(self): if self._client is None or not self._ping(): self._client = clickhouse_connect.get_client( host=settings.CLICKHOUSE_HOST, port=settings.CLICKHOUSE_PORT, database=settings.CLICKHOUSE_DB, user=settings.CLICKHOUSE_USER, password=settings.CLICKHOUSE_PASSWORD, connect_timeout=10, ) return self._client ``` ### Usage Example ```python from ja4_common.clickhouse import get_client client = get_client() # SELECT query result = client.query("SELECT count() FROM http_logs WHERE src_ip = {ip:String}", {"ip": "203.0.113.42"}) print(result.result_rows) # INSERT client.insert("audit_logs", [[datetime.now(), "analyst1", "investigate", "ip", "203.0.113.42"]], column_names=["timestamp", "user_name", "action", "entity_type", "entity_id"]) # Command client.command("OPTIMIZE TABLE http_logs FINAL") ``` ## get_client() Singleton The `get_client()` function provides a module-level singleton `ClickHouseClient`: ```python from ja4_common.clickhouse import get_client # First call creates the client client1 = get_client() # Subsequent calls return the same instance client2 = get_client() assert client1 is client2 ``` ### Implementation ```python _client: Optional[ClickHouseClient] = None def get_client() -> ClickHouseClient: global _client if _client is None: _client = ClickHouseClient() return _client ``` ## Using from a New Service ### 1. Add Dependency In your service's `requirements.txt`: ``` ja4-common @ file:///app/shared/python/ja4_common ``` Or in `pyproject.toml`: ```toml [project] dependencies = [ "ja4-common", ] ``` ### 2. Docker Setup ```dockerfile # Copy shared library COPY shared/python/ja4_common /app/shared/python/ja4_common RUN pip install /app/shared/python/ja4_common # Copy service code COPY services/my-service /app/services/my-service ``` ### 3. Use in Code ```python from ja4_common.clickhouse import get_client from ja4_common.settings import settings # Access settings print(f"Connecting to {settings.CLICKHOUSE_HOST}:{settings.CLICKHOUSE_PORT}") # Use client db = get_client() result = db.query("SELECT count() FROM ml_detected_anomalies") ``` ### 4. Environment Configuration Create a `.env` file or set environment variables: ```bash CLICKHOUSE_HOST=clickhouse.example.com CLICKHOUSE_PORT=8123 CLICKHOUSE_DB=mabase_prod CLICKHOUSE_USER=data_writer CLICKHOUSE_PASSWORD=secret ``` ## Testing: Mocking the Client ### Using unittest.mock ```python from unittest.mock import MagicMock, patch from ja4_common.clickhouse import ClickHouseClient def test_my_service(): mock_client = MagicMock(spec=ClickHouseClient) mock_client.query.return_value = MagicMock(result_rows=[(42,)]) with patch("ja4_common.clickhouse._client", mock_client): from ja4_common.clickhouse import get_client client = get_client() result = client.query("SELECT count() FROM http_logs") assert result.result_rows == [(42,)] ``` ### Overriding Settings in Tests ```python from ja4_common.settings import ClickHouseSettings # Create custom settings for tests test_settings = ClickHouseSettings( CLICKHOUSE_HOST="localhost", CLICKHOUSE_PORT=8123, CLICKHOUSE_DB="test_db", CLICKHOUSE_USER="test_user", CLICKHOUSE_PASSWORD="test_pass", ) ``` ## Source Files | File | Description | |------|-------------| | `ja4_common/settings.py` | `ClickHouseSettings` pydantic-settings model | | `ja4_common/clickhouse.py` | `ClickHouseClient` class and `get_client()` singleton | | `pyproject.toml` | Package metadata and dependencies |