feat: ja4-platform monorepo — 5 services unified, tests & RPM builds standardized

Services:
- ja4sentinel: TLS/JA4 fingerprint capture daemon (Go, libpcap)
- logcorrelator: JA4 log correlation engine (Go, ClickHouse)
- mod_reqin_log: Apache module (C, JSON request logging)
- bot_detector: ML bot detection pipeline (Python)
- dashboard: FastAPI/Streamlit analytics UI (Python)

Shared libraries:
- shared/go/ja4common: logger, config, shutdown, ipfilter (Go module)
- shared/python/ja4_common: ClickHouseClient, ClickHouseSettings (Python package)
- shared/clickhouse/: canonical SQL migrations (10 files)

Build & packaging:
- Unified 3-stage Dockerfile.package for Go RPMs (el8/el9/el10)
- go.work workspace linking sentinel, correlator, ja4common
- Makefile with test-all, build-all, rpm-* targets

Fixes applied:
- go.work: 1.21 → 1.24.6 (required by sentinel)
- correlator Dockerfiles: golang:1.21 → golang:1.24
- replace directives in go.mod for ja4common local path
- pyproject.toml: setuptools.backends → setuptools.build_meta
- Removed static libpcap linking (unavailable on Rocky 9)
- Fixed data races in output/writers_test.go (sync.Mutex + atomic.Int32)
- Rewrote corrupted test files (logger_test.go × 2)

Test coverage:
- correlator: 67.1% total (unixsocket 80.5%, config 91.7%, app 83.3%, multi 87.7%, stdout 100%)
- sentinel: all 10 packages pass (api, capture, config, fingerprint, ipfilter, logging, output, tlsparse)

Documentation:
- README.md + docs/ (architecture, development, 5 services, shared libs, DB schema & migrations)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
toto
2026-04-07 16:42:59 +02:00
commit d469e39da7
278 changed files with 1621301 additions and 0 deletions

View File

@ -0,0 +1,322 @@
"""
Modèles de données pour l'API
"""
from pydantic import BaseModel, Field, ConfigDict
from typing import Optional, List, Dict, Any
from datetime import datetime
from enum import Enum
class ThreatLevel(str, Enum):
CRITICAL = "CRITICAL"
HIGH = "HIGH"
MEDIUM = "MEDIUM"
LOW = "LOW"
# ─────────────────────────────────────────────────────────────────────────────
# MÉTRIQUES
# ─────────────────────────────────────────────────────────────────────────────
class MetricsSummary(BaseModel):
total_detections: int
critical_count: int
high_count: int
medium_count: int
low_count: int
known_bots_count: int
anomalies_count: int
unique_ips: int
class TimeSeriesPoint(BaseModel):
hour: datetime
total: int
critical: int
high: int
medium: int
low: int
class MetricsResponse(BaseModel):
summary: MetricsSummary
timeseries: List[TimeSeriesPoint]
threat_distribution: Dict[str, int]
# ─────────────────────────────────────────────────────────────────────────────
# DÉTECTIONS
# ─────────────────────────────────────────────────────────────────────────────
class Detection(BaseModel):
detected_at: datetime
src_ip: str
ja4: str
host: str
bot_name: str
anomaly_score: float
threat_level: str
model_name: str
recurrence: int
asn_number: str
asn_org: str
asn_detail: str
asn_domain: str
country_code: str
asn_label: str
hits: int
hit_velocity: float
fuzzing_index: float
post_ratio: float
reason: str
client_headers: str = ""
asn_score: Optional[float] = None
asn_rep_label: str = ""
first_seen: Optional[datetime] = None
last_seen: Optional[datetime] = None
unique_ja4s: Optional[List[str]] = None
unique_hosts: Optional[List[str]] = None
anubis_bot_name: str = ""
anubis_bot_action: str = ""
anubis_bot_category: str = ""
class DetectionsListResponse(BaseModel):
items: List[Detection]
total: int
page: int
page_size: int
total_pages: int
# ─────────────────────────────────────────────────────────────────────────────
# VARIABILITÉ
# ─────────────────────────────────────────────────────────────────────────────
class AttributeValue(BaseModel):
value: str
count: int
percentage: float
first_seen: Optional[datetime] = None
last_seen: Optional[datetime] = None
threat_levels: Optional[Dict[str, int]] = None
unique_ips: Optional[int] = None
primary_threat: Optional[str] = None
class VariabilityAttributes(BaseModel):
user_agents: List[AttributeValue] = Field(default_factory=list)
ja4: List[AttributeValue] = Field(default_factory=list)
countries: List[AttributeValue] = Field(default_factory=list)
asns: List[AttributeValue] = Field(default_factory=list)
hosts: List[AttributeValue] = Field(default_factory=list)
threat_levels: List[AttributeValue] = Field(default_factory=list)
model_names: List[AttributeValue] = Field(default_factory=list)
class Insight(BaseModel):
type: str # "warning", "info", "success"
message: str
class VariabilityResponse(BaseModel):
type: str
value: str
total_detections: int
unique_ips: int
date_range: Dict[str, datetime]
attributes: VariabilityAttributes
insights: List[Insight] = Field(default_factory=list)
# ─────────────────────────────────────────────────────────────────────────────
# ATTRIBUTS UNIQUES
# ─────────────────────────────────────────────────────────────────────────────
class AttributeListItem(BaseModel):
value: str
count: int
class AttributeListResponse(BaseModel):
type: str
items: List[AttributeListItem]
total: int
# ─────────────────────────────────────────────────────────────────────────────
# USER-AGENTS
# ─────────────────────────────────────────────────────────────────────────────
class UserAgentValue(BaseModel):
value: str
count: int
percentage: float
first_seen: Optional[datetime] = None
last_seen: Optional[datetime] = None
class UserAgentsResponse(BaseModel):
type: str
value: str
user_agents: List[UserAgentValue]
total: int
showing: int
# ─────────────────────────────────────────────────────────────────────────────
# CLASSIFICATIONS (SOC / ML)
# ─────────────────────────────────────────────────────────────────────────────
class ClassificationLabel(str, Enum):
LEGITIMATE = "legitimate"
SUSPICIOUS = "suspicious"
MALICIOUS = "malicious"
class ClassificationBase(BaseModel):
ip: Optional[str] = None
ja4: Optional[str] = None
label: ClassificationLabel
tags: List[str] = Field(default_factory=list)
comment: str = ""
confidence: float = Field(ge=0.0, le=1.0, default=0.5)
analyst: str = "unknown"
class ClassificationCreate(ClassificationBase):
"""Données pour créer une classification"""
features: dict = Field(default_factory=dict)
class Classification(ClassificationBase):
"""Classification complète avec métadonnées"""
model_config = ConfigDict(from_attributes=True)
created_at: datetime
features: dict = Field(default_factory=dict)
class ClassificationsListResponse(BaseModel):
items: List[Classification]
total: int
# ─────────────────────────────────────────────────────────────────────────────
# ANALYSIS (CORRELATION)
# ─────────────────────────────────────────────────────────────────────────────
class SubnetAnalysis(BaseModel):
"""Analyse subnet/ASN"""
ip: str
subnet: str
ips_in_subnet: List[str]
total_in_subnet: int
asn_number: str
asn_org: str
total_in_asn: int
alert: bool # True si > 10 IPs du subnet
class CountryData(BaseModel):
"""Données pour un pays"""
code: str
name: str
count: int
percentage: float
class CountryAnalysis(BaseModel):
"""Analyse des pays"""
top_countries: List[CountryData]
baseline: dict # Pays habituels
alert_country: Optional[str] = None # Pays surreprésenté
class JA4SubnetData(BaseModel):
"""Subnet pour un JA4"""
subnet: str
count: int
class JA4Analysis(BaseModel):
"""Analyse JA4"""
ja4: str
shared_ips_count: int
top_subnets: List[JA4SubnetData]
other_ja4_for_ip: List[str]
class UserAgentData(BaseModel):
"""Données pour un User-Agent"""
value: str
count: int
percentage: float
classification: str # "normal", "bot", "script"
class UserAgentAnalysis(BaseModel):
"""Analyse User-Agents"""
ip_user_agents: List[UserAgentData]
ja4_user_agents: List[UserAgentData]
bot_percentage: float
alert: bool # True si > 20% bots/scripts
class CorrelationIndicators(BaseModel):
"""Indicateurs de corrélation"""
subnet_ips_count: int
asn_ips_count: int
country_percentage: float
ja4_shared_ips: int
user_agents_count: int
bot_ua_percentage: float
class ClassificationRecommendation(BaseModel):
"""Recommandation de classification"""
label: ClassificationLabel
confidence: float
indicators: CorrelationIndicators
suggested_tags: List[str]
reason: str
# ─────────────────────────────────────────────────────────────────────────────
# ENTITIES (UNIFIED VIEW)
# ─────────────────────────────────────────────────────────────────────────────
class EntityStats(BaseModel):
"""Statistiques pour une entité"""
entity_type: str
entity_value: str
total_requests: int
unique_ips: int
first_seen: datetime
last_seen: datetime
class EntityRelatedAttributes(BaseModel):
"""Attributs associés à une entité"""
ips: List[str] = Field(default_factory=list)
ja4s: List[str] = Field(default_factory=list)
hosts: List[str] = Field(default_factory=list)
asns: List[str] = Field(default_factory=list)
countries: List[str] = Field(default_factory=list)
class EntityAttributeValue(BaseModel):
"""Valeur d'attribut avec count et percentage (pour les entities)"""
value: str
count: int
percentage: float
class EntityInvestigation(BaseModel):
"""Investigation complète pour une entité"""
stats: EntityStats
related: EntityRelatedAttributes
user_agents: List[EntityAttributeValue] = Field(default_factory=list)
client_headers: List[EntityAttributeValue] = Field(default_factory=list)
paths: List[EntityAttributeValue] = Field(default_factory=list)
query_params: List[EntityAttributeValue] = Field(default_factory=list)