Feat: Détection menaces HTTP via vues ClickHouse + simplification shutdown
Nouvelles vues de détection (sql/views.sql) : - Identification hosts par IP/JA4 (view_host_identification, view_host_ja4_anomalies) - Détection brute force POST et query params variables - Header fingerprinting (ordre, headers modernes manquants, Sec-CH-UA) - ALPN mismatch detection (h2 déclaré mais HTTP/1.1 parlé) - Rate limiting & burst detection (50 req/min, 20 req/10s) - Path enumeration/scanning (paths sensibles) - Payload attacks (SQLi, XSS, path traversal) - JA4 botnet detection (même fingerprint sur 20+ IPs) - Correlation quality (orphan ratio >80%) ClickHouse (sql/init.sql) : - Compression ZSTD(3) sur champs texte (path, query, headers, ja3/ja4) - TTL automatique : 1 jour (raw) + 7 jours (http_logs) - Paramètre ttl_only_drop_parts = 1 Shutdown simplifié (internal/app/orchestrator.go) : - Suppression ShutdownTimeout et logique de flush/attente - Stop() = cancel() + Close() uniquement - systemd TimeoutStopSec gère l'arrêt forcé si besoin File output toggle (internal/config/*.go) : - Ajout champ Enabled dans FileOutputConfig - Le sink fichier n'est créé que si enabled && path != '' - Tests : TestValidate_FileOutputDisabled, TestLoadConfig_FileOutputDisabled RPM packaging (packaging/rpm/logcorrelator.spec) : - Changelog 1.1.18 → 1.1.22 - Suppression logcorrelator-tmpfiles.conf (redondant RuntimeDirectory=) Nettoyage : - idees.txt → idees/ (dossier) - Suppression 91.224.92.185.txt (logs exemple) Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
This commit is contained in:
6235
91.224.92.185.txt
6235
91.224.92.185.txt
File diff suppressed because it is too large
Load Diff
2
Makefile
2
Makefile
@ -20,7 +20,7 @@ BINARY_NAME=logcorrelator
|
|||||||
DIST_DIR=dist
|
DIST_DIR=dist
|
||||||
|
|
||||||
# Package version
|
# Package version
|
||||||
PKG_VERSION ?= 1.1.17
|
PKG_VERSION ?= 1.1.22
|
||||||
|
|
||||||
# Enable BuildKit for better performance
|
# Enable BuildKit for better performance
|
||||||
export DOCKER_BUILDKIT=1
|
export DOCKER_BUILDKIT=1
|
||||||
|
|||||||
@ -345,7 +345,8 @@ outputs:
|
|||||||
enabled: true
|
enabled: true
|
||||||
description: >
|
description: >
|
||||||
Sink fichier local. Un JSON par ligne. Rotation gérée par logrotate,
|
Sink fichier local. Un JSON par ligne. Rotation gérée par logrotate,
|
||||||
réouverture du fichier sur SIGHUP.
|
réouverture du fichier sur SIGHUP. Le champ `enabled: false` coupe
|
||||||
|
completement l'ecriture du fichier (le sink n'est pas cree).
|
||||||
path: /var/log/logcorrelator/correlated.log
|
path: /var/log/logcorrelator/correlated.log
|
||||||
format: json_lines
|
format: json_lines
|
||||||
rotate_managed_by: external_logrotate
|
rotate_managed_by: external_logrotate
|
||||||
|
|||||||
@ -63,7 +63,7 @@ func main() {
|
|||||||
// Create sinks
|
// Create sinks
|
||||||
sinks := make([]ports.CorrelatedLogSink, 0)
|
sinks := make([]ports.CorrelatedLogSink, 0)
|
||||||
|
|
||||||
if cfg.Outputs.File.Path != "" {
|
if cfg.Outputs.File.Enabled && cfg.Outputs.File.Path != "" {
|
||||||
fileSink, err := file.NewFileSink(file.Config{
|
fileSink, err := file.NewFileSink(file.Config{
|
||||||
Path: cfg.Outputs.File.Path,
|
Path: cfg.Outputs.File.Path,
|
||||||
})
|
})
|
||||||
|
|||||||
@ -90,4 +90,3 @@ metrics:
|
|||||||
# Endpoints:
|
# Endpoints:
|
||||||
# GET /metrics - Returns correlation metrics as JSON
|
# GET /metrics - Returns correlation metrics as JSON
|
||||||
# GET /health - Health check endpoint
|
# GET /health - Health check endpoint
|
||||||
|
|
||||||
|
|||||||
111
idees/champs.md
Normal file
111
idees/champs.md
Normal file
@ -0,0 +1,111 @@
|
|||||||
|
time
|
||||||
|
log_date
|
||||||
|
|
||||||
|
src_ip
|
||||||
|
- ip source de la connexion
|
||||||
|
src_port
|
||||||
|
- port source de la connexion
|
||||||
|
dst_ip
|
||||||
|
- ip de destination de la connexion
|
||||||
|
dst_port
|
||||||
|
- port de destination de la connexion
|
||||||
|
|
||||||
|
src_asn
|
||||||
|
- Numero d'AS de l'ip source
|
||||||
|
src_country_code
|
||||||
|
- Code Pays de l'ip source
|
||||||
|
src_as_name
|
||||||
|
- Nom de l'AS de l ip source
|
||||||
|
src_org
|
||||||
|
- Organisation de l AS source
|
||||||
|
src_domain
|
||||||
|
- domaine de l'AS de l ip source
|
||||||
|
|
||||||
|
method
|
||||||
|
- Methode HTTP [GET, POST, ... ]
|
||||||
|
scheme
|
||||||
|
- Type de connexion http [http, https]
|
||||||
|
host
|
||||||
|
- Hostname demandé dans l'url
|
||||||
|
path
|
||||||
|
- Path demandé dans l'url
|
||||||
|
query
|
||||||
|
- Query demandé dans l'url
|
||||||
|
http_version
|
||||||
|
- Version du protocol http utilisé
|
||||||
|
|
||||||
|
orphan_side
|
||||||
|
- Indique si le log HTTP a pu etre enrichi avec les informations ip_, tcp, ja3_ et ja4_
|
||||||
|
- "A" indique que seul le log HTTP est present, sans enrichissement
|
||||||
|
correlated
|
||||||
|
- l'algorithm de correlation log http + parametres tcp a il réussi (tcp + ja4/3)
|
||||||
|
keepalives
|
||||||
|
- Numero de desquance dans une connexion http avec keepalive.
|
||||||
|
a_timestamp
|
||||||
|
b_timestamp
|
||||||
|
conn_id
|
||||||
|
|
||||||
|
ip_meta_df
|
||||||
|
- Flag dont fragement
|
||||||
|
ip_meta_id
|
||||||
|
- id du packet ip
|
||||||
|
ip_meta_total_length
|
||||||
|
- Taille des metadata dans pe packet ip
|
||||||
|
ip_meta_ttl
|
||||||
|
- TTL du packet ip vu par le serveur destinataire du packet
|
||||||
|
|
||||||
|
tcp_meta_options
|
||||||
|
- options du packet TCP vu par le serveur destinataire du packet
|
||||||
|
tcp_meta_window_size
|
||||||
|
- TCP window size vu par le serveur destinataire du packet
|
||||||
|
tcp_meta_mss
|
||||||
|
- TCP mss vu par le serveur destinataire du packet
|
||||||
|
tcp_meta_window_scale
|
||||||
|
- TCP windows scale vu par le serveur destinataire du packet
|
||||||
|
syn_to_clienthello_ms
|
||||||
|
- durée en ms entre le 1er packet SYN et le ClienHello du TLS
|
||||||
|
|
||||||
|
tls_version
|
||||||
|
- Version de TLS negocié avec le serveur destinataire du packet
|
||||||
|
tls_sni
|
||||||
|
- SNI, nom de domaine demandé pour le cerificat TLS
|
||||||
|
tls_alpn
|
||||||
|
- ALPN annoncé lors du TLS
|
||||||
|
ja3
|
||||||
|
- liste des agos utiliés pour la signature ja3
|
||||||
|
ja3_hash
|
||||||
|
- hash ja3
|
||||||
|
ja4
|
||||||
|
- hash ja4
|
||||||
|
|
||||||
|
client_headers
|
||||||
|
- liste des headers envoyés par le client http sous forme de liste Header,Header2,Header3,...
|
||||||
|
|
||||||
|
header_user_agent
|
||||||
|
- Header HTTP User-Agent
|
||||||
|
header_accept
|
||||||
|
- Header HTTP Accept
|
||||||
|
header_accept_encoding
|
||||||
|
- Header HTTP Accept-Encoding
|
||||||
|
header_accept_language
|
||||||
|
- Header HTTP Accept-Language
|
||||||
|
header_content_type
|
||||||
|
- Header Content-Type
|
||||||
|
header_x_request_id
|
||||||
|
- Header X-Request-ID
|
||||||
|
header_x_trace_id
|
||||||
|
- Header X-Trace-ID
|
||||||
|
header_x_forwarded_for
|
||||||
|
- Header X-Forwarded-For
|
||||||
|
header_sec_ch_ua
|
||||||
|
- Header Sec-Ch-UA
|
||||||
|
header_sec_ch_ua_mobile
|
||||||
|
- Header -Sec-Ch-UA-Mobile
|
||||||
|
header_sec_ch_ua_platform
|
||||||
|
- Header Sec-Ch-UA-Plateform
|
||||||
|
header_sec_fetch_dest
|
||||||
|
- Header -Sec-Fetch-Dest
|
||||||
|
header_sec_fetch_mode
|
||||||
|
- Header Sec-Fetch-Mode
|
||||||
|
header_sec_fetch_site
|
||||||
|
- Header Sec-Fetch-Site
|
||||||
521
idees/views.md
Normal file
521
idees/views.md
Normal file
@ -0,0 +1,521 @@
|
|||||||
|
# 🛡️ Manuel de Référence Technique : Moteur de Détection Antispam & Bot
|
||||||
|
|
||||||
|
Ce document détaille les algorithmes de détection implémentés dans les vues ClickHouse pour la plateforme.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Analyse de la Couche Transport (L4) : La "Trace Physique"
|
||||||
|
Avant même d'analyser l'URL, le moteur inspecte la manière dont la connexion a été établie. C'est la couche la plus difficile à falsifier pour un attaquant.
|
||||||
|
|
||||||
|
### A. Fingerprint de la Pile TCP (`tcp_fingerprint`)
|
||||||
|
* **Fonctionnement :** Nous utilisons `cityHash64` pour créer un identifiant unique basé sur trois paramètres immuables du handshake : le **MSS** (Maximum Segment Size), la **Window Size** et le **Window Scale**.
|
||||||
|
* **Ce que ça détecte :** L'unicité logicielle. Un bot tournant sur une image Alpine Linux aura une signature TCP différente d'un utilisateur sur iOS 17 ou Windows 11.
|
||||||
|
* **Détection de botnet :** Si 500 IPs différentes partagent exactement le même `tcp_fingerprint` ET le même `ja4`, il y a une probabilité de 99% qu'il s'agisse d'un cluster de bots clonés.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
### B. Analyse de la gigue (Jitter) et Handshake
|
||||||
|
* **Fonctionnement :** On calcule la variance (`varPop`) du délai entre le `SYN` et le `ClientHello` TLS.
|
||||||
|
* **Ce que ça détecte :** La stabilité robotique.
|
||||||
|
* **Humain :** Latence variable (4G, Wi-Fi, mouvements). La variance est élevée.
|
||||||
|
* **Bot Datacenter :** Latence ultra-stable (fibre optique dédiée). Une variance proche de 0 indique une connexion automatisée depuis une infrastructure cloud.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Analyse de la Session (L5) : Le "Passeport TLS"
|
||||||
|
Le handshake TLS est une mine d'or pour identifier la bibliothèque logicielle (OpenSSL, Go-TLS, etc.).
|
||||||
|
|
||||||
|
### A. Incohérence UA vs JA4
|
||||||
|
* **Fonctionnement :** Le moteur croise le `header_user_agent` (déclaratif) avec le `ja4` (structurel).
|
||||||
|
* **Ce que ça détecte :** Le **Spoofing de Browser**. Un script Python peut facilement écrire `User-Agent: Mozilla/5.0...Chrome/120`, mais il ne peut pas simuler l'ordre exact des extensions TLS et des algorithmes de chiffrement d'un vrai Chrome sans une ingénierie complexe (comme `utls`).
|
||||||
|
* **Logique de score :** Si UA = Chrome mais JA4 != Signature_Chrome -> **+50 points de risque**.
|
||||||
|
|
||||||
|
### B. Discordance Host vs SNI
|
||||||
|
* **Fonctionnement :** Comparaison entre le champ `tls_sni` (négocié en clair lors du handshake) et le header `Host` (envoyé plus tard dans la requête chiffrée).
|
||||||
|
* **Ce que ça détecte :** Le **Domain Fronting** ou les attaques par tunnel. Un bot peut demander un certificat pour `domaine-innocent.com` (SNI) mais tenter d'attaquer `api-critique.com` (Host).
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Analyse Applicative (L7) : Le "Comportement HTTP"
|
||||||
|
Une fois le tunnel établi, on analyse la structure de la requête HTTP.
|
||||||
|
|
||||||
|
### A. Empreinte d'ordre des Headers (`http_fp`)
|
||||||
|
* **Fonctionnement :** Nous hashons la liste ordonnée des clés de headers (`Accept`, `User-Agent`, `Connection`, etc.).
|
||||||
|
* **Ce que ça détecte :** La signature du moteur de rendu. Chaque navigateur (Firefox, Safari, Chromium) a un ordre immuable pour envoyer ses headers.
|
||||||
|
* **Détection :** Si un client envoie les headers dans un ordre inhabituel ou minimaliste (pauvreté des headers < 6), il est marqué comme suspect.
|
||||||
|
|
||||||
|
### B. Analyse des Payloads et Entropie
|
||||||
|
* **Fonctionnement :** Recherche de patterns via regex dans `query` et `path` (détection SQLi, XSS, Path Traversal).
|
||||||
|
* **Complexité :** Nous détectons les encodages multiples (ex: `%2520`) qui tentent de tromper les pare-feux simples.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Corrélation Temporelle & Baseline : Le "Voisinage Statistique"
|
||||||
|
Le score final dépend du passé de la signature TLS.
|
||||||
|
|
||||||
|
### A. Le Malus de Nouveauté (`agg_novelty`)
|
||||||
|
* **Logique :** Une signature (JA4 + FP) vue pour la première fois aujourd'hui est "froide".
|
||||||
|
* **Traitement :** On applique un malus si `first_seen` date de moins de 2 heures. Un botnet qui vient de lancer une campagne de rotation de signatures sera immédiatement pénalisé par son manque d'historique.
|
||||||
|
|
||||||
|
### B. Le Dépassement de Baseline (`tbl_baseline_ja4_7d`)
|
||||||
|
* **Fonctionnement :** On compare les `hits` actuels au 99ème percentile (`p99`) historique de cette signature précise.
|
||||||
|
* **Exemple :** Si le JA4 de "Chrome 122" fait habituellement 10 requêtes/min/IP sur votre site, et qu'une IP en fait soudainement 300, le score explose même si la requête est techniquement parfaite.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Synthèse du Scoring (Le Verdict)
|
||||||
|
|
||||||
|
| Algorithme | Signal | Impact Score |
|
||||||
|
| :--- | :--- | :--- |
|
||||||
|
| **Fingerprint Mismatch** | UA vs TLS (Spoofing) | **Haut (50)** |
|
||||||
|
| **L4 Anomaly** | Variance latence < 0.5ms | **Moyen (30)** |
|
||||||
|
| **Path Sensitivity** | Hit sur `/admin` ou `/config` | **Haut (40)** |
|
||||||
|
| **Payload Security** | Caractères d'injection (SQL/XSS) | **Critique (60)** |
|
||||||
|
| **Mass Distribution** | 1 JA4 sur > 50 IPs différentes | **Moyen (30)** |
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Identification des Hosts par IP et JA4 (sql/hosts.sql)
|
||||||
|
|
||||||
|
Cette section détaille les vues d'agrégation et de détection pour identifier quels hosts sont associés à quelles signatures (IP + JA4).
|
||||||
|
|
||||||
|
### A. Agrégats de Base
|
||||||
|
|
||||||
|
| Table | Granularité | Description |
|
||||||
|
|-------|-------------|-------------|
|
||||||
|
| `agg_host_ip_ja4_1h` | heure | Hits, paths uniques, query params, méthodes par (IP, JA4, host) |
|
||||||
|
| `agg_host_ip_ja4_24h` | jour | Rollup quotidien pour historique long terme |
|
||||||
|
|
||||||
|
### B. Vues d'Identification
|
||||||
|
|
||||||
|
**`view_host_identification`** - Top hosts par signature
|
||||||
|
```sql
|
||||||
|
-- Quel host est associé à cette IP/JA4 ?
|
||||||
|
SELECT src_ip, ja4, host, total_hits, unique_paths, user_agent
|
||||||
|
FROM mabase_prod.view_host_identification
|
||||||
|
WHERE src_ip = '1.2.3.4'
|
||||||
|
ORDER BY total_hits DESC;
|
||||||
|
```
|
||||||
|
|
||||||
|
**`view_host_ja4_anomalies`** - JA4 partagé par plusieurs hosts (botnet)
|
||||||
|
```sql
|
||||||
|
-- Ce JA4 est-il utilisé par plusieurs hosts différents ?
|
||||||
|
SELECT ja4, hosts, unique_hosts, unique_ips
|
||||||
|
FROM mabase_prod.view_host_ja4_anomalies
|
||||||
|
HAVING unique_hosts >= 3;
|
||||||
|
-- Interprétation : 1 JA4 sur 3+ hosts = botnet cloné probable
|
||||||
|
```
|
||||||
|
|
||||||
|
**`view_host_ip_ja4_rotation`** - IP avec rotation de fingerprints
|
||||||
|
```sql
|
||||||
|
-- Cette IP change-t-elle de JA4 fréquemment ?
|
||||||
|
SELECT src_ip, ja4s, unique_ja4s
|
||||||
|
FROM mabase_prod.view_host_ip_ja4_rotation
|
||||||
|
HAVING unique_ja4s >= 5;
|
||||||
|
-- Interprétation : 1 IP avec 5+ JA4 différents = fingerprint spoofing
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Détection de Brute Force (sql/hosts.sql)
|
||||||
|
|
||||||
|
### A. Brute Force sur POST (endpoints sensibles)
|
||||||
|
|
||||||
|
**Table :** `agg_bruteforce_post_5m` - Fenêtres de 5 minutes
|
||||||
|
|
||||||
|
**Vue :** `view_bruteforce_post_detected`
|
||||||
|
```sql
|
||||||
|
-- Détecter les tentatives de brute force sur les login
|
||||||
|
SELECT window, src_ip, ja4, host, path, attempts, attempts_per_minute
|
||||||
|
FROM mabase_prod.view_bruteforce_post_detected
|
||||||
|
WHERE host = 'api.example.com'
|
||||||
|
ORDER BY attempts DESC;
|
||||||
|
|
||||||
|
-- Threshold : ≥10 POST en 5 minutes sur endpoints sensibles
|
||||||
|
-- Endpoints ciblés : login, auth, signin, password, admin, wp-login, etc.
|
||||||
|
```
|
||||||
|
|
||||||
|
### B. Brute Force sur Formulaire (Query params variables)
|
||||||
|
|
||||||
|
**Table :** `agg_form_bruteforce_5m`
|
||||||
|
|
||||||
|
**Vue :** `view_form_bruteforce_detected`
|
||||||
|
```sql
|
||||||
|
-- Détecter les requêtes avec query params hautement variables
|
||||||
|
SELECT window, src_ip, ja4, host, path, requests, unique_query_patterns
|
||||||
|
FROM mabase_prod.view_form_bruteforce_detected
|
||||||
|
HAVING requests >= 20 AND unique_query_patterns >= 10;
|
||||||
|
|
||||||
|
-- Interprétation : 20+ requêtes avec 10+ patterns query différents
|
||||||
|
-- = tentative de fuzzing ou brute force sur paramètres
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Header Fingerprinting (sql/hosts.sql)
|
||||||
|
|
||||||
|
Le champ `client_headers` contient la liste comma-separated des headers présents.
|
||||||
|
Exemple : `"Accept,Accept-Encoding,Sec-CH-UA,Sec-Fetch-Dest,User-Agent"`
|
||||||
|
|
||||||
|
### A. Signature par Ordre de Headers
|
||||||
|
|
||||||
|
**Table :** `agg_header_fingerprint_1h`
|
||||||
|
|
||||||
|
| Champ | Description |
|
||||||
|
|-------|-------------|
|
||||||
|
| `header_count` | Nombre total de headers (virgules + 1) |
|
||||||
|
| `has_*` | Flags pour chaque header moderne (Sec-CH-UA, Sec-Fetch-*, etc.) |
|
||||||
|
| `header_order_hash` | MD5(client_headers) = signature unique de l'ordre |
|
||||||
|
| `modern_browser_score` | Score 0-100 basé sur les headers modernes présents |
|
||||||
|
|
||||||
|
### B. Vues de Détection
|
||||||
|
|
||||||
|
**`view_header_missing_modern_headers`** - Headers modernes manquants
|
||||||
|
```sql
|
||||||
|
-- Navigateurs "modernes" avec headers manquants
|
||||||
|
SELECT src_ip, ja4, header_user_agent, modern_browser_score, header_count
|
||||||
|
FROM mabase_prod.view_header_missing_modern_headers
|
||||||
|
WHERE header_user_agent ILIKE '%Chrome%';
|
||||||
|
|
||||||
|
-- Threshold : score < 70 pour Chrome/Firefox = suspect
|
||||||
|
-- Un vrai Chrome envoie automatiquement Sec-CH-UA, Sec-Fetch-*, etc.
|
||||||
|
```
|
||||||
|
|
||||||
|
**`view_header_ua_order_mismatch`** - Spoofing détecté
|
||||||
|
```sql
|
||||||
|
-- Même User-Agent avec ordre de headers différent
|
||||||
|
SELECT header_user_agent, ja4, unique_hashes, unique_ips
|
||||||
|
FROM mabase_prod.view_header_ua_order_mismatch
|
||||||
|
HAVING unique_hashes > 1;
|
||||||
|
|
||||||
|
-- Interprétation : 1 UA avec 2+ ordres de headers = spoofing ou outil custom
|
||||||
|
```
|
||||||
|
|
||||||
|
**`view_header_minimalist_count`** - Bot minimaliste
|
||||||
|
```sql
|
||||||
|
-- Clients avec trop peu de headers
|
||||||
|
SELECT src_ip, ja4, header_count, header_user_agent
|
||||||
|
FROM mabase_prod.view_header_minimalist_count
|
||||||
|
WHERE header_count < 6;
|
||||||
|
|
||||||
|
-- Threshold : < 6 headers = bot scripté (curl, Python requests, etc.)
|
||||||
|
```
|
||||||
|
|
||||||
|
**`view_header_sec_ch_missing`** - Incohérence Chrome
|
||||||
|
```sql
|
||||||
|
-- Chrome sans Sec-CH-UA (impossible pour un vrai Chrome)
|
||||||
|
SELECT src_ip, ja4, header_user_agent
|
||||||
|
FROM mabase_prod.view_header_sec_ch_missing
|
||||||
|
WHERE header_user_agent ILIKE '%Chrome/%';
|
||||||
|
```
|
||||||
|
|
||||||
|
**`view_header_known_bot_signature`** - Signature botnet
|
||||||
|
```sql
|
||||||
|
-- Même ordre de headers sur 10+ IPs différentes
|
||||||
|
SELECT header_order_hash, header_user_agent, unique_ips, total_hits
|
||||||
|
FROM mabase_prod.view_header_known_bot_signature
|
||||||
|
HAVING unique_ips >= 10;
|
||||||
|
|
||||||
|
-- Interprétation : 1 signature sur 10+ IPs = cluster de bots clonés
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. ALPN Mismatch Detection (sql/hosts.sql)
|
||||||
|
|
||||||
|
### Principe
|
||||||
|
|
||||||
|
ALPN (Application-Layer Protocol Negotiation) est une extension TLS qui négocie le protocole HTTP **avant** la requête.
|
||||||
|
|
||||||
|
| ALPN déclaré | HTTP réel | Interprétation |
|
||||||
|
|--------------|-----------|----------------|
|
||||||
|
| `h2` | `HTTP/2` | ✅ Normal |
|
||||||
|
| `h2` | `HTTP/1.1` | ❌ Bot mal configuré |
|
||||||
|
| `http/1.1` | `HTTP/1.1` | ✅ Normal |
|
||||||
|
|
||||||
|
### Vue de Détection
|
||||||
|
|
||||||
|
**`view_alpn_mismatch_detected`**
|
||||||
|
```sql
|
||||||
|
-- Clients déclarant h2 mais parlant HTTP/1.1
|
||||||
|
SELECT src_ip, ja4, declared_alpn, actual_http_version, mismatches, mismatch_pct
|
||||||
|
FROM mabase_prod.view_alpn_mismatch_detected
|
||||||
|
HAVING mismatch_pct >= 80;
|
||||||
|
|
||||||
|
-- Threshold : ≥5 requêtes avec ≥80% d'incohérence
|
||||||
|
-- Cause : curl mal configuré, Python requests, bots spoofant ALPN
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 10. Rate Limiting & Burst Detection (sql/hosts.sql)
|
||||||
|
|
||||||
|
### A. Rate Limiting (1 minute)
|
||||||
|
|
||||||
|
**Table :** `agg_rate_limit_1m`
|
||||||
|
|
||||||
|
**Vue :** `view_rate_limit_exceeded`
|
||||||
|
```sql
|
||||||
|
-- IPs dépassant 50 requêtes/minute
|
||||||
|
SELECT minute, src_ip, ja4, requests_per_min, unique_paths
|
||||||
|
FROM mabase_prod.view_rate_limit_exceeded
|
||||||
|
ORDER BY requests_per_min DESC;
|
||||||
|
|
||||||
|
-- Threshold : > 50 req/min = trafic automatisé
|
||||||
|
-- Un humain ne peut pas soutenir 50+ req/min de manière cohérente
|
||||||
|
```
|
||||||
|
|
||||||
|
### B. Burst Detection (10 secondes)
|
||||||
|
|
||||||
|
**Table :** `agg_burst_10s`
|
||||||
|
|
||||||
|
**Vue :** `view_burst_detected`
|
||||||
|
```sql
|
||||||
|
-- Pics soudains de trafic
|
||||||
|
SELECT window, src_ip, ja4, burst_count
|
||||||
|
FROM mabase_prod.view_burst_detected
|
||||||
|
HAVING burst_count > 20;
|
||||||
|
|
||||||
|
-- Threshold : > 20 requêtes en 10 secondes = burst suspect
|
||||||
|
-- Utile pour détecter les attaques par vagues
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 11. Path Enumeration / Scanning (sql/hosts.sql)
|
||||||
|
|
||||||
|
### Vue de Détection
|
||||||
|
|
||||||
|
**`view_path_scan_detected`**
|
||||||
|
```sql
|
||||||
|
-- Détection de scanning de paths sensibles
|
||||||
|
SELECT window, src_ip, ja4, host, sensitive_hits, sensitive_ratio
|
||||||
|
FROM mabase_prod.view_path_scan_detected
|
||||||
|
HAVING sensitive_hits >= 5;
|
||||||
|
|
||||||
|
-- Paths surveillés : admin, backup, config, .env, .git, wp-admin,
|
||||||
|
-- phpinfo, test, debug, log, sql, dump, passwd, shadow, htaccess, etc.
|
||||||
|
|
||||||
|
-- Threshold : ≥5 paths sensibles en 5 minutes = scanning
|
||||||
|
```
|
||||||
|
|
||||||
|
### Exemple de Résultat
|
||||||
|
|
||||||
|
| src_ip | ja4 | host | sensitive_hits | sensitive_ratio |
|
||||||
|
|--------|-----|------|----------------|-----------------|
|
||||||
|
| 1.2.3.4 | t13d... | api.example.com | 47 | 94.00 |
|
||||||
|
| 5.6.7.8 | t13d... | www.example.com | 12 | 80.00 |
|
||||||
|
|
||||||
|
**Interprétation :** Ces IPs testent systématiquement les paths sensibles = outils comme Nikto, Dirb, Gobuster.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 12. Payload Attack Detection (sql/hosts.sql)
|
||||||
|
|
||||||
|
### A. Types d'Attaques Détectées
|
||||||
|
|
||||||
|
| Type | Patterns Détectés |
|
||||||
|
|------|-------------------|
|
||||||
|
| **SQL Injection** | `UNION SELECT`, `OR 1=1`, `DROP TABLE`, `; --`, `/* */`, `WAITFOR DELAY`, `SLEEP()` |
|
||||||
|
| **XSS** | `<script>`, `javascript:`, `onerror=`, `onload=`, `<img src=data:`, `<svg onload>` |
|
||||||
|
| **Path Traversal** | `../`, `..\\`, `%2e%2e%2f`, `%252e%252e`, `%%32%65%%32%65` |
|
||||||
|
|
||||||
|
### Vue de Détection
|
||||||
|
|
||||||
|
**`view_payload_attacks_detected`**
|
||||||
|
```sql
|
||||||
|
-- Toutes les tentatives d'injection
|
||||||
|
SELECT window, src_ip, ja4, host, path,
|
||||||
|
sqli_attempts, xss_attempts, traversal_attempts
|
||||||
|
FROM mabase_prod.view_payload_attacks_detected
|
||||||
|
ORDER BY sqli_attempts DESC, xss_attempts DESC, traversal_attempts DESC;
|
||||||
|
|
||||||
|
-- Threshold : ≥1 tentative = alerte (zero tolerance)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 13. JA4 Botnet Detection (sql/hosts.sql)
|
||||||
|
|
||||||
|
### Principe
|
||||||
|
|
||||||
|
Un vrai navigateur a un fingerprint TLS unique. Un bot déployé sur 100 machines aura le **même JA4**.
|
||||||
|
|
||||||
|
### Vue de Détection
|
||||||
|
|
||||||
|
**`view_ja4_botnet_suspected`**
|
||||||
|
```sql
|
||||||
|
-- JA4 partagé par 20+ IPs différentes
|
||||||
|
SELECT ja4, ja3_hash, unique_ips, unique_asns, unique_countries, total_hits
|
||||||
|
FROM mabase_prod.view_ja4_botnet_suspected
|
||||||
|
HAVING unique_ips >= 20;
|
||||||
|
|
||||||
|
-- Threshold : ≥20 IPs avec le même JA4 = botnet cloné
|
||||||
|
```
|
||||||
|
|
||||||
|
### Exemple de Résultat
|
||||||
|
|
||||||
|
| ja4 | ja3_hash | unique_ips | unique_asns | unique_countries |
|
||||||
|
|-----|----------|------------|-------------|------------------|
|
||||||
|
| t13d1512... | a3b5c7... | 147 | 12 | 8 |
|
||||||
|
| t13d0918... | f1e2d3... | 52 | 3 | 2 |
|
||||||
|
|
||||||
|
**Interprétation :** 147 IPs différentes avec le même fingerprint = cluster de bots clonés.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 14. Correlation Quality (sql/hosts.sql)
|
||||||
|
|
||||||
|
### Principe
|
||||||
|
|
||||||
|
Mesure le ratio d'événements non-corrélés (orphelins). Un trafic légitime a une bonne corrélation HTTP/TCP.
|
||||||
|
|
||||||
|
### Vue de Détection
|
||||||
|
|
||||||
|
**`view_high_orphan_ratio`**
|
||||||
|
```sql
|
||||||
|
-- Trafic avec >80% d'événements non-corrélés
|
||||||
|
SELECT hour, src_ip, ja4, host, correlated, orphans, orphan_pct
|
||||||
|
FROM mabase_prod.view_high_orphan_ratio
|
||||||
|
ORDER BY orphan_pct DESC;
|
||||||
|
|
||||||
|
-- Threshold : orphan_pct > 80% = trafic suspect
|
||||||
|
-- Peut indiquer du trafic généré artificiellement
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 15. Maintenance et Faux Positifs
|
||||||
|
|
||||||
|
### Exceptions Connues
|
||||||
|
|
||||||
|
| Source | Faux Positif | Solution |
|
||||||
|
|--------|--------------|----------|
|
||||||
|
| **Googlebot/Bingbot** | Scan agressif mais légitime | Filtrer par ASN + Reverse DNS |
|
||||||
|
| **Monitoring interne** | Rate limit élevé | Whitelist par IP/ASN |
|
||||||
|
| **CDN/Proxy** | JA4 partagé (clients derrière proxy) | Vérifier ASN (Cloudflare, Akamai) |
|
||||||
|
| **Navigateurs anciens** | Headers modernes manquants | Vérifier UA version |
|
||||||
|
|
||||||
|
### Reset des Scores
|
||||||
|
|
||||||
|
Les agrégats sont automatiquement purgés par TTL :
|
||||||
|
- `agg_*_1h` : TTL 7 jours
|
||||||
|
- `agg_*_5m` : TTL 1 jour
|
||||||
|
- `agg_*_1m` : TTL 1 jour
|
||||||
|
|
||||||
|
Un IP bloquée par erreur retrouvera un score normal après expiration du TTL.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 16. Synthèse des Vues de Détection
|
||||||
|
|
||||||
|
| Vue | Détection | Threshold | Impact |
|
||||||
|
|-----|-----------|-----------|--------|
|
||||||
|
| `view_bruteforce_post_detected` | POST endpoints sensibles | ≥10 en 5min | 🔴 Haut |
|
||||||
|
| `view_form_bruteforce_detected` | Query params variables | ≥20 req, ≥10 patterns | 🔴 Haut |
|
||||||
|
| `view_header_missing_modern_headers` | Headers modernes manquants | score < 70 | 🔴 Haut |
|
||||||
|
| `view_header_ua_order_mismatch` | UA spoofing (ordre) | >1 hash | 🔴 Haut |
|
||||||
|
| `view_header_minimalist_count` | Bot minimaliste | < 6 headers | 🔴 Haut |
|
||||||
|
| `view_header_sec_ch_missing` | Chrome sans Sec-CH | absent | 🟡 Moyen |
|
||||||
|
| `view_header_known_bot_signature` | Signature connue (botnet) | 10+ IPs | 🔴 Haut |
|
||||||
|
| `view_alpn_mismatch_detected` | h2 déclaré, HTTP/1.1 parlé | ≥80% mismatch | 🔴 Haut |
|
||||||
|
| `view_rate_limit_exceeded` | Rate limit dépassé | >50 req/min | 🔴 Haut |
|
||||||
|
| `view_burst_detected` | Burst soudain | >20 req/10s | 🟡 Moyen |
|
||||||
|
| `view_path_scan_detected` | Scanning de paths | ≥5 sensibles | 🔴 Haut |
|
||||||
|
| `view_payload_attacks_detected` | Injections SQLi/XSS | ≥1 tentative | 🔴 Critique |
|
||||||
|
| `view_ja4_botnet_suspected` | JA4 partagé (botnet) | ≥20 IPs | 🔴 Haut |
|
||||||
|
| `view_high_orphan_ratio` | Trafic non-corrélé | >80% orphans | 🟡 Moyen |
|
||||||
|
| `view_host_ja4_anomalies` | JA4 sur plusieurs hosts | ≥3 hosts | 🟡 Moyen |
|
||||||
|
| `view_host_ip_ja4_rotation` | IP rotate JA4 | ≥5 JA4 | 🟡 Moyen |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 17. Exemples de Requêtes d'Investigation
|
||||||
|
|
||||||
|
### Top 10 des IPs les plus suspectes (score cumulé)
|
||||||
|
```sql
|
||||||
|
WITH threats AS (
|
||||||
|
SELECT src_ip, ja4, 'bruteforce' AS type, sum(attempts) AS score
|
||||||
|
FROM mabase_prod.view_bruteforce_post_detected GROUP BY src_ip, ja4
|
||||||
|
UNION ALL
|
||||||
|
SELECT src_ip, ja4, 'path_scan', sum(sensitive_hits)
|
||||||
|
FROM mabase_prod.view_path_scan_detected GROUP BY src_ip, ja4
|
||||||
|
UNION ALL
|
||||||
|
SELECT src_ip, ja4, 'payload', sum(sqli_attempts + xss_attempts)
|
||||||
|
FROM mabase_prod.view_payload_attacks_detected GROUP BY src_ip, ja4
|
||||||
|
)
|
||||||
|
SELECT src_ip, ja4, sum(score) AS total_score, groupArray(type) AS threat_types
|
||||||
|
FROM threats
|
||||||
|
GROUP BY src_ip, ja4
|
||||||
|
ORDER BY total_score DESC
|
||||||
|
LIMIT 10;
|
||||||
|
```
|
||||||
|
|
||||||
|
### Historique d'une IP suspecte
|
||||||
|
```sql
|
||||||
|
SELECT
|
||||||
|
hour,
|
||||||
|
host,
|
||||||
|
countMerge(hits) AS requests,
|
||||||
|
uniqMerge(uniq_paths) AS unique_paths
|
||||||
|
FROM mabase_prod.agg_host_ip_ja4_1h
|
||||||
|
WHERE src_ip = '1.2.3.4'
|
||||||
|
AND hour >= now() - INTERVAL 24 HOUR
|
||||||
|
GROUP BY hour, host
|
||||||
|
ORDER BY hour DESC;
|
||||||
|
```
|
||||||
|
|
||||||
|
### Corrélation JA4 → User-Agent → Hosts
|
||||||
|
```sql
|
||||||
|
SELECT
|
||||||
|
ja4,
|
||||||
|
any(first_ua) AS user_agent,
|
||||||
|
groupArray(DISTINCT host) AS hosts,
|
||||||
|
sum(countMerge(hits)) AS total_requests
|
||||||
|
FROM mabase_prod.agg_host_ip_ja4_1h
|
||||||
|
WHERE hour >= now() - INTERVAL 1 HOUR
|
||||||
|
GROUP BY ja4
|
||||||
|
ORDER BY total_requests DESC
|
||||||
|
LIMIT 20;
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 18. Installation et Maintenance
|
||||||
|
|
||||||
|
### Installation
|
||||||
|
```bash
|
||||||
|
# Exécuter après init.sql
|
||||||
|
clickhouse-client --multiquery < sql/hosts.sql
|
||||||
|
```
|
||||||
|
|
||||||
|
### Vérification
|
||||||
|
```sql
|
||||||
|
-- Compter les enregistrements
|
||||||
|
SELECT count(*) FROM mabase_prod.agg_host_ip_ja4_1h;
|
||||||
|
SELECT count(*) FROM mabase_prod.agg_header_fingerprint_1h;
|
||||||
|
|
||||||
|
-- Tester les vues
|
||||||
|
SELECT * FROM mabase_prod.view_host_identification LIMIT 10;
|
||||||
|
SELECT * FROM mabase_prod.view_bruteforce_post_detected LIMIT 10;
|
||||||
|
SELECT * FROM mabase_prod.view_payload_attacks_detected LIMIT 10;
|
||||||
|
```
|
||||||
|
|
||||||
|
### Monitoring
|
||||||
|
```sql
|
||||||
|
-- Vues les plus actives (dernière heure)
|
||||||
|
SELECT
|
||||||
|
'bruteforce_post' AS view_name, count() AS alerts
|
||||||
|
FROM mabase_prod.view_bruteforce_post_detected
|
||||||
|
UNION ALL
|
||||||
|
SELECT 'path_scan', count() FROM mabase_prod.view_path_scan_detected
|
||||||
|
UNION ALL
|
||||||
|
SELECT 'payload_attacks', count() FROM mabase_prod.view_payload_attacks_detected
|
||||||
|
UNION ALL
|
||||||
|
SELECT 'ja4_botnet', count() FROM mabase_prod.view_ja4_botnet_suspected
|
||||||
|
ORDER BY alerts DESC;
|
||||||
|
```
|
||||||
@ -13,8 +13,6 @@ import (
|
|||||||
const (
|
const (
|
||||||
// DefaultEventChannelBufferSize is the default size for event channels
|
// DefaultEventChannelBufferSize is the default size for event channels
|
||||||
DefaultEventChannelBufferSize = 1000
|
DefaultEventChannelBufferSize = 1000
|
||||||
// ShutdownTimeout is the maximum time to wait for graceful shutdown
|
|
||||||
ShutdownTimeout = 30 * time.Second
|
|
||||||
// OrphanTickInterval is how often the orchestrator drains pending orphans.
|
// OrphanTickInterval is how often the orchestrator drains pending orphans.
|
||||||
// Set to half the default emit delay (500ms/2) so orphans are emitted promptly
|
// Set to half the default emit delay (500ms/2) so orphans are emitted promptly
|
||||||
// even when no new events arrive.
|
// even when no new events arrive.
|
||||||
@ -143,46 +141,17 @@ func (o *Orchestrator) processEvents(eventChan <-chan *domain.NormalizedEvent) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Stop gracefully stops the orchestrator.
|
// Stop gracefully stops the orchestrator.
|
||||||
// It stops all sources first, then flushes remaining events, then closes sinks.
|
// It stops all sources and closes sinks immediately without waiting for queue drainage.
|
||||||
|
// systemd TimeoutStopSec handles forced termination if needed.
|
||||||
func (o *Orchestrator) Stop() error {
|
func (o *Orchestrator) Stop() error {
|
||||||
if !o.running.CompareAndSwap(true, false) {
|
if !o.running.CompareAndSwap(true, false) {
|
||||||
return nil // Not running
|
return nil // Not running
|
||||||
}
|
}
|
||||||
|
|
||||||
// Create shutdown context with timeout
|
// Cancel context to stop accepting new events immediately
|
||||||
shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), ShutdownTimeout)
|
|
||||||
defer shutdownCancel()
|
|
||||||
|
|
||||||
// First, cancel the main context to stop accepting new events
|
|
||||||
o.cancel()
|
o.cancel()
|
||||||
|
|
||||||
// Wait for source goroutines to finish
|
// Close sink (flush skipped - in-flight events are dropped)
|
||||||
// Use a separate goroutine with timeout to prevent deadlock
|
|
||||||
done := make(chan struct{})
|
|
||||||
go func() {
|
|
||||||
o.wg.Wait()
|
|
||||||
close(done)
|
|
||||||
}()
|
|
||||||
|
|
||||||
select {
|
|
||||||
case <-done:
|
|
||||||
// Sources stopped cleanly
|
|
||||||
case <-shutdownCtx.Done():
|
|
||||||
// Timeout waiting for sources
|
|
||||||
}
|
|
||||||
|
|
||||||
// Flush remaining events from correlation service
|
|
||||||
flushedLogs := o.correlationSvc.Flush()
|
|
||||||
for _, log := range flushedLogs {
|
|
||||||
if err := o.config.Sink.Write(shutdownCtx, log); err != nil {
|
|
||||||
// Log error but continue
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Flush and close sink with timeout
|
|
||||||
if err := o.config.Sink.Flush(shutdownCtx); err != nil {
|
|
||||||
// Log error
|
|
||||||
}
|
|
||||||
if err := o.config.Sink.Close(); err != nil {
|
if err := o.config.Sink.Close(); err != nil {
|
||||||
// Log error
|
// Log error
|
||||||
}
|
}
|
||||||
|
|||||||
@ -69,6 +69,7 @@ type OutputsConfig struct {
|
|||||||
|
|
||||||
// FileOutputConfig holds file sink configuration.
|
// FileOutputConfig holds file sink configuration.
|
||||||
type FileOutputConfig struct {
|
type FileOutputConfig struct {
|
||||||
|
Enabled bool `yaml:"enabled"`
|
||||||
Path string `yaml:"path"`
|
Path string `yaml:"path"`
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -182,6 +183,7 @@ func defaultConfig() *Config {
|
|||||||
},
|
},
|
||||||
Outputs: OutputsConfig{
|
Outputs: OutputsConfig{
|
||||||
File: FileOutputConfig{
|
File: FileOutputConfig{
|
||||||
|
Enabled: true,
|
||||||
Path: "/var/log/logcorrelator/correlated.log",
|
Path: "/var/log/logcorrelator/correlated.log",
|
||||||
},
|
},
|
||||||
ClickHouse: ClickHouseOutputConfig{
|
ClickHouse: ClickHouseOutputConfig{
|
||||||
@ -232,7 +234,7 @@ func (c *Config) Validate() error {
|
|||||||
|
|
||||||
// At least one output must be enabled
|
// At least one output must be enabled
|
||||||
hasOutput := false
|
hasOutput := false
|
||||||
if c.Outputs.File.Path != "" {
|
if c.Outputs.File.Enabled && c.Outputs.File.Path != "" {
|
||||||
hasOutput = true
|
hasOutput = true
|
||||||
}
|
}
|
||||||
if c.Outputs.ClickHouse.Enabled {
|
if c.Outputs.ClickHouse.Enabled {
|
||||||
|
|||||||
@ -47,6 +47,9 @@ correlation:
|
|||||||
if cfg.Outputs.File.Path != "/var/log/logcorrelator/correlated.log" {
|
if cfg.Outputs.File.Path != "/var/log/logcorrelator/correlated.log" {
|
||||||
t.Errorf("expected file path /var/log/logcorrelator/correlated.log, got %s", cfg.Outputs.File.Path)
|
t.Errorf("expected file path /var/log/logcorrelator/correlated.log, got %s", cfg.Outputs.File.Path)
|
||||||
}
|
}
|
||||||
|
if !cfg.Outputs.File.Enabled {
|
||||||
|
t.Error("expected file output to be enabled by default when path is set")
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
func TestLoad_InvalidPath(t *testing.T) {
|
func TestLoad_InvalidPath(t *testing.T) {
|
||||||
@ -110,7 +113,7 @@ func TestValidate_MinimumInputs(t *testing.T) {
|
|||||||
},
|
},
|
||||||
},
|
},
|
||||||
Outputs: OutputsConfig{
|
Outputs: OutputsConfig{
|
||||||
File: FileOutputConfig{Path: "/var/log/test.log"},
|
File: FileOutputConfig{Enabled: true, Path: "/var/log/test.log"},
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -129,7 +132,7 @@ func TestValidate_AtLeastOneOutput(t *testing.T) {
|
|||||||
},
|
},
|
||||||
},
|
},
|
||||||
Outputs: OutputsConfig{
|
Outputs: OutputsConfig{
|
||||||
File: FileOutputConfig{},
|
File: FileOutputConfig{Enabled: false},
|
||||||
ClickHouse: ClickHouseOutputConfig{Enabled: false},
|
ClickHouse: ClickHouseOutputConfig{Enabled: false},
|
||||||
Stdout: StdoutOutputConfig{Enabled: false},
|
Stdout: StdoutOutputConfig{Enabled: false},
|
||||||
},
|
},
|
||||||
@ -189,7 +192,7 @@ func TestValidate_DuplicateNames(t *testing.T) {
|
|||||||
},
|
},
|
||||||
},
|
},
|
||||||
Outputs: OutputsConfig{
|
Outputs: OutputsConfig{
|
||||||
File: FileOutputConfig{Path: "/var/log/test.log"},
|
File: FileOutputConfig{Enabled: true, Path: "/var/log/test.log"},
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -208,7 +211,7 @@ func TestValidate_DuplicatePaths(t *testing.T) {
|
|||||||
},
|
},
|
||||||
},
|
},
|
||||||
Outputs: OutputsConfig{
|
Outputs: OutputsConfig{
|
||||||
File: FileOutputConfig{Path: "/var/log/test.log"},
|
File: FileOutputConfig{Enabled: true, Path: "/var/log/test.log"},
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -227,7 +230,7 @@ func TestValidate_EmptyName(t *testing.T) {
|
|||||||
},
|
},
|
||||||
},
|
},
|
||||||
Outputs: OutputsConfig{
|
Outputs: OutputsConfig{
|
||||||
File: FileOutputConfig{Path: "/var/log/test.log"},
|
File: FileOutputConfig{Enabled: true, Path: "/var/log/test.log"},
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -246,7 +249,7 @@ func TestValidate_EmptyPath(t *testing.T) {
|
|||||||
},
|
},
|
||||||
},
|
},
|
||||||
Outputs: OutputsConfig{
|
Outputs: OutputsConfig{
|
||||||
File: FileOutputConfig{Path: "/var/log/test.log"},
|
File: FileOutputConfig{Enabled: true, Path: "/var/log/test.log"},
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -265,7 +268,7 @@ func TestValidate_EmptyFilePath(t *testing.T) {
|
|||||||
},
|
},
|
||||||
},
|
},
|
||||||
Outputs: OutputsConfig{
|
Outputs: OutputsConfig{
|
||||||
File: FileOutputConfig{Path: ""},
|
File: FileOutputConfig{Enabled: true, Path: ""},
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -284,7 +287,7 @@ func TestValidate_ClickHouseMissingDSN(t *testing.T) {
|
|||||||
},
|
},
|
||||||
},
|
},
|
||||||
Outputs: OutputsConfig{
|
Outputs: OutputsConfig{
|
||||||
File: FileOutputConfig{Path: ""},
|
File: FileOutputConfig{Enabled: true, Path: ""},
|
||||||
ClickHouse: ClickHouseOutputConfig{
|
ClickHouse: ClickHouseOutputConfig{
|
||||||
Enabled: true,
|
Enabled: true,
|
||||||
DSN: "",
|
DSN: "",
|
||||||
@ -308,7 +311,7 @@ func TestValidate_ClickHouseMissingTable(t *testing.T) {
|
|||||||
},
|
},
|
||||||
},
|
},
|
||||||
Outputs: OutputsConfig{
|
Outputs: OutputsConfig{
|
||||||
File: FileOutputConfig{Path: ""},
|
File: FileOutputConfig{Enabled: true, Path: ""},
|
||||||
ClickHouse: ClickHouseOutputConfig{
|
ClickHouse: ClickHouseOutputConfig{
|
||||||
Enabled: true,
|
Enabled: true,
|
||||||
DSN: "clickhouse://localhost:9000/db",
|
DSN: "clickhouse://localhost:9000/db",
|
||||||
@ -332,7 +335,7 @@ func TestValidate_ClickHouseInvalidBatchSize(t *testing.T) {
|
|||||||
},
|
},
|
||||||
},
|
},
|
||||||
Outputs: OutputsConfig{
|
Outputs: OutputsConfig{
|
||||||
File: FileOutputConfig{Path: ""},
|
File: FileOutputConfig{Enabled: true, Path: ""},
|
||||||
ClickHouse: ClickHouseOutputConfig{
|
ClickHouse: ClickHouseOutputConfig{
|
||||||
Enabled: true,
|
Enabled: true,
|
||||||
DSN: "clickhouse://localhost:9000/db",
|
DSN: "clickhouse://localhost:9000/db",
|
||||||
@ -357,7 +360,7 @@ func TestValidate_ClickHouseInvalidMaxBufferSize(t *testing.T) {
|
|||||||
},
|
},
|
||||||
},
|
},
|
||||||
Outputs: OutputsConfig{
|
Outputs: OutputsConfig{
|
||||||
File: FileOutputConfig{Path: ""},
|
File: FileOutputConfig{Enabled: true, Path: ""},
|
||||||
ClickHouse: ClickHouseOutputConfig{
|
ClickHouse: ClickHouseOutputConfig{
|
||||||
Enabled: true,
|
Enabled: true,
|
||||||
DSN: "clickhouse://localhost:9000/db",
|
DSN: "clickhouse://localhost:9000/db",
|
||||||
@ -383,7 +386,7 @@ func TestValidate_ClickHouseInvalidTimeout(t *testing.T) {
|
|||||||
},
|
},
|
||||||
},
|
},
|
||||||
Outputs: OutputsConfig{
|
Outputs: OutputsConfig{
|
||||||
File: FileOutputConfig{Path: ""},
|
File: FileOutputConfig{Enabled: true, Path: ""},
|
||||||
ClickHouse: ClickHouseOutputConfig{
|
ClickHouse: ClickHouseOutputConfig{
|
||||||
Enabled: true,
|
Enabled: true,
|
||||||
DSN: "clickhouse://localhost:9000/db",
|
DSN: "clickhouse://localhost:9000/db",
|
||||||
@ -409,7 +412,7 @@ func TestValidate_EmptyCorrelationKey(t *testing.T) {
|
|||||||
},
|
},
|
||||||
},
|
},
|
||||||
Outputs: OutputsConfig{
|
Outputs: OutputsConfig{
|
||||||
File: FileOutputConfig{Path: "/var/log/test.log"},
|
File: FileOutputConfig{Enabled: true, Path: "/var/log/test.log"},
|
||||||
},
|
},
|
||||||
Correlation: CorrelationConfig{
|
Correlation: CorrelationConfig{
|
||||||
TimeWindowS: 0,
|
TimeWindowS: 0,
|
||||||
@ -938,3 +941,70 @@ correlation:
|
|||||||
t.Error("expected error for ClickHouse enabled without table")
|
t.Error("expected error for ClickHouse enabled without table")
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func TestValidate_FileOutputDisabled(t *testing.T) {
|
||||||
|
cfg := &Config{
|
||||||
|
Inputs: InputsConfig{
|
||||||
|
UnixSockets: []UnixSocketConfig{
|
||||||
|
{Name: "a", Path: "/tmp/a.sock"},
|
||||||
|
{Name: "b", Path: "/tmp/b.sock"},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
Outputs: OutputsConfig{
|
||||||
|
File: FileOutputConfig{Enabled: false, Path: "/var/log/test.log"},
|
||||||
|
ClickHouse: ClickHouseOutputConfig{Enabled: false},
|
||||||
|
Stdout: StdoutOutputConfig{Enabled: true},
|
||||||
|
},
|
||||||
|
Correlation: CorrelationConfig{
|
||||||
|
TimeWindowS: 1,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
err := cfg.Validate()
|
||||||
|
if err != nil {
|
||||||
|
t.Errorf("expected no error when file is disabled but stdout is enabled, got: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestLoadConfig_FileOutputDisabled(t *testing.T) {
|
||||||
|
content := `
|
||||||
|
inputs:
|
||||||
|
unix_sockets:
|
||||||
|
- name: http
|
||||||
|
path: /var/run/logcorrelator/http.socket
|
||||||
|
- name: network
|
||||||
|
path: /var/run/logcorrelator/network.socket
|
||||||
|
|
||||||
|
outputs:
|
||||||
|
file:
|
||||||
|
enabled: false
|
||||||
|
path: /var/log/logcorrelator/correlated.log
|
||||||
|
stdout:
|
||||||
|
enabled: true
|
||||||
|
|
||||||
|
correlation:
|
||||||
|
time_window_s: 1
|
||||||
|
emit_orphans: true
|
||||||
|
`
|
||||||
|
|
||||||
|
tmpDir := t.TempDir()
|
||||||
|
configPath := filepath.Join(tmpDir, "config.yml")
|
||||||
|
if err := os.WriteFile(configPath, []byte(content), 0644); err != nil {
|
||||||
|
t.Fatalf("failed to write config: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
cfg, err := Load(configPath)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("unexpected error: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if cfg.Outputs.File.Enabled {
|
||||||
|
t.Error("expected file output to be disabled")
|
||||||
|
}
|
||||||
|
if cfg.Outputs.File.Path != "/var/log/logcorrelator/correlated.log" {
|
||||||
|
t.Errorf("expected file path to be preserved, got %s", cfg.Outputs.File.Path)
|
||||||
|
}
|
||||||
|
if !cfg.Outputs.Stdout.Enabled {
|
||||||
|
t.Error("expected stdout output to be enabled")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|||||||
@ -1,5 +0,0 @@
|
|||||||
# systemd-tmpfiles config for logcorrelator
|
|
||||||
# Recrée /run/logcorrelator avec le bon propriétaire à chaque démarrage,
|
|
||||||
# même si /var/run est un tmpfs vidé au reboot.
|
|
||||||
# Format: type path mode user group age
|
|
||||||
d /run/logcorrelator 0755 logcorrelator logcorrelator -
|
|
||||||
@ -61,9 +61,6 @@ install -m 0644 %{_builddir}/etc/systemd/system/logcorrelator.service %{buildroo
|
|||||||
# Install logrotate config
|
# Install logrotate config
|
||||||
install -m 0644 %{_builddir}/etc/logrotate.d/logcorrelator %{buildroot}/etc/logrotate.d/logcorrelator
|
install -m 0644 %{_builddir}/etc/logrotate.d/logcorrelator %{buildroot}/etc/logrotate.d/logcorrelator
|
||||||
|
|
||||||
# Install tmpfiles.d config (recrée /run/logcorrelator au boot avec le bon propriétaire)
|
|
||||||
install -m 0644 %{_sourcedir}/logcorrelator-tmpfiles.conf %{buildroot}/usr/lib/tmpfiles.d/logcorrelator.conf
|
|
||||||
|
|
||||||
%post
|
%post
|
||||||
# Create logcorrelator user and group
|
# Create logcorrelator user and group
|
||||||
if ! getent group logcorrelator >/dev/null 2>&1; then
|
if ! getent group logcorrelator >/dev/null 2>&1; then
|
||||||
@ -101,11 +98,9 @@ if [ ! -f /etc/logcorrelator/logcorrelator.yml ]; then
|
|||||||
chmod 640 /etc/logcorrelator/logcorrelator.yml
|
chmod 640 /etc/logcorrelator/logcorrelator.yml
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Reload systemd and apply tmpfiles
|
# Reload systemd and start service
|
||||||
if [ -x /bin/systemctl ]; then
|
if [ -x /bin/systemctl ]; then
|
||||||
systemctl daemon-reload
|
systemctl daemon-reload
|
||||||
# Crée /run/logcorrelator immédiatement avec le bon propriétaire
|
|
||||||
systemd-tmpfiles --create /usr/lib/tmpfiles.d/logcorrelator.conf 2>/dev/null || true
|
|
||||||
systemctl enable logcorrelator.service
|
systemctl enable logcorrelator.service
|
||||||
systemctl start logcorrelator.service
|
systemctl start logcorrelator.service
|
||||||
fi
|
fi
|
||||||
@ -141,10 +136,55 @@ exit 0
|
|||||||
/var/log/logcorrelator
|
/var/log/logcorrelator
|
||||||
/var/lib/logcorrelator
|
/var/lib/logcorrelator
|
||||||
/etc/systemd/system/logcorrelator.service
|
/etc/systemd/system/logcorrelator.service
|
||||||
/usr/lib/tmpfiles.d/logcorrelator.conf
|
|
||||||
%config(noreplace) /etc/logrotate.d/logcorrelator
|
%config(noreplace) /etc/logrotate.d/logcorrelator
|
||||||
|
|
||||||
%changelog
|
%changelog
|
||||||
|
* Wed Mar 11 2026 logcorrelator <dev@example.com> - 1.1.22-1
|
||||||
|
- Feat(outputs): file output enabled/disabled toggle
|
||||||
|
Ajout du champ enabled: true/false dans outputs.file de la configuration.
|
||||||
|
Le sink fichier n'est cree que si enabled: true ET path: defini.
|
||||||
|
Permet de desactiver completement la sortie fichier tout en gardant stdout/clickhouse.
|
||||||
|
Tests: TestValidate_FileOutputDisabled, TestLoadConfig_FileOutputDisabled
|
||||||
|
|
||||||
|
- Fix(systemd): arret immediat sans vidage de queue
|
||||||
|
orchestrator.Stop() ne vide plus les buffers (events en transit perdus).
|
||||||
|
Suppression de ShutdownTimeout et de la logique de flush/attente.
|
||||||
|
systemd TimeoutStopSec=30 gere l'arret force si besoin.
|
||||||
|
Simplification: cancel() + Close() uniquement.
|
||||||
|
|
||||||
|
- Feat(sql): TTL et compression ZSTD sur tables ClickHouse
|
||||||
|
http_logs_raw: TTL 1 jour, compression ZSTD sur raw_json
|
||||||
|
http_logs: TTL 7 jours, compression ZSTD sur champs texte volumineux
|
||||||
|
Parametre ttl_only_drop_parts = 1 pour optimiser les suppressions
|
||||||
|
|
||||||
|
* Mon Mar 09 2026 logcorrelator <dev@example.com> - 1.1.21-1
|
||||||
|
- Update: vues ClickHouse et schema SQL
|
||||||
|
Ajout de bots.sql pour l'identification des bots (User-Agent parsing)
|
||||||
|
Ajout de tables.sql pour les tables de reference
|
||||||
|
Mise a jour de mv1.sql (vue materialisee) avec nouvelle structure de correlation
|
||||||
|
Documentation views.md enrichie avec exemples de requetes et schema complet
|
||||||
|
|
||||||
|
* Mon Mar 09 2026 logcorrelator <dev@example.com> - 1.1.20-1
|
||||||
|
- Fix(rpm): suppression de systemd-tmpfiles.conf redondant
|
||||||
|
RuntimeDirectory=logcorrelator dans le service systemd gere deja /run/logcorrelator
|
||||||
|
automatiquement. La commande systemd-tmpfiles --create causait des erreurs sur
|
||||||
|
les systemes avec /var/lib/mysql existant (fichier au lieu de repertoire).
|
||||||
|
Suppression de /usr/lib/tmpfiles.d/logcorrelator.conf et de systemd-tmpfiles --create.
|
||||||
|
|
||||||
|
* Mon Mar 09 2026 logcorrelator <dev@example.com> - 1.1.19-1
|
||||||
|
- Fix(systemd): stop/restart immediat sans attendre vidage queue
|
||||||
|
L'arret du service ne vide plus les buffers (events en transit perdus).
|
||||||
|
systemd TimeoutStopSec=30 gere deja l'arret force si besoin.
|
||||||
|
Simplification de orchestrator.Stop() : cancel() + Close() uniquement.
|
||||||
|
Suppression de ShutdownTimeout devenu inutile.
|
||||||
|
|
||||||
|
* Mon Mar 09 2026 logcorrelator <dev@example.com> - 1.1.18-1
|
||||||
|
- Fix(outputs): file output enabled: false ne coupait pas l ecriture du fichier
|
||||||
|
Le champ Enabled manquait dans FileOutputConfig. Le sink fichier etait cree
|
||||||
|
meme avec enabled: false tant que path etait defini. Desormais, la condition
|
||||||
|
verifie explicitement enabled && path != "" dans main.go et Validate().
|
||||||
|
Test: TestValidate_FileOutputDisabled et TestLoadConfig_FileOutputDisabled ajoutes.
|
||||||
|
|
||||||
* Fri Mar 06 2026 logcorrelator <dev@example.com> - 1.1.17-1
|
* Fri Mar 06 2026 logcorrelator <dev@example.com> - 1.1.17-1
|
||||||
- Fix(correlation): champ keepalives non peuple dans ClickHouse
|
- Fix(correlation): champ keepalives non peuple dans ClickHouse
|
||||||
Le champ KeepAliveSeq de NormalizedEvent n'etait pas transfere dans les Fields
|
Le champ KeepAliveSeq de NormalizedEvent n'etait pas transfere dans les Fields
|
||||||
|
|||||||
57
sql/init.sql
57
sql/init.sql
@ -19,19 +19,22 @@ CREATE DATABASE IF NOT EXISTS mabase_prod;
|
|||||||
-- -----------------------------------------------------------------------------
|
-- -----------------------------------------------------------------------------
|
||||||
CREATE TABLE IF NOT EXISTS mabase_prod.http_logs_raw
|
CREATE TABLE IF NOT EXISTS mabase_prod.http_logs_raw
|
||||||
(
|
(
|
||||||
`raw_json` String,
|
`raw_json` String CODEC(ZSTD(3)),
|
||||||
`ingest_time` DateTime DEFAULT now()
|
`ingest_time` DateTime DEFAULT now()
|
||||||
)
|
)
|
||||||
ENGINE = MergeTree
|
ENGINE = MergeTree
|
||||||
PARTITION BY toDate(ingest_time)
|
PARTITION BY toDate(ingest_time)
|
||||||
ORDER BY ingest_time
|
ORDER BY ingest_time
|
||||||
SETTINGS index_granularity = 8192;
|
TTL ingest_time + INTERVAL 1 DAY
|
||||||
|
SETTINGS
|
||||||
|
index_granularity = 8192,
|
||||||
|
ttl_only_drop_parts = 1;
|
||||||
|
|
||||||
-- -----------------------------------------------------------------------------
|
-- -----------------------------------------------------------------------------
|
||||||
-- Table parsée : alimentée automatiquement par la vue matérialisée
|
-- Table parsée : alimentée automatiquement par la vue matérialisée
|
||||||
-- -----------------------------------------------------------------------------
|
-- -----------------------------------------------------------------------------
|
||||||
|
|
||||||
CREATE TABLE IF NOT EXISTS mabase_prod.http_logs
|
CREATE TABLE mabase_prod.http_logs
|
||||||
(
|
(
|
||||||
-- Temporel
|
-- Temporel
|
||||||
`time` DateTime,
|
`time` DateTime,
|
||||||
@ -54,8 +57,8 @@ CREATE TABLE IF NOT EXISTS mabase_prod.http_logs
|
|||||||
`method` LowCardinality(String),
|
`method` LowCardinality(String),
|
||||||
`scheme` LowCardinality(String),
|
`scheme` LowCardinality(String),
|
||||||
`host` LowCardinality(String),
|
`host` LowCardinality(String),
|
||||||
`path` String,
|
`path` String CODEC(ZSTD(3)),
|
||||||
`query` String,
|
`query` String CODEC(ZSTD(3)),
|
||||||
`http_version` LowCardinality(String),
|
`http_version` LowCardinality(String),
|
||||||
|
|
||||||
-- Corrélation
|
-- Corrélation
|
||||||
@ -64,7 +67,7 @@ CREATE TABLE IF NOT EXISTS mabase_prod.http_logs
|
|||||||
`keepalives` UInt16,
|
`keepalives` UInt16,
|
||||||
`a_timestamp` UInt64,
|
`a_timestamp` UInt64,
|
||||||
`b_timestamp` UInt64,
|
`b_timestamp` UInt64,
|
||||||
`conn_id` String,
|
`conn_id` String CODEC(ZSTD(3)),
|
||||||
|
|
||||||
-- Métadonnées IP
|
-- Métadonnées IP
|
||||||
`ip_meta_df` UInt8,
|
`ip_meta_df` UInt8,
|
||||||
@ -83,32 +86,34 @@ CREATE TABLE IF NOT EXISTS mabase_prod.http_logs
|
|||||||
`tls_version` LowCardinality(String),
|
`tls_version` LowCardinality(String),
|
||||||
`tls_sni` LowCardinality(String),
|
`tls_sni` LowCardinality(String),
|
||||||
`tls_alpn` LowCardinality(String),
|
`tls_alpn` LowCardinality(String),
|
||||||
`ja3` String,
|
`ja3` String CODEC(ZSTD(3)),
|
||||||
`ja3_hash` String,
|
`ja3_hash` String CODEC(ZSTD(3)),
|
||||||
`ja4` String,
|
`ja4` String CODEC(ZSTD(3)),
|
||||||
|
|
||||||
-- En-têtes HTTP
|
-- En-têtes HTTP
|
||||||
`client_headers` String,
|
`client_headers` String CODEC(ZSTD(3)),
|
||||||
`header_user_agent` String,
|
`header_user_agent` String CODEC(ZSTD(3)),
|
||||||
`header_accept` String,
|
`header_accept` String CODEC(ZSTD(3)),
|
||||||
`header_accept_encoding` String,
|
`header_accept_encoding` String CODEC(ZSTD(3)),
|
||||||
`header_accept_language` String,
|
`header_accept_language` String CODEC(ZSTD(3)),
|
||||||
`header_content_type` String,
|
`header_content_type` String CODEC(ZSTD(3)),
|
||||||
`header_x_request_id` String,
|
`header_x_request_id` String CODEC(ZSTD(3)),
|
||||||
`header_x_trace_id` String,
|
`header_x_trace_id` String CODEC(ZSTD(3)),
|
||||||
`header_x_forwarded_for` String,
|
`header_x_forwarded_for` String CODEC(ZSTD(3)),
|
||||||
`header_sec_ch_ua` String,
|
`header_sec_ch_ua` String CODEC(ZSTD(3)),
|
||||||
`header_sec_ch_ua_mobile` String,
|
`header_sec_ch_ua_mobile` String CODEC(ZSTD(3)),
|
||||||
`header_sec_ch_ua_platform` String,
|
`header_sec_ch_ua_platform` String CODEC(ZSTD(3)),
|
||||||
`header_sec_fetch_dest` String,
|
`header_sec_fetch_dest` String CODEC(ZSTD(3)),
|
||||||
`header_sec_fetch_mode` String,
|
`header_sec_fetch_mode` String CODEC(ZSTD(3)),
|
||||||
`header_sec_fetch_site` String
|
`header_sec_fetch_site` String CODEC(ZSTD(3))
|
||||||
)
|
)
|
||||||
ENGINE = MergeTree
|
ENGINE = MergeTree
|
||||||
PARTITION BY log_date
|
PARTITION BY log_date
|
||||||
ORDER BY (time, src_ip, dst_ip, ja4)
|
ORDER BY (time, src_ip, dst_ip, ja4)
|
||||||
SETTINGS index_granularity = 8192;
|
TTL log_date + INTERVAL 7 DAY
|
||||||
|
SETTINGS
|
||||||
|
index_granularity = 8192,
|
||||||
|
ttl_only_drop_parts = 1;
|
||||||
|
|
||||||
-- -----------------------------------------------------------------------------
|
-- -----------------------------------------------------------------------------
|
||||||
-- Vue matérialisée : parse le JSON de http_logs_raw vers http_logs
|
-- Vue matérialisée : parse le JSON de http_logs_raw vers http_logs
|
||||||
|
|||||||
154
sql/mv1.sql
154
sql/mv1.sql
@ -1,154 +0,0 @@
|
|||||||
-- ============================================================================
|
|
||||||
-- PROJET : Moteur de Détection de Menaces HTTP (Full Spectrum)
|
|
||||||
-- DESCRIPTION : Configuration complète des tables d'agrégation et du scoring.
|
|
||||||
-- COUVRE : Spoofing UA/TLS, TCP Fingerprinting, Anomalies comportementales.
|
|
||||||
-- DATE : 2026-03-08
|
|
||||||
-- ============================================================================
|
|
||||||
|
|
||||||
-- ----------------------------------------------------------------------------
|
|
||||||
-- 1. NETTOYAGE (Ordre inverse des dépendances)
|
|
||||||
-- ----------------------------------------------------------------------------
|
|
||||||
DROP VIEW IF EXISTS mabase_prod.live_threat_scores;
|
|
||||||
DROP VIEW IF EXISTS mabase_prod.mv_baseline_update;
|
|
||||||
DROP VIEW IF EXISTS mabase_prod.mv_novelty;
|
|
||||||
DROP VIEW IF EXISTS mabase_prod.mv_traffic_1d;
|
|
||||||
DROP VIEW IF EXISTS mabase_prod.mv_traffic_1h;
|
|
||||||
DROP VIEW IF EXISTS mabase_prod.mv_traffic_1m;
|
|
||||||
|
|
||||||
DROP TABLE IF EXISTS mabase_prod.agg_traffic_1d;
|
|
||||||
DROP TABLE IF EXISTS mabase_prod.agg_traffic_1h;
|
|
||||||
DROP TABLE IF EXISTS mabase_prod.agg_traffic_1m;
|
|
||||||
|
|
||||||
-- ----------------------------------------------------------------------------
|
|
||||||
-- 2. TABLES DE DESTINATION (STORAGE)
|
|
||||||
-- ----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
CREATE TABLE mabase_prod.agg_traffic_1m (
|
|
||||||
minute DateTime,
|
|
||||||
host LowCardinality(String),
|
|
||||||
src_ip IPv4,
|
|
||||||
src_asn UInt32,
|
|
||||||
src_country_code LowCardinality(String),
|
|
||||||
ja4 String,
|
|
||||||
ja3_hash String,
|
|
||||||
header_user_agent String,
|
|
||||||
|
|
||||||
-- Métriques de Base
|
|
||||||
hits AggregateFunction(count, UInt64),
|
|
||||||
uniq_paths AggregateFunction(uniq, String),
|
|
||||||
|
|
||||||
-- Couche 4 : TCP & Handshake
|
|
||||||
avg_syn_to_clienthello_ms AggregateFunction(avg, Int32),
|
|
||||||
var_syn_to_clienthello_ms AggregateFunction(varPop, Int32),
|
|
||||||
tcp_fingerprint AggregateFunction(uniq, UInt64), -- MSS + Window + Scale
|
|
||||||
|
|
||||||
-- Couche 7 : HTTP Fingerprinting
|
|
||||||
avg_headers_count AggregateFunction(avg, Float64),
|
|
||||||
host_sni_mismatch AggregateFunction(countIf, UInt8),
|
|
||||||
|
|
||||||
-- Détection Spoofing & Incohérences
|
|
||||||
spoofing_ua_tls AggregateFunction(countIf, UInt8),
|
|
||||||
spoofing_ua_alpn AggregateFunction(countIf, UInt8),
|
|
||||||
spoofing_os_ttl AggregateFunction(countIf, UInt8),
|
|
||||||
missing_human_headers AggregateFunction(countIf, UInt8),
|
|
||||||
|
|
||||||
-- Comportement & Payloads
|
|
||||||
sensitive_path_hits AggregateFunction(countIf, UInt8),
|
|
||||||
suspicious_methods AggregateFunction(countIf, UInt8),
|
|
||||||
suspicious_queries AggregateFunction(countIf, UInt8)
|
|
||||||
) ENGINE = AggregatingMergeTree()
|
|
||||||
PARTITION BY toYYYYMM(minute)
|
|
||||||
ORDER BY (host, ja4, src_ip, minute);
|
|
||||||
|
|
||||||
-- Tables 1h et 1d (Simplifiées pour le stockage long terme)
|
|
||||||
CREATE TABLE mabase_prod.agg_traffic_1h (
|
|
||||||
hour DateTime,
|
|
||||||
host LowCardinality(String),
|
|
||||||
src_country_code LowCardinality(String),
|
|
||||||
ja4 String,
|
|
||||||
hits AggregateFunction(count, UInt64),
|
|
||||||
uniq_ips AggregateFunction(uniq, IPv4)
|
|
||||||
) ENGINE = AggregatingMergeTree() ORDER BY (host, ja4, hour);
|
|
||||||
|
|
||||||
CREATE TABLE mabase_prod.agg_traffic_1d (
|
|
||||||
day Date,
|
|
||||||
host LowCardinality(String),
|
|
||||||
ja4 String,
|
|
||||||
hits AggregateFunction(count, UInt64),
|
|
||||||
uniq_ips AggregateFunction(uniq, IPv4)
|
|
||||||
) ENGINE = AggregatingMergeTree() ORDER BY (host, ja4, day);
|
|
||||||
|
|
||||||
-- ----------------------------------------------------------------------------
|
|
||||||
-- 3. VUES MATÉRIALISÉES (MOTEUR DE CALCUL)
|
|
||||||
-- ----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
CREATE MATERIALIZED VIEW mabase_prod.mv_traffic_1m TO mabase_prod.agg_traffic_1m
|
|
||||||
AS SELECT
|
|
||||||
toStartOfMinute(time) AS minute,
|
|
||||||
host, src_ip, src_asn, src_country_code, ja4, ja3_hash, header_user_agent,
|
|
||||||
countState() AS hits,
|
|
||||||
uniqState(path) AS uniq_paths,
|
|
||||||
avgState(syn_to_clienthello_ms) AS avg_syn_to_clienthello_ms,
|
|
||||||
varPopState(syn_to_clienthello_ms) AS var_syn_to_clienthello_ms,
|
|
||||||
-- TCP Fingerprint Hash
|
|
||||||
uniqState(cityHash64(toString(tcp_meta_mss), toString(tcp_meta_window_size), toString(tcp_meta_window_scale))) AS tcp_fingerprint,
|
|
||||||
-- HTTP Metrics
|
|
||||||
avgState(toFloat64(length(client_headers) - length(replaceAll(client_headers, ',', '')) + 1)) AS avg_headers_count,
|
|
||||||
countIfState(host != tls_sni AND tls_sni != '') AS host_sni_mismatch,
|
|
||||||
-- Spoofing Logic
|
|
||||||
countIfState((header_user_agent ILIKE '%Chrome%') AND (ja4 NOT ILIKE 't13d%')) AS spoofing_ua_tls,
|
|
||||||
countIfState((header_user_agent ILIKE '%Chrome%') AND (tls_alpn NOT ILIKE '%h2%')) AS spoofing_ua_alpn,
|
|
||||||
countIfState((header_user_agent ILIKE '%Windows%') AND (ip_meta_ttl <= 64)) AS spoofing_os_ttl,
|
|
||||||
countIfState((header_user_agent ILIKE '%Mozilla%') AND (header_sec_ch_ua = '')) AS missing_human_headers,
|
|
||||||
-- Behavior & Payloads
|
|
||||||
countIfState(match(path, 'login|auth|admin|password|config|wp-admin|api/v[0-9]/auth')) AS sensitive_path_hits,
|
|
||||||
countIfState(method IN ('PUT', 'DELETE', 'OPTIONS', 'TRACE')) AS suspicious_methods,
|
|
||||||
countIfState((length(query) > 250) OR match(query, '(<script|union|select|etc/passwd|%00)')) AS suspicious_queries
|
|
||||||
FROM mabase_prod.http_logs
|
|
||||||
GROUP BY minute, host, src_ip, src_asn, src_country_code, ja4, ja3_hash, header_user_agent;
|
|
||||||
|
|
||||||
-- Cascading to 1h
|
|
||||||
CREATE MATERIALIZED VIEW mabase_prod.mv_traffic_1h TO mabase_prod.agg_traffic_1h
|
|
||||||
AS SELECT toStartOfHour(minute) AS hour, host, src_country_code, ja4, countMergeState(hits) AS hits, uniqState(src_ip) AS uniq_ips
|
|
||||||
FROM mabase_prod.agg_traffic_1m GROUP BY hour, host, src_country_code, ja4;
|
|
||||||
|
|
||||||
-- ----------------------------------------------------------------------------
|
|
||||||
-- 4. VUE DE SCORING FINAL (VERDICT)
|
|
||||||
-- ----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
CREATE VIEW mabase_prod.live_threat_scores AS
|
|
||||||
SELECT
|
|
||||||
T1.src_ip,
|
|
||||||
T1.ja4,
|
|
||||||
T1.src_asn,
|
|
||||||
T1.src_country_code,
|
|
||||||
(
|
|
||||||
-- 1. Incohérences de Signature (Poids Fort : 40-50)
|
|
||||||
if(countMerge(T1.spoofing_ua_tls) > 0, 50, 0) +
|
|
||||||
if(countMerge(T1.spoofing_os_ttl) > 0, 40, 0) +
|
|
||||||
if(countMerge(T1.host_sni_mismatch) > 0, 45, 0) +
|
|
||||||
if(countMerge(T1.missing_human_headers) > 0, 30, 0) +
|
|
||||||
|
|
||||||
-- 2. Anomalies Réseau (Poids Moyen : 20-30)
|
|
||||||
if(varPopMerge(T1.var_syn_to_clienthello_ms) < 0.5 AND countMerge(T1.hits) > 5, 30, 0) +
|
|
||||||
if(avgMerge(T1.avg_headers_count) < 6, 25, 0) +
|
|
||||||
|
|
||||||
-- 3. Comportement (Poids Variable)
|
|
||||||
if(countMerge(T1.sensitive_path_hits) > 5, 40, 0) +
|
|
||||||
if(countMerge(T1.suspicious_queries) > 0, 60, 0) +
|
|
||||||
if(uniqMerge(T1.uniq_paths) > 50, 40, 0) + -- Balayage (Scanner)
|
|
||||||
|
|
||||||
-- 4. Volumétrie vs Baseline
|
|
||||||
if(countMerge(T1.hits) > (B.p99_hits_per_hour * 3), 50, 0)
|
|
||||||
|
|
||||||
) AS final_threat_score,
|
|
||||||
countMerge(T1.hits) AS request_count,
|
|
||||||
B.p99_hits_per_hour AS baseline
|
|
||||||
FROM mabase_prod.agg_traffic_1m AS T1
|
|
||||||
LEFT JOIN mabase_prod.tbl_baseline_ja4_7d AS B ON T1.ja4 = B.ja4
|
|
||||||
WHERE T1.minute >= now() - INTERVAL 5 MINUTE
|
|
||||||
GROUP BY T1.src_ip, T1.ja4, T1.src_asn, T1.src_country_code, B.p99_hits_per_hour
|
|
||||||
HAVING final_threat_score > 0
|
|
||||||
ORDER BY final_threat_score DESC;
|
|
||||||
251
sql/views.sql
Normal file
251
sql/views.sql
Normal file
@ -0,0 +1,251 @@
|
|||||||
|
-- ============================================================================
|
||||||
|
-- SCRIPT DE DÉPLOIEMENT DES VUES DE DÉTECTION DE BOTS & SPAM (CLICKHOUSE)
|
||||||
|
-- ============================================================================
|
||||||
|
|
||||||
|
-- ----------------------------------------------------------------------------
|
||||||
|
-- 1. NETTOYAGE STRICT
|
||||||
|
-- ----------------------------------------------------------------------------
|
||||||
|
DROP TABLE IF EXISTS mabase_prod.ml_detected_anomalies;
|
||||||
|
|
||||||
|
DROP VIEW IF EXISTS mabase_prod.view_ai_features_1h;
|
||||||
|
DROP VIEW IF EXISTS mabase_prod.view_host_ip_ja4_rotation;
|
||||||
|
DROP VIEW IF EXISTS mabase_prod.view_host_ja4_anomalies;
|
||||||
|
DROP VIEW IF EXISTS mabase_prod.view_form_bruteforce_detected;
|
||||||
|
DROP VIEW IF EXISTS mabase_prod.view_alpn_mismatch_detected;
|
||||||
|
DROP VIEW IF EXISTS mabase_prod.view_tcp_spoofing_detected;
|
||||||
|
|
||||||
|
DROP VIEW IF EXISTS mabase_prod.mv_agg_host_ip_ja4_1h;
|
||||||
|
DROP TABLE IF EXISTS mabase_prod.agg_host_ip_ja4_1h;
|
||||||
|
|
||||||
|
DROP VIEW IF EXISTS mabase_prod.mv_agg_header_fingerprint_1h;
|
||||||
|
DROP TABLE IF EXISTS mabase_prod.agg_header_fingerprint_1h;
|
||||||
|
|
||||||
|
|
||||||
|
-- ----------------------------------------------------------------------------
|
||||||
|
-- 2. TABLES D'AGRÉGATION ET VUES MATÉRIALISÉES (TEMPS RÉEL)
|
||||||
|
-- ----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
CREATE TABLE mabase_prod.agg_host_ip_ja4_1h (
|
||||||
|
window_start DateTime,
|
||||||
|
src_ip String,
|
||||||
|
ja4 String,
|
||||||
|
host String,
|
||||||
|
first_seen SimpleAggregateFunction(min, DateTime),
|
||||||
|
last_seen SimpleAggregateFunction(max, DateTime),
|
||||||
|
hits SimpleAggregateFunction(sum, UInt64),
|
||||||
|
count_post SimpleAggregateFunction(sum, UInt64),
|
||||||
|
uniq_paths AggregateFunction(uniq, String),
|
||||||
|
uniq_query_params AggregateFunction(uniq, String),
|
||||||
|
src_country_code SimpleAggregateFunction(any, String),
|
||||||
|
tcp_fingerprint SimpleAggregateFunction(any, String),
|
||||||
|
tcp_jitter_variance AggregateFunction(varPop, Float64),
|
||||||
|
tcp_window_size SimpleAggregateFunction(any, UInt32),
|
||||||
|
tcp_window_scale SimpleAggregateFunction(any, UInt32),
|
||||||
|
tcp_mss SimpleAggregateFunction(any, UInt32),
|
||||||
|
tcp_ttl SimpleAggregateFunction(any, UInt32),
|
||||||
|
http_version SimpleAggregateFunction(any, String),
|
||||||
|
first_ua SimpleAggregateFunction(any, String)
|
||||||
|
) ENGINE = AggregatingMergeTree()
|
||||||
|
ORDER BY (window_start, src_ip, ja4, host)
|
||||||
|
TTL window_start + INTERVAL 7 DAY;
|
||||||
|
|
||||||
|
CREATE MATERIALIZED VIEW mabase_prod.mv_agg_host_ip_ja4_1h
|
||||||
|
TO mabase_prod.agg_host_ip_ja4_1h AS
|
||||||
|
SELECT
|
||||||
|
toStartOfHour(time) AS window_start,
|
||||||
|
src_ip,
|
||||||
|
ja4,
|
||||||
|
host,
|
||||||
|
min(time) AS first_seen,
|
||||||
|
max(time) AS last_seen,
|
||||||
|
count() AS hits,
|
||||||
|
sum(IF(method = 'POST', 1, 0)) AS count_post,
|
||||||
|
uniqState(path) AS uniq_paths,
|
||||||
|
uniqState(query) AS uniq_query_params,
|
||||||
|
any(src_country_code) AS src_country_code,
|
||||||
|
any(toString(cityHash64(concat(toString(tcp_meta_window_size), toString(tcp_meta_mss), toString(tcp_meta_window_scale), tcp_meta_options)))) AS tcp_fingerprint,
|
||||||
|
varPopState(toFloat64(syn_to_clienthello_ms)) AS tcp_jitter_variance,
|
||||||
|
any(tcp_meta_window_size) AS tcp_window_size,
|
||||||
|
any(tcp_meta_window_scale) AS tcp_window_scale,
|
||||||
|
any(tcp_meta_mss) AS tcp_mss,
|
||||||
|
any(ip_meta_ttl) AS tcp_ttl,
|
||||||
|
any(http_version) AS http_version,
|
||||||
|
any(header_user_agent) AS first_ua
|
||||||
|
FROM mabase_prod.http_logs
|
||||||
|
GROUP BY window_start, src_ip, ja4, host;
|
||||||
|
|
||||||
|
|
||||||
|
CREATE TABLE mabase_prod.agg_header_fingerprint_1h (
|
||||||
|
window_start DateTime,
|
||||||
|
src_ip String,
|
||||||
|
header_order_hash SimpleAggregateFunction(any, String),
|
||||||
|
modern_browser_score SimpleAggregateFunction(max, UInt8),
|
||||||
|
sec_fetch_mode SimpleAggregateFunction(any, String),
|
||||||
|
sec_fetch_dest SimpleAggregateFunction(any, String),
|
||||||
|
count_site_none SimpleAggregateFunction(sum, UInt64)
|
||||||
|
) ENGINE = AggregatingMergeTree()
|
||||||
|
ORDER BY (window_start, src_ip)
|
||||||
|
TTL window_start + INTERVAL 7 DAY;
|
||||||
|
|
||||||
|
CREATE MATERIALIZED VIEW mabase_prod.mv_agg_header_fingerprint_1h
|
||||||
|
TO mabase_prod.agg_header_fingerprint_1h AS
|
||||||
|
SELECT
|
||||||
|
toStartOfHour(time) AS window_start,
|
||||||
|
src_ip,
|
||||||
|
any(toString(cityHash64(client_headers))) AS header_order_hash,
|
||||||
|
max(toUInt8(if(length(header_sec_ch_ua) > 0, 100, if(length(header_user_agent) > 0, 50, 0)))) AS modern_browser_score,
|
||||||
|
any(header_sec_fetch_mode) AS sec_fetch_mode,
|
||||||
|
any(header_sec_fetch_dest) AS sec_fetch_dest,
|
||||||
|
sum(IF(header_sec_fetch_site = 'none', 1, 0)) AS count_site_none
|
||||||
|
FROM mabase_prod.http_logs
|
||||||
|
GROUP BY window_start, src_ip;
|
||||||
|
|
||||||
|
|
||||||
|
-- ----------------------------------------------------------------------------
|
||||||
|
-- 3. TABLE DE DESTINATION POUR LE MACHINE LEARNING
|
||||||
|
-- ----------------------------------------------------------------------------
|
||||||
|
CREATE TABLE mabase_prod.ml_detected_anomalies (
|
||||||
|
detected_at DateTime,
|
||||||
|
src_ip String,
|
||||||
|
ja4 String,
|
||||||
|
host String,
|
||||||
|
anomaly_score Float32,
|
||||||
|
reason String
|
||||||
|
) ENGINE = MergeTree()
|
||||||
|
ORDER BY (detected_at, src_ip, ja4)
|
||||||
|
TTL detected_at + INTERVAL 30 DAY;
|
||||||
|
|
||||||
|
|
||||||
|
-- ----------------------------------------------------------------------------
|
||||||
|
-- 4. VUE DE FEATURE ENGINEERING POUR L'ISOLATION FOREST (RÉSOLUE)
|
||||||
|
-- ----------------------------------------------------------------------------
|
||||||
|
-- Utilisation de sous-requêtes agrégées (GROUP BY explicite) avant la jointure
|
||||||
|
-- pour éviter les erreurs d'état et le produit cartésien.
|
||||||
|
CREATE VIEW mabase_prod.view_ai_features_1h AS
|
||||||
|
SELECT
|
||||||
|
a.src_ip,
|
||||||
|
a.ja4,
|
||||||
|
a.host,
|
||||||
|
a.hits,
|
||||||
|
a.uniq_paths,
|
||||||
|
a.uniq_query_params,
|
||||||
|
a.count_post,
|
||||||
|
|
||||||
|
-- Indicateur de Corrélation L4/L7
|
||||||
|
IF(length(a.ja4) > 0 AND length(a.tcp_fingerprint) > 0, 1, 0) AS correlated,
|
||||||
|
|
||||||
|
-- DIMENSIONS COMPORTEMENTALES
|
||||||
|
(a.count_post / (a.hits + 1)) AS post_ratio,
|
||||||
|
(a.uniq_query_params / (a.uniq_paths + 1)) AS fuzzing_index,
|
||||||
|
(a.hits / (dateDiff('second', a.first_seen, a.last_seen) + 1)) AS hit_velocity,
|
||||||
|
|
||||||
|
-- DIMENSIONS TCP / L4
|
||||||
|
COALESCE(a.tcp_jitter_variance, 0) AS tcp_jitter_variance,
|
||||||
|
count() OVER (PARTITION BY a.tcp_fingerprint) AS tcp_shared_count,
|
||||||
|
a.tcp_window_size * exp2(a.tcp_window_scale) AS true_window_size,
|
||||||
|
IF(a.tcp_mss > 0, a.tcp_window_size / a.tcp_mss, 0) AS window_mss_ratio,
|
||||||
|
|
||||||
|
-- DIMENSIONS TLS / L5 (Mismatch)
|
||||||
|
IF(substring(a.ja4, 10, 2) = 'h2' AND a.http_version!= '2', 1, 0) AS alpn_http_mismatch,
|
||||||
|
IF(substring(a.ja4, 10, 2) = '00', 1, 0) AS is_alpn_missing,
|
||||||
|
|
||||||
|
-- DIMENSIONS HTTP / L7
|
||||||
|
COALESCE(h.modern_browser_score, 0) AS modern_browser_score,
|
||||||
|
IF(h.sec_fetch_mode = 'navigate' AND h.sec_fetch_dest!= 'document', 1, 0) AS is_fake_navigation,
|
||||||
|
(h.count_site_none / (a.hits + 1)) AS site_none_ratio
|
||||||
|
|
||||||
|
FROM (
|
||||||
|
-- Consolidation des logs d'hôtes (Résolution du GROUP BY manquant)
|
||||||
|
SELECT
|
||||||
|
window_start, src_ip, ja4, host,
|
||||||
|
sum(hits) AS hits,
|
||||||
|
uniqMerge(uniq_paths) AS uniq_paths,
|
||||||
|
uniqMerge(uniq_query_params) AS uniq_query_params,
|
||||||
|
sum(count_post) AS count_post,
|
||||||
|
min(first_seen) AS first_seen,
|
||||||
|
max(last_seen) AS last_seen,
|
||||||
|
any(tcp_fingerprint) AS tcp_fingerprint,
|
||||||
|
varPopMerge(tcp_jitter_variance) AS tcp_jitter_variance,
|
||||||
|
any(tcp_window_size) AS tcp_window_size,
|
||||||
|
any(tcp_window_scale) AS tcp_window_scale,
|
||||||
|
any(tcp_mss) AS tcp_mss,
|
||||||
|
any(http_version) AS http_version
|
||||||
|
FROM mabase_prod.agg_host_ip_ja4_1h
|
||||||
|
WHERE window_start >= toStartOfHour(now() - INTERVAL 2 HOUR)
|
||||||
|
GROUP BY window_start, src_ip, ja4, host
|
||||||
|
) a
|
||||||
|
LEFT JOIN (
|
||||||
|
-- Consolidation des en-têtes
|
||||||
|
SELECT
|
||||||
|
window_start, src_ip,
|
||||||
|
max(modern_browser_score) AS modern_browser_score,
|
||||||
|
any(sec_fetch_mode) AS sec_fetch_mode,
|
||||||
|
any(sec_fetch_dest) AS sec_fetch_dest,
|
||||||
|
sum(count_site_none) AS count_site_none
|
||||||
|
FROM mabase_prod.agg_header_fingerprint_1h
|
||||||
|
WHERE window_start >= toStartOfHour(now() - INTERVAL 2 HOUR)
|
||||||
|
GROUP BY window_start, src_ip
|
||||||
|
) h
|
||||||
|
ON a.src_ip = h.src_ip AND a.window_start = h.window_start;
|
||||||
|
|
||||||
|
|
||||||
|
-- ----------------------------------------------------------------------------
|
||||||
|
-- 5. VUES DE DÉTECTION HEURISTIQUES STATIQUES (RÉSOLUES)
|
||||||
|
-- ----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
CREATE VIEW mabase_prod.view_host_ip_ja4_rotation AS
|
||||||
|
SELECT
|
||||||
|
src_ip,
|
||||||
|
uniqExact(ja4) AS distinct_ja4_count,
|
||||||
|
sum(hits) AS total_hits
|
||||||
|
FROM mabase_prod.agg_host_ip_ja4_1h
|
||||||
|
WHERE window_start >= toStartOfHour(now() - INTERVAL 1 HOUR)
|
||||||
|
GROUP BY src_ip
|
||||||
|
HAVING distinct_ja4_count >= 5 AND total_hits > 100;
|
||||||
|
|
||||||
|
CREATE VIEW mabase_prod.view_host_ja4_anomalies AS
|
||||||
|
SELECT
|
||||||
|
ja4,
|
||||||
|
uniqExact(src_ip) AS unique_ips,
|
||||||
|
uniqExact(src_country_code) AS unique_countries,
|
||||||
|
uniqExact(host) AS targeted_hosts
|
||||||
|
FROM mabase_prod.agg_host_ip_ja4_1h
|
||||||
|
WHERE window_start >= toStartOfHour(now() - INTERVAL 1 HOUR)
|
||||||
|
GROUP BY ja4
|
||||||
|
HAVING unique_ips >= 20 AND targeted_hosts >= 3;
|
||||||
|
|
||||||
|
-- Ajout du GROUP BY
|
||||||
|
CREATE VIEW mabase_prod.view_form_bruteforce_detected AS
|
||||||
|
SELECT
|
||||||
|
src_ip, ja4, host,
|
||||||
|
sum(hits) AS hits,
|
||||||
|
uniqMerge(uniq_query_params) AS query_params_count
|
||||||
|
FROM mabase_prod.agg_host_ip_ja4_1h
|
||||||
|
WHERE window_start >= toStartOfHour(now() - INTERVAL 1 HOUR)
|
||||||
|
GROUP BY src_ip, ja4, host
|
||||||
|
HAVING query_params_count >= 10 AND hits >= 20;
|
||||||
|
|
||||||
|
-- Ajout du GROUP BY
|
||||||
|
CREATE VIEW mabase_prod.view_alpn_mismatch_detected AS
|
||||||
|
SELECT
|
||||||
|
src_ip, ja4, host,
|
||||||
|
sum(hits) AS hits,
|
||||||
|
any(http_version) AS http_version
|
||||||
|
FROM mabase_prod.agg_host_ip_ja4_1h
|
||||||
|
WHERE window_start >= toStartOfHour(now() - INTERVAL 1 HOUR)
|
||||||
|
AND substring(ja4, 10, 2) IN ('h2', 'h3')
|
||||||
|
GROUP BY src_ip, ja4, host
|
||||||
|
HAVING http_version = '1.1' AND hits >= 10;
|
||||||
|
|
||||||
|
-- Ajout du GROUP BY
|
||||||
|
CREATE VIEW mabase_prod.view_tcp_spoofing_detected AS
|
||||||
|
SELECT
|
||||||
|
src_ip, ja4,
|
||||||
|
any(tcp_ttl) AS tcp_ttl,
|
||||||
|
any(tcp_window_size) AS tcp_window_size,
|
||||||
|
any(first_ua) AS first_ua
|
||||||
|
FROM mabase_prod.agg_host_ip_ja4_1h
|
||||||
|
WHERE window_start >= toStartOfHour(now() - INTERVAL 1 HOUR)
|
||||||
|
GROUP BY src_ip, ja4
|
||||||
|
HAVING tcp_ttl <= 64
|
||||||
|
AND (first_ua ILIKE '%Windows%' OR first_ua ILIKE '%iPhone%');
|
||||||
84
views.md
84
views.md
@ -1,84 +0,0 @@
|
|||||||
# 🛡️ Manuel de Référence Technique : Moteur de Détection Antispam & Bot
|
|
||||||
|
|
||||||
Ce document détaille les algorithmes de détection implémentés dans les vues ClickHouse pour la plateforme.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 1. Analyse de la Couche Transport (L4) : La "Trace Physique"
|
|
||||||
Avant même d'analyser l'URL, le moteur inspecte la manière dont la connexion a été établie. C'est la couche la plus difficile à falsifier pour un attaquant.
|
|
||||||
|
|
||||||
### A. Fingerprint de la Pile TCP (`tcp_fingerprint`)
|
|
||||||
* **Fonctionnement :** Nous utilisons `cityHash64` pour créer un identifiant unique basé sur trois paramètres immuables du handshake : le **MSS** (Maximum Segment Size), la **Window Size** et le **Window Scale**.
|
|
||||||
* **Ce que ça détecte :** L'unicité logicielle. Un bot tournant sur une image Alpine Linux aura une signature TCP différente d'un utilisateur sur iOS 17 ou Windows 11.
|
|
||||||
* **Détection de botnet :** Si 500 IPs différentes partagent exactement le même `tcp_fingerprint` ET le même `ja4`, il y a une probabilité de 99% qu'il s'agisse d'un cluster de bots clonés.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
### B. Analyse de la gigue (Jitter) et Handshake
|
|
||||||
* **Fonctionnement :** On calcule la variance (`varPop`) du délai entre le `SYN` et le `ClientHello` TLS.
|
|
||||||
* **Ce que ça détecte :** La stabilité robotique.
|
|
||||||
* **Humain :** Latence variable (4G, Wi-Fi, mouvements). La variance est élevée.
|
|
||||||
* **Bot Datacenter :** Latence ultra-stable (fibre optique dédiée). Une variance proche de 0 indique une connexion automatisée depuis une infrastructure cloud.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 2. Analyse de la Session (L5) : Le "Passeport TLS"
|
|
||||||
Le handshake TLS est une mine d'or pour identifier la bibliothèque logicielle (OpenSSL, Go-TLS, etc.).
|
|
||||||
|
|
||||||
### A. Incohérence UA vs JA4
|
|
||||||
* **Fonctionnement :** Le moteur croise le `header_user_agent` (déclaratif) avec le `ja4` (structurel).
|
|
||||||
* **Ce que ça détecte :** Le **Spoofing de Browser**. Un script Python peut facilement écrire `User-Agent: Mozilla/5.0...Chrome/120`, mais il ne peut pas simuler l'ordre exact des extensions TLS et des algorithmes de chiffrement d'un vrai Chrome sans une ingénierie complexe (comme `utls`).
|
|
||||||
* **Logique de score :** Si UA = Chrome mais JA4 != Signature_Chrome -> **+50 points de risque**.
|
|
||||||
|
|
||||||
### B. Discordance Host vs SNI
|
|
||||||
* **Fonctionnement :** Comparaison entre le champ `tls_sni` (négocié en clair lors du handshake) et le header `Host` (envoyé plus tard dans la requête chiffrée).
|
|
||||||
* **Ce que ça détecte :** Le **Domain Fronting** ou les attaques par tunnel. Un bot peut demander un certificat pour `domaine-innocent.com` (SNI) mais tenter d'attaquer `api-critique.com` (Host).
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 3. Analyse Applicative (L7) : Le "Comportement HTTP"
|
|
||||||
Une fois le tunnel établi, on analyse la structure de la requête HTTP.
|
|
||||||
|
|
||||||
### A. Empreinte d'ordre des Headers (`http_fp`)
|
|
||||||
* **Fonctionnement :** Nous hashons la liste ordonnée des clés de headers (`Accept`, `User-Agent`, `Connection`, etc.).
|
|
||||||
* **Ce que ça détecte :** La signature du moteur de rendu. Chaque navigateur (Firefox, Safari, Chromium) a un ordre immuable pour envoyer ses headers.
|
|
||||||
* **Détection :** Si un client envoie les headers dans un ordre inhabituel ou minimaliste (pauvreté des headers < 6), il est marqué comme suspect.
|
|
||||||
|
|
||||||
### B. Analyse des Payloads et Entropie
|
|
||||||
* **Fonctionnement :** Recherche de patterns via regex dans `query` et `path` (détection SQLi, XSS, Path Traversal).
|
|
||||||
* **Complexité :** Nous détectons les encodages multiples (ex: `%2520`) qui tentent de tromper les pare-feux simples.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 4. Corrélation Temporelle & Baseline : Le "Voisinage Statistique"
|
|
||||||
Le score final dépend du passé de la signature TLS.
|
|
||||||
|
|
||||||
### A. Le Malus de Nouveauté (`agg_novelty`)
|
|
||||||
* **Logique :** Une signature (JA4 + FP) vue pour la première fois aujourd'hui est "froide".
|
|
||||||
* **Traitement :** On applique un malus si `first_seen` date de moins de 2 heures. Un botnet qui vient de lancer une campagne de rotation de signatures sera immédiatement pénalisé par son manque d'historique.
|
|
||||||
|
|
||||||
### B. Le Dépassement de Baseline (`tbl_baseline_ja4_7d`)
|
|
||||||
* **Fonctionnement :** On compare les `hits` actuels au 99ème percentile (`p99`) historique de cette signature précise.
|
|
||||||
* **Exemple :** Si le JA4 de "Chrome 122" fait habituellement 10 requêtes/min/IP sur votre site, et qu'une IP en fait soudainement 300, le score explose même si la requête est techniquement parfaite.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 5. Synthèse du Scoring (Le Verdict)
|
|
||||||
|
|
||||||
| Algorithme | Signal | Impact Score |
|
|
||||||
| :--- | :--- | :--- |
|
|
||||||
| **Fingerprint Mismatch** | UA vs TLS (Spoofing) | **Haut (50)** |
|
|
||||||
| **L4 Anomaly** | Variance latence < 0.5ms | **Moyen (30)** |
|
|
||||||
| **Path Sensitivity** | Hit sur `/admin` ou `/config` | **Haut (40)** |
|
|
||||||
| **Payload Security** | Caractères d'injection (SQL/XSS) | **Critique (60)** |
|
|
||||||
| **Mass Distribution** | 1 JA4 sur > 50 IPs différentes | **Moyen (30)** |
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 6. Maintenance et faux positifs
|
|
||||||
* **Exceptions :** Les bots légitimes (Googlebot, Bing) sont filtrés par ASN et Reverse DNS avant le scoring pour éviter de déréférencer le site.
|
|
||||||
* **Réinitialisation :** Un `final_score` est volatile (calculé sur 5 minutes). Une IP bloquée par erreur retrouvera un score normal dès qu'elle cessera son comportement atypique.
|
|
||||||
Reference in New Issue
Block a user