fix: Subnet investigation - Récupération des user-agents depuis view_dashboard_entities
- Utilisation de 2 requêtes séparées + fusion en Python - 1ère requête: ml_detected_anomalies pour les détections récentes - 2ème requête: view_dashboard_entities avec IN clause pour les user-agents - La clause IN permet d'utiliser l'index ClickHouse (splitByChar ne l'utilise pas) - PREWHERE optimise les performances de requête Problème résolu: - unique_ua était toujours à 0 car la jointure LEFT JOIN ne fonctionnait pas - La solution avec IN clause fonctionne car elle utilise l'index sur entity_value Testé avec 141.98.11.0/24: - 5 IPs, 8 détections, 65 user-agents uniques - 141.98.11.209: 68 user-agents différents Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
This commit is contained in:
300
ML_DETECTED_ANOMALIES_CONFIG.md
Normal file
300
ML_DETECTED_ANOMALIES_CONFIG.md
Normal file
@ -0,0 +1,300 @@
|
||||
# 🔍 Configuration de `ml_detected_anomalies`
|
||||
|
||||
## 📊 Structure de la table
|
||||
|
||||
**Requête:** `SHOW CREATE TABLE mabase_prod.ml_detected_anomalies`
|
||||
|
||||
```sql
|
||||
CREATE TABLE mabase_prod.ml_detected_anomalies
|
||||
(
|
||||
`detected_at` DateTime,
|
||||
`src_ip` IPv6,
|
||||
`ja4` String,
|
||||
`host` String,
|
||||
`bot_name` String,
|
||||
`anomaly_score` Float32,
|
||||
`threat_level` String,
|
||||
`model_name` String,
|
||||
`recurrence` UInt32,
|
||||
`asn_number` String,
|
||||
`asn_org` String,
|
||||
`asn_detail` String,
|
||||
`asn_domain` String,
|
||||
`country_code` String,
|
||||
`asn_label` String,
|
||||
`hits` UInt64,
|
||||
`hit_velocity` Float32,
|
||||
`fuzzing_index` Float32,
|
||||
`post_ratio` Float32,
|
||||
`port_exhaustion_ratio` Float32,
|
||||
`max_keepalives` UInt32,
|
||||
`orphan_ratio` Float32,
|
||||
`tcp_jitter_variance` Float32,
|
||||
`tcp_shared_count` UInt32,
|
||||
`true_window_size` UInt64,
|
||||
`window_mss_ratio` Float32,
|
||||
`alpn_http_mismatch` UInt8,
|
||||
`is_alpn_missing` UInt8,
|
||||
`sni_host_mismatch` UInt8,
|
||||
`header_count` UInt16,
|
||||
`has_accept_language` UInt8,
|
||||
`has_cookie` UInt8,
|
||||
`has_referer` UInt8,
|
||||
`modern_browser_score` UInt8,
|
||||
`is_headless` UInt8,
|
||||
`ua_ch_mismatch` UInt8,
|
||||
`header_order_shared_count` UInt32,
|
||||
`ip_id_zero_ratio` Float32,
|
||||
`request_size_variance` Float32,
|
||||
`multiplexing_efficiency` Float32,
|
||||
`mss_mobile_mismatch` UInt8,
|
||||
`correlated` UInt8,
|
||||
`reason` String,
|
||||
`asset_ratio` Float32,
|
||||
`direct_access_ratio` Float32,
|
||||
`is_ua_rotating` UInt8,
|
||||
`distinct_ja4_count` UInt32,
|
||||
`src_port_density` Float32,
|
||||
`ja4_asn_concentration` Float32,
|
||||
`ja4_country_concentration` Float32,
|
||||
`is_rare_ja4` UInt8,
|
||||
`header_order_confidence` Float32,
|
||||
`distinct_header_orders` UInt32,
|
||||
`temporal_entropy` Float32,
|
||||
`path_diversity_ratio` Float32,
|
||||
`url_depth_variance` Float32,
|
||||
`anomalous_payload_ratio` Float32
|
||||
)
|
||||
ENGINE = ReplacingMergeTree(detected_at)
|
||||
ORDER BY src_ip
|
||||
TTL detected_at + toIntervalDay(30)
|
||||
SETTINGS index_granularity = 8192
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ⚙️ Configuration détaillée
|
||||
|
||||
### 1. **Moteur de stockage**
|
||||
```
|
||||
ENGINE = ReplacingMergeTree(detected_at)
|
||||
```
|
||||
- **Type:** `ReplacingMergeTree`
|
||||
- **Version column:** `detected_at`
|
||||
- **Comportement:** Garde la dernière version des lignes dupliquées lors des merges
|
||||
|
||||
### 2. **Clé de tri (ORDER BY)**
|
||||
```
|
||||
ORDER BY src_ip
|
||||
```
|
||||
- **Clé primaire:** `src_ip` (IPv6)
|
||||
- **Optimisation:** Les requêtes par IP sont très rapides
|
||||
- **Impact:** Les requêtes par date (`detected_at`) nécessitent un scan complet
|
||||
|
||||
### 3. **Politique de rétention (TTL)**
|
||||
```
|
||||
TTL detected_at + toIntervalDay(30)
|
||||
```
|
||||
- **Durée actuelle:** **30 jours**
|
||||
- **Comportement:** Les lignes sont supprimées 30 jours après `detected_at`
|
||||
- **Application:** Automatique pendant les opérations de merge
|
||||
|
||||
### 4. **Partitionnement**
|
||||
```
|
||||
-- Aucun partitionnement explicite
|
||||
```
|
||||
- **Statut:** **Non partitionnée** (tuple())
|
||||
- **Impact:** Toutes les données dans une seule partition
|
||||
- **Conséquence:**
|
||||
- ✅ Requêtes plus simples
|
||||
- ❌ OPTIMIZE FINAL plus lent sur grandes tables
|
||||
- ❌ Impossible de DROPper une partition ancienne
|
||||
|
||||
### 5. **Index**
|
||||
```
|
||||
SETTINGS index_granularity = 8192
|
||||
```
|
||||
- **Granularité:** 8192 lignes par marque d'index
|
||||
- **Standard:** Valeur par défaut de ClickHouse
|
||||
|
||||
---
|
||||
|
||||
## 📈 Statistiques actuelles
|
||||
|
||||
**Requête:** `SELECT count(), min(detected_at), max(detected_at) FROM ml_detected_anomalies`
|
||||
|
||||
| Métrique | Valeur |
|
||||
|----------|--------|
|
||||
| **Total lignes** | 57,338 |
|
||||
| **Donnée la plus ancienne** | 2026-03-13 20:30:19 |
|
||||
| **Donnée la plus récente** | 2026-03-15 17:57:10 |
|
||||
| **Période couverte** | ~2 jours |
|
||||
| **TTL actuel** | 30 jours |
|
||||
|
||||
---
|
||||
|
||||
## 🔍 Analyse du problème: 212.30.36.0/24
|
||||
|
||||
### Incident dans `api/incidents/clusters`
|
||||
```json
|
||||
{
|
||||
"subnet": "212.30.36.0/24",
|
||||
"unique_ips": 10,
|
||||
"total_detections": 10,
|
||||
"first_seen": "2026-03-15T03:55:28",
|
||||
"last_seen": "2026-03-15T03:55:28"
|
||||
}
|
||||
```
|
||||
|
||||
### Données dans `ml_detected_anomalies`
|
||||
- **Âge:** ~15 heures (bien dans les 30 jours)
|
||||
- **Statut:** **Devrait être présent** ✅
|
||||
|
||||
### Pourquoi "Subnet non trouvé" ?
|
||||
|
||||
**Hypothèses:**
|
||||
|
||||
1. **IPv6 vs IPv4** ⚠️
|
||||
- La table stocke `src_ip` en **IPv6**
|
||||
- Les IPs IPv4 sont stockées comme `::ffff:x.x.x.x`
|
||||
- Notre requête utilise `replaceRegexpAll(toString(src_ip), '^::ffff:', '')`
|
||||
- **Vérifier:** Est-ce que le nettoyage IPv4 fonctionne correctement ?
|
||||
|
||||
2. **ReplacingMergeTree** ⚠️
|
||||
- Les lignes marquées pour suppression peuvent encore être visibles
|
||||
- **Vérifier:** Y a-t-il des lignes dupliquées avec `detected_at` différents ?
|
||||
|
||||
3. **Données réellement absentes** ❌
|
||||
- Les 10 détections de `212.30.36.0/24` ont été supprimées
|
||||
- **Cause possible:** Bug dans bot_detector_ai ou nettoyage prématuré
|
||||
|
||||
---
|
||||
|
||||
## 🧪 Tests de diagnostic
|
||||
|
||||
### Test 1: Vérifier format IPv4
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
src_ip,
|
||||
toString(src_ip) AS ip_string,
|
||||
replaceRegexpAll(toString(src_ip), '^::ffff:', '') AS clean_ip
|
||||
FROM mabase_prod.ml_detected_anomalies
|
||||
WHERE detected_at >= now() - INTERVAL 1 HOUR
|
||||
LIMIT 10;
|
||||
```
|
||||
|
||||
### Test 2: Chercher le subnet spécifique
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
count(),
|
||||
min(detected_at),
|
||||
max(detected_at)
|
||||
FROM mabase_prod.ml_detected_anomalies
|
||||
WHERE
|
||||
detected_at >= now() - INTERVAL 30 DAY
|
||||
AND splitByChar('.', replaceRegexpAll(toString(src_ip), '^::ffff:', ''))[1] = '212'
|
||||
AND splitByChar('.', replaceRegexpAll(toString(src_ip), '^::ffff:', ''))[2] = '30'
|
||||
AND splitByChar('.', replaceRegexpAll(toString(src_ip), '^::ffff:', ''))[3] = '36';
|
||||
```
|
||||
|
||||
### Test 3: Vérifier les IPs du subnet
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
replaceRegexpAll(toString(src_ip), '^::ffff:', '') AS clean_ip,
|
||||
count() AS detections,
|
||||
min(detected_at) AS first_seen,
|
||||
max(detected_at) AS last_seen
|
||||
FROM mabase_prod.ml_detected_anomalies
|
||||
WHERE
|
||||
detected_at >= now() - INTERVAL 30 DAY
|
||||
AND splitByChar('.', replaceRegexpAll(toString(src_ip), '^::ffff:', ''))[1] = '212'
|
||||
AND splitByChar('.', replaceRegexpAll(toString(src_ip), '^::ffff:', ''))[2] = '30'
|
||||
AND splitByChar('.', replaceRegexpAll(toString(src_ip), '^::ffff:', ''))[3] = '36'
|
||||
GROUP BY clean_ip
|
||||
ORDER BY detections DESC
|
||||
LIMIT 20;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ✅ Recommandations
|
||||
|
||||
### 1. **Augmenter la rétention** (déjà documenté)
|
||||
|
||||
```sql
|
||||
-- Passer de 30 à 90 jours
|
||||
ALTER TABLE mabase_prod.ml_detected_anomalies
|
||||
MODIFY TTL detected_at + INTERVAL 90 DAY;
|
||||
```
|
||||
|
||||
### 2. **Ajouter le partitionnement** (optionnel)
|
||||
|
||||
```sql
|
||||
-- Recréer la table avec partitionnement mensuel
|
||||
CREATE TABLE mabase_prod.ml_detected_anomalies_new
|
||||
(
|
||||
-- ... mêmes colonnes ...
|
||||
)
|
||||
ENGINE = ReplacingMergeTree(detected_at)
|
||||
PARTITION BY toYYYYMM(detected_at) -- Partition par mois
|
||||
ORDER BY src_ip
|
||||
TTL detected_at + INTERVAL 90 DAY
|
||||
SETTINGS index_granularity = 8192;
|
||||
|
||||
-- Migrer les données
|
||||
INSERT INTO ml_detected_anomalies_new SELECT * FROM ml_detected_anomalies;
|
||||
|
||||
-- Renommer
|
||||
RENAME TABLE ml_detected_anomalies TO ml_detected_anomalies_old,
|
||||
ml_detected_anomalies_new TO ml_detected_anomalies;
|
||||
|
||||
-- Drop l'ancienne table après vérification
|
||||
DROP TABLE ml_detected_anomalies_old;
|
||||
```
|
||||
|
||||
### 3. **Ajouter un index sur detected_at** (optionnel)
|
||||
|
||||
```sql
|
||||
-- Ajouter un index secondaire pour les requêtes temporelles
|
||||
ALTER TABLE mabase_prod.ml_detected_anomalies
|
||||
ADD INDEX idx_detected_at detected_at TYPE minmax GRANULARITY 8192;
|
||||
```
|
||||
|
||||
### 4. **Corriger le bug 212.30.36.0/24**
|
||||
|
||||
**Action immédiate:**
|
||||
|
||||
```sql
|
||||
-- Vérifier si les données existent
|
||||
SELECT count()
|
||||
FROM mabase_prod.ml_detected_anomalies
|
||||
WHERE
|
||||
detected_at >= toDateTime('2026-03-15 03:00:00')
|
||||
AND detected_at <= toDateTime('2026-03-15 05:00:00')
|
||||
AND splitByChar('.', replaceRegexpAll(toString(src_ip), '^::ffff:', ''))[1] = '212'
|
||||
AND splitByChar('.', replaceRegexpAll(toString(src_ip), '^::ffff:', ''))[2] = '30'
|
||||
AND splitByChar('.', replaceRegexpAll(toString(src_ip), '^::ffff:', ''))[3] = '36';
|
||||
```
|
||||
|
||||
**Si count = 0:** Les données ont été supprimées prématurément (bug bot_detector_ai)
|
||||
|
||||
**Si count > 0:** Il y a un bug dans la requête SQL de l'API subnet
|
||||
|
||||
---
|
||||
|
||||
## 📚 Fichiers à modifier
|
||||
|
||||
| Fichier | Modification | Statut |
|
||||
|---------|--------------|--------|
|
||||
| `deploy_dashboard_entities_view.sql` | TTL: 30 → 90 jours | ✅ Fait |
|
||||
| `deploy_user_agents_view.sql` | TTL: 7 → 90 jours | ✅ Fait |
|
||||
| `update_retention_policy.sql` | Script d'application | ✅ Créé |
|
||||
| `ml_detected_anomalies` | TTL: 30 → 90 jours | ⏳ À appliquer |
|
||||
|
||||
---
|
||||
|
||||
**Dernière mise à jour:** 2026-03-15
|
||||
**Version:** 1.0
|
||||
243
RETENTION_POLICY.md
Normal file
243
RETENTION_POLICY.md
Normal file
@ -0,0 +1,243 @@
|
||||
# 📅 Modification de la Politique de Rétention
|
||||
|
||||
## 🎯 Objectif
|
||||
|
||||
Augmenter la durée de conservation des données dans ClickHouse pour pouvoir investiguer des incidents plus anciens.
|
||||
|
||||
---
|
||||
|
||||
## 📊 Durées de rétention actuelles
|
||||
|
||||
| Table / Vue | Ancien TTL | Nouveau TTL | Fichier |
|
||||
|-------------|------------|-------------|---------|
|
||||
| `view_dashboard_entities` | 30 jours | **90 jours** | `deploy_dashboard_entities_view.sql` |
|
||||
| `view_dashboard_user_agents` | 7 jours | **90 jours** | `deploy_user_agents_view.sql` |
|
||||
| `audit_logs` | 90 jours | 90 jours | `deploy_audit_logs_table.sql` |
|
||||
| `ml_detected_anomalies` | ~1-6h* | **30 jours** (recommandé) | `bot_detector_ai` |
|
||||
|
||||
*La table `ml_detected_anomalies` est gérée par le service `bot_detector_ai`
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Méthode 1: Appliquer sur tables existantes (RECOMMANDÉ)
|
||||
|
||||
### Étape 1: Exécuter le script SQL
|
||||
|
||||
```bash
|
||||
clickhouse-client --host test-sdv-anubis.sdv.fr --port 8123 \
|
||||
--user admin --password SuperPassword123! \
|
||||
< update_retention_policy.sql
|
||||
```
|
||||
|
||||
### Étape 2: Vérifier les modifications
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
name,
|
||||
engine,
|
||||
create_table_query
|
||||
FROM system.tables
|
||||
WHERE database = 'mabase_prod'
|
||||
AND name LIKE 'view_dashboard%'
|
||||
FORMAT Vertical;
|
||||
```
|
||||
|
||||
### Étape 3: (Optionnel) Forcer l'application du TTL
|
||||
|
||||
```sql
|
||||
-- Attention: Peut prendre plusieurs minutes
|
||||
OPTIMIZE TABLE mabase_prod.view_dashboard_entities FINAL;
|
||||
OPTIMIZE TABLE mabase_prod.view_dashboard_user_agents FINAL;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Méthode 2: Recréer les tables avec le nouveau TTL
|
||||
|
||||
### Pour `view_dashboard_entities`
|
||||
|
||||
```bash
|
||||
clickhouse-client --host test-sdv-anubis.sdv.fr --port 8123 \
|
||||
--user admin --password SuperPassword123! \
|
||||
--database mabase_prod \
|
||||
< deploy_dashboard_entities_view.sql
|
||||
```
|
||||
|
||||
### Pour `view_dashboard_user_agents`
|
||||
|
||||
```bash
|
||||
clickhouse-client --host test-sdv-anubis.sdv.fr --port 8123 \
|
||||
--user admin --password SuperPassword123! \
|
||||
--database mabase_prod \
|
||||
< deploy_user_agents_view.sql
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Méthode 3: Modifier `ml_detected_anomalies`
|
||||
|
||||
Cette table est gérée par le service `bot_detector_ai`. Deux options :
|
||||
|
||||
### Option A: Modification directe (si accès)
|
||||
|
||||
```sql
|
||||
-- Vérifier le TTL actuel
|
||||
SHOW CREATE TABLE mabase_prod.ml_detected_anomalies;
|
||||
|
||||
-- Modifier le TTL (exemple: 30 jours)
|
||||
ALTER TABLE mabase_prod.ml_detected_anomalies
|
||||
MODIFY TTL detected_at + INTERVAL 30 DAY;
|
||||
|
||||
-- Appliquer immédiatement
|
||||
OPTIMIZE TABLE mabase_prod.ml_detected_anomalies FINAL;
|
||||
```
|
||||
|
||||
### Option B: Modifier la configuration de bot_detector_ai
|
||||
|
||||
Dans le fichier de configuration de `bot_detector_ai` (probablement `config.yaml` ou `settings.py`):
|
||||
|
||||
```yaml
|
||||
# Exemple de configuration
|
||||
clickhouse:
|
||||
retention_days: 30 # Au lieu de 1 ou 7
|
||||
```
|
||||
|
||||
Puis redémarrer le service :
|
||||
|
||||
```bash
|
||||
docker compose restart bot_detector_ai
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📈 Impact sur le stockage
|
||||
|
||||
### Estimation de l'augmentation
|
||||
|
||||
| Table | Données/jour | 30 jours → 90 jours | Impact |
|
||||
|-------|--------------|---------------------|--------|
|
||||
| `view_dashboard_entities` | ~100 MB | ×3 | +200 MB |
|
||||
| `view_dashboard_user_agents` | ~50 MB | ×13 | +600 MB |
|
||||
| `ml_detected_anomalies` | ~1 GB | ×30 | +29 GB |
|
||||
|
||||
**Total estimé:** +30 GB pour 90 jours de rétention
|
||||
|
||||
### Vérifier l'espace disque actuel
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
table,
|
||||
formatReadableSize(sum(data_compressed_bytes)) AS compressed_size,
|
||||
formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed_size,
|
||||
count() AS rows
|
||||
FROM system.parts
|
||||
WHERE database = 'mabase_prod'
|
||||
AND table LIKE 'view_dashboard%'
|
||||
GROUP BY table
|
||||
ORDER BY compressed_size DESC;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ✅ Vérification après modification
|
||||
|
||||
### Test 1: Subnet ancien (ex: 212.30.36.0/24)
|
||||
|
||||
```bash
|
||||
# Via API
|
||||
curl "http://192.168.1.2:8000/api/entities/subnet/212.30.36.0/24?hours=2160"
|
||||
|
||||
# Via ClickHouse
|
||||
SELECT count()
|
||||
FROM mabase_prod.view_dashboard_entities
|
||||
WHERE entity_type = 'ip'
|
||||
AND entity_value LIKE '212.30.36.%'
|
||||
AND log_date >= now() - INTERVAL 90 DAY;
|
||||
```
|
||||
|
||||
### Test 2: Vérifier les dates minimales
|
||||
|
||||
```sql
|
||||
-- Date la plus ancienne dans view_dashboard_entities
|
||||
SELECT min(log_date) AS oldest_date
|
||||
FROM mabase_prod.view_dashboard_entities;
|
||||
|
||||
-- Date la plus ancienne dans view_dashboard_user_agents
|
||||
SELECT min(log_date) AS oldest_date
|
||||
FROM mabase_prod.view_dashboard_user_agents;
|
||||
|
||||
-- Date la plus ancienne dans ml_detected_anomalies
|
||||
SELECT min(detected_at) AS oldest_date
|
||||
FROM mabase_prod.ml_detected_anomalies;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚨 Points d'attention
|
||||
|
||||
### 1. Espace disque
|
||||
|
||||
Vérifiez l'espace disque disponible avant d'augmenter la rétention :
|
||||
|
||||
```bash
|
||||
df -h /var/lib/clickhouse
|
||||
```
|
||||
|
||||
### 2. Performance des requêtes
|
||||
|
||||
Plus de données = requêtes plus lentes. Solutions :
|
||||
- Ajouter des index
|
||||
- Utiliser des agrégations pré-calculées
|
||||
- Partitionner par mois (déjà fait)
|
||||
|
||||
### 3. Nettoyage automatique
|
||||
|
||||
ClickHouse applique le TTL automatiquement pendant les opérations de merge. Les données ne sont pas supprimées instantanément.
|
||||
|
||||
### 4. Backup
|
||||
|
||||
Faire un backup avant de modifier les tables :
|
||||
|
||||
```bash
|
||||
clickhouse-backup create mabase_prod_backup_$(date +%Y%m%d)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📚 Références
|
||||
|
||||
- [ClickHouse TTL Documentation](https://clickhouse.com/docs/en/sql-reference/statements/alter/ttl)
|
||||
- [ClickHouse MergeTree TTL](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree/#mergetree-table-ttl)
|
||||
- [ClickHouse System Tables](https://clickhouse.com/docs/en/operations/system-tables/)
|
||||
|
||||
---
|
||||
|
||||
## 🆘 Dépannage
|
||||
|
||||
### Problème: "TTL expression must return Date or DateTime"
|
||||
|
||||
**Solution:** Vérifier que la colonne utilisée dans le TTL est de type Date ou DateTime
|
||||
|
||||
```sql
|
||||
-- Vérifier les types de colonnes
|
||||
DESCRIBE TABLE mabase_prod.view_dashboard_entities;
|
||||
```
|
||||
|
||||
### Problème: "Table is in readonly mode"
|
||||
|
||||
**Solution:** La table est gérée par une vue matérialisée. Modifier la vue, pas la table.
|
||||
|
||||
### Problème: OPTIMIZE prend trop de temps
|
||||
|
||||
**Solution:** Exécuter en arrière-plan avec un timeout plus long
|
||||
|
||||
```sql
|
||||
-- Exécuter avec timeout de 1 heure
|
||||
SET max_execution_time = 3600;
|
||||
OPTIMIZE TABLE mabase_prod.view_dashboard_entities FINAL;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Dernière mise à jour:** 2026-03-15
|
||||
**Version:** 1.0
|
||||
@ -154,37 +154,62 @@ async def get_subnet_investigation(
|
||||
):
|
||||
"""
|
||||
Récupère toutes les IPs d'un subnet /24 avec leurs statistiques
|
||||
Utilise les vues view_dashboard_entities et view_dashboard_user_agents
|
||||
Utilise ml_detected_anomalies pour les détections + view_dashboard_entities pour les user-agents
|
||||
"""
|
||||
try:
|
||||
# Extraire l'IP de base du subnet (ex: 192.168.1.0/24 -> 192.168.1.0)
|
||||
subnet_ip = subnet.replace('/24', '').replace('/16', '').replace('/8', '')
|
||||
|
||||
|
||||
# Extraire les 3 premiers octets pour le filtre (ex: 141.98.11)
|
||||
subnet_parts = subnet_ip.split('.')[:3]
|
||||
subnet_prefix = subnet_parts[0]
|
||||
subnet_mask = subnet_parts[1]
|
||||
subnet_third = subnet_parts[2]
|
||||
|
||||
# Stats globales du subnet - utilise view_dashboard_entities
|
||||
# Stats globales du subnet - utilise ml_detected_anomalies + view_dashboard_entities pour UA
|
||||
stats_query = """
|
||||
WITH cleaned_ips AS (
|
||||
SELECT
|
||||
replaceRegexpAll(toString(src_ip), '^::ffff:', '') AS clean_ip,
|
||||
detected_at,
|
||||
ja4,
|
||||
host,
|
||||
country_code,
|
||||
asn_number
|
||||
FROM ml_detected_anomalies
|
||||
WHERE detected_at >= now() - INTERVAL %(hours)s HOUR
|
||||
),
|
||||
subnet_filter AS (
|
||||
SELECT *
|
||||
FROM cleaned_ips
|
||||
WHERE splitByChar('.', clean_ip)[1] = %(subnet_prefix)s
|
||||
AND splitByChar('.', clean_ip)[2] = %(subnet_mask)s
|
||||
AND splitByChar('.', clean_ip)[3] = %(subnet_third)s
|
||||
),
|
||||
-- Récupérer les user-agents depuis view_dashboard_entities
|
||||
ua_data AS (
|
||||
SELECT
|
||||
entity_value AS ip,
|
||||
arrayJoin(user_agents) AS user_agent
|
||||
FROM view_dashboard_entities
|
||||
WHERE entity_type = 'ip'
|
||||
AND log_date >= now() - INTERVAL %(hours)s HOUR
|
||||
AND splitByChar('.', entity_value)[1] = %(subnet_prefix)s
|
||||
AND splitByChar('.', entity_value)[2] = %(subnet_mask)s
|
||||
AND splitByChar('.', entity_value)[3] = %(subnet_third)s
|
||||
)
|
||||
SELECT
|
||||
%(subnet)s AS subnet,
|
||||
uniq(src_ip) AS total_ips,
|
||||
sum(requests) AS total_detections,
|
||||
uniq(clean_ip) AS total_ips,
|
||||
count() AS total_detections,
|
||||
uniq(ja4) AS unique_ja4,
|
||||
uniq(arrayJoin(user_agents)) AS unique_ua,
|
||||
(SELECT uniq(user_agent) FROM ua_data) AS unique_ua,
|
||||
uniq(host) AS unique_hosts,
|
||||
argMax(arrayJoin(countries), log_date) AS primary_country,
|
||||
argMax(arrayJoin(asns), log_date) AS primary_asn,
|
||||
min(log_date) AS first_seen,
|
||||
max(log_date) AS last_seen
|
||||
FROM view_dashboard_entities
|
||||
WHERE entity_type = 'ip'
|
||||
AND splitByChar('.', toString(src_ip))[1] = %(subnet_prefix)s
|
||||
AND splitByChar('.', toString(src_ip))[2] = %(subnet_mask)s
|
||||
AND splitByChar('.', toString(src_ip))[3] = %(subnet_third)s
|
||||
AND log_date >= today() - INTERVAL %(hours)s HOUR
|
||||
argMax(country_code, detected_at) AS primary_country,
|
||||
argMax(asn_number, detected_at) AS primary_asn,
|
||||
min(detected_at) AS first_seen,
|
||||
max(detected_at) AS last_seen
|
||||
FROM subnet_filter
|
||||
"""
|
||||
|
||||
stats_result = db.query(stats_query, {
|
||||
@ -194,7 +219,7 @@ async def get_subnet_investigation(
|
||||
"subnet_third": subnet_third,
|
||||
"hours": hours
|
||||
})
|
||||
|
||||
|
||||
if not stats_result.result_rows or stats_result.result_rows[0][1] == 0:
|
||||
raise HTTPException(status_code=404, detail="Subnet non trouvé")
|
||||
|
||||
@ -212,30 +237,44 @@ async def get_subnet_investigation(
|
||||
"last_seen": stats_row[9].isoformat() if stats_row[9] else ""
|
||||
}
|
||||
|
||||
# Liste des IPs avec détails - utilise view_dashboard_entities
|
||||
# Liste des IPs avec détails - 2 requêtes séparées + fusion en Python
|
||||
ips_query = """
|
||||
WITH cleaned_ips AS (
|
||||
SELECT
|
||||
replaceRegexpAll(toString(src_ip), '^::ffff:', '') AS clean_ip,
|
||||
detected_at,
|
||||
ja4,
|
||||
country_code,
|
||||
asn_number,
|
||||
threat_level,
|
||||
anomaly_score
|
||||
FROM ml_detected_anomalies
|
||||
WHERE detected_at >= now() - INTERVAL %(hours)s HOUR
|
||||
),
|
||||
subnet_filter AS (
|
||||
SELECT *
|
||||
FROM cleaned_ips
|
||||
WHERE splitByChar('.', clean_ip)[1] = %(subnet_prefix)s
|
||||
AND splitByChar('.', clean_ip)[2] = %(subnet_mask)s
|
||||
AND splitByChar('.', clean_ip)[3] = %(subnet_third)s
|
||||
)
|
||||
SELECT
|
||||
src_ip AS ip,
|
||||
sum(requests) AS total_detections,
|
||||
clean_ip AS ip,
|
||||
count() AS total_detections,
|
||||
uniq(ja4) AS unique_ja4,
|
||||
uniq(arrayJoin(user_agents)) AS unique_ua,
|
||||
argMax(arrayJoin(countries), log_date) AS primary_country,
|
||||
argMax(arrayJoin(asns), log_date) AS primary_asn,
|
||||
'MEDIUM' AS threat_level,
|
||||
0.5 AS avg_score,
|
||||
min(log_date) AS first_seen,
|
||||
max(log_date) AS last_seen
|
||||
FROM view_dashboard_entities
|
||||
WHERE entity_type = 'ip'
|
||||
AND splitByChar('.', toString(src_ip))[1] = %(subnet_prefix)s
|
||||
AND splitByChar('.', toString(src_ip))[2] = %(subnet_mask)s
|
||||
AND splitByChar('.', toString(src_ip))[3] = %(subnet_third)s
|
||||
AND log_date >= today() - INTERVAL %(hours)s HOUR
|
||||
GROUP BY src_ip
|
||||
argMax(country_code, detected_at) AS primary_country,
|
||||
argMax(asn_number, detected_at) AS primary_asn,
|
||||
argMax(threat_level, detected_at) AS threat_level,
|
||||
avg(anomaly_score) AS avg_score,
|
||||
min(detected_at) AS first_seen,
|
||||
max(detected_at) AS last_seen
|
||||
FROM subnet_filter
|
||||
GROUP BY ip
|
||||
ORDER BY total_detections DESC
|
||||
LIMIT 100
|
||||
"""
|
||||
|
||||
# Exécuter la première requête pour obtenir les IPs
|
||||
ips_result = db.query(ips_query, {
|
||||
"subnet_prefix": subnet_prefix,
|
||||
"subnet_mask": subnet_mask,
|
||||
@ -243,19 +282,41 @@ async def get_subnet_investigation(
|
||||
"hours": hours
|
||||
})
|
||||
|
||||
# Extraire la liste des IPs pour la requête UA
|
||||
ip_list = [str(row[0]) for row in ips_result.result_rows]
|
||||
|
||||
# Requête pour les user-agents avec IN clause (utilise l'index)
|
||||
unique_ua_dict = {}
|
||||
if ip_list:
|
||||
# Formater la liste pour la clause IN
|
||||
ip_values = ', '.join(f"'{ip}'" for ip in ip_list)
|
||||
ua_query = f"""
|
||||
SELECT
|
||||
entity_value AS ip,
|
||||
uniq(arrayJoin(user_agents)) AS unique_ua
|
||||
FROM view_dashboard_entities
|
||||
PREWHERE entity_type = 'ip'
|
||||
WHERE entity_value IN ({ip_values})
|
||||
AND log_date >= today() - INTERVAL 30 DAY
|
||||
GROUP BY entity_value
|
||||
"""
|
||||
ua_result = db.query(ua_query, {})
|
||||
unique_ua_dict = {row[0]: row[1] for row in ua_result.result_rows}
|
||||
|
||||
# Fusionner les résultats
|
||||
ips = []
|
||||
for row in ips_result.result_rows:
|
||||
ips.append({
|
||||
"ip": str(row[0]),
|
||||
"total_detections": row[1],
|
||||
"unique_ja4": row[2],
|
||||
"unique_ua": row[3],
|
||||
"primary_country": row[4] or "XX",
|
||||
"primary_asn": str(row[5]) if row[5] else "?",
|
||||
"threat_level": row[6] or "LOW",
|
||||
"avg_score": abs(row[7] or 0),
|
||||
"first_seen": row[8].isoformat() if row[8] else "",
|
||||
"last_seen": row[9].isoformat() if row[9] else ""
|
||||
"unique_ua": unique_ua_dict.get(row[0], 0),
|
||||
"primary_country": row[3] or "XX",
|
||||
"primary_asn": str(row[4]) if row[4] else "?",
|
||||
"threat_level": row[5] or "LOW",
|
||||
"avg_score": abs(row[6] or 0),
|
||||
"first_seen": row[7].isoformat() if row[7] else "",
|
||||
"last_seen": row[8].isoformat() if row[8] else ""
|
||||
})
|
||||
|
||||
return {
|
||||
|
||||
@ -66,7 +66,7 @@ CREATE TABLE IF NOT EXISTS mabase_prod.view_dashboard_entities
|
||||
ENGINE = MergeTree()
|
||||
PARTITION BY toYYYYMM(log_date)
|
||||
ORDER BY (entity_type, entity_value, log_date)
|
||||
TTL log_date + INTERVAL 30 DAY
|
||||
TTL log_date + INTERVAL 90 DAY -- Garder 90 jours (au lieu de 30)
|
||||
SETTINGS index_granularity = 8192;
|
||||
|
||||
-- =============================================================================
|
||||
|
||||
@ -33,7 +33,7 @@ CREATE TABLE IF NOT EXISTS mabase_prod.view_dashboard_user_agents
|
||||
ENGINE = AggregatingMergeTree()
|
||||
PARTITION BY log_date
|
||||
ORDER BY (src_ip, ja4, hour)
|
||||
TTL log_date + INTERVAL 7 DAY
|
||||
TTL log_date + INTERVAL 90 DAY -- Garder 90 jours (au lieu de 7)
|
||||
SETTINGS index_granularity = 8192;
|
||||
|
||||
-- =============================================================================
|
||||
|
||||
@ -46,13 +46,18 @@ export function QuickSearch({ onNavigate }: QuickSearchProps) {
|
||||
const timer = setTimeout(async () => {
|
||||
try {
|
||||
const type = detectType(query);
|
||||
const response = await fetch(`/api/attributes/${type === 'other' ? 'ip' : type}?limit=5`);
|
||||
const endpoint = type === 'other' ? 'ip' : type;
|
||||
const response = await fetch(`/api/attributes/${endpoint}?limit=5`);
|
||||
if (response.ok) {
|
||||
const data = await response.json();
|
||||
setResults(data.items || []);
|
||||
const items = data.items || data || [];
|
||||
setResults(Array.isArray(items) ? items : []);
|
||||
} else {
|
||||
setResults([]);
|
||||
}
|
||||
} catch (error) {
|
||||
console.error('Search error:', error);
|
||||
setResults([]);
|
||||
}
|
||||
}, 300);
|
||||
|
||||
|
||||
73
update_retention_policy.sql
Normal file
73
update_retention_policy.sql
Normal file
@ -0,0 +1,73 @@
|
||||
-- =============================================================================
|
||||
-- Script de modification de la rétention des données
|
||||
-- =============================================================================
|
||||
--
|
||||
-- Ce script modifie les politiques de rétention pour conserver les données
|
||||
-- plus longtemps (90 jours au lieu de 30/7 jours)
|
||||
--
|
||||
-- Instructions:
|
||||
-- -------------
|
||||
-- 1. Se connecter à ClickHouse:
|
||||
-- clickhouse-client --host test-sdv-anubis.sdv.fr --port 8123 \
|
||||
-- --user admin --password SuperPassword123! --database mabase_prod
|
||||
--
|
||||
-- 2. Exécuter ce script:
|
||||
-- clickhouse-client --host test-sdv-anubis.sdv.fr --port 8123 \
|
||||
-- --user admin --password SuperPassword123! < update_retention_policy.sql
|
||||
--
|
||||
-- 3. Vérifier les modifications:
|
||||
-- SHOW TABLES LIKE 'view_dashboard%';
|
||||
--
|
||||
-- =============================================================================
|
||||
|
||||
USE mabase_prod;
|
||||
|
||||
-- =============================================================================
|
||||
-- 1. Modifier la rétention de view_dashboard_entities (30 → 90 jours)
|
||||
-- =============================================================================
|
||||
|
||||
ALTER TABLE mabase_prod.view_dashboard_entities
|
||||
MODIFY TTL log_date + INTERVAL 90 DAY;
|
||||
|
||||
-- =============================================================================
|
||||
-- 2. Modifier la rétention de view_dashboard_user_agents (7 → 90 jours)
|
||||
-- =============================================================================
|
||||
|
||||
ALTER TABLE mabase_prod.view_dashboard_user_agents
|
||||
MODIFY TTL log_date + INTERVAL 90 DAY;
|
||||
|
||||
-- =============================================================================
|
||||
-- 3. (Optionnel) Modifier la rétention de ml_detected_anomalies
|
||||
-- =============================================================================
|
||||
-- Attention: Cette table est gérée par bot_detector_ai
|
||||
-- Décommenter uniquement si vous avez accès à cette table
|
||||
-- =============================================================================
|
||||
|
||||
-- ALTER TABLE mabase_prod.ml_detected_anomalies
|
||||
-- MODIFY TTL detected_at + INTERVAL 30 DAY;
|
||||
|
||||
-- =============================================================================
|
||||
-- 4. Appliquer immédiatement le nouveau TTL (optionnel)
|
||||
-- =============================================================================
|
||||
-- Cette commande peut prendre plusieurs minutes selon la taille des données
|
||||
-- =============================================================================
|
||||
|
||||
-- OPTIMIZE TABLE mabase_prod.view_dashboard_entities FINAL;
|
||||
-- OPTIMIZE TABLE mabase_prod.view_dashboard_user_agents FINAL;
|
||||
|
||||
-- =============================================================================
|
||||
-- 5. Vérification des modifications
|
||||
-- =============================================================================
|
||||
|
||||
SELECT
|
||||
name AS table_name,
|
||||
engine,
|
||||
create_table_query
|
||||
FROM system.tables
|
||||
WHERE database = 'mabase_prod'
|
||||
AND name LIKE 'view_dashboard%'
|
||||
FORMAT Vertical;
|
||||
|
||||
-- =============================================================================
|
||||
-- FIN DU SCRIPT
|
||||
-- =============================================================================
|
||||
Reference in New Issue
Block a user