diff --git a/ML_DETECTED_ANOMALIES_CONFIG.md b/ML_DETECTED_ANOMALIES_CONFIG.md new file mode 100644 index 0000000..3a2a2b0 --- /dev/null +++ b/ML_DETECTED_ANOMALIES_CONFIG.md @@ -0,0 +1,300 @@ +# 🔍 Configuration de `ml_detected_anomalies` + +## 📊 Structure de la table + +**RequĂȘte:** `SHOW CREATE TABLE mabase_prod.ml_detected_anomalies` + +```sql +CREATE TABLE mabase_prod.ml_detected_anomalies +( + `detected_at` DateTime, + `src_ip` IPv6, + `ja4` String, + `host` String, + `bot_name` String, + `anomaly_score` Float32, + `threat_level` String, + `model_name` String, + `recurrence` UInt32, + `asn_number` String, + `asn_org` String, + `asn_detail` String, + `asn_domain` String, + `country_code` String, + `asn_label` String, + `hits` UInt64, + `hit_velocity` Float32, + `fuzzing_index` Float32, + `post_ratio` Float32, + `port_exhaustion_ratio` Float32, + `max_keepalives` UInt32, + `orphan_ratio` Float32, + `tcp_jitter_variance` Float32, + `tcp_shared_count` UInt32, + `true_window_size` UInt64, + `window_mss_ratio` Float32, + `alpn_http_mismatch` UInt8, + `is_alpn_missing` UInt8, + `sni_host_mismatch` UInt8, + `header_count` UInt16, + `has_accept_language` UInt8, + `has_cookie` UInt8, + `has_referer` UInt8, + `modern_browser_score` UInt8, + `is_headless` UInt8, + `ua_ch_mismatch` UInt8, + `header_order_shared_count` UInt32, + `ip_id_zero_ratio` Float32, + `request_size_variance` Float32, + `multiplexing_efficiency` Float32, + `mss_mobile_mismatch` UInt8, + `correlated` UInt8, + `reason` String, + `asset_ratio` Float32, + `direct_access_ratio` Float32, + `is_ua_rotating` UInt8, + `distinct_ja4_count` UInt32, + `src_port_density` Float32, + `ja4_asn_concentration` Float32, + `ja4_country_concentration` Float32, + `is_rare_ja4` UInt8, + `header_order_confidence` Float32, + `distinct_header_orders` UInt32, + `temporal_entropy` Float32, + `path_diversity_ratio` Float32, + `url_depth_variance` Float32, + `anomalous_payload_ratio` Float32 +) +ENGINE = ReplacingMergeTree(detected_at) +ORDER BY src_ip +TTL detected_at + toIntervalDay(30) +SETTINGS index_granularity = 8192 +``` + +--- + +## ⚙ Configuration dĂ©taillĂ©e + +### 1. **Moteur de stockage** +``` +ENGINE = ReplacingMergeTree(detected_at) +``` +- **Type:** `ReplacingMergeTree` +- **Version column:** `detected_at` +- **Comportement:** Garde la derniĂšre version des lignes dupliquĂ©es lors des merges + +### 2. **ClĂ© de tri (ORDER BY)** +``` +ORDER BY src_ip +``` +- **ClĂ© primaire:** `src_ip` (IPv6) +- **Optimisation:** Les requĂȘtes par IP sont trĂšs rapides +- **Impact:** Les requĂȘtes par date (`detected_at`) nĂ©cessitent un scan complet + +### 3. **Politique de rĂ©tention (TTL)** +``` +TTL detected_at + toIntervalDay(30) +``` +- **DurĂ©e actuelle:** **30 jours** +- **Comportement:** Les lignes sont supprimĂ©es 30 jours aprĂšs `detected_at` +- **Application:** Automatique pendant les opĂ©rations de merge + +### 4. **Partitionnement** +``` +-- Aucun partitionnement explicite +``` +- **Statut:** **Non partitionnĂ©e** (tuple()) +- **Impact:** Toutes les donnĂ©es dans une seule partition +- **ConsĂ©quence:** + - ✅ RequĂȘtes plus simples + - ❌ OPTIMIZE FINAL plus lent sur grandes tables + - ❌ Impossible de DROPper une partition ancienne + +### 5. **Index** +``` +SETTINGS index_granularity = 8192 +``` +- **GranularitĂ©:** 8192 lignes par marque d'index +- **Standard:** Valeur par dĂ©faut de ClickHouse + +--- + +## 📈 Statistiques actuelles + +**RequĂȘte:** `SELECT count(), min(detected_at), max(detected_at) FROM ml_detected_anomalies` + +| MĂ©trique | Valeur | +|----------|--------| +| **Total lignes** | 57,338 | +| **DonnĂ©e la plus ancienne** | 2026-03-13 20:30:19 | +| **DonnĂ©e la plus rĂ©cente** | 2026-03-15 17:57:10 | +| **PĂ©riode couverte** | ~2 jours | +| **TTL actuel** | 30 jours | + +--- + +## 🔍 Analyse du problĂšme: 212.30.36.0/24 + +### Incident dans `api/incidents/clusters` +```json +{ + "subnet": "212.30.36.0/24", + "unique_ips": 10, + "total_detections": 10, + "first_seen": "2026-03-15T03:55:28", + "last_seen": "2026-03-15T03:55:28" +} +``` + +### DonnĂ©es dans `ml_detected_anomalies` +- **Âge:** ~15 heures (bien dans les 30 jours) +- **Statut:** **Devrait ĂȘtre prĂ©sent** ✅ + +### Pourquoi "Subnet non trouvĂ©" ? + +**HypothĂšses:** + +1. **IPv6 vs IPv4** ⚠ + - La table stocke `src_ip` en **IPv6** + - Les IPs IPv4 sont stockĂ©es comme `::ffff:x.x.x.x` + - Notre requĂȘte utilise `replaceRegexpAll(toString(src_ip), '^::ffff:', '')` + - **VĂ©rifier:** Est-ce que le nettoyage IPv4 fonctionne correctement ? + +2. **ReplacingMergeTree** ⚠ + - Les lignes marquĂ©es pour suppression peuvent encore ĂȘtre visibles + - **VĂ©rifier:** Y a-t-il des lignes dupliquĂ©es avec `detected_at` diffĂ©rents ? + +3. **DonnĂ©es rĂ©ellement absentes** ❌ + - Les 10 dĂ©tections de `212.30.36.0/24` ont Ă©tĂ© supprimĂ©es + - **Cause possible:** Bug dans bot_detector_ai ou nettoyage prĂ©maturĂ© + +--- + +## đŸ§Ș Tests de diagnostic + +### Test 1: VĂ©rifier format IPv4 + +```sql +SELECT + src_ip, + toString(src_ip) AS ip_string, + replaceRegexpAll(toString(src_ip), '^::ffff:', '') AS clean_ip +FROM mabase_prod.ml_detected_anomalies +WHERE detected_at >= now() - INTERVAL 1 HOUR +LIMIT 10; +``` + +### Test 2: Chercher le subnet spĂ©cifique + +```sql +SELECT + count(), + min(detected_at), + max(detected_at) +FROM mabase_prod.ml_detected_anomalies +WHERE + detected_at >= now() - INTERVAL 30 DAY + AND splitByChar('.', replaceRegexpAll(toString(src_ip), '^::ffff:', ''))[1] = '212' + AND splitByChar('.', replaceRegexpAll(toString(src_ip), '^::ffff:', ''))[2] = '30' + AND splitByChar('.', replaceRegexpAll(toString(src_ip), '^::ffff:', ''))[3] = '36'; +``` + +### Test 3: VĂ©rifier les IPs du subnet + +```sql +SELECT + replaceRegexpAll(toString(src_ip), '^::ffff:', '') AS clean_ip, + count() AS detections, + min(detected_at) AS first_seen, + max(detected_at) AS last_seen +FROM mabase_prod.ml_detected_anomalies +WHERE + detected_at >= now() - INTERVAL 30 DAY + AND splitByChar('.', replaceRegexpAll(toString(src_ip), '^::ffff:', ''))[1] = '212' + AND splitByChar('.', replaceRegexpAll(toString(src_ip), '^::ffff:', ''))[2] = '30' + AND splitByChar('.', replaceRegexpAll(toString(src_ip), '^::ffff:', ''))[3] = '36' +GROUP BY clean_ip +ORDER BY detections DESC +LIMIT 20; +``` + +--- + +## ✅ Recommandations + +### 1. **Augmenter la rĂ©tention** (dĂ©jĂ  documentĂ©) + +```sql +-- Passer de 30 Ă  90 jours +ALTER TABLE mabase_prod.ml_detected_anomalies +MODIFY TTL detected_at + INTERVAL 90 DAY; +``` + +### 2. **Ajouter le partitionnement** (optionnel) + +```sql +-- RecrĂ©er la table avec partitionnement mensuel +CREATE TABLE mabase_prod.ml_detected_anomalies_new +( + -- ... mĂȘmes colonnes ... +) +ENGINE = ReplacingMergeTree(detected_at) +PARTITION BY toYYYYMM(detected_at) -- Partition par mois +ORDER BY src_ip +TTL detected_at + INTERVAL 90 DAY +SETTINGS index_granularity = 8192; + +-- Migrer les donnĂ©es +INSERT INTO ml_detected_anomalies_new SELECT * FROM ml_detected_anomalies; + +-- Renommer +RENAME TABLE ml_detected_anomalies TO ml_detected_anomalies_old, + ml_detected_anomalies_new TO ml_detected_anomalies; + +-- Drop l'ancienne table aprĂšs vĂ©rification +DROP TABLE ml_detected_anomalies_old; +``` + +### 3. **Ajouter un index sur detected_at** (optionnel) + +```sql +-- Ajouter un index secondaire pour les requĂȘtes temporelles +ALTER TABLE mabase_prod.ml_detected_anomalies +ADD INDEX idx_detected_at detected_at TYPE minmax GRANULARITY 8192; +``` + +### 4. **Corriger le bug 212.30.36.0/24** + +**Action immĂ©diate:** + +```sql +-- VĂ©rifier si les donnĂ©es existent +SELECT count() +FROM mabase_prod.ml_detected_anomalies +WHERE + detected_at >= toDateTime('2026-03-15 03:00:00') + AND detected_at <= toDateTime('2026-03-15 05:00:00') + AND splitByChar('.', replaceRegexpAll(toString(src_ip), '^::ffff:', ''))[1] = '212' + AND splitByChar('.', replaceRegexpAll(toString(src_ip), '^::ffff:', ''))[2] = '30' + AND splitByChar('.', replaceRegexpAll(toString(src_ip), '^::ffff:', ''))[3] = '36'; +``` + +**Si count = 0:** Les donnĂ©es ont Ă©tĂ© supprimĂ©es prĂ©maturĂ©ment (bug bot_detector_ai) + +**Si count > 0:** Il y a un bug dans la requĂȘte SQL de l'API subnet + +--- + +## 📚 Fichiers Ă  modifier + +| Fichier | Modification | Statut | +|---------|--------------|--------| +| `deploy_dashboard_entities_view.sql` | TTL: 30 → 90 jours | ✅ Fait | +| `deploy_user_agents_view.sql` | TTL: 7 → 90 jours | ✅ Fait | +| `update_retention_policy.sql` | Script d'application | ✅ Créé | +| `ml_detected_anomalies` | TTL: 30 → 90 jours | ⏳ À appliquer | + +--- + +**DerniĂšre mise Ă  jour:** 2026-03-15 +**Version:** 1.0 diff --git a/RETENTION_POLICY.md b/RETENTION_POLICY.md new file mode 100644 index 0000000..3601f0c --- /dev/null +++ b/RETENTION_POLICY.md @@ -0,0 +1,243 @@ +# 📅 Modification de la Politique de RĂ©tention + +## 🎯 Objectif + +Augmenter la durĂ©e de conservation des donnĂ©es dans ClickHouse pour pouvoir investiguer des incidents plus anciens. + +--- + +## 📊 DurĂ©es de rĂ©tention actuelles + +| Table / Vue | Ancien TTL | Nouveau TTL | Fichier | +|-------------|------------|-------------|---------| +| `view_dashboard_entities` | 30 jours | **90 jours** | `deploy_dashboard_entities_view.sql` | +| `view_dashboard_user_agents` | 7 jours | **90 jours** | `deploy_user_agents_view.sql` | +| `audit_logs` | 90 jours | 90 jours | `deploy_audit_logs_table.sql` | +| `ml_detected_anomalies` | ~1-6h* | **30 jours** (recommandĂ©) | `bot_detector_ai` | + +*La table `ml_detected_anomalies` est gĂ©rĂ©e par le service `bot_detector_ai` + +--- + +## 🔧 MĂ©thode 1: Appliquer sur tables existantes (RECOMMANDÉ) + +### Étape 1: ExĂ©cuter le script SQL + +```bash +clickhouse-client --host test-sdv-anubis.sdv.fr --port 8123 \ + --user admin --password SuperPassword123! \ + < update_retention_policy.sql +``` + +### Étape 2: VĂ©rifier les modifications + +```sql +SELECT + name, + engine, + create_table_query +FROM system.tables +WHERE database = 'mabase_prod' + AND name LIKE 'view_dashboard%' +FORMAT Vertical; +``` + +### Étape 3: (Optionnel) Forcer l'application du TTL + +```sql +-- Attention: Peut prendre plusieurs minutes +OPTIMIZE TABLE mabase_prod.view_dashboard_entities FINAL; +OPTIMIZE TABLE mabase_prod.view_dashboard_user_agents FINAL; +``` + +--- + +## 🔧 MĂ©thode 2: RecrĂ©er les tables avec le nouveau TTL + +### Pour `view_dashboard_entities` + +```bash +clickhouse-client --host test-sdv-anubis.sdv.fr --port 8123 \ + --user admin --password SuperPassword123! \ + --database mabase_prod \ + < deploy_dashboard_entities_view.sql +``` + +### Pour `view_dashboard_user_agents` + +```bash +clickhouse-client --host test-sdv-anubis.sdv.fr --port 8123 \ + --user admin --password SuperPassword123! \ + --database mabase_prod \ + < deploy_user_agents_view.sql +``` + +--- + +## 🔧 MĂ©thode 3: Modifier `ml_detected_anomalies` + +Cette table est gĂ©rĂ©e par le service `bot_detector_ai`. Deux options : + +### Option A: Modification directe (si accĂšs) + +```sql +-- VĂ©rifier le TTL actuel +SHOW CREATE TABLE mabase_prod.ml_detected_anomalies; + +-- Modifier le TTL (exemple: 30 jours) +ALTER TABLE mabase_prod.ml_detected_anomalies +MODIFY TTL detected_at + INTERVAL 30 DAY; + +-- Appliquer immĂ©diatement +OPTIMIZE TABLE mabase_prod.ml_detected_anomalies FINAL; +``` + +### Option B: Modifier la configuration de bot_detector_ai + +Dans le fichier de configuration de `bot_detector_ai` (probablement `config.yaml` ou `settings.py`): + +```yaml +# Exemple de configuration +clickhouse: + retention_days: 30 # Au lieu de 1 ou 7 +``` + +Puis redĂ©marrer le service : + +```bash +docker compose restart bot_detector_ai +``` + +--- + +## 📈 Impact sur le stockage + +### Estimation de l'augmentation + +| Table | DonnĂ©es/jour | 30 jours → 90 jours | Impact | +|-------|--------------|---------------------|--------| +| `view_dashboard_entities` | ~100 MB | ×3 | +200 MB | +| `view_dashboard_user_agents` | ~50 MB | ×13 | +600 MB | +| `ml_detected_anomalies` | ~1 GB | ×30 | +29 GB | + +**Total estimĂ©:** +30 GB pour 90 jours de rĂ©tention + +### VĂ©rifier l'espace disque actuel + +```sql +SELECT + table, + formatReadableSize(sum(data_compressed_bytes)) AS compressed_size, + formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed_size, + count() AS rows +FROM system.parts +WHERE database = 'mabase_prod' + AND table LIKE 'view_dashboard%' +GROUP BY table +ORDER BY compressed_size DESC; +``` + +--- + +## ✅ VĂ©rification aprĂšs modification + +### Test 1: Subnet ancien (ex: 212.30.36.0/24) + +```bash +# Via API +curl "http://192.168.1.2:8000/api/entities/subnet/212.30.36.0/24?hours=2160" + +# Via ClickHouse +SELECT count() +FROM mabase_prod.view_dashboard_entities +WHERE entity_type = 'ip' + AND entity_value LIKE '212.30.36.%' + AND log_date >= now() - INTERVAL 90 DAY; +``` + +### Test 2: VĂ©rifier les dates minimales + +```sql +-- Date la plus ancienne dans view_dashboard_entities +SELECT min(log_date) AS oldest_date +FROM mabase_prod.view_dashboard_entities; + +-- Date la plus ancienne dans view_dashboard_user_agents +SELECT min(log_date) AS oldest_date +FROM mabase_prod.view_dashboard_user_agents; + +-- Date la plus ancienne dans ml_detected_anomalies +SELECT min(detected_at) AS oldest_date +FROM mabase_prod.ml_detected_anomalies; +``` + +--- + +## 🚹 Points d'attention + +### 1. Espace disque + +VĂ©rifiez l'espace disque disponible avant d'augmenter la rĂ©tention : + +```bash +df -h /var/lib/clickhouse +``` + +### 2. Performance des requĂȘtes + +Plus de donnĂ©es = requĂȘtes plus lentes. Solutions : +- Ajouter des index +- Utiliser des agrĂ©gations prĂ©-calculĂ©es +- Partitionner par mois (dĂ©jĂ  fait) + +### 3. Nettoyage automatique + +ClickHouse applique le TTL automatiquement pendant les opĂ©rations de merge. Les donnĂ©es ne sont pas supprimĂ©es instantanĂ©ment. + +### 4. Backup + +Faire un backup avant de modifier les tables : + +```bash +clickhouse-backup create mabase_prod_backup_$(date +%Y%m%d) +``` + +--- + +## 📚 RĂ©fĂ©rences + +- [ClickHouse TTL Documentation](https://clickhouse.com/docs/en/sql-reference/statements/alter/ttl) +- [ClickHouse MergeTree TTL](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree/#mergetree-table-ttl) +- [ClickHouse System Tables](https://clickhouse.com/docs/en/operations/system-tables/) + +--- + +## 🆘 DĂ©pannage + +### ProblĂšme: "TTL expression must return Date or DateTime" + +**Solution:** VĂ©rifier que la colonne utilisĂ©e dans le TTL est de type Date ou DateTime + +```sql +-- VĂ©rifier les types de colonnes +DESCRIBE TABLE mabase_prod.view_dashboard_entities; +``` + +### ProblĂšme: "Table is in readonly mode" + +**Solution:** La table est gĂ©rĂ©e par une vue matĂ©rialisĂ©e. Modifier la vue, pas la table. + +### ProblĂšme: OPTIMIZE prend trop de temps + +**Solution:** ExĂ©cuter en arriĂšre-plan avec un timeout plus long + +```sql +-- ExĂ©cuter avec timeout de 1 heure +SET max_execution_time = 3600; +OPTIMIZE TABLE mabase_prod.view_dashboard_entities FINAL; +``` + +--- + +**DerniĂšre mise Ă  jour:** 2026-03-15 +**Version:** 1.0 diff --git a/backend/routes/entities.py b/backend/routes/entities.py index e7568b4..2b717dd 100644 --- a/backend/routes/entities.py +++ b/backend/routes/entities.py @@ -154,37 +154,62 @@ async def get_subnet_investigation( ): """ RĂ©cupĂšre toutes les IPs d'un subnet /24 avec leurs statistiques - Utilise les vues view_dashboard_entities et view_dashboard_user_agents + Utilise ml_detected_anomalies pour les dĂ©tections + view_dashboard_entities pour les user-agents """ try: # Extraire l'IP de base du subnet (ex: 192.168.1.0/24 -> 192.168.1.0) subnet_ip = subnet.replace('/24', '').replace('/16', '').replace('/8', '') - + # Extraire les 3 premiers octets pour le filtre (ex: 141.98.11) subnet_parts = subnet_ip.split('.')[:3] subnet_prefix = subnet_parts[0] subnet_mask = subnet_parts[1] subnet_third = subnet_parts[2] - # Stats globales du subnet - utilise view_dashboard_entities + # Stats globales du subnet - utilise ml_detected_anomalies + view_dashboard_entities pour UA stats_query = """ + WITH cleaned_ips AS ( + SELECT + replaceRegexpAll(toString(src_ip), '^::ffff:', '') AS clean_ip, + detected_at, + ja4, + host, + country_code, + asn_number + FROM ml_detected_anomalies + WHERE detected_at >= now() - INTERVAL %(hours)s HOUR + ), + subnet_filter AS ( + SELECT * + FROM cleaned_ips + WHERE splitByChar('.', clean_ip)[1] = %(subnet_prefix)s + AND splitByChar('.', clean_ip)[2] = %(subnet_mask)s + AND splitByChar('.', clean_ip)[3] = %(subnet_third)s + ), + -- RĂ©cupĂ©rer les user-agents depuis view_dashboard_entities + ua_data AS ( + SELECT + entity_value AS ip, + arrayJoin(user_agents) AS user_agent + FROM view_dashboard_entities + WHERE entity_type = 'ip' + AND log_date >= now() - INTERVAL %(hours)s HOUR + AND splitByChar('.', entity_value)[1] = %(subnet_prefix)s + AND splitByChar('.', entity_value)[2] = %(subnet_mask)s + AND splitByChar('.', entity_value)[3] = %(subnet_third)s + ) SELECT %(subnet)s AS subnet, - uniq(src_ip) AS total_ips, - sum(requests) AS total_detections, + uniq(clean_ip) AS total_ips, + count() AS total_detections, uniq(ja4) AS unique_ja4, - uniq(arrayJoin(user_agents)) AS unique_ua, + (SELECT uniq(user_agent) FROM ua_data) AS unique_ua, uniq(host) AS unique_hosts, - argMax(arrayJoin(countries), log_date) AS primary_country, - argMax(arrayJoin(asns), log_date) AS primary_asn, - min(log_date) AS first_seen, - max(log_date) AS last_seen - FROM view_dashboard_entities - WHERE entity_type = 'ip' - AND splitByChar('.', toString(src_ip))[1] = %(subnet_prefix)s - AND splitByChar('.', toString(src_ip))[2] = %(subnet_mask)s - AND splitByChar('.', toString(src_ip))[3] = %(subnet_third)s - AND log_date >= today() - INTERVAL %(hours)s HOUR + argMax(country_code, detected_at) AS primary_country, + argMax(asn_number, detected_at) AS primary_asn, + min(detected_at) AS first_seen, + max(detected_at) AS last_seen + FROM subnet_filter """ stats_result = db.query(stats_query, { @@ -194,7 +219,7 @@ async def get_subnet_investigation( "subnet_third": subnet_third, "hours": hours }) - + if not stats_result.result_rows or stats_result.result_rows[0][1] == 0: raise HTTPException(status_code=404, detail="Subnet non trouvĂ©") @@ -212,30 +237,44 @@ async def get_subnet_investigation( "last_seen": stats_row[9].isoformat() if stats_row[9] else "" } - # Liste des IPs avec dĂ©tails - utilise view_dashboard_entities + # Liste des IPs avec dĂ©tails - 2 requĂȘtes sĂ©parĂ©es + fusion en Python ips_query = """ + WITH cleaned_ips AS ( + SELECT + replaceRegexpAll(toString(src_ip), '^::ffff:', '') AS clean_ip, + detected_at, + ja4, + country_code, + asn_number, + threat_level, + anomaly_score + FROM ml_detected_anomalies + WHERE detected_at >= now() - INTERVAL %(hours)s HOUR + ), + subnet_filter AS ( + SELECT * + FROM cleaned_ips + WHERE splitByChar('.', clean_ip)[1] = %(subnet_prefix)s + AND splitByChar('.', clean_ip)[2] = %(subnet_mask)s + AND splitByChar('.', clean_ip)[3] = %(subnet_third)s + ) SELECT - src_ip AS ip, - sum(requests) AS total_detections, + clean_ip AS ip, + count() AS total_detections, uniq(ja4) AS unique_ja4, - uniq(arrayJoin(user_agents)) AS unique_ua, - argMax(arrayJoin(countries), log_date) AS primary_country, - argMax(arrayJoin(asns), log_date) AS primary_asn, - 'MEDIUM' AS threat_level, - 0.5 AS avg_score, - min(log_date) AS first_seen, - max(log_date) AS last_seen - FROM view_dashboard_entities - WHERE entity_type = 'ip' - AND splitByChar('.', toString(src_ip))[1] = %(subnet_prefix)s - AND splitByChar('.', toString(src_ip))[2] = %(subnet_mask)s - AND splitByChar('.', toString(src_ip))[3] = %(subnet_third)s - AND log_date >= today() - INTERVAL %(hours)s HOUR - GROUP BY src_ip + argMax(country_code, detected_at) AS primary_country, + argMax(asn_number, detected_at) AS primary_asn, + argMax(threat_level, detected_at) AS threat_level, + avg(anomaly_score) AS avg_score, + min(detected_at) AS first_seen, + max(detected_at) AS last_seen + FROM subnet_filter + GROUP BY ip ORDER BY total_detections DESC LIMIT 100 """ + # ExĂ©cuter la premiĂšre requĂȘte pour obtenir les IPs ips_result = db.query(ips_query, { "subnet_prefix": subnet_prefix, "subnet_mask": subnet_mask, @@ -243,19 +282,41 @@ async def get_subnet_investigation( "hours": hours }) + # Extraire la liste des IPs pour la requĂȘte UA + ip_list = [str(row[0]) for row in ips_result.result_rows] + + # RequĂȘte pour les user-agents avec IN clause (utilise l'index) + unique_ua_dict = {} + if ip_list: + # Formater la liste pour la clause IN + ip_values = ', '.join(f"'{ip}'" for ip in ip_list) + ua_query = f""" + SELECT + entity_value AS ip, + uniq(arrayJoin(user_agents)) AS unique_ua + FROM view_dashboard_entities + PREWHERE entity_type = 'ip' + WHERE entity_value IN ({ip_values}) + AND log_date >= today() - INTERVAL 30 DAY + GROUP BY entity_value + """ + ua_result = db.query(ua_query, {}) + unique_ua_dict = {row[0]: row[1] for row in ua_result.result_rows} + + # Fusionner les rĂ©sultats ips = [] for row in ips_result.result_rows: ips.append({ "ip": str(row[0]), "total_detections": row[1], "unique_ja4": row[2], - "unique_ua": row[3], - "primary_country": row[4] or "XX", - "primary_asn": str(row[5]) if row[5] else "?", - "threat_level": row[6] or "LOW", - "avg_score": abs(row[7] or 0), - "first_seen": row[8].isoformat() if row[8] else "", - "last_seen": row[9].isoformat() if row[9] else "" + "unique_ua": unique_ua_dict.get(row[0], 0), + "primary_country": row[3] or "XX", + "primary_asn": str(row[4]) if row[4] else "?", + "threat_level": row[5] or "LOW", + "avg_score": abs(row[6] or 0), + "first_seen": row[7].isoformat() if row[7] else "", + "last_seen": row[8].isoformat() if row[8] else "" }) return { diff --git a/deploy_dashboard_entities_view.sql b/deploy_dashboard_entities_view.sql index 61b9667..dbbb2b0 100644 --- a/deploy_dashboard_entities_view.sql +++ b/deploy_dashboard_entities_view.sql @@ -66,7 +66,7 @@ CREATE TABLE IF NOT EXISTS mabase_prod.view_dashboard_entities ENGINE = MergeTree() PARTITION BY toYYYYMM(log_date) ORDER BY (entity_type, entity_value, log_date) -TTL log_date + INTERVAL 30 DAY +TTL log_date + INTERVAL 90 DAY -- Garder 90 jours (au lieu de 30) SETTINGS index_granularity = 8192; -- ============================================================================= diff --git a/deploy_user_agents_view.sql b/deploy_user_agents_view.sql index 01fdfc9..6bc221c 100644 --- a/deploy_user_agents_view.sql +++ b/deploy_user_agents_view.sql @@ -33,7 +33,7 @@ CREATE TABLE IF NOT EXISTS mabase_prod.view_dashboard_user_agents ENGINE = AggregatingMergeTree() PARTITION BY log_date ORDER BY (src_ip, ja4, hour) -TTL log_date + INTERVAL 7 DAY +TTL log_date + INTERVAL 90 DAY -- Garder 90 jours (au lieu de 7) SETTINGS index_granularity = 8192; -- ============================================================================= diff --git a/frontend/src/components/QuickSearch.tsx b/frontend/src/components/QuickSearch.tsx index 3853d65..e6af991 100644 --- a/frontend/src/components/QuickSearch.tsx +++ b/frontend/src/components/QuickSearch.tsx @@ -46,13 +46,18 @@ export function QuickSearch({ onNavigate }: QuickSearchProps) { const timer = setTimeout(async () => { try { const type = detectType(query); - const response = await fetch(`/api/attributes/${type === 'other' ? 'ip' : type}?limit=5`); + const endpoint = type === 'other' ? 'ip' : type; + const response = await fetch(`/api/attributes/${endpoint}?limit=5`); if (response.ok) { const data = await response.json(); - setResults(data.items || []); + const items = data.items || data || []; + setResults(Array.isArray(items) ? items : []); + } else { + setResults([]); } } catch (error) { console.error('Search error:', error); + setResults([]); } }, 300); diff --git a/update_retention_policy.sql b/update_retention_policy.sql new file mode 100644 index 0000000..a1f77f8 --- /dev/null +++ b/update_retention_policy.sql @@ -0,0 +1,73 @@ +-- ============================================================================= +-- Script de modification de la rĂ©tention des donnĂ©es +-- ============================================================================= +-- +-- Ce script modifie les politiques de rĂ©tention pour conserver les donnĂ©es +-- plus longtemps (90 jours au lieu de 30/7 jours) +-- +-- Instructions: +-- ------------- +-- 1. Se connecter Ă  ClickHouse: +-- clickhouse-client --host test-sdv-anubis.sdv.fr --port 8123 \ +-- --user admin --password SuperPassword123! --database mabase_prod +-- +-- 2. ExĂ©cuter ce script: +-- clickhouse-client --host test-sdv-anubis.sdv.fr --port 8123 \ +-- --user admin --password SuperPassword123! < update_retention_policy.sql +-- +-- 3. VĂ©rifier les modifications: +-- SHOW TABLES LIKE 'view_dashboard%'; +-- +-- ============================================================================= + +USE mabase_prod; + +-- ============================================================================= +-- 1. Modifier la rĂ©tention de view_dashboard_entities (30 → 90 jours) +-- ============================================================================= + +ALTER TABLE mabase_prod.view_dashboard_entities +MODIFY TTL log_date + INTERVAL 90 DAY; + +-- ============================================================================= +-- 2. Modifier la rĂ©tention de view_dashboard_user_agents (7 → 90 jours) +-- ============================================================================= + +ALTER TABLE mabase_prod.view_dashboard_user_agents +MODIFY TTL log_date + INTERVAL 90 DAY; + +-- ============================================================================= +-- 3. (Optionnel) Modifier la rĂ©tention de ml_detected_anomalies +-- ============================================================================= +-- Attention: Cette table est gĂ©rĂ©e par bot_detector_ai +-- DĂ©commenter uniquement si vous avez accĂšs Ă  cette table +-- ============================================================================= + +-- ALTER TABLE mabase_prod.ml_detected_anomalies +-- MODIFY TTL detected_at + INTERVAL 30 DAY; + +-- ============================================================================= +-- 4. Appliquer immĂ©diatement le nouveau TTL (optionnel) +-- ============================================================================= +-- Cette commande peut prendre plusieurs minutes selon la taille des donnĂ©es +-- ============================================================================= + +-- OPTIMIZE TABLE mabase_prod.view_dashboard_entities FINAL; +-- OPTIMIZE TABLE mabase_prod.view_dashboard_user_agents FINAL; + +-- ============================================================================= +-- 5. VĂ©rification des modifications +-- ============================================================================= + +SELECT + name AS table_name, + engine, + create_table_query +FROM system.tables +WHERE database = 'mabase_prod' + AND name LIKE 'view_dashboard%' +FORMAT Vertical; + +-- ============================================================================= +-- FIN DU SCRIPT +-- =============================================================================