diff --git a/docs/detection.md b/docs/detection.md new file mode 100644 index 0000000..971dc59 --- /dev/null +++ b/docs/detection.md @@ -0,0 +1,224 @@ +# Architecture de détection — logcorrelator + +## Vue d'ensemble + +Le système de détection est composé de **trois couches** qui s'enchaînent en pipeline : + +``` +Trafic HTTP/TLS capturé + │ + ▼ +┌───────────────────┐ +│ ClickHouse │ Stockage, agrégation, vues heuristiques +│ (SQL pipeline) │ +└────────┬──────────┘ + │ + ▼ +┌───────────────────┐ +│ bot_detector.py │ Modèle IA (Isolation Forest, cycle 5 min) +│ (Python / ML) │ +└────────┬──────────┘ + │ + ▼ +┌───────────────────┐ +│ ml_detected_ │ Table de résultats (ReplacingMergeTree) +│ anomalies │ +└───────────────────┘ +``` + +--- + +## 1. Ingestion des logs (`http_logs_raw` → `http_logs`) + +Les logs bruts arrivent en JSON dans la table `http_logs_raw`. Une **vue matérialisée** (`mv_http_logs`) les parse en temps réel et alimente la table `http_logs`, qui contient les champs structurés suivants : + +| Catégorie | Champs clés | +|---|---| +| Réseau | `src_ip`, `src_port`, `dst_ip`, `dst_port` | +| Enrichissement | `src_asn`, `src_country_code`, `src_as_name` (via dictionnaire IPLocate) | +| HTTP | `method`, `host`, `path`, `query`, `http_version` | +| Corrélation | `correlated`, `orphan_side`, `conn_id`, `keepalives` | +| Métadonnées IP | `ip_meta_ttl`, `ip_meta_id`, `ip_meta_df`, `ip_meta_total_length` | +| Métadonnées TCP | `tcp_meta_window_size`, `tcp_meta_mss`, `tcp_meta_window_scale`, `tcp_meta_options` | +| TLS / Fingerprint | `tls_version`, `tls_sni`, `tls_alpn`, `ja3`, `ja3_hash`, `ja4` | +| En-têtes HTTP | `header_user_agent`, `header_sec_ch_ua*`, `header_sec_fetch_*`, … | + +L'enrichissement IP est réalisé via le dictionnaire `dict_iplocate_asn` (fichier CSV chargé en mémoire, rechargé toutes les 1-2 heures). + +--- + +## 2. Agrégation comportementale (fenêtre horaire) + +Deux tables d'agrégation `AggregatingMergeTree` sont alimentées en continu par des vues matérialisées. + +### 2.1 `agg_host_ip_ja4_1h` — Comportement réseau & applicatif + +Agrège par triplet **(window_start, src_ip, ja4, host)** toutes les heures : + +| Métrique agrégée | Signification | +|---|---| +| `hits` | Nombre total de requêtes | +| `count_post` | Requêtes POST | +| `uniq_paths` | Chemins distincts visités | +| `uniq_query_params` | Paramètres de query distincts | +| `unique_src_ports` | Ports sources distincts | +| `unique_conn_id` | Connexions TCP distinctes | +| `max_keepalives` | Réutilisation maximale d'une connexion | +| `orphan_count` | Requêtes sans corrélation TCP complète | +| `ip_id_zero_count` | Paquets avec IP ID = 0 (spoofing potentiel) | +| `tcp_fp_raw` | Hash de l'empreinte TCP (window, MSS, scale, options) | +| `tcp_jitter_variance` | Variance du délai SYN→ClientHello (jitter TLS) | +| `total_ip_length_var` | Variance de la taille des paquets IP | +| `mss_1460_count` | Requêtes avec MSS = 1460 (signature Ethernet/desktop) | + +### 2.2 `agg_header_fingerprint_1h` — Empreinte des en-têtes HTTP + +Agrège par **(window_start, src_ip)** : + +| Métrique | Signification | +|---|---| +| `header_order_hash` | Hash de l'ordre des en-têtes (fingerprint JA4H) | +| `header_count` | Nombre d'en-têtes distincts | +| `has_accept_language` | Présence de `Accept-Language` | +| `has_cookie` | Présence de `Cookie` | +| `has_referer` | Présence de `Referer` | +| `modern_browser_score` | Score 0/50/100 selon présence UA et `Sec-CH-UA` | +| `ua_ch_mismatch` | Incohérence entre `User-Agent` et `Sec-CH-UA-Platform` | +| `sec_fetch_mode/dest` | Contexte de navigation déclaré | + +--- + +## 3. Exclusions (listes blanches) + +Avant toute analyse, deux tables permettent d'**exclure les robots légitimes** connus : + +- `bot_ip` (fichier `bot_ip.csv`) — IPs à ignorer (crawlers, monitoring…) +- `bot_ja4` (fichier `bot_ja4.csv`) — Fingerprints JA4 à ignorer +- `ref_bot_networks` — Réseaux CIDR IPv4/IPv6 catégorisés (légitimes ou malveillants) + +Ces exclusions sont appliquées dans la vue `view_ai_features_1h`. + +--- + +## 4. Vue IA : `view_ai_features_1h` + +Cette vue consolidée **sur 24 heures glissantes** calcule les **28 features** passées au modèle ML. Elle joint les deux tables d'agrégation et dérive les métriques suivantes : + +| Feature | Calcul | Signal détecté | +|---|---|---| +| `hit_velocity` | `hits / durée_en_secondes` | Volume de requêtes anormalement élevé | +| `fuzzing_index` | `uniq_query_params / uniq_paths` | Exploration paramétrique (fuzzing) | +| `post_ratio` | `count_post / hits` | Soumission de formulaires en masse | +| `port_exhaustion_ratio` | `unique_src_ports / hits` | Rotation de ports (scan) | +| `orphan_ratio` | `orphan_count / hits` | Requêtes sans handshake complet | +| `ip_id_zero_ratio` | `ip_id_zero_count / hits` | Spoofing d'adresse IP | +| `multiplexing_efficiency` | `hits / unique_conn_id` | Réutilisation des connexions (H2/H3) | +| `true_window_size` | `tcp_win * 2^tcp_scale` | Taille réelle de la fenêtre TCP | +| `window_mss_ratio` | `tcp_win / tcp_mss` | Cohérence TCP stack | +| `tcp_jitter_variance` | Variance SYN→ClientHello | Irrégularité du timing TLS | +| `alpn_http_mismatch` | ALPN=h2 mais HTTP/1.1 | Négociation TLS mensongère | +| `is_alpn_missing` | ALPN absent ou `00` | Client non-standard | +| `sni_host_mismatch` | SNI ≠ Host header | Proxy transparent / bot | +| `mss_mobile_mismatch` | MSS=1460 + score navigateur élevé | Client mobile simulé depuis desktop | +| `is_fake_navigation` | `sec_fetch_mode=navigate` mais `sec_fetch_dest≠document` | Navigation simulée | +| `tcp_shared_count` | Nb d'IPs partageant la même empreinte TCP | Infrastructure partagée / botnet | +| `header_order_shared_count` | Nb d'IPs partageant le même ordre d'en-têtes | Outil automatisé commun | + +--- + +## 5. Modèle IA : Isolation Forest (`bot_detector.py`) + +### Cycle d'exécution + +Le service tourne en boucle avec un **cycle de 5 minutes** : + +``` +fetch_and_analyze() + │ + ├─ Requête SELECT * FROM view_ai_features_1h + │ + ├─ Nettoyage des données (fillna) + │ + ├─ Dual-Model routing : + │ ├─ [Complet] correlated=1 → 23 features (réseau + TLS + headers) + │ └─ [Applicatif] correlated=0 → 19 features (headers + comportement) + │ + └─ INSERT INTO ml_detected_anomalies +``` + +### Paramétrage du modèle + +| Paramètre | Valeur | Signification | +|---|---|---| +| `n_estimators` | 200 | Nombre d'arbres d'isolation | +| `contamination` | 0.2% | Proportion de bots attendue dans le trafic | +| `seuil de score` | < -0.05 | Score en dessous duquel une session est marquée anomalie | +| `volume minimum` | 500 sessions | En dessous, le modèle est ignoré (trop peu de données) | + +### Dual-Model routing + +Le trafic est **séparé en deux populations** selon le champ `correlated` : + +- **Modèle Complet** (`correlated=1`) : la corrélation TCP↔HTTP est disponible → les features réseau (TTL, jitter TLS, ALPN, SNI) sont fiables et ajoutées à l'analyse. +- **Modèle Applicatif** (`correlated=0`) : seule la couche HTTP est disponible → l'analyse se concentre sur le comportement applicatif (headers, paths, POST ratio…). + +--- + +## 6. Vues heuristiques statiques + +En parallèle du modèle IA, cinq vues SQL fournissent des **détections déterministes** sans ML, sur fenêtre 24h : + +| Vue | Règle de détection | +|---|---| +| `view_host_ip_ja4_rotation` | IP avec ≥ 5 fingerprints JA4 distincts et > 100 requêtes → rotation d'identité | +| `view_host_ja4_anomalies` | Fingerprint JA4 vu depuis ≥ 20 IPs sur ≥ 3 hôtes → outil de scan distribué | +| `view_form_bruteforce_detected` | ≥ 10 query params distincts et ≥ 20 hits → brute-force de formulaire | +| `view_alpn_mismatch_detected` | HTTP/1.1 avec ALPN h2 ou h3 et ≥ 10 hits → négociation TLS frauduleuse | +| `view_tcp_spoofing_detected` | TTL ≤ 64 avec User-Agent Windows ou iPhone → empreinte OS incohérente | + +--- + +## 7. Résultats : `ml_detected_anomalies` + +Les anomalies détectées sont stockées dans une table `ReplacingMergeTree(detected_at)` avec **TTL 30 jours**. La clé d'ordre `(src_ip, ja4, host)` garantit que chaque triplet ne conserve que la **détection la plus récente** (dédoublonnage automatique). + +Chaque enregistrement contient : +- Les scores et features ayant conduit à la détection +- Le champ `reason` : texte lisible avec score, vélocité, et indice de fuzzing +- Le champ `is_headless` : déduit de l'incohérence `sec_fetch_mode` + +--- + +## 8. Schéma de flux complet + +``` + ┌─────────────────────────────────────┐ + │ http_logs_raw (JSON) │ + └──────────────┬──────────────────────┘ + │ mv_http_logs (MV) + ▼ + ┌─────────────────────────────────────┐ + │ http_logs (parsée) │ + └────────┬──────────────┬─────────────┘ + │ │ + mv_agg_host_ip_ja4 │ │ mv_agg_header_fingerprint + ▼ ▼ + ┌──────────────────┐ ┌──────────────────────────┐ + │ agg_host_ip_ja4 │ │ agg_header_fingerprint │ + │ _1h │ │ _1h │ + └────────┬─────────┘ └──────────┬──────────────┘ + │ │ + └──────────┬─────────────┘ + │ view_ai_features_1h (JOIN + calculs) + ▼ + ┌─────────────────────────────────────┐ + │ bot_detector.py (Isolation Forest) │ + │ Cycle : 5 min | Fenêtre : 24h │ + └──────────────┬──────────────────────┘ + │ + ▼ + ┌─────────────────────────────────────┐ + │ ml_detected_anomalies │ + │ (ReplacingMergeTree, TTL 30j) │ + └─────────────────────────────────────┘ +``` diff --git a/sql/views.sql b/sql/views.sql deleted file mode 100644 index 6edda1b..0000000 --- a/sql/views.sql +++ /dev/null @@ -1,153 +0,0 @@ -<-- 1. NETTOYAGE COMPLET -DROP TABLE IF EXISTS mabase_prod.ml_detected_anomalies; -DROP VIEW IF EXISTS mabase_prod.view_ai_features_1h; -DROP VIEW IF EXISTS mabase_prod.view_host_ip_ja4_rotation; -DROP VIEW IF EXISTS mabase_prod.view_host_ja4_anomalies; -DROP VIEW IF EXISTS mabase_prod.view_form_bruteforce_detected; -DROP VIEW IF EXISTS mabase_prod.view_alpn_mismatch_detected; -DROP VIEW IF EXISTS mabase_prod.view_tcp_spoofing_detected; -DROP VIEW IF EXISTS mabase_prod.mv_agg_host_ip_ja4_1h; -DROP TABLE IF EXISTS mabase_prod.agg_host_ip_ja4_1h; -DROP VIEW IF EXISTS mabase_prod.mv_agg_header_fingerprint_1h; -DROP TABLE IF EXISTS mabase_prod.agg_header_fingerprint_1h; - --- 2. TABLES D'EXCLUSION -CREATE TABLE IF NOT EXISTS mabase_prod.bot_ip (ip String) ENGINE = File(CSV, 'bot_ip.csv'); -CREATE TABLE IF NOT EXISTS mabase_prod.bot_ja4 (ja4 String) ENGINE = File(CSV, 'bot_ja4.csv'); - --- 3. AGRÉGATION COMPORTEMENTALE (26 DIMENSIONS) -CREATE TABLE mabase_prod.agg_host_ip_ja4_1h -( - window_start DateTime, src_ip String, ja4 String, host String, - first_seen SimpleAggregateFunction(min, DateTime), last_seen SimpleAggregateFunction(max, DateTime), - hits SimpleAggregateFunction(sum, UInt64), count_post SimpleAggregateFunction(sum, UInt64), - uniq_paths AggregateFunction(uniq, String), uniq_query_params AggregateFunction(uniq, String), - src_country_code SimpleAggregateFunction(any, String), tcp_fp_raw SimpleAggregateFunction(any, String), - tcp_jitter_variance AggregateFunction(varPop, Float64), tcp_win_raw SimpleAggregateFunction(any, UInt32), - tcp_scale_raw SimpleAggregateFunction(any, UInt32), tcp_mss_raw SimpleAggregateFunction(any, UInt32), - tcp_ttl_raw SimpleAggregateFunction(any, UInt32), http_ver_raw SimpleAggregateFunction(any, String), - tls_alpn_raw SimpleAggregateFunction(any, String), tls_sni_raw SimpleAggregateFunction(any, String), - first_ua SimpleAggregateFunction(any, String), correlated_raw SimpleAggregateFunction(max, UInt8), - unique_src_ports AggregateFunction(uniq, UInt16), unique_conn_id AggregateFunction(uniq, String), - max_keepalives SimpleAggregateFunction(max, UInt32), orphan_count SimpleAggregateFunction(sum, UInt64), - ip_id_zero_count SimpleAggregateFunction(sum, UInt64), total_ip_length_var AggregateFunction(varPop, Float64), - mss_1460_count SimpleAggregateFunction(sum, UInt64) -) ENGINE = AggregatingMergeTree() ORDER BY (window_start, src_ip, ja4, host); - -CREATE MATERIALIZED VIEW mabase_prod.mv_agg_host_ip_ja4_1h TO mabase_prod.agg_host_ip_ja4_1h AS -SELECT - toStartOfHour(src.time) AS window_start, src.src_ip, src.ja4, src.host, - min(src.time) AS first_seen, max(src.time) AS last_seen, count() AS hits, - sum(IF(src.method = 'POST', 1, 0)) AS count_post, uniqState(src.path) AS uniq_paths, - uniqState(src.query) AS uniq_query_params, any(src.src_country_code) AS src_country_code, - any(toString(cityHash64(concat(toString(src.tcp_meta_window_size), toString(src.tcp_meta_mss), toString(src.tcp_meta_window_scale), src.tcp_meta_options)))) AS tcp_fp_raw, - varPopState(toFloat64(src.syn_to_clienthello_ms)) AS tcp_jitter_variance, - any(src.tcp_meta_window_size) AS tcp_win_raw, any(src.tcp_meta_window_scale) AS tcp_scale_raw, - any(src.tcp_meta_mss) AS tcp_mss_raw, any(src.ip_meta_ttl) AS tcp_ttl_raw, - any(src.http_version) AS http_ver_raw, any(src.tls_alpn) AS tls_alpn_raw, - any(src.tls_sni) AS tls_sni_raw, any(src.header_user_agent) AS first_ua, - max(toUInt8(src.correlated)) AS correlated_raw, uniqState(toUInt16(src.src_port)) AS unique_src_ports, - uniqState(src.conn_id) AS unique_conn_id, max(toUInt32(src.keepalives)) AS max_keepalives, - sum(IF(src.orphan_side = 'A' OR toUInt8(src.correlated) = 0, 1, 0)) AS orphan_count, - sum(IF(src.ip_meta_id == 0, 1, 0)) AS ip_id_zero_count, - varPopState(toFloat64(src.ip_meta_total_length)) AS total_ip_length_var, - sum(IF(src.tcp_meta_mss == 1460, 1, 0)) AS mss_1460_count -FROM mabase_prod.http_logs AS src -GROUP BY window_start, src_ip, ja4, host; - --- 4. AGRÉGATION HEADERS (JA4H) -CREATE TABLE mabase_prod.agg_header_fingerprint_1h -( - window_start DateTime, src_ip String, header_order_hash SimpleAggregateFunction(any, String), - header_count SimpleAggregateFunction(max, UInt16), has_accept_language SimpleAggregateFunction(max, UInt8), - has_cookie SimpleAggregateFunction(max, UInt8), has_referer SimpleAggregateFunction(max, UInt8), - modern_browser_score SimpleAggregateFunction(max, UInt8), ua_ch_mismatch SimpleAggregateFunction(max, UInt8), - sec_fetch_mode SimpleAggregateFunction(any, String), sec_fetch_dest SimpleAggregateFunction(any, String) -) ENGINE = AggregatingMergeTree() ORDER BY (window_start, src_ip); - -CREATE MATERIALIZED VIEW mabase_prod.mv_agg_header_fingerprint_1h TO mabase_prod.agg_header_fingerprint_1h AS -SELECT - toStartOfHour(src.time) AS window_start, src.src_ip, any(toString(cityHash64(src.client_headers))) AS header_order_hash, - max(toUInt16(length(src.client_headers) - length(replaceAll(src.client_headers, ',', '')) + 1)) AS header_count, - max(toUInt8(if(position(src.client_headers, 'Accept-Language') > 0, 1, 0))) AS has_accept_language, - max(toUInt8(if(position(src.client_headers, 'Cookie') > 0, 1, 0))) AS has_cookie, - max(toUInt8(if(position(src.client_headers, 'Referer') > 0, 1, 0))) AS has_referer, - max(toUInt8(if(length(src.header_sec_ch_ua) > 0, 100, if(length(src.header_user_agent) > 0, 50, 0)))) AS modern_browser_score, - max(toUInt8(if((position(src.header_user_agent, 'Windows') > 0 AND position(src.header_sec_ch_ua_platform, 'Windows') == 0) OR (position(src.header_user_agent, 'iPhone') > 0 AND position(src.header_sec_ch_ua_platform, 'iOS') == 0), 1, 0))) AS ua_ch_mismatch, - any(src.header_sec_fetch_mode) AS sec_fetch_mode, any(src.header_sec_fetch_dest) AS sec_fetch_dest -FROM mabase_prod.http_logs AS src -GROUP BY window_start, src.src_ip; - --- 5. TABLE RÉSULTATS DÉDOUBLONNÉE -CREATE TABLE mabase_prod.ml_detected_anomalies -( - detected_at DateTime, src_ip String, ja4 String, host String, anomaly_score Float32, - hits UInt64, hit_velocity Float32, fuzzing_index Float32, post_ratio Float32, - port_exhaustion_ratio Float32, max_keepalives UInt32, orphan_ratio Float32, - tcp_jitter_variance Float32, tcp_shared_count UInt32, true_window_size UInt64, - window_mss_ratio Float32, alpn_http_mismatch UInt8, is_alpn_missing UInt8, - sni_host_mismatch UInt8, header_count UInt16, has_accept_language UInt8, - has_cookie UInt8, has_referer UInt8, modern_browser_score UInt8, - is_headless UInt8, ua_ch_mismatch UInt8, header_order_shared_count UInt32, - ip_id_zero_ratio Float32, request_size_variance Float32, multiplexing_efficiency Float32, - mss_mobile_mismatch UInt8, reason String -) ENGINE = ReplacingMergeTree(detected_at) ORDER BY (src_ip, ja4, host) TTL detected_at + INTERVAL 30 DAY; - --- 6. VUE IA (24H + EXCLUSIONS + TOUT MERGE DANS SOUS-REQUÊTE) -CREATE OR REPLACE VIEW mabase_prod.view_ai_features_1h AS -SELECT - a.*, h.*, - (a.count_post / (a.hits + 1)) AS post_ratio, (a.uniq_query_params / (a.uniq_paths + 1)) AS fuzzing_index, - (a.hits / (dateDiff('second', a.first_seen, a.last_seen) + 1)) AS hit_velocity, - (a.unique_src_ports / (a.hits + 1)) AS port_exhaustion_ratio, (a.orphan_count / (a.hits + 1)) AS orphan_ratio, - (a.ip_id_zero_count / (a.hits + 1)) AS ip_id_zero_ratio, (a.hits / (a.unique_conn_id + 1)) AS multiplexing_efficiency, - IF(a.mss_1460_count > (a.hits * 0.8) AND h.modern_browser_score > 70, 1, 0) AS mss_mobile_mismatch, - count() OVER (PARTITION BY a.tcp_fingerprint) AS tcp_shared_count, - count() OVER (PARTITION BY h.header_order_hash) AS header_order_shared_count -FROM ( - SELECT window_start, src_ip, ja4, host, sum(hits) AS hits, uniqMerge(uniq_paths) AS uniq_paths, - uniqMerge(uniq_query_params) AS uniq_query_params, sum(count_post) AS count_post, - min(first_seen) AS first_seen, max(last_seen) AS last_seen, any(tcp_fp_raw) AS tcp_fingerprint, - varPopMerge(tcp_jitter_variance) AS tcp_jitter_variance, varPopMerge(total_ip_length_var) AS request_size_variance, - any(tcp_win_raw * exp2(tcp_scale_raw)) AS true_window_size, - IF(any(tcp_mss_raw) > 0, any(tcp_win_raw) / any(tcp_mss_raw), 0) AS window_mss_ratio, - any(http_ver_raw) AS http_version, any(tls_alpn_raw) AS tls_alpn, any(tls_sni_raw) AS tls_sni, - max(correlated_raw) AS correlated, uniqMerge(unique_src_ports) AS unique_src_ports, - uniqMerge(unique_conn_id) AS unique_conn_id, max(max_keepalives) AS max_keepalives, - sum(orphan_count) AS orphan_count, sum(ip_id_zero_count) AS ip_id_zero_count, sum(mss_1460_count) AS mss_1460_count - FROM mabase_prod.agg_host_ip_ja4_1h - WHERE window_start >= now() - INTERVAL 24 HOUR - AND src_ip NOT IN (SELECT ip FROM mabase_prod.bot_ip) - AND ja4 NOT IN (SELECT ja4 FROM mabase_prod.bot_ja4) - GROUP BY window_start, src_ip, ja4, host -) a -LEFT JOIN ( - SELECT window_start, src_ip, any(header_order_hash) AS header_order_hash, max(header_count) AS header_count, - max(has_accept_language) AS has_accept_language, max(has_cookie) AS has_cookie, - max(has_referer) AS has_referer, max(modern_browser_score) AS modern_browser_score, - max(ua_ch_mismatch) AS ua_ch_mismatch, any(sec_fetch_mode) AS sec_fetch_mode, any(sec_fetch_dest) AS sec_fetch_dest - FROM mabase_prod.agg_header_fingerprint_1h - WHERE window_start >= now() - INTERVAL 24 HOUR - GROUP BY window_start, src_ip -) h ON a.src_ip = h.src_ip AND a.window_start = h.window_start; - --- 7. RESTAURATION VUES HEURISTIQUES -CREATE OR REPLACE VIEW mabase_prod.view_host_ip_ja4_rotation AS -SELECT src_ip, uniqExact(ja4) AS distinct_ja4_count, sum(hits) AS total_hits FROM mabase_prod.agg_host_ip_ja4_1h -WHERE window_start >= now() - INTERVAL 24 HOUR GROUP BY src_ip HAVING distinct_ja4_count >= 5 AND total_hits > 100; - -CREATE OR REPLACE VIEW mabase_prod.view_host_ja4_anomalies AS -SELECT ja4, uniqExact(src_ip) AS unique_ips, uniqExact(src_country_code) AS unique_countries, uniqExact(host) AS targeted_hosts -FROM mabase_prod.agg_host_ip_ja4_1h WHERE window_start >= now() - INTERVAL 24 HOUR GROUP BY ja4 HAVING unique_ips >= 20 AND targeted_hosts >= 3; - -CREATE OR REPLACE VIEW mabase_prod.view_form_bruteforce_detected AS -SELECT src_ip, ja4, host, sum(hits) AS hits, uniqMerge(uniq_query_params) AS query_params_count FROM mabase_prod.agg_host_ip_ja4_1h -WHERE window_start >= now() - INTERVAL 24 HOUR GROUP BY src_ip, ja4, host HAVING query_params_count >= 10 AND hits >= 20; - -CREATE OR REPLACE VIEW mabase_prod.view_alpn_mismatch_detected AS -SELECT src_ip, ja4, host, sum(hits) AS hits, any(http_ver_raw) AS http_version, any(tls_alpn_raw) AS tls_alpn FROM mabase_prod.agg_host_ip_ja4_1h -WHERE window_start >= now() - INTERVAL 24 HOUR GROUP BY src_ip, ja4, host HAVING http_version = '1.1' AND tls_alpn IN ('h2', 'h3') AND hits >= 10; - -CREATE OR REPLACE VIEW mabase_prod.view_tcp_spoofing_detected AS -SELECT src_ip, ja4, any(tcp_ttl_raw) AS tcp_ttl, any(tcp_win_raw) AS tcp_window_size, any(first_ua) AS first_ua FROM mabase_prod.agg_host_ip_ja4_1h -WHERE window_start >= now() - INTERVAL 24 HOUR GROUP BY src_ip, ja4 HAVING tcp_ttl <= 64 AND (first_ua ILIKE '%Windows%' OR first_ua ILIKE '%iPhone%');