8180f4af04
refactor(anubis): simplify to IP/CIDR + ASN only, remove UA and Country rules
...
- Remove UA regex extraction (extract_ua_regex, _extract_ua_from_all/any)
- Remove Country rule collection from parse_bot_policies_inline
- Simplify fetch_rules.py: collect_all_rules returns (ip_rules, asn_rules)
- Remove insert_ua_rules and insert_country_rules functions
- reload_dicts now only reloads dict_anubis_ip + dict_anubis_asn
- Simplify CASE blocks in 04_mv_http_logs.sql, 07_ai_features_view.sql,
view_ai_features_anubis.sql, mv_http_logs.sql: IP > ASN (was 5-level
UA+IP > UA > IP > ASN > Country cascade)
- Remove dict_anubis_country + dict_anubis_ua from 03_anubis_tables.sql
(UA table kept as stub for REGEXP_TREE catch-all compatibility)
- Remove anubis_country_rules table from schema
- Remove Anubis UA and Country tabs from dashboard reflists page
- Remove anubis_ua_rules/country_rules from API reflist queries
- deploy_schema.sql simplified from 339 to 122 lines
- 764 lines removed across 9 files
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-09 15:25:33 +02:00
98abbc80c7
feat(dashboard): page Listes de référence — visualisation CSV/dictionnaires
...
Nouvelle page /reflists pour visualiser les 9 dictionnaires ClickHouse :
- bot_ip (3.5K entrées) : IP/CIDR de bots connus
- bot_ja4 (31) : fingerprints JA4 de bots
- browser_ja4 (1.2K) : fingerprints JA4 navigateurs → famille, lib TLS
- asn_reputation (82.5K) : ASN → réputation (isp, datacenter, cdn…)
- iplocate_asn (714K) : géolocalisation IP → ASN, pays, nom
- anubis_ua_rules, anubis_ip_rules, anubis_asn_rules, anubis_country_rules
Fonctionnalités :
- 9 onglets de navigation entre les listes
- Recherche textuelle avec filtrage côté ClickHouse
- Pagination (200 entrées/page)
- Tri par colonne (ASC/DESC)
- Graphique de répartition (ECharts) par catégorie
- KPIs dictionnaires en haut de page
- Infobulles de documentation
API : /api/dictionaries, /api/reflist/{name}, /api/reflist/{name}/stats
Helpers : esc() (HTML escape) ajouté à base.html
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-09 14:56:54 +02:00
702c0d5edb
feat(dashboard): add JA4 fingerprint and cluster investigation pages
...
- /ja4/{fingerprint} page: 8 KPIs, timeline, threat pie, IP scores
table, ASN/geo charts, HTTP logs, AI features — full JA4 investigation
- /cluster/{cid} page: 8 KPIs, timeline, threat/JA4/ASN/host charts,
member table with bulk classify — full campaign investigation
- /api/ja4/{fingerprint} and /api/cluster/{cid} API endpoints
- fmtJA4 links now navigate to /ja4/ investigation page
- campaigns.html: 'Ouvrir' button links to /cluster/{cid} full page
- Fix: double-brace {{param}} in non-f-string queries → single {param}
(was causing HTTP 500 on all parameterized ClickHouse queries)
- 50 routes total, all tests pass, 0 JS console errors
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-09 14:05:52 +02:00
70188b508c
fix(dashboard): eliminate @apply CSS, fix status column, fix click propagation
...
Playwright testing revealed 3 critical bugs:
1. Tailwind CDN @apply with custom brand-* colors produces empty CSS
rules, breaking ALL design components (kpi-card, data-table, badges,
filter-btn, section-card, nav-item). Fix: replace all @apply
directives with equivalent raw CSS values.
2. Traffic API and IP detail API reference non-existent 'status' column
in http_logs table → HTTP 500 on /traffic and /ip/{ip}. Fix: remove
status from SELECT, sort whitelist, filters, and templates.
3. Nested <a> links (fmtJA4, fmtASN, fmtCountry, fmtBotName) inside
clickable <tr onclick> capture clicks, preventing row navigation to
/ip/ detail. Fix: add event.stopPropagation() to all formatter links.
Verified with Playwright: 10 pages × 0 JS errors, all tooltips hidden
by default, sidebar toggle works, keyboard shortcuts (Alt+1-9, Alt+B),
classification form saves to DB, campaign detail panel opens on click.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-09 13:54:38 +02:00
63ba6d203c
feat(dashboard): complete SOC dashboard with full monitoring and workflows
...
- models.html: Full rewrite — 6 KPIs, scoring volume timeline, anomaly rate
chart, threat breakdown per model, enhanced model cards with validation gate
- classify.html: SOC workflow — suggested unclassified IPs, quick-classify
buttons, classification stats pie, pre-fill from URL params
- traffic.html: Clickable rows → ip_detail, column sorting, status column,
search filter, doc tooltips on all chart sections
- scores.html: Search input, clickable rows → ip_detail, LEGITIMATE_BROWSER
filter button, doc tooltips on distribution + scatter charts
- ip_detail.html: Resource cascade section (headless browser detection),
status column in HTTP logs table
- detections.html: Doc tooltips on threat/reason/ASN chart sections
- features.html: Doc tooltips on radar/importance/scatter sections
- api.py: 4 new endpoints — /api/models/timeline, /api/models/threats,
/api/classify/stats, /api/classify/suggested. Traffic API: status + search.
46 routes total. All tests pass (dashboard + bot-detector 36/36).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-09 01:25:01 +02:00
396baa90d2
feat(dashboard): visualisation clusters HDBSCAN
...
- Page /campaigns dédiée avec 4 vues graphiques :
· Scatter plot (score vs vélocité, bulles colorées par campagne)
· Graphe réseau force-directed (IPs liées par JA4 partagé)
· Grille de cartes campagne (KPIs, ASN, pays, JA4)
· Panneau détail (radar comportemental, timeline horaire, table membres)
- 4 nouveaux endpoints API :
· GET /api/campaigns (fix: campaign_id >= 0 au lieu de != '')
· GET /api/campaigns/graph (nœuds + arêtes)
· GET /api/campaigns/scatter (score/vélocité par IP)
· GET /api/campaigns/{cid} (détail + profil + timeline)
- Sidebar: lien Campagnes ajouté dans Surveillance
- Overview: campagnes clickables → lien vers /campaigns
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-09 01:11:16 +02:00
2d04288e95
feat(dashboard): SOC workflow overhaul — sidebar nav, doc tooltips, full-width layout
...
- base.html: collapsible sidebar navigation, doc tooltip system, JS helpers
(fmtNum, fmtPct, fmtDuration, ecGrid, buildTable, docHTML)
- overview.html: SOC command center with stacked timeline, live alerts,
campaigns panel, browser donut, 6 KPIs
- detections.html: threat color dots, raw score column, click-to-navigate rows
- network.html: JA4 rotation, brute-force, persistent threats tables, 6 KPIs
- ip_detail.html: ASN/country KPIs, AE/XGB/campaign columns, enriched features
- scores/traffic/features/models/classify: page_title blocks + doc tooltips
- api.py: 9 new endpoints (campaigns, brute-force, ja4-rotation, recurrence,
cascade, alerts, timeline-detail, ua-rotation)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-09 00:29:34 +02:00
db306fb9da
fix: P0 audit bugs — bot-detector + dashboard + SQL
...
Bot-detector:
- B1.1: campaign_id and raw_anomaly_score now inserted into ml_detected_anomalies
- B1.4/B1.5: log_decision argument order fixed (cycle_id, name)
- B1.7: AE broadcast error — model now returns features list, scoring
uses model's features instead of current cycle's (prevents dim mismatch)
- B1.8: Anubis ALLOW bots now get bot_name from anubis_bot_name
Dashboard:
- C1.1: XSS in ip_detail.html — {{ ip | tojson }} instead of raw string
- C1.2: Stored XSS via innerHTML — added escapeHtml() helper, all user-facing
formatters (fmtIP, fmtASN, fmtCountry, fmtJA4, fmtBotName, fmtLabel) sanitized
- C2.1: status filter now correctly filters http_version column
- C2.2: heatmap toDayOfWeek() - 1 for 0-indexed JS days
SQL:
- B1.3: view_ip_recurrence worst_score uses max() not min() (0=normal, 1=anomal)
- B1.6: view_resource_cascade_1h joined into view_thesis_features_1h (§5.4)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-08 23:33:00 +02:00
5c5bca71d1
feat: rewrite ASN classification with PeeringDB + expanded heuristics
...
Major improvements to generate_asn_data.py:
- Add PeeringDB network data source (34K networks with info_type)
- Add new categories: education, government, enterprise
- Rename 'human' label to 'isp' across all consumers
- Expand keyword heuristics (ISP, datacenter, hosting, CDN, education, gov)
- Add hard-coded lists for education, government, enterprise ASNs
- Support both --output-dir and --output-asn/--output-ipasn CLI interfaces
- Add --no-peeringdb flag for offline use
Results: unknown dropped from 86% to 57%, ISP coverage 21.8K ASNs,
education 3.1K, enterprise 5.7K, government 520.
Updated consumers:
- bot_detector.py: 'human' -> 'isp' for baseline selection
- dashboard api.py: 'human' -> 'isp' in SQL queries
- run-tests.sh: 'human' -> 'isp' in integration test assertions
- update-csv-data.sh: updated label description comment
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-08 16:02:07 +02:00
9a48fb9d29
feat: LEGITIMATE_BROWSER classification from JA4 + behavioral consistency
...
Add browser legitimacy classification (A9) to the bot detection pipeline:
- New features: is_known_browser (binary) and browser_consistency_score [0..5]
combining 5 signals: JA4 browser match, modern_browser_score, Accept-Language,
cookies, Sec-Fetch-* presence
- Post-scoring: sessions with known browser JA4 + consistency >= 4/5 + NORMAL/LOW
threat level are reclassified as LEGITIMATE_BROWSER
- Spoofing detection: inconsistent behavior (known JA4 but low consistency) stays
in normal anomaly scoring — prevents evasion via JA4 spoofing
- XGBoost treats LEGITIMATE_BROWSER as non-threat (negative label)
- ClickHouse: browser_family column added to ml_detected_anomalies and ml_all_scores
- Dashboard: browser_family filter/sort on detections and scores endpoints,
legitimate_browsers count and browser_stats in overview
- 6 new unit tests covering classification threshold, spoofing, exclusion logic
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-08 15:46:22 +02:00
7d09c614c3
feat: browser JA4 detection, Anubis bot rules, worldwide ASN data
...
- Add generate_browser_ja4.py: 1,186 browser JA4 fingerprints from FoxIO + ja4db.com
covering 11 families (Chromium, Firefox, Safari, Edge, Tor, Opera, Vivaldi...)
- Rewrite generate_bot_ip.py: Anubis YAML rules (Google, Bing, Apple, DuckDuck,
OpenAI, Perplexity bots) + Tor exit nodes + cloud scanner IPs (3,555 entries)
- Rewrite generate_asn_data.py: worldwide iptoasn.com data (78,049 ASNs, 714K CIDRs)
- Add dict_browser_ja4 ClickHouse dictionary + browser_family in AI features views
- Add /api/browsers dashboard endpoint
- Fix CSV quoting for fields containing commas (User-Agent strings)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-08 15:27:37 +02:00
b6184e6529
feat: CSV generation scripts, API filter params, enriched CSV stubs
...
- scripts/generate_bot_ip.py: download Tor exit nodes + curate scanner IPs (1353 entries)
- scripts/generate_bot_ja4.py: 31 bot JA4 fingerprints across 16 families
- scripts/generate_asn_data.py: 38 ASNs + 96 IP-to-ASN prefixes
- scripts/update-csv-data.sh: master orchestrator with --install-stubs
- api.py: add asn_org/country_code/ja4/bot_name filters on detections+scores
- pages.py: add /network route
- csv-stubs: enriched with generated data (Tor nodes, scanner IPs, etc.)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-08 15:05:43 +02:00
b735bab5a5
feat(dashboard): rebuild SOC dashboard + fix ClickHouse SQL
...
Complete rewrite of the SOC dashboard using FastAPI + Jinja2 + htmx + Chart.js + Tailwind CSS.
Replaces the old React/Vite frontend with server-rendered templates.
Dashboard pages:
- Overview: KPIs, timeline chart, threat distribution, top IPs
- Detections: paginated/filterable anomaly table
- Scores: ml_all_scores with AE error & XGB prob columns
- Traffic: HTTP logs with method/host filters
- IP Investigation: full deep-dive (scores, features, HTTP logs, classify)
- Classification: SOC feedback form + history
- Features: AI + thesis feature stats
- Models: scoring stats + model metadata
API: 9 JSON endpoints with parameterized queries, sort whitelists
SQL fixes:
- 05_aggregation_tables: add deduplicate_merge_projection_mode
- 11_views: fix nested aggregate (argMax inside sum)
- 12_thesis_features: remove invalid 'let' bindings, fix groupArrayIf type
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-04-08 03:21:05 +02:00