Files
toto 85d3b95b7b feat: HTTP/2 passive fingerprinting with individual SETTINGS fields
Complete implementation of HTTP/2 passive fingerprinting per thesis §2.5.3:

mod-reqin-log (C module):
- Replace connection-level filter with ap_hook_process_connection (APR_HOOK_FIRST)
  to capture H2 preface before mod_http2 takes over the connection
- AP_MODE_SPECULATIVE read of 512 bytes from c->input_filters
- Parse SETTINGS, WINDOW_UPDATE, PRIORITY flags, pseudo-header order
- Output individual SETTINGS params as separate JSON fields (IDs 1-6, 8)
- Read H2 notes from c1 (master connection) for mod_http2 secondary conns
- Fix header_order_signature JSON length bug (26→strlen)

ClickHouse schema:
- Add 8 new columns to http_logs: h2_has_priority, h2_header_table_size,
  h2_enable_push, h2_max_concurrent_streams, h2_initial_window_size,
  h2_max_frame_size, h2_max_header_list_size, h2_enable_connect_protocol
- Use Int32/Int64 with DEFAULT -1 to distinguish absent vs zero
- Update mv_http_logs to extract individual fields via JSONHas/JSONExtractInt
- Migration 04_http2_fields.sql updated for existing deployments

Correlator:
- Accept both timestamp_ns and timestamp field names (backward compat)

Integration:
- Enable HTTP/2 in Apache: Protocols h2 http/1.1 in httpd-integration.conf

Validated end-to-end via Playwright: H2 curl traffic → mod-reqin-log →
correlator → ClickHouse with all 12 H2 columns populated correctly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-11 02:33:45 +02:00

314 lines
16 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{% extends "base.html" %}
{% block title %}JA4 SOC — Features ML{% endblock %}
{% block page_title %}
Features ML
<span class="relative inline-block ml-1"><button onclick="docToggle(this)" class="doc-btn"></button><div class="doc-panel">
<h4>Exploration des features</h4>
<p>Visualisez les 72 features ML extraites : comportementales (velocity, fuzzing), réseau (port_density, JA4), et thesis §5 (entropie, cadence, drift).</p>
<p><strong>Radar :</strong> Compare les profils ISP (humain) vs datacenter (bot). <strong>Scatter :</strong> Identifiez visuellement les clusters anormaux.</p>
<p class="doc-source">Source : view_ai_features_1h, view_thesis_features_1h</p>
</div></span>
{% endblock %}
{% block content %}
<div class="space-y-6">
<!-- Row 1: Radar + Feature Importance -->
<div class="grid grid-cols-1 lg:grid-cols-2 gap-4">
<div class="section-card">
<div class="section-header"><span class="section-title">Profil Humain vs Bot (Radar)
<span class="relative inline-block"><button onclick="docToggle(this)" class="doc-btn"></button><div class="doc-panel">
<h4>Comparaison ISP vs Datacenter</h4>
<p>Profil moyen des sessions ISP (humaines) vs sessions datacenter (bots potentiels). Les axes sont les features ML normalisées.</p>
<p><strong>Interprétation :</strong> Plus la zone rouge dépasse la verte, plus la feature est discriminante. hit_velocity, fuzzing_index et post_ratio sont typiquement les plus discriminants.</p>
<p class="doc-source">Source : view_ai_features_1h GROUP BY asn_label</p>
</div></span>
</span></div>
<div class="section-body"><div id="chart-radar" style="height:360px"></div></div>
</div>
<div class="section-card">
<div class="section-header"><span class="section-title" id="importance-title">Importance des features (SHAP/ExIFFI)
<span class="relative inline-block"><button onclick="docToggle(this)" class="doc-btn"></button><div class="doc-panel">
<h4>Feature importance</h4>
<p>Importance moyenne des features issue de SHAP (XGBoost) ou ExIFFI (EIF). Chaque barre représente la contribution absolue moyenne d'une feature aux décisions d'anomalie récentes.</p>
<p><strong>Fallback :</strong> Si aucune donnée SHAP/ExIFFI n'est disponible, la variance inter-classe (proxy statistique) est affichée à la place.</p>
<p class="doc-source">Source : ml_detected_anomalies.reason (SHAP/ExIFFI) ou view_ai_features_1h (variance)</p>
</div></span>
</span></div>
<div class="section-body"><div id="chart-importance" style="height:360px"></div></div>
</div>
</div>
<!-- Row 2: Scatter full-width -->
<div class="section-card">
<div class="section-header"><span class="section-title">Scatter — Hit Velocity vs Fuzzing Index
<span class="relative inline-block"><button onclick="docToggle(this)" class="doc-btn"></button><div class="doc-panel">
<h4>Scatter bidimensionnel</h4>
<p>Chaque point = une session IP. X = cadence de requêtes, Y = diversité des paths. Les clusters séparés du groupe principal sont des anomalies.</p>
<p><strong>Action :</strong> Cliquez sur un point pour ouvrir la page IP détail.</p>
<p class="doc-source">Source : view_ai_features_1h</p>
</div></span>
</span></div>
<div class="section-body"><div id="chart-scatter" style="height:420px"></div></div>
</div>
<!-- Row 3: Distribution histograms (3-col grid) -->
<div class="grid grid-cols-1 md:grid-cols-2 xl:grid-cols-3 gap-4">
<div class="bg-gray-900 rounded-xl p-5 border border-gray-800">
<h3 class="text-xs font-medium text-gray-400 mb-2">hit_velocity</h3>
<div id="dist-hit_velocity" style="height:200px"></div>
</div>
<div class="bg-gray-900 rounded-xl p-5 border border-gray-800">
<h3 class="text-xs font-medium text-gray-400 mb-2">fuzzing_index</h3>
<div id="dist-fuzzing_index" style="height:200px"></div>
</div>
<div class="bg-gray-900 rounded-xl p-5 border border-gray-800">
<h3 class="text-xs font-medium text-gray-400 mb-2">post_ratio</h3>
<div id="dist-post_ratio" style="height:200px"></div>
</div>
<div class="bg-gray-900 rounded-xl p-5 border border-gray-800">
<h3 class="text-xs font-medium text-gray-400 mb-2">asset_ratio</h3>
<div id="dist-asset_ratio" style="height:200px"></div>
</div>
<div class="bg-gray-900 rounded-xl p-5 border border-gray-800">
<h3 class="text-xs font-medium text-gray-400 mb-2">temporal_entropy</h3>
<div id="dist-temporal_entropy" style="height:200px"></div>
</div>
<div class="bg-gray-900 rounded-xl p-5 border border-gray-800">
<h3 class="text-xs font-medium text-gray-400 mb-2">path_diversity_ratio</h3>
<div id="dist-path_diversity_ratio" style="height:200px"></div>
</div>
</div>
<!-- Row 4: Temporal heatmap full-width -->
<div class="bg-gray-900 rounded-xl p-5 border border-gray-800">
<h3 class="text-sm font-medium text-gray-400 mb-3">Heatmap temporelle (jour × heure)</h3>
<div id="chart-heatmap" style="height:320px"></div>
</div>
</div>
{% endblock %}
{% block scripts %}
<script>
const LABEL_COLORS = {human:'#22c55e', datacenter:'#ef4444', hosting:'#f97316', unknown:'#6b7280'};
const DAYS = ['Lun','Mar','Mer','Jeu','Ven','Sam','Dim'];
const HOURS = Array.from({length:24}, (_,i) => String(i).padStart(2,'0')+'h');
let charts = {};
function initChart(id) {
const el = document.getElementById(id);
if (!el) return null;
if (charts[id]) charts[id].dispose();
charts[id] = echarts.init(el);
return charts[id];
}
async function loadAll() {
try {
const [feat, behav, heat] = await Promise.all([
fetch('/api/features').then(r => r.json()),
fetch('/api/behavior').then(r => r.json()),
fetch('/api/heatmap').then(r => r.json()),
]);
// ── Radar: Human vs Bot profiles ──
const radarKeys = [
{key:'avg_velocity', label:'Velocity'},
{key:'avg_fuzz', label:'Fuzzing'},
{key:'avg_post', label:'POST ratio'},
{key:'avg_asset', label:'Asset ratio'},
{key:'avg_direct', label:'Direct access'},
{key:'avg_entropy', label:'Entropy'},
{key:'avg_path_div', label:'Path diversity'},
{key:'avg_browser', label:'Browser score'},
];
const hp = feat.human_profile || {};
const bp = feat.bot_profile || {};
const maxVals = radarKeys.map(f => Math.max(hp[f.key] || 0, bp[f.key] || 0) || 1);
const radarChart = initChart('chart-radar');
if (radarChart) {
radarChart.setOption(ecBase({
tooltip: ecTooltip(),
legend: {data:['Humain','Bot'], top:10, textStyle:{color:EC_TEXT}},
radar: {
indicator: radarKeys.map((f,i) => ({name:f.label, max:1})),
shape:'polygon',
splitArea:{areaStyle:{color:['rgba(99,102,241,0.05)','rgba(99,102,241,0.1)']}},
splitLine:{lineStyle:{color:EC_GRID}},
axisLine:{lineStyle:{color:EC_GRID}},
axisName:{color:EC_TEXT, fontSize:11},
},
series:[{
type:'radar',
data:[
{
name:'Humain',
value: radarKeys.map((f,i) => (hp[f.key]||0) / maxVals[i]),
areaStyle:{color:'rgba(34,197,94,0.2)'},
lineStyle:{color:'#22c55e', width:2},
itemStyle:{color:'#22c55e'},
},
{
name:'Bot',
value: radarKeys.map((f,i) => (bp[f.key]||0) / maxVals[i]),
areaStyle:{color:'rgba(239,68,68,0.2)'},
lineStyle:{color:'#ef4444', width:2},
itemStyle:{color:'#ef4444'},
},
]
}]
}));
}
// ── Feature Importance (horizontal bar) — SHAP/ExIFFI si disponible, variance sinon ──
const shapData = feat.shap_importance || [];
const varianceData = (feat.feature_importance || []).sort((a,b) => a.variance - b.variance);
const useShap = shapData.length > 0;
const fi = useShap
? shapData.slice().sort((a,b) => a.importance - b.importance)
: varianceData;
const impLabel = useShap ? 'SHAP/ExIFFI (|valeur| moyenne)' : 'Variance';
document.getElementById('importance-title').childNodes[0].textContent =
useShap ? 'Importance des features (SHAP/ExIFFI) ' : 'Importance des features (Variance) ';
const impChart = initChart('chart-importance');
if (impChart && fi.length) {
impChart.setOption(ecBase({
tooltip: ecTooltip({trigger:'axis', axisPointer:{type:'shadow'}}),
grid: {left:150, right:30, top:10, bottom:30},
yAxis: {
type:'category',
data: fi.map(f => f.name),
axisLine:{lineStyle:{color:EC_GRID}},
axisLabel:{color:EC_TEXT, fontSize:11, width:140, overflow:'truncate'},
},
xAxis: {
type:'value',
splitLine:{lineStyle:{color:EC_GRID, type:'dashed'}},
axisLabel:{color:EC_TEXT},
name: impLabel, nameTextStyle:{color:EC_TEXT},
},
series:[{
type:'bar', data: fi.map(f => useShap ? f.importance : f.variance), barWidth:'60%',
itemStyle:{color: new echarts.graphic.LinearGradient(0,0,1,0,[
{offset:0, color: useShap ? '#f59e0b' : '#6366f1'},
{offset:1, color: useShap ? '#ef4444' : '#8b5cf6'}
])},
label:{show:true, position:'right', color:EC_TEXT, fontSize:10, formatter:p => p.value.toFixed(4)},
}]
}));
}
// ── Scatter: hit_velocity vs fuzzing_index ──
const scatter = behav.scatter || [];
const scatterChart = initChart('chart-scatter');
if (scatterChart && scatter.length) {
const groups = {};
scatter.forEach(p => {
const lbl = p.asn_label || 'unknown';
if (!groups[lbl]) groups[lbl] = [];
groups[lbl].push(p);
});
const series = Object.entries(groups).map(([label, pts]) => ({
name: label,
type: 'scatter',
data: pts.map(p => [
p.hit_velocity || 0,
p.fuzzing_index || 0,
Math.max(4, Math.min(30, Math.sqrt(p.hits || 1) * 2)),
p.ip, p.bot_name,
]),
symbolSize: (val) => val[2],
itemStyle: {color: LABEL_COLORS[label] || '#6b7280', opacity:0.75},
}));
scatterChart.setOption(ecBase({
tooltip: ecTooltip({
trigger:'item',
formatter: p => {
const d = p.data;
return `<b>${d[3]||''}</b><br>Label: ${p.seriesName}<br>` +
`Velocity: ${d[0].toFixed(3)}<br>Fuzzing: ${d[1].toFixed(3)}` +
(d[4] ? `<br>Bot: ${d[4]}` : '');
}
}),
legend: {data: Object.keys(groups), top:5, textStyle:{color:EC_TEXT}},
grid: {left:60, right:30, top:40, bottom:40},
xAxis: {
type:'value', name:'Hit Velocity', nameTextStyle:{color:EC_TEXT},
splitLine:{lineStyle:{color:EC_GRID, type:'dashed'}}, axisLabel:{color:EC_TEXT},
},
yAxis: {
type:'value', name:'Fuzzing Index', nameTextStyle:{color:EC_TEXT},
splitLine:{lineStyle:{color:EC_GRID, type:'dashed'}}, axisLabel:{color:EC_TEXT},
},
series,
}));
scatterChart.on('click', params => {
const ip = params.data?.[3];
if (ip) window.location.href = '/ip/' + encodeURIComponent(ip);
});
}
// ── Distribution histograms ──
const distKeys = ['hit_velocity','fuzzing_index','post_ratio','asset_ratio','temporal_entropy','path_diversity_ratio'];
const dists = behav.distributions || {};
distKeys.forEach(key => {
const data = dists[key] || [];
const ch = initChart('dist-' + key);
if (!ch || !data.length) return;
ch.setOption(ecBase({
tooltip: ecTooltip({trigger:'axis', axisPointer:{type:'shadow'}}),
grid: {left:45, right:10, top:8, bottom:25},
xAxis: {
type:'category', data: data.map(d => d.bucket),
axisLabel:{color:EC_TEXT, fontSize:9, rotate:30}, axisLine:{lineStyle:{color:EC_GRID}},
},
yAxis: {
type:'value', splitLine:{lineStyle:{color:EC_GRID, type:'dashed'}},
axisLabel:{color:EC_TEXT, fontSize:9},
},
series:[{
type:'bar', data: data.map(d => d.cnt), barWidth:'70%',
itemStyle:{color:'#6366f1', borderRadius:[2,2,0,0]},
}]
}));
});
// ── Temporal heatmap ──
const cells = heat.cells || [];
const heatChart = initChart('chart-heatmap');
if (heatChart && cells.length) {
const maxCnt = Math.max(...cells.map(c => c.cnt), 1);
heatChart.setOption(ecBase({
tooltip: ecTooltip({
formatter: p => `${DAYS[p.data[1]]} ${HOURS[p.data[0]]}<br>Requêtes: <b>${p.data[2]}</b>`
}),
grid: {left:60, right:40, top:10, bottom:30},
xAxis: {
type:'category', data:HOURS, splitArea:{show:true},
axisLabel:{color:EC_TEXT, fontSize:10}, axisLine:{lineStyle:{color:EC_GRID}},
},
yAxis: {
type:'category', data:DAYS, splitArea:{show:true},
axisLabel:{color:EC_TEXT, fontSize:11}, axisLine:{lineStyle:{color:EC_GRID}},
},
visualMap: {
min:0, max:maxCnt, calculable:true, orient:'vertical', right:0, top:'center',
inRange:{color:['#1e1b4b','#4338ca','#6366f1','#a78bfa','#f97316','#ef4444']},
textStyle:{color:EC_TEXT}, borderColor:'transparent',
},
series:[{
type:'heatmap',
data: cells.map(c => [c.hour, c.dow, c.cnt]),
label:{show:false},
emphasis:{itemStyle:{shadowBlur:6, shadowColor:'rgba(0,0,0,0.4)'}},
}]
}));
}
} catch(e) { console.error('Features load error:', e); }
}
loadAll();
setInterval(loadAll, 60000);
window.addEventListener('resize', () => Object.values(charts).forEach(c => c?.resize()));
</script>
{% endblock %}