Skip to content

Commit b836e15

Browse files
authored
Merge pull request #7 from nyxCore-Systems/develop
feat: v2.0.0 — ExplainMatch + model provenance
2 parents 89c6e8c + 5c478d5 commit b836e15

9 files changed

Lines changed: 389 additions & 48 deletions

File tree

CHANGELOG.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,20 @@ All notable changes to this project are documented here.
44

55
---
66

7+
## [2.0.0] — 2026-04-13
8+
9+
### Added
10+
11+
**v2.0 — Semantic explainability + model provenance**
12+
13+
- **`ExplainMatch { query, result_uri, top_k, chunk_lines, model }`** — explain *why* `result_uri` ranked as a strong semantic match. Chunks `result_uri`'s source into `chunk_lines`-line windows, embeds each in one batch call, cosine-scores each against the query embedding (cached for URI queries; embedded on the fly for text queries), and returns the top-`top_k` chunks with `(start_line, end_line, chunk_text, score)`. Turns "this file is relevant" into "these specific lines are relevant." Not permitted inside `BatchQuery` (requires HTTP). Returns `ExplainMatchResult { chunks: Vec<ExplanationChunk>, query_model }`.
14+
- **Model provenance** — every `set_file_embedding` and `set_symbol_embedding` now records the model name that produced the vector. The name is supplied by the `EmbeddingBatch` handler from `embed_texts`'s return value, so it reflects the model actually used (not just what was configured). `QueryFileStatus` now returns `embedding_model: Option<String>`. `QueryIndexStatus` now returns `mixed_models: bool` and `models_in_index: Vec<String>` — clients can warn users when a model upgrade left the index with mixed-model vectors, making cosine scores unreliable across the boundary.
15+
- **New wire types**: `ExplanationChunk { start_line, end_line, chunk_text, score }`.
16+
- **1 new MCP tool**: `lip_explain_match`.
17+
- **MCP updates**: `lip_file_status` response now includes `embedding_model`; `lip_index_status` response now includes `mixed_models` flag and `models_in_index` list with a `⚠ MIXED MODELS` warning in text output.
18+
19+
---
20+
721
## [1.9.0] — 2026-04-13
822

923
### Added

Cargo.lock

Lines changed: 3 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ members = [
77
]
88

99
[workspace.package]
10-
version = "1.9.0"
10+
version = "2.0.0"
1111
edition = "2021"
1212
authors = ["Lisa Welsch <lisa@tastehub.io>"]
1313
license = "MIT"

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -236,6 +236,7 @@ lip query export-embeddings file:///src/auth.rs file:///src/session.rs --output
236236
| `lip_prune_deleted` | Remove index entries for files no longer on disk (v1.8) |
237237
| `lip_get_centroid` | Server-side embedding centroid of a file set (v1.9) |
238238
| `lip_stale_embeddings` | Files whose embedding is older than their current mtime (v1.9) |
239+
| `lip_explain_match` | Why a result matched: top-scoring chunks of `result_uri` against a query (v2.0) |
239240

240241
**Recommended agent workflow before modifying code:**
241242
1. `lip_workspace_symbols` — find URIs for all symbols you plan to touch
@@ -377,7 +378,7 @@ Requires Rust 1.78+. No system `protoc` required.
377378

378379
## Status
379380

380-
v1.9Connective tissue: `filter` glob + `min_score` threshold on all nearest-neighbour calls, `GetCentroid` (server-side mean embedding, no raw vectors shipped), `QueryStaleEmbeddings` (mtime vs indexed-at freshness probe). v1.8: `FindBoundaries`, `SemanticDiff`, `QueryNearestInStore` (cross-repo federation), `QueryNoveltyScore`, `ExtractTerminology`, `PruneDeleted`. v1.7: 6 semantic retrieval primitives. v1.6: `ReindexFiles`, `Similarity`, `QueryExpansion`, `Cluster`, `ExportEmbeddings`. Wire format is JSON; FlatBuffers IPC is planned for v2.0 (see roadmap).
381+
v2.0`ExplainMatch` (chunk-level explanation: which lines in a result file drove the match), model provenance (`FileStatus` exposes the embedding model per file; `IndexStatus` warns when the index contains mixed-model vectors). v1.9: `filter` glob + `min_score` on all NN calls, `GetCentroid`, `QueryStaleEmbeddings`. v1.8: `FindBoundaries`, `SemanticDiff`, `QueryNearestInStore` (cross-repo federation), `QueryNoveltyScore`, `ExtractTerminology`, `PruneDeleted`. v1.7: 6 semantic retrieval primitives. v1.6: `ReindexFiles`, `Similarity`, `QueryExpansion`, `Cluster`, `ExportEmbeddings`. Wire format is JSON.
381382

382383
---
383384

bindings/rust/src/daemon/session.rs

Lines changed: 155 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -506,9 +506,9 @@ impl Session {
506506
let mut db = self.db.lock().await;
507507
for (uri, vec) in miss_uris.iter().zip(new_vecs.iter()) {
508508
if uri.starts_with("lip://") {
509-
db.set_symbol_embedding(uri, vec.clone());
509+
db.set_symbol_embedding(uri, vec.clone(), &used_model);
510510
} else {
511-
db.set_file_embedding(uri, vec.clone());
511+
db.set_file_embedding(uri, vec.clone(), &used_model);
512512
}
513513
}
514514
}
@@ -564,22 +564,28 @@ impl Session {
564564
.as_ref()
565565
.as_ref()
566566
.map(|c| c.default_model().to_owned());
567+
let models_in_index = db.file_embedding_model_names();
568+
let mixed_models = models_in_index.len() > 1;
567569
ServerMessage::IndexStatusResult {
568570
indexed_files,
569571
pending_embedding_files: pending,
570572
last_updated_ms: last_ms,
571573
embedding_model,
574+
mixed_models,
575+
models_in_index,
572576
}
573577
}
574578

575579
ClientMessage::QueryFileStatus { uri } => {
576580
let db = self.db.lock().await;
577581
let (indexed, has_embedding, age_seconds) = db.file_status(&uri);
582+
let embedding_model = db.file_embedding_model(&uri).map(str::to_owned);
578583
ServerMessage::FileStatusResult {
579584
uri,
580585
indexed,
581586
has_embedding,
582587
age_seconds,
588+
embedding_model,
583589
}
584590
}
585591

@@ -706,19 +712,20 @@ impl Session {
706712
};
707713
// Embed — no db lock held during HTTP call.
708714
let texts = vec![embed_text];
709-
let (mut vecs, _) = match client.embed_texts(&texts, model.as_deref()).await {
710-
Ok(r) => r,
711-
Err(e) => {
712-
return ServerMessage::Error {
713-
message: format!("embedding failed: {e}"),
715+
let (mut vecs, sym_model) =
716+
match client.embed_texts(&texts, model.as_deref()).await {
717+
Ok(r) => r,
718+
Err(e) => {
719+
return ServerMessage::Error {
720+
message: format!("embedding failed: {e}"),
721+
}
714722
}
715-
}
716-
};
723+
};
717724
let v = vecs.pop().unwrap_or_default();
718725
// Cache the computed vector for future calls.
719726
{
720727
let mut db = self.db.lock().await;
721-
db.set_symbol_embedding(&symbol_uri, v.clone());
728+
db.set_symbol_embedding(&symbol_uri, v.clone(), &sym_model);
722729
}
723730
v
724731
};
@@ -1362,6 +1369,134 @@ impl Session {
13621369
}
13631370
ServerMessage::StaleEmbeddingsResult { uris: stale }
13641371
}
1372+
1373+
// ── v2.0: ExplainMatch ────────────────────────────────────────
1374+
ClientMessage::ExplainMatch {
1375+
query,
1376+
result_uri,
1377+
top_k,
1378+
chunk_lines,
1379+
model,
1380+
} => {
1381+
let Some(client) = self.embedding_client.as_ref().as_ref() else {
1382+
return ServerMessage::Error {
1383+
message: "embedding not configured — set LIP_EMBEDDING_URL".into(),
1384+
};
1385+
};
1386+
let effective_top_k = if top_k == 0 { 5 } else { top_k };
1387+
let chunk_size = if chunk_lines == 0 { 20 } else { chunk_lines };
1388+
1389+
// Resolve the query embedding.
1390+
let (query_vec, query_model) = {
1391+
let db = self.db.lock().await;
1392+
if let Some(v) = db.get_file_embedding(&query) {
1393+
let m = db
1394+
.file_embedding_model(&query)
1395+
.unwrap_or_else(|| client.default_model())
1396+
.to_owned();
1397+
(v.clone(), m)
1398+
} else {
1399+
drop(db);
1400+
// Not a cached URI — treat as free-text query.
1401+
let texts = vec![query];
1402+
match client.embed_texts(&texts, model.as_deref()).await {
1403+
Ok((mut vecs, m)) => (vecs.pop().unwrap_or_default(), m),
1404+
Err(e) => {
1405+
return ServerMessage::Error {
1406+
message: format!("embedding failed: {e}"),
1407+
}
1408+
}
1409+
}
1410+
}
1411+
};
1412+
1413+
if query_vec.is_empty() {
1414+
return ServerMessage::Error {
1415+
message: "could not obtain query embedding".into(),
1416+
};
1417+
}
1418+
1419+
// Read source text for result_uri.
1420+
let source = {
1421+
let db = self.db.lock().await;
1422+
db.file_source_text(&result_uri).unwrap_or_default()
1423+
};
1424+
if source.is_empty() {
1425+
return ServerMessage::ExplainMatchResult {
1426+
chunks: vec![],
1427+
query_model,
1428+
};
1429+
}
1430+
1431+
// Chunk the source.
1432+
let lines: Vec<&str> = source.lines().collect();
1433+
let raw_chunks: Vec<(u32, u32, String)> = lines
1434+
.chunks(chunk_size)
1435+
.enumerate()
1436+
.map(|(i, chunk_lines_slice)| {
1437+
let start = (i * chunk_size) as u32;
1438+
let end = (start as usize + chunk_lines_slice.len() - 1) as u32;
1439+
(start, end, chunk_lines_slice.join("\n"))
1440+
})
1441+
.collect();
1442+
1443+
if raw_chunks.is_empty() {
1444+
return ServerMessage::ExplainMatchResult {
1445+
chunks: vec![],
1446+
query_model,
1447+
};
1448+
}
1449+
1450+
// Embed all chunks in one call.
1451+
let chunk_texts: Vec<String> =
1452+
raw_chunks.iter().map(|(_, _, t)| t.clone()).collect();
1453+
let (chunk_vecs, chunk_model) =
1454+
match client.embed_texts(&chunk_texts, model.as_deref()).await {
1455+
Ok(r) => r,
1456+
Err(e) => {
1457+
return ServerMessage::Error {
1458+
message: format!("embedding failed: {e}"),
1459+
}
1460+
}
1461+
};
1462+
let _ = chunk_model; // we report query_model, not per-chunk model
1463+
1464+
// Score each chunk against the query vector.
1465+
let q_norm: f32 = query_vec.iter().map(|x| x * x).sum::<f32>().sqrt();
1466+
let mut scored: Vec<crate::query_graph::types::ExplanationChunk> = raw_chunks
1467+
.into_iter()
1468+
.zip(chunk_vecs.into_iter())
1469+
.filter_map(|((start_line, end_line, chunk_text), vec)| {
1470+
if vec.len() != query_vec.len() || q_norm == 0.0 {
1471+
return None;
1472+
}
1473+
let v_norm: f32 = vec.iter().map(|x| x * x).sum::<f32>().sqrt();
1474+
if v_norm == 0.0 {
1475+
return None;
1476+
}
1477+
let dot: f32 = query_vec.iter().zip(vec.iter()).map(|(a, b)| a * b).sum();
1478+
let score = dot / (q_norm * v_norm);
1479+
Some(crate::query_graph::types::ExplanationChunk {
1480+
start_line,
1481+
end_line,
1482+
chunk_text,
1483+
score,
1484+
})
1485+
})
1486+
.collect();
1487+
1488+
scored.sort_by(|a, b| {
1489+
b.score
1490+
.partial_cmp(&a.score)
1491+
.unwrap_or(std::cmp::Ordering::Equal)
1492+
});
1493+
scored.truncate(effective_top_k);
1494+
1495+
ServerMessage::ExplainMatchResult {
1496+
chunks: scored,
1497+
query_model,
1498+
}
1499+
}
13651500
}
13661501
}
13671502
}
@@ -1523,21 +1658,27 @@ fn process_query_sync(
15231658
// Status queries are read-only and safe inside a batch.
15241659
ClientMessage::QueryIndexStatus => {
15251660
let (indexed_files, pending, last_ms) = db.index_status();
1661+
let models_in_index = db.file_embedding_model_names();
1662+
let mixed_models = models_in_index.len() > 1;
15261663
ok(ServerMessage::IndexStatusResult {
15271664
indexed_files,
15281665
pending_embedding_files: pending,
15291666
last_updated_ms: last_ms,
15301667
embedding_model: None, // no client reference available in sync context
1668+
mixed_models,
1669+
models_in_index,
15311670
})
15321671
}
15331672

15341673
ClientMessage::QueryFileStatus { uri } => {
15351674
let (indexed, has_embedding, age_seconds) = db.file_status(&uri);
1675+
let embedding_model = db.file_embedding_model(&uri).map(str::to_owned);
15361676
ok(ServerMessage::FileStatusResult {
15371677
uri,
15381678
indexed,
15391679
has_embedding,
15401680
age_seconds,
1681+
embedding_model,
15411682
})
15421683
}
15431684

@@ -1899,6 +2040,10 @@ fn process_query_sync(
18992040
ClientMessage::QueryStaleEmbeddings { .. } => {
19002041
err("QueryStaleEmbeddings requires filesystem I/O; not permitted in BatchQuery")
19012042
}
2043+
2044+
ClientMessage::ExplainMatch { .. } => {
2045+
err("ExplainMatch requires async HTTP; not permitted in BatchQuery")
2046+
}
19022047
}
19032048
}
19042049

0 commit comments

Comments
 (0)