Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 44 additions & 1 deletion docs/multi-repo.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,50 @@ Agents can manage repos at runtime without CLI access:
| `set_active_project` | Switch project scope for all subsequent queries |
| `get_active_project` | Return current project name and repo list |

All query tools (`search_symbols`, `get_symbol`, `find_usages`, `get_file_summary`, `get_call_chain`, `smart_context`) accept optional `repo`, `project`, and `ref` parameters for scoping. When an active project is set, it applies as the default scope.
Locate, reach, and analyze query tools uniformly accept `repo`, `project`, `workspace`, and `scope` parameters for scoping (plus `ref` where reference tags apply). All are clamped to the session workspace — the hard isolation boundary. Default breadth now follows **tool intent** when `scope.intent_defaults` is enabled (the default); see [Tool scoping by intent](#tool-scoping-by-intent) below.

For `analyze`, the overrides genuinely narrow its **graph-node** kinds — `dead_code`, `hotspots`, `cycles`, `health_score`, `todos`, `stale_code`, `ownership`, `coverage_gaps`, `coverage_summary`, `impact`, `bottlenecks`, `role`, `k8s_resources`, `images`, `kustomize`, `dbt_models`, `external_calls`, and the like — and, since v1, its **edge-walk / graph-algorithm / framework / file-AST-scan** kinds too (`channel_ops`, `pubsub`, `routes`, `models`, `pagerank`, `kcore`, `edge_audit`, `tests_as_edges`, `sast`, `review`, …), which prune their rows / re-tally their counts against the same workspace + repo allow-set. The narrowing also resolves the two kind-specific collisions: `kind=cross_repo` keeps `repo` as its boundary filter and `kind=cycles` keeps `scope` as a file-path / package prefix (both are stripped from the uniform scope-resolution view). **v1 caveat:** the remaining long-tail kinds — community detection (`clusters`, `concepts`, `suggest_boundaries`), git/disk-mining (`blame`, `coverage`, `fixes_history`, `retrieval_log`, `temporal_verify`), per-id (`would_create_cycle`, `def_use`), `synthesizers` / `resolution_outcomes`, and `sql_rebuild` — remain workspace-bound but are **not** repo-narrowed — passing a narrowing arg on such a kind stamps a `scope_note` on the response disclosing the no-op.

## Tool scoping by intent

Tools are split by intent — each group has a different default scope:

| Intent | Tools | Default scope |
|--------|-------|---------------|
| **Locate** ("where is X defined") | `search_symbols`, `search_text`, `find_files` | current repo |
| **Reach** ("who consumes X") | `find_usages`, `get_callers`, `get_call_chain`, `contracts` | workspace |
| **Analyze** | `analyze`, `review`, sast | workspace (graph-node + edge-walk / algorithm / framework / scan kinds narrow to `repo`/`project`/`scope`; community / git-mining / per-id kinds stay workspace-bound — see the caveat above) |

Other query tools (`get_symbol`, `get_file_summary`, `smart_context`, etc.) keep their existing per-tool scope classification; the intent defaults above apply to the locate/reach/analyze groups listed in the table.

### `scope.intent_defaults` config flag

- Controls the intent-based default scoping described above
- **Defaults ON** (enabled out of the box — this is the new behavior after upgrade)
- **Narrow-only invariant:** the intent defaults only ever *narrow* within the session workspace (the hard isolation boundary); they never widen past it, and an explicit `repo` / `project` / `workspace` / `scope` arg always overrides the default
- Opt out: set `scope.intent_defaults: false` in `.gortex.yaml`, or set env var `GORTEX_SCOPE_INTENT_DEFAULTS=0`

**⚠ Upgrade note (behavior change):** When upgrading to this version:

- Locate tools narrow their default: project → repo (you now need `repo:"*"` to search the whole workspace)
- Reach tools widen their default: project → workspace (cross-repo callers surface automatically)
- Restore the old behavior with `scope.intent_defaults: false` or `GORTEX_SCOPE_INTENT_DEFAULTS=0`

### Widen sentinels

When intent defaults are on, you can still widen or narrow explicitly:

- `repo:"*"` — widen a locate tool back to the whole workspace
- `project:<name>` — select the middle rung (explicit project scope)
- `scope:<name>` — select a named saved scope

### Uniform parameter set

Every locate/reach/analyze tool now uniformly accepts `repo`, `project`, `workspace`, and `scope` parameters. All are clamped to the session workspace (the hard isolation boundary). For `analyze` this narrows the graph-node, edge-walk, graph-algorithm, framework, and file/AST-scan kinds; the remaining community / git-mining / per-id / synthesizer kinds are workspace-bound but not repo-narrowed in v1 (see the [MCP tools](#mcp-tools) caveat above).

### Response metadata

Scoped tool responses carry a `scope_applied` meta field plus a one-line widen hint naming an explicit override that re-broadens the result (e.g. `repo:"*"` for the whole workspace, or `project:<name>` / `scope:<name>` to re-scope to a deliberate rung). `analyze` additionally stamps a `scope_note` when a narrowing arg is passed to a kind that does not repo-narrow its rows in v1, so the no-op is self-documenting rather than silent.

## How it works

Expand Down
22 changes: 22 additions & 0 deletions internal/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -470,6 +470,7 @@ type Config struct {
Index IndexConfig `mapstructure:"index" yaml:"index,omitempty"`
Watch WatchConfig `mapstructure:"watch" yaml:"watch,omitempty"`
Query QueryConfig `mapstructure:"query" yaml:"query,omitempty"`
Scope ScopeConfig `mapstructure:"scope" yaml:"scope,omitempty"`
Search SearchConfig `mapstructure:"search" yaml:"search,omitempty"`
// Embedding configures the semantic-search vector channel: the
// embedding provider plus the chunking / concurrency knobs the
Expand Down Expand Up @@ -1261,6 +1262,23 @@ type QueryConfig struct {
MaxDepth int `mapstructure:"max_depth" yaml:"max_depth,omitempty"`
}

type ScopeConfig struct {
IntentDefaults bool `mapstructure:"intent_defaults" yaml:"intent_defaults,omitempty"`
}

// MergeEnv overlays the scope-specific environment knobs on top of
// file/default config values. Invalid values are ignored so a typo does not
// silently disable scoped queries.
func (c ScopeConfig) MergeEnv() ScopeConfig {
switch strings.ToLower(strings.TrimSpace(os.Getenv("GORTEX_SCOPE_INTENT_DEFAULTS"))) {
case "0", "false", "no":
c.IntentDefaults = false
case "1", "true", "yes":
c.IntentDefaults = true
}
return c
}

// SearchConfig configures the I13 11-signal rerank pipeline that
// orders `search_symbols` / `winnow_symbols` results. The Weights
// map is keyed by canonical signal name (rerank.SignalBM25,
Expand Down Expand Up @@ -1667,6 +1685,9 @@ func Default() *Config {
DefaultDepth: 3,
MaxDepth: 10,
},
Scope: ScopeConfig{
IntentDefaults: true,
},
MCP: MCPConfig{
Transport: "stdio",
Port: 8765,
Expand Down Expand Up @@ -1737,6 +1758,7 @@ func Load(configPath string) (*Config, error) {
if err := v.Unmarshal(cfg); err != nil {
return nil, err
}
cfg.Scope = cfg.Scope.MergeEnv()

if err := cfg.validateWorkspaceSchema(); err != nil {
return nil, err
Expand Down
152 changes: 106 additions & 46 deletions internal/indexer/multi.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ import (
"os"
"path/filepath"
"runtime"
"sort"
"strings"
"sync"
"time"
Expand Down Expand Up @@ -1896,6 +1897,68 @@ func (mi *MultiIndexer) GetIndexer(repoPrefix string) *Indexer {
return mi.indexers[repoPrefix]
}

type grepRepoJob struct {
prefix string
idx *Indexer
}

func (mi *MultiIndexer) grepRepoJobs(repoAllow map[string]bool) []grepRepoJob {
if mi == nil {
return nil
}
mi.mu.RLock()
defer mi.mu.RUnlock()
capHint := len(mi.indexers)
if repoAllow != nil {
capHint = len(repoAllow)
}
jobs := make([]grepRepoJob, 0, capHint)
for prefix, idx := range mi.indexers {
if idx == nil {
continue
}
if repoAllow != nil && !repoAllow[prefix] {
continue
}
jobs = append(jobs, grepRepoJob{prefix: prefix, idx: idx})
}
sort.Slice(jobs, func(i, j int) bool { return jobs[i].prefix < jobs[j].prefix })
return jobs
}

func singleAllowedRepo(repoAllow map[string]bool) (string, bool) {
if repoAllow == nil {
return "", false
}
var only string
count := 0
for prefix, allowed := range repoAllow {
if !allowed {
continue
}
only = prefix
count++
}
return only, count == 1
}

func stampGrepMatchPaths(prefix string, hits []trigram.Match) []trigram.Match {
if prefix == "" {
return hits
}
for i := range hits {
hits[i].Path = prefix + "/" + hits[i].Path
}
return hits
}

func capGrepMatches(matches []trigram.Match, limit int) []trigram.Match {
if limit > 0 && len(matches) > limit {
return matches[:limit]
}
return matches
}

// GrepText fans out a trigram-accelerated literal search across every
// tracked per-repo Indexer and returns the union, capped at limit.
// Match paths are re-stamped from repo-root-relative to repo-prefixed
Expand All @@ -1906,45 +1969,40 @@ func (mi *MultiIndexer) GetIndexer(repoPrefix string) *Indexer {
// single-Indexer path (Indexer.GrepText) is used by callers without a
// MultiIndexer.
func (mi *MultiIndexer) GrepText(query string, limit int) []trigram.Match {
return capGrepMatches(mi.GrepTextForRepos(query, nil, limit), limit)
}

// GrepTextForRepos is the scoped variant of GrepText. When repoAllow is
// non-nil, only those repo prefixes are searched. perRepoLimit caps each
// searched repo independently; the returned union is intentionally not
// globally capped so callers can apply path / graph-scope filters first.
func (mi *MultiIndexer) GrepTextForRepos(query string, repoAllow map[string]bool, perRepoLimit int) []trigram.Match {
if mi == nil || query == "" {
return nil
}
mi.mu.RLock()
type job struct {
prefix string
idx *Indexer
}
jobs := make([]job, 0, len(mi.indexers))
for prefix, idx := range mi.indexers {
if prefix, ok := singleAllowedRepo(repoAllow); ok {
idx := mi.GetIndexer(prefix)
if idx == nil {
continue
return nil
}
jobs = append(jobs, job{prefix: prefix, idx: idx})
return stampGrepMatchPaths(prefix, idx.GrepText(query, perRepoLimit))
}
mi.mu.RUnlock()

// Per-repo cap mirrors the overall limit when set; the merge below
// applies the final cap so a small repo contributing 100 matches
// doesn't starve a larger one. Zero / negative means no per-repo
// cap (let each searcher return everything).
perCap := limit
// Per-repo cap mirrors the caller's page size when set. The caller
// applies the final cap after any path / graph-scope filters, so a
// repo outside those filters cannot consume the page first. Zero /
// negative means no per-repo cap (let each searcher return everything).
jobs := mi.grepRepoJobs(repoAllow)
out := make([]trigram.Match, 0, len(jobs)*8)
for _, j := range jobs {
hits := j.idx.GrepText(query, perCap)
hits := j.idx.GrepText(query, perRepoLimit)
if len(hits) == 0 {
continue
}
for i := range hits {
// Trigram emits forward-slash repo-relative paths. Stamp
// the repo prefix so downstream tools (resolveGraphPath,
// path-prefix filters) see the same shape they get from
// the graph nodes.
hits[i].Path = j.prefix + "/" + hits[i].Path
}
out = append(out, hits...)
}
if limit > 0 && len(out) > limit {
out = out[:limit]
// Trigram emits forward-slash repo-relative paths. Stamp the repo
// prefix so downstream tools (resolveGraphPath, path-prefix filters)
// see the same shape they get from the graph nodes.
out = append(out, stampGrepMatchPaths(j.prefix, hits)...)
}
return out
}
Expand All @@ -1958,27 +2016,35 @@ func (mi *MultiIndexer) GrepText(query string, limit int) []trigram.Match {
// compile in any indexer is reported once; per-indexer errors after
// the first compile are otherwise treated as no-match.
func (mi *MultiIndexer) GrepRegexp(pattern, pathPrefix string, limit int) ([]trigram.Match, error) {
hits, err := mi.GrepRegexpForRepos(pattern, pathPrefix, nil, limit)
if err != nil {
return nil, err
}
return capGrepMatches(hits, limit), nil
}

// GrepRegexpForRepos is the scoped variant of GrepRegexp. repoAllow and
// perRepoLimit have the same semantics as GrepTextForRepos.
func (mi *MultiIndexer) GrepRegexpForRepos(pattern, pathPrefix string, repoAllow map[string]bool, perRepoLimit int) ([]trigram.Match, error) {
if mi == nil || pattern == "" {
return nil, nil
}
mi.mu.RLock()
type job struct {
prefix string
idx *Indexer
}
jobs := make([]job, 0, len(mi.indexers))
for prefix, idx := range mi.indexers {
if prefix, ok := singleAllowedRepo(repoAllow); ok {
idx := mi.GetIndexer(prefix)
if idx == nil {
continue
return nil, nil
}
hits, err := idx.GrepRegexp(pattern, pathPrefix, perRepoLimit)
if err != nil {
return nil, err
}
jobs = append(jobs, job{prefix: prefix, idx: idx})
return stampGrepMatchPaths(prefix, hits), nil
}
mi.mu.RUnlock()

perCap := limit
jobs := mi.grepRepoJobs(repoAllow)
out := make([]trigram.Match, 0, len(jobs)*8)
for _, j := range jobs {
hits, err := j.idx.GrepRegexp(pattern, pathPrefix, perCap)
hits, err := j.idx.GrepRegexp(pattern, pathPrefix, perRepoLimit)
if err != nil {
// First compile error short-circuits — the pattern is the
// caller's fault and won't compile in any other indexer
Expand All @@ -1988,13 +2054,7 @@ func (mi *MultiIndexer) GrepRegexp(pattern, pathPrefix string, limit int) ([]tri
if len(hits) == 0 {
continue
}
for i := range hits {
hits[i].Path = j.prefix + "/" + hits[i].Path
}
out = append(out, hits...)
}
if limit > 0 && len(out) > limit {
out = out[:limit]
out = append(out, stampGrepMatchPaths(j.prefix, hits)...)
}
return out, nil
}
Expand Down
Loading
Loading