Development¶

Daily Validation Loop¶

cargo build
cargo test
cargo clippy --all-targets --all-features -- -D warnings

One-command validation workflow¶

CGREP_BIN=cgrep bash scripts/validate_all.sh

This single workflow verifies: - core indexing/search - incremental update path (--print-diff) - agent planning flow (agent plan) - status/search stats payload checks (json2 --compact) - doctor flow (scripts/doctor.sh) when repository-integrations files are present - docs local-link sanity checks (README + docs hub files)

Docs Site Validation (GitHub Pages)¶

Local preview:

mkdocs serve

Strict build parity with CI (docs-pages workflow):

mkdocs build --strict

Performance Gate¶

python3 scripts/index_perf_gate.py \
  --baseline-bin /path/to/baseline/cgrep \
  --candidate-bin /path/to/candidate/cgrep \
  --runs 3 \
  --warmup 1 \
  --files 1200

python3 scripts/agent_plan_perf_gate.py \
  --baseline-bin /path/to/baseline/cgrep \
  --candidate-bin /path/to/candidate/cgrep \
  --runs 5 \
  --warmup 2 \
  --files 800

Run this after search/indexing-related changes. Performance gate tracks latency p50/p95 for: - fresh worktree index latency with --reuse off - first keyword search latency after --reuse off - incremental index update latency after a small tracked-file change (--reuse off) - fresh worktree index latency with --reuse strict - fresh worktree index latency with --reuse auto - first keyword search latency after --reuse strict - first keyword search latency after --reuse auto - agent plan latency for simple identifier-like query - agent plan latency for complex phrase-like query - agent plan end-to-end latency on expand-heavy query

Methodology: - --warmup executes non-reported warmup runs per metric. - --runs executes measured runs per metric. - p50 uses median. - p95 uses nearest-rank percentile over measured runs.

CI thresholds (median): - search regression > 5%: fail - cold index build regression > 10%: fail - incremental/reuse update regression > 10%: fail - agent plan regression > 10%: fail - small absolute deltas (<= 3ms) are treated as noise for agent-plan perf checks

Tag-Triggered Release CI¶

release workflow (.github/workflows/release.yml) runs when a release tag is pushed. Accepted tag forms: - vMAJOR.MINOR.PATCH (for example v1.5.2) - MAJOR.MINOR.PATCH (for example 1.5.2)

git tag v1.5.2
git push origin v1.5.2

or

git tag 1.5.2
git push origin 1.5.2

Manual fallback (workflow_dispatch) uses the selected commit and publishes with the provided tag input.

Release-Ready Checklist¶

Build passes (cargo build)
Tests pass (cargo test)
Clippy clean (-D warnings)
Validation workflow passes (scripts/validate_all.sh)
Performance gates pass (scripts/index_perf_gate.py, scripts/agent_plan_perf_gate.py)
Docs updated for CLI/behavior changes

Benchmark: Agent Token Efficiency (PyTorch)¶

python3 scripts/benchmark_agent_token_efficiency.py --repo /path/to/pytorch

Tier tuning:

python3 scripts/benchmark_agent_token_efficiency.py \
  --repo /path/to/pytorch \
  --baseline-file-tiers 2,4,6,8,12 \
  --cgrep-expand-tiers 1,2,4,6,8

Outputs: - docs/benchmarks/pytorch-agent-token-efficiency.md - local/benchmarks/pytorch-agent-token-efficiency.json (local-only)

Benchmark: Codex Real-Agent Efficiency (PyTorch)¶

python3 scripts/benchmark_codex_agent_efficiency.py \
  --repo /path/to/pytorch \
  --cgrep-bin /path/to/cgrep \
  --model gpt-5-codex \
  --reasoning-effort medium \
  --runs 2

Tracks: - input_tokens, cached_input_tokens, output_tokens - billable_tokens = input - cached_input + output - success/failure under command-policy constraints - scenario set includes: autograd, TensorIterator, PythonArgParser, DispatchKeySet, CUDAGraph, addmm - prompt includes scenario-specific starter hints: - baseline: focused rg starter (grep_pattern) - cgrep: high-signal search/definition starters (cgrep_commands) + compact/scoped recommendation

Outputs: - docs/benchmarks/pytorch-codex-agent-efficiency.md - local/benchmarks/pytorch-codex-agent-efficiency.json (local-only)

Codex single-run variance can be high; prefer multi-run medians (--runs >= 2) for release decisions.

Benchmark: Search Option Performance (PyTorch)¶

python3 scripts/benchmark_search_option_performance.py \
  --repo /path/to/pytorch \
  --cgrep-bin /path/to/cgrep \
  --runs 5 \
  --warmup 1

Covers practical search option/scenario pairs, including: - scoped keyword search - --type, --glob, -C, -B, -P - payload-focused flags (--path-alias, --dedupe-context, --suppress-boilerplate) - scan-mode comparisons (--no-index, --regex --no-index)

Outputs: - docs/benchmarks/pytorch-search-options-performance.md - local/benchmarks/pytorch-search-options-performance.json (local-only)