~GIX Loading...
Open SourceMIT License

Sonde

An MCP-native topic registry for GitHub, arXiv, Hugging Face, RSS, and web monitors. Declare what you're watching, simulate before you collect, and give agents governed access to fresh signal with full lineage.

COLLECTION_FLOW

01 Declare topic intent in YAML

02 Lint, dedupe, simulate

03 Run produces versioned manifest

04 Every artifact traceable to its origin.

Not a Scraper. The Layer Above.

Most collection systems hide their actual intent in scattered YAML, scripts, seed URLs, RSS lists, API queries, and watchlists. Sonde turns that hidden layer into a governed topic pack that can live in Git and produce reproducible run manifests.

Every collected artifact answers: which topic and topic version produced this, which config hash, which adapter and source query, when it was collected, and what raw or normalized source record it came from.

CLI Commands

sonde init

Scaffold a new topic configuration

sonde lint

Validate topic schemas and detect issues

sonde dedupe

Find duplicate or near-duplicate queries

sonde diff

Compare two config versions side-by-side

sonde simulate

Sample expected yield and noise for a topic

sonde run

Execute collection with manifest generation

sonde status

View run history and registry state

sonde export

Package topics as portable topic packs

sonde mcp

Start MCP server for agent access

sonde version

Print version information

Source Adapters

GitHub

Repository search API

arXiv

Atom API

Hugging Face

Hub API

RSS

Public feeds

Local JSONL

Offline fixtures

Declarative Topics

Topics are versioned YAML objects with explicit intent, queries, negative terms, source bindings, schedules, and ownership. Everything that drives collection lives in one inspectable place.

Lint catches schema errors. Dedupe catches query overlap. Diff shows what changed between versions. Simulate shows what would be collected before you commit.

topics:
  - id: "agent_security_model"
    intent: "Track emerging work on identity,
        permissioning, and threat models
        for AI agents."
    version: "1.0.0"
    priority: high
    queries:
      - "agent security model"
      - "AI agent permissioning"
    negative_terms:
      - "real estate agent"
    schedule:
      interval_minutes: 120

MCP Surface

15 tools, 13 resources, and 6 prompts. Agents get governed access to the full topic lifecycle — from drafting and validation to collection and lineage inspection.

Tools

lint_topics

Validate a topic config

dedupe_topics

Find duplicate and overlapping topics

find_semantic_overlap

Detect semantic overlap between topics

diff_topics

Compare two topic configs

simulate_topic

Sample expected yield and noise

estimate_collection_cost

Estimate API requests, artifacts, storage

run_topic_dry_run

Execute a dry run, return manifest

create_topic_draft

Create a new draft topic

update_topic_draft

Modify a topic, returns diff

deprecate_topic

Transition topic to deprecated

promote_topic

Promote a draft to active

rollback_topic_version

Roll back to a previous version

generate_aliases

Generate query aliases from intent

generate_negative_terms

Generate negative terms to reduce noise

summarize_topic_health

Yield, noise, staleness, coverage report

Resources

sonde://topics

All topics (summary)

sonde://topics/{id}

Full topic definition

sonde://topics/{id}/versions

Version history

sonde://topics/{id}/quality

Quality metrics

sonde://sources

All configured source IDs

sonde://runs

Recent collection runs

sonde://runs/{id}

Full run manifest

sonde://artifacts/{id}

Single artifact with lineage

sonde://lineage/artifact/{id}

Lineage chain for an artifact

sonde://diffs/{from}/{to}

Diff between topic versions

sonde://schema/topic

Topic JSON schema

sonde://schema/artifact

Artifact JSON schema

Prompts

review_topic_quality

Review yield, noise, overlap, and versioning

create_collection_strategy

Design a collection strategy for a domain

expand_topic_aliases

Expand aliases and negative terms

deprecate_noisy_topic

Draft a deprecation decision

write_signal_report

Summarize recent signal for a topic

recommend_deprecations

Identify candidates for deprecation

Full Lineage

Versioned Manifests

Every run produces a manifest under .sonde/artifacts/manifests/ with config hash, topic version, and timestamps.

SQLite Registry

Local database at .sonde/sonde.db tracks all runs, their parameters, and outcomes.

Normalized Artifacts

Collected items appended to .sonde/artifacts/normalized/artifacts.jsonl with full provenance.

Govern what you collect.

Open source. MIT licensed. Built by Adjective.