Snap: Visual Annotation for Agentic Coding
Adjective built Snap to solve a problem we kept hitting during our own development work: AI coding agents are powerful, but they cannot see your screen. When a layout breaks, a component misaligns, or a color is wrong, you have to stop coding, take a screenshot, paste it into chat, and describe the problem in words. That friction kills flow state and wastes time. Snap eliminates it entirely — press a hotkey, circle the bug, type a label, hit enter. Your agent sees exactly what you see and fixes it.
Challenge
Approach
Annotation Overlay (Tauri 2.x)
Built a native desktop application using Tauri 2.x (Rust backend, vanilla JS frontend) that captures the screen on a global hotkey and presents a fullscreen annotation canvas
Structured Metadata Layer
Every annotation save produces a matched JSON sidecar with structured data that agents can parse programmatically — not just a screenshot
MCP Server (Python / fastmcp)
Built an MCP server that exposes the annotation inbox to any AI agent via stdio transport, making Snap agent-native from day one
Cross-Platform Setup
Shipped setup scripts and documentation for Linux (X11, Wayland/GNOME, Sway, Hyprland) and macOS with automated MCP registration
Impact
UI bug resolution during vibe coding sessions dropped from 60-90 seconds (screenshot + describe + wait for fix) to under 10 seconds (hotkey + annotate + enter). Measured across 200+ visual issues during internal development.
Agents fix the correct element on the first attempt 90%+ of the time when given annotated screenshots with coordinates vs. ~30% accuracy with verbal descriptions alone. Structured metadata eliminates the "which button do you mean?" back-and-forth.
Complete round-trip from spotting a bug to having the agent working on the fix. No app-switching, no file uploads, no typing paragraphs of description. The developer stays in flow.
From zero to a working cross-platform annotation tool with MCP integration, shipped as open source under MIT license.