April 25, 2026•AI Infrastructure•Rob Murtha

Intelligence Primitives: The Ground Truth Agents Need

Why structured, scored data primitives — not bigger models — are the missing layer for accurate AI agents, compositional code generation, and the path toward grounded AGI. A look at how Gerolamo's intelligence corpus and meta molecules change the game.

Every major AI lab is racing to build bigger models. But the hardest problems in 2026 aren't about parameter counts — they're about what agents actually know when they act.

Meta recently deployed 50+ specialized AI agents to map tribal knowledge across their data pipelines. The result? Context files that gave agents structured navigation guides for 100% of code modules. Without them, agents would "guess, explore, guess again — and often produce code that compiled but was subtly wrong."

This is the defining problem of the agentic era: most agent failures aren't intelligence failures. They're context failures.

Gerolamo was built to solve this at the technology intelligence layer — a scored, structured, continuously refreshed corpus of every meaningful open-source entity across GitHub, arXiv, and Hugging Face. Not a search engine. A ground truth substrate that agents can reason over.

The Primitive: Atomic Unit of Machine Intelligence

Every entity in Gerolamo's corpus becomes an intelligence primitive — a scored, structured object that encodes not just what something is, but how defensible it is, how fast it's moving, who built it, and what threatens it.

{
  title: "CODESTRUCT",
  corpus: "arxiv",
  defensibility_score: 2,
  frontier_risk: "HIGH",
  composability: "algorithm",
  capability_tags: ["AST manipulation", "structured actions", "code agents"],
  velocity: 0.0,
  threat_profile: { frontier_risk: "HIGH", displacement_horizon: "near" }
}

This is CODESTRUCT — a paper that reframes code agent interactions by replacing fragile string-matching edits with structured actions performed directly on Abstract Syntax Trees. It scores low on defensibility because frontier labs will likely absorb this technique. But the insight it encodes — that code agents need structural primitives, not string diffs — is exactly what makes it valuable as intelligence.

When an agent consumes this primitive, it doesn't just get a link. It gets a scored assessment of technical defensibility, frontier risk, creator authority, velocity trajectory, and composability profile. That's the difference between searching and knowing.

Why Agents Hallucinate Strategy

A recent arXiv paper on epistemic grounding proposes "GROUNDING.md" — community-governed documents that encode hard constraints and convention parameters for AI coding agents. The premise: agents need external validity anchors because their training data can't capture the live state of any specific domain.

This is exactly right, but it doesn't go far enough. A GROUNDING.md file is static. The technology landscape isn't.

Consider what happens when an agent is asked to evaluate a dependency:

Without Gerolamo: The agent searches the web, finds the repo's README, checks star count, maybe reads a blog post. It produces a confident recommendation based on surface signals.
With Gerolamo: The agent calls score_stack and gets back per-entity defensibility scores, weakest-link analysis, frontier risk assessments, and velocity trajectories — all computed from continuous multi-platform monitoring.

The difference isn't marginal. It's the difference between opinion and intelligence.

Right now, Gerolamo tracks entities like Governed Reasoning for Institutional AI — a framework using nine typed cognitive primitives to prevent silent errors in high-stakes AI systems. Or CogniAlign, which uses structured deliberation among "scientist agents" to align AI behavior with naturalistic moral reasoning. These aren't trending repos. They're scored intelligence that an agent can use to make grounded recommendations about AI safety architecture.

Meta Molecules: Speculative Intelligence That Compounds

Here's where it gets interesting. Gerolamo doesn't just index what exists — it enables composition of what could exist.

Meta molecules are speculative technology concepts created by fusing real intelligence primitives. They're theoretical constructs with real scored lineage. Three examples currently in the corpus:

MCP Synaptic Bus — Context-Aware Tool Composition Engine (Score: 7/10)
A meta molecule proposing a dynamic routing layer that enables MCP tools to compose based on context, not hardcoded workflows. Instead of an agent calling tools sequentially, the Synaptic Bus would let tools negotiate which capabilities to invoke based on the query's semantic signature.

MCP Panopticon — Cross-Agent Intelligence Fabric with Competitive Moat Detection (Score: 6/10)
An intelligence mesh that lets multiple agents share scored observations about technology defensibility in real time. Imagine a VC firm's research agent, an engineering team's dependency auditor, and a strategy agent all contributing to and drawing from the same scored intelligence layer.

MYCELIUM — Mesh-Yielded Communication via Emergent Low-bandwidth Intelligence Units in Motion (Score: 8/10)
The highest-scored meta molecule in the corpus. MYCELIUM proposes emergent communication protocols for distributed AI systems operating under bandwidth constraints — a biological metaphor for how intelligence should propagate across agent networks.

These aren't fantasies. They're scored hypotheses with traced lineage back to real primitives. When an agent asks Gerolamo "what could I build to improve cross-agent coordination?", meta molecules surface as compositional possibilities grounded in real technical intelligence.

Code Generation Is a Ground Truth Problem

The most immediate application of intelligence primitives is in code generation and — more critically — code regeneration.

Here's the current state: AI coding agents in 2026 achieve only 42% capability on multi-file refactors in enterprise environments. Legacy codebases hit 35%. Even with 1M token context windows, a 400,000-file monorepo can't fit any context window.

The bottleneck isn't model capability. It's that agents make structurally uninformed decisions about which code to generate, which dependencies to use, and which patterns to follow.

Gerolamo's MCP tools change this equation:

Before Writing Code: Intelligence-Grounded Dependency Selection

An agent building a new service calls query_intelligence to find candidates in the problem space, then score_stack to evaluate the defensibility of its planned dependency set. It discovers that one core library has a frontier risk of HIGH and a velocity trajectory heading toward zero — a dead dependency walking. It substitutes before writing a single line.

During Generation: Compositional Awareness

Using compose_molecules, an agent can fuse multiple primitives into a buildable specification. It doesn't generate code from a blank prompt — it generates from a scored, structured understanding of the technical landscape. The output isn't just syntactically correct; it's strategically sound.

After Shipping: Continuous Regeneration

This is the future state. An agent monitoring a deployed codebase subscribes to Gerolamo intelligence on its dependency graph. When a dependency's velocity collapses or a frontier lab releases a competing capability, the agent regenerates the affected components — not from scratch, but informed by the scored alternatives in the corpus.

The entity agent-sh/agentsys (defensibility: 5/10, 710 stars) already demonstrates this pattern — an orchestration layer that augments coding agents with task-specific capabilities. But it lacks the intelligence substrate. Pair it with Gerolamo's primitives and you get code generation that's grounded in continuously refreshed competitive intelligence.

The Accuracy Problem Is a Data Structure Problem

Stanford's 2026 AI Index shows that top models now exceed 50% on Humanity's Last Exam. Agentic benchmarks like SWE-Bench Verified show steep improvement curves. But enterprise experiments keep finding that "AI agents make too many mistakes for businesses to rely on them for processes involving big money."

The gap between benchmark performance and production reliability is a data structure problem. Benchmarks test reasoning in controlled contexts. Production requires reasoning over live, scored, continuously updated information about the world.

This is what intelligence primitives provide:

Scored assessments — not binary "good/bad" but 1-10 defensibility with full reasoning chains
Threat profiles — frontier risk, platform domination risk, displacement horizons
Velocity trajectories — is this accelerating or dying?
Composability metadata — tech stack, integration surface, capability tags
Creator authority — who built this and what's their track record?

When an agent has access to these structured signals, its recommendations shift from plausible to grounded. It's the same shift that happened when search engines moved from keyword matching to PageRank — the underlying intelligence structure changed, and accuracy followed.

From Primitives to AGI: The Compounding Effect

The path to AGI doesn't run through bigger models alone. It runs through richer, more structured, continuously refreshed knowledge substrates that models can reason over.

Consider the compounding effect:

Layer 1: Primitives — Scored entities with defensibility, velocity, threat profiles. This is Gerolamo today.
Layer 2: Meta Molecules — Speculative compositions traced back to real primitives. Agents propose new technology combinations and the corpus scores them.
Layer 3: Conspectus — AI-generated macro intelligence summaries across entire domains. Not individual entities but landscape-level understanding.
Layer 4: Agent Memory — Agents that remember which primitives they've consumed, which recommendations they've made, and which predictions played out. Intelligence that learns from its own trajectory.

Each layer compounds on the last. An agent with Layer 4 capabilities doesn't just answer "what should I build?" — it answers "given everything I've tracked across six months of technology evolution, and given which of my previous recommendations succeeded or failed, here's what you should build next."

Gerolamo's current architecture already implements Layers 1-3. The intelligence brief endpoint delivers combined sleeper signals, trending entities, and breakout detection in a single call. The conspectus system generates narrative-level summaries across tracked domains. Meta molecules enable speculative composition.

Layer 4 is where this converges with AGI-relevant capabilities: agents that maintain persistent, scored models of technology evolution and use those models to make increasingly accurate predictions.

The MCP Advantage: Intelligence as Protocol

MCP has emerged as the defining architectural trend of 2026 — the open standard that lets AI systems integrate with external tools and data sources. Gartner predicts 40% of enterprise applications will include task-specific AI agents by end of 2026.

Gerolamo's MCP server turns the intelligence corpus into a protocol-native resource. Any MCP-compatible agent — Claude Code, custom pipelines, enterprise orchestrators — can:

Discover via query_intelligence and find_sleepers
Analyze via score_stack and explain_score
Compose via compose_molecules and explore_connections
Monitor via get_intelligence_brief and subscription endpoints

This isn't a dashboard you check. It's an intelligence layer your agents think through.

The trending entities in Gerolamo right now tell the story: DeepSeek-V3 optimization is moving at velocity 29.9. Open WebUI at 6.5. vLLM at 6.4. LiteLLM at 5.9. These aren't just numbers — they're real-time signals about where the infrastructure layer is consolidating and where opportunities remain.

When your agent can access this natively through MCP, it stops hallucinating strategy and starts executing from ground truth.

What This Means for You

If you're building AI-powered products, evaluating technology stacks, or running engineering teams in 2026, the question isn't whether your agents are smart enough. It's whether they have access to structured, scored, continuously refreshed intelligence about the landscape they're operating in.

The shift from "AI that sounds right" to "AI that is right" won't come from the next model upgrade. It'll come from the data substrate underneath.

Gerolamo is that substrate. The primitives are scored. The meta molecules are composable. The MCP server is live.

The agents are waiting for ground truth. Now they have it.