Skip to contentSkip to content
Cosmic Stackcosmicstack.ai

about

A thesis-first lab for the next decade of software.

Cosmic Stack is a research and product lab. We believe the most consequential software of the next decade will be built by small teams of researchers shipping production agents — not committees writing 200-page roadmaps.

Founded 2025·Independent·Remote-first

why now

The cost curve flipped, but the products haven't caught up.

Frontier inference is two orders of magnitude cheaper than it was 18 months ago. Agents that took $40 of compute to complete a single workflow now take $0.40. The infrastructure layer has been quietly rewritten: routing, caching, evals, persistent memory, tool sandboxing — all of it real, all of it shippable.

And yet most "AI products" are still thin chat wrappers trained on a quarterly OKR. The gap between what's technically possible and what's actually shipped has never been wider. That gap is the opportunity.

Cosmic Stack exists to close it — by building the agents, primitives, and runtimes ourselves, then handing them to other teams in the form of open source, incubated companies, and a managed cloud.

what we believe

Five claims we'll defend in public.

  1. The agent is the product, not the model. Model performance is converging. The defensible surface is the runtime — tools, memory, evals, recovery.
  2. Open source wins the platform layer. Every durable infrastructure category in the last 20 years ended in open source. Agents will be no different.
  3. Evals are the new product spec. If you can't measure it, you're shipping vibes. Evals come before features, not after.
  4. Local-first is non-negotiable for personal data. Finance, health, communication — sensitive surfaces deserve agents that compute on your device by default.
  5. Small teams that ship beat big teams that plan. Every product in this lab is owned end-to-end by a tiny team with full agency. We optimize for cycle time, not headcount.

how we operate

A lab, an incubator, an OSS org, and a cloud.

The four surfaces — Labs, Incubator, Open Source, Cloud — are not separate business lines. They are stages of the same pipeline.

  1. Labs is where ideas get prototyped against real evals. Most things die here, in public.
  2. Open Source is where the survivors get hardened and adopted — Mercury Agent is the flagship.
  3. Incubator picks up the open source primitives and turns them into companies with founders and customers.
  4. Cloud runs all of the above as a managed service for teams that want the agent stack without operating it.

The pipeline is the moat. Each stage feeds the next with production evidence, and each stage feeds the previous with problems worth researching.

principles

Four rules we'll be held to.

01

Ship in public, fail in public.

Every research direction either becomes a shipped product or a written postmortem. There is no third drawer where things quietly die.

02

Evals before features.

If a capability cannot be measured, it is not yet a capability. We write the eval first, then chase the score.

03

Open source by default.

Unless there is a clear customer reason to keep something private, it is published. The platform layer of AI belongs to no one company.

04

Small teams, full ownership.

Every product is owned by a tiny team with end-to-end authority. We pay the salary of a senior engineer, not the salary of a manager.

work with us

If any of this resonates, the door is open.