The moat is the system, not the model: what AI cybersecurity just taught every founder

The Compass Team
April 16, 2026

A small security lab called AISLE spent last week quietly making every "which LLM should I use" argument on Twitter look silly.
On April 7, Anthropic announced Mythos, a specialized AI model wrapped in a consortium called Project Glasswing. AWS, Apple, Google, JPMorgan, Microsoft, NVIDIA, Palo Alto Networks. The pitch: here's the model that autonomously finds thousands of zero-day vulnerabilities in software humans have been staring at for decades.
A 27-year-old bug in OpenBSD. A 16-year-old bug in FFmpeg. Multi-stage privilege escalation chains in the Linux kernel. Anthropic committed up to $100M in usage credits to the effort.
Nine days later, AISLE published a response that sat at the top of Hacker News with 1,275 points. They took the specific vulnerabilities Mythos showcased, isolated the code, and ran it through small open-weights models.
Eight of those eight models detected Mythos's flagship FreeBSD exploit. One of them had 3.6 billion active parameters and cost $0.11 per million tokens. On a basic security reasoning task, the tiny open models beat most frontier models from every major lab.
AISLE's post has one sentence that matters for every founder building anything with AI right now:
"The moat is the system into which deep security expertise is built, not the model itself."
Sit with it for a second.
Why this matters outside of security
Founders are in a worse spot than AISLE, and most of them don't realize it.
AISLE wins because they spent a year building the scaffolding around the model. A discovery pipeline, a validation loop, a patch generator, a relationship with maintainers that gets patches accepted. (They've shipped 180+ externally validated CVEs, including 15 in OpenSSL and 5 in curl.)
The model they use is interchangeable. They swap Claude, GPT, Qwen, Llama based on the task. What stays constant is the system.
Now look at how most founders use AI. You open Claude or ChatGPT, type a question into a blank input, and hope it knows enough about your life to give you an answer that isn't generic. The context you feed it is whatever you can type in 90 seconds before the next meeting.
The model has no idea what you were worried about last Tuesday, who your top 3 customers are, what you decided about pricing two months ago, or why you hired a designer before an engineer.
You're asking a $25-per-million-token frontier model to do strategic thinking on zero scaffolding. No wonder the output sounds like a McKinsey intern wrote it.
What every major lab shipped this week (by accident)
Go look at what Anthropic, OpenAI, and Alibaba released in the past 7 days. The pattern is obvious once you see it.
- Claude Code Routines: scheduled agents that run on triggers, with context injected from saved configs and connectors.
- OpenAI Agents SDK 2.0: native filesystem, sandboxed shell, MCP connectors, Skills as a packaging unit,
AGENTS.mdas standard. Codex is at 3M weekly active users. Enterprise revenue is already 40%+ of OpenAI's total and tracking for parity with consumer by the end of 2026. - Qwen 3.6-35B-A3B: open-weights agentic coder, 3B active parameters, runs on a laptop.
Every one of those releases ships scaffolding around the model. Memory, tools, context, triggers, permissions, audit trails. The model layer is roughly stable. The plumbing is where every major lab is racing.
The bottleneck moved. If you're still treating "which LLM" as the interesting question, you're arguing about the wrong thing.
(The SkyPilot team drove the same point home with a side experiment this week. They bolted a "read papers and study competing forks first" phase onto a coding agent and got 5 llama.cpp optimizations in 3 hours for $29 across 4 VMs. The model stayed the same. The context pipeline changed. The output jumped.)
What founders have that's actually scarce
Here's the part that stings.
The scarce input for a founder's AI is you. Your situation, your decisions, your customer calls, your unresolved 2am worries, your pattern of mistakes, your weekly context.
A frontier model in 2026 is basically a commodity. You can rent one for a few dollars. You can download one for free. Six weeks from now a better one will ship on whatever benchmark you care about. The model is the cheapest part of the stack.
What's expensive is the structured record of your own thinking. There's exactly one of those per founder, and no external tool will build it for you.
This is the thing founders miss when they reach for a chat window on a blank page. They're treating the chat as the whole tool. The chat window is the cheapest layer in the stack. The expensive layer is the memory sitting behind it.
Without that, you're running your strategic thinking on whatever was in your head for the last 30 seconds.
That's what we keep writing about: AI collapsed the cost of execution. The new bottleneck is the quality of the thinking you feed it. AISLE just proved it on a subject where the data is falsifiable. Founders are operating on the same principle with worse tooling.
What a founder's system actually looks like
If the moat is the system, the next question is obvious. What's in it?
The same primitives AISLE uses, translated to a founder's work:
- Capture that actually captures. Voice notes, quick text, screenshots. The friction between a thought and the system has to drop to near-zero or you won't log the thought. We walked through why most founder note-taking setups fail at this first step.
- Structure the model can read. Tagged entries, dated, linked to people, projects, decisions. The formatting serves the model first so it can query across months. A pile of unstructured text is a worse input than 30 well-tagged notes.
- A weekly review that compounds. The loop where last week's notes become this week's context. We wrote the 45-minute framework for this.
- A decisions log. What you decided, when, why, what you expected, what actually happened. This is the single most valuable artifact in a founder's life, and almost no one keeps one.
- A model-agnostic query surface. Whatever LLM you're using today, you should be able to point it at your own corpus. When the better model ships next quarter, you swap. The system doesn't move.
Skip any of those and you end up as AISLE without the pipeline. A smart model sitting on top of nothing. You'll get smart-sounding generic output.
The solo founder AI stack piece walks through the specific tools that fit each slot. The stack matters less than the fact that you have one. A founder using mediocre tools inside a working system will make sharper decisions than a founder using Claude Opus 4.7 with no memory.
The uncomfortable shift
Most founders hate this framing because it moves the work.
When the story was "pick the best model," the work was shopping. Read a benchmark, subscribe to the top one, feel like you made a strategic decision. Done in 10 minutes.
When the story is "build the system around the model," the work is building. Capture habits. Weekly rhythms. Structured notes. A log of your decisions. A query layer that works across tools. That takes weeks of setup and years of practice. No signup flow will do it for you.
Founders love AI stories where the lift feels small. The AISLE result points the other way. The model did part of the job. AISLE did the other part, which took a year and 180 CVEs to get right. That's where their moat lives. Their output without their pipeline isn't very interesting.
Your output without your system is the same.
Where Compass fits
We built Compass because we kept watching founders do the model-shopping thing. Cancel Claude, subscribe to ChatGPT, cancel ChatGPT, try Gemini, cycle. Every model they tried was smart. Every output felt empty. The common thread was zero system underneath.
Compass is the system. Voice and text capture in 30 seconds. AI categorization that structures entries as you go. Weekly reviews that surface patterns across months of notes. A memory layer that every AI tool you touch can read from, so context compounds instead of evaporating.
The model you run on top is your call. It should be. Model prices drop 80% a year. What doesn't drop is the cost of rebuilding a year of your own thinking.
What to do on Monday
Two things.
First, open whatever AI chat tool you use most and count how many words of personal context you've fed it in the last 30 days. Most founders will find a number under 500. That's the gap. That's the reason the outputs feel generic. You're asking a frontier model to do strategic work on a rounding error's worth of context about you.
Second, pick one capture habit you can run for a week. A voice memo after every customer call. A 3-line evening log. A Friday decision-log entry. One habit, one week.
Then look at the corpus on day 8 and query it. Ask "what did I worry about this week," "what customers came up more than once," "what did I decide about pricing." If the answers are junk, the system is junk. Fix that before you fix the model.
AISLE won by building a better system around the model they already had. The security industry ran the experiment in public, at $0.11 per million tokens, and the result sat at the top of Hacker News for a reason.
The model gets cheaper every quarter. The system you build around it compounds every quarter you have it.
Go build the system.
Compass is the AI note-taking app built for founders. Capture your thinking by voice, watch AI surface ideas, insights, and relationships, and make sharper decisions week over week. For founders who take their own thinking seriously.
Join the founding members →Share this article