Waiting, With Intent: Designing AI Systems for the Long Game 🧭

Why AI orchestration, context limits, and trust matter more than speed—and what building for the next five to ten years actually looks like.

🦄 I’m waiting for AI to mature. Very explicitly—and yes, mostly impatiently. I don’t even think we're close to imagining the future landscape with AI, and honestly pretending otherwise is neither honest or useful to anyone. This post is my attempt to explain how I think about AI from a dev perspective on a longer horizon—five, maybe even ten years down the road. The tools we have right now are still a very long way away from my baseline expectations, which my AI systems remind me of near constantly—like when I'm trying to force agent-like functionality out of ChatGPT. Spoiler: it’s not designed to handle that.

While I’m waiting, though, I’m not disengaged. I’m definitely tinkering—sometimes randomly and sometimes just as an unsatisfied AI user who’s not thrilled with the existing systems. I’m also busy figuring out what the next problems really look like by diving in and getting my hands dirty.

One of those big challenges is what I keep calling the “memory problem.” I've designed a solution for my own personal agent to manage long-term memory. Yes—I'm aware that GitHub is inevitably going to beat me to a viable solution. Again. But I'm one of those people who will attempt to solve a problem first, get it wrong at least ten different times, and then do the research to fill in the knowledge gaps. Now I just have to muster up enough oomph to actually do it. 🐉🧚‍♀️

First Principles: LLM vs Agent 🧩

At some point, if you want any of this AI talk to make sense, you have to step back, align terminology, and separate concepts that keep getting blurred together. An LLM, often called a model, is the generative part of GenAI—it accepts input and generates output. That's it. An agent is the system managing context, memory, and various tools. The agent is responsible for what information the LLM even sees in the first place.

When those two ideas get collapsed into the same thing, everything downstream becomes confused. You can’t reason clearly about limits, costs, or failure modes if you don’t separate generation from data management. Until you draw that line, every other discussion ends up muddy.

Context Is the Bottleneck (and Everyone Knows It) 🕸️

Once you make the distinction between LLM and agent, the real bottleneck becomes obvious. There is no good way to manage context today, let alone have the agent automate that job effectively. If you’re not fully up to date on the lingo: context includes a whole set of things like instruction files, workspace structure, active files in your IDE, the AI chat history, available tools, and more.

What we have now are very manual tools that do very little to solve the problem. We have to remember to tell the AI which parts currently matter—or at some point we have to clear the chat entirely and start over. If we don’t do that deliberately, AI slowly loses the point of what’s we're supposed to be working on in the first place. At worst, the entire chat thread is poisoned and the AI becomes unable to function at all. Then you're forced to start fresh and always at the most inconvenient time.

And don’t expect LLM context to scale, either. Hardware costs may go down eventually, but nowhere near fast enough to keep up with everything we keep throwing at it. So, context is very finite—especially in GitHub where context windows are smaller than normal anyway.

The agent will typically make space by compacting information. It will ask the LLM to summarize key points and then it literally drops the original full length novel completely from your active context and replaces it with the cliff notes version. The more summarization, the less accurate things get over time. So naturally you retry prompts while adding back the dropped details and you end up making more calls for a single task overall. The model has to process more and more input just to get you back to the same answer you already had—not necessarily a better one.

People know this is a problem. Tools like Toon exist specifically to minimize input impact for AI. We also have tools like Copilot's #runSubagent to help manage context within a single agent. These aren't true solutions though—they are signals. These are the problems people are trying to solve yesterday while we wait for the next AI evolution to emerge.

Why Orchestration Is Inevitable 🐙

Even if you do everything “right” and manage context like a master AI sensei, agents eventually hit a limit. The list of must-have MCPs is growing and right now those stay in the context window as long as they're enabled. Projects are starting and accumulating larger knowledge bases. Customization is becoming more and more explicit. The context an agent needs to use will continue to grow exponentially, even though LLMs aren't increasing capacity at the same speed.

The ultimate overflow state isn’t hypothetical—it’s inevitable. Once an agent accumulates enough memory, enough history, enough summarization, the LLM simply can’t keep up coherently anymore. That isn’t a failure in the system—it’s a limit.

When you hit that limit, you can't just tweak prompts or optimize harder. You wouldn't try to squeeze more juice out of the same dry orange, either. The only real long-term solution is that you split the system—you have to!

Smaller pieces of work are then sent to the LLM with only relevant context, which is when smarter agents will start to appear. This is where summarization stops and you retain the original intent at both a high-level and at the lowest-level. When we get here, AI generation stops being the problem—the new problem is coordinating all those tiny pieces of work and still accomplishing the larger goal without re-prompting anything previously stated or defined elsewhere already. Welcome to the world of true agent orchestration!

💡 ProTip: If you want a sneak peek of what this looks like, check out Verdent.ai. Of all the solutions I've worked with, Verdent is the only one that's truly designed for agent orchestration. It also excels in VS Code and wins every coding competition I've put it in.

Orchestration as a System Property ♟️

Orchestration isn’t just about sequencing work in a nicer way—it’s about changing where responsibility lives. Yes—some things are always going to be sequential, but not everything needs to be. Some things can and should run in parallel, especially if you want speed and reliability included in future agentic systems.

Validation is a fundamental part of orchestration, not something bolted on afterward. A successful agent has to be able to verify its own work without relying on prior context. It has to come in like a third party, with no knowledge beyond the repo instructions. CodeQL, lint enforcement, Makefiles, and even extra tests become the ground truth the system must consistently check itself against.

Multi-model opposition fits naturally here, too. Different models trained by different companies catch different things. Then the agent can pick one model to implement and another to review. The point is that they disagree by default and then they converge around a common goal. This is a pivotal moment in the future landscape because officially the LLM is no longer the center of gravity—the agentic system is.

🎤 ShoutOut @marcosomma wrote a brilliant article on the concept of agent convergence a while back and it's still one of my favorites. Worth the read if you missed it!

Add Another Layer of Abstraction 🪜

Now for my version of truth, which I know a lot of you are going to hate so go ahead and brace for it. Once you’re working in a smart orchestration-driven flow, there's no reason you need to keep prompting from the IDE. Wait before you jump into the debate, though—I'm not saying the IDE becomes obsolete! It just stops being the primary interface for developer workflows because you’re consistently able to work at a higher level of abstraction. In this future, developers are directing systems that generate, test, and validate the code several layers underneath you automatically.

You’re orchestrating agents that direct other agents. Some run sequentially. Others will run in parallel. Documentation is generated automatically and added to the agent's working knowledge base. Tests run continuously alongside agents implementing new code. Integration testing matters. Systems testing matters more. Chaos testing morphs from an abstract concept into a baseline requirement. The code still exists—but it’s no longer written by or for humans. AI slowly takes that over, which makes natural language the newest language you need to learn.

🦄 For the record, developers are most definitely still building and driving solutions. That will never change—we're the mad scientists thinking up wild potions you didn't know you needed! Besides, all the future advancements in the world won't give silicon the ability to invent new things. Humans create. AI helps. Period.

Trust, Then Speed (not the other way around) 🏎️

When something breaks in any of my workflows, I don’t correct the mistake in the code immediately. I start by correcting whatever instruction caused the mistake, and then I rerun it. Even when I’m busy, even when work is chaotic, and especially when I should have left it alone hours ago—I never fully disengage from this. I can’t.

This is exactly why AI doesn’t make you faster—not yet, anyway. Not because it can’t, but because the systems haven’t caught up to where speed actually emerges. If you’re learning to use AI correctly, it almost always makes you slower at first—not faster. The delay isn’t failure. It’s infrastructure lag.

Think of it like an investment. You’re learning how the models behave and how instructions actually align with them. You’re learning where the limits are, and then deliberately making the system work within those constraints. Speed comes later—after you trust that the system returns results that are validated, reviewed, and tested because you built it to behave that way.

AI evolution is a long game, and we’re barely getting started. Right now, it still feels like grade school. We’re teaching it what our world looks like, how we think, and where the boundaries are.

All the work done now—in this awkward middle state—is what makes that learning possible. Long runs of trial-and-error prompts, walls of instructions, documentation that later turns into knowledge bases—that’s the curriculum. And by the time it’s ready to graduate, it won’t just be competent. It’ll be a master. That’s the moment you realize you trust AI—not because it’s autonomous, but because you finally are. 🐉🧚‍♀️

🛡️ I Worked Until It Worked

This post was written by me, with ChatGPT nearby like an overly talkative whiteboard—listening, interrupting, getting corrected, and occasionally making a genuinely good point. We argued about structure, laughed at the mic cutting out at the worst moments, and kept going anyway. The opinions are mine. The fact that it finally worked is the point.