đŚ Iâm waiting for AI to mature. Very explicitlyâand yes, mostly impatiently. I donât even think we're close to imagining the future landscape with AI, and honestly pretending otherwise is neither honest or useful to anyone. This post is my attempt to explain how I think about AI from a dev perspective on a longer horizonâfive, maybe even ten years down the road. The tools we have right now are still a very long way away from my baseline expectations, which my AI systems remind me of near constantlyâlike when I'm trying to force agent-like functionality out of ChatGPT. Spoiler: itâs not designed to handle that.
While Iâm waiting, though, Iâm not disengaged. Iâm definitely tinkeringâsometimes randomly and sometimes just as an unsatisfied AI user whoâs not thrilled with the existing systems. Iâm also busy figuring out what the next problems really look like by diving in and getting my hands dirty.
One of those big challenges is what I keep calling the âmemory problem.â I've designed a solution for my own personal agent to manage long-term memory. YesâI'm aware that GitHub is inevitably going to beat me to a viable solution. Again. But I'm one of those people who will attempt to solve a problem first, get it wrong at least ten different times, and then do the research to fill in the knowledge gaps. Now I just have to muster up enough oomph to actually do it. đđ§ââď¸
First Principles: LLM vs Agent đ§Š
At some point, if you want any of this AI talk to make sense, you have to step back, align terminology, and separate concepts that keep getting blurred together. An LLM, often called a model, is the generative part of GenAIâit accepts input and generates output. That's it. An agent is the system managing context, memory, and various tools. The agent is responsible for what information the LLM even sees in the first place.
When those two ideas get collapsed into the same thing, everything downstream becomes confused. You canât reason clearly about limits, costs, or failure modes if you donât separate generation from data management. Until you draw that line, every other discussion ends up muddy.
Context Is the Bottleneck (and Everyone Knows It) đ¸ď¸
Once you make the distinction between LLM and agent, the real bottleneck becomes obvious. There is no good way to manage context today, let alone have the agent automate that job effectively. If youâre not fully up to date on the lingo: context includes a whole set of things like instruction files, workspace structure, active files in your IDE, the AI chat history, available tools, and more.
What we have now are very manual tools that do very little to solve the problem. We have to remember to tell the AI which parts currently matterâor at some point we have to clear the chat entirely and start over. If we donât do that deliberately, AI slowly loses the point of whatâs we're supposed to be working on in the first place. At worst, the entire chat thread is poisoned and the AI becomes unable to function at all. Then you're forced to start fresh and always at the most inconvenient time.
And donât expect LLM context to scale, either. Hardware costs may go down eventually, but nowhere near fast enough to keep up with everything we keep throwing at it. So, context is very finiteâespecially in GitHub where context windows are smaller than normal anyway.
The agent will typically make space by compacting information. It will ask the LLM to summarize key points and then it literally drops the original full length novel completely from your active context and replaces it with the cliff notes version. The more summarization, the less accurate things get over time. So naturally you retry prompts while adding back the dropped details and you end up making more calls for a single task overall. The model has to process more and more input just to get you back to the same answer you already hadânot necessarily a better one.
People know this is a problem. Tools like Toon exist specifically to minimize input impact for AI. We also have tools like Copilot's #runSubagent to help manage context within a single agent. These aren't true solutions thoughâthey are signals. These are the problems people are trying to solve yesterday while we wait for the next AI evolution to emerge.
Why Orchestration Is Inevitable đ
Even if you do everything ârightâ and manage context like a master AI sensei, agents eventually hit a limit. The list of must-have MCPs is growing and right now those stay in the context window as long as they're enabled. Projects are starting and accumulating larger knowledge bases. Customization is becoming more and more explicit. The context an agent needs to use will continue to grow exponentially, even though LLMs aren't increasing capacity at the same speed.
The ultimate overflow state isnât hypotheticalâitâs inevitable. Once an agent accumulates enough memory, enough history, enough summarization, the LLM simply canât keep up coherently anymore. That isnât a failure in the systemâitâs a limit.
When you hit that limit, you can't just tweak prompts or optimize harder. You wouldn't try to squeeze more juice out of the same dry orange, either. The only real long-term solution is that you split the systemâyou have to!
Smaller pieces of work are then sent to the LLM with only relevant context, which is when smarter agents will start to appear. This is where summarization stops and you retain the original intent at both a high-level and at the lowest-level. When we get here, AI generation stops being the problemâthe new problem is coordinating all those tiny pieces of work and still accomplishing the larger goal without re-prompting anything previously stated or defined elsewhere already. Welcome to the world of true agent orchestration!
đĄ ProTip: If you want a sneak peek of what this looks like, check out Verdent.ai. Of all the solutions I've worked with, Verdent is the only one that's truly designed for agent orchestration. It also excels in VS Code and wins every coding competition I've put it in.
Orchestration as a System Property âď¸
Orchestration isnât just about sequencing work in a nicer wayâitâs about changing where responsibility lives. Yesâsome things are always going to be sequential, but not everything needs to be. Some things can and should run in parallel, especially if you want speed and reliability included in future agentic systems.
Validation is a fundamental part of orchestration, not something bolted on afterward. A successful agent has to be able to verify its own work without relying on prior context. It has to come in like a third party, with no knowledge beyond the repo instructions. CodeQL, lint enforcement, Makefiles, and even extra tests become the ground truth the system must consistently check itself against.
Multi-model opposition fits naturally here, too. Different models trained by different companies catch different things. Then the agent can pick one model to implement and another to review. The point is that they disagree by default and then they converge around a common goal. This is a pivotal moment in the future landscape because officially the LLM is no longer the center of gravityâthe agentic system is.
đ¤ ShoutOut @marcosomma wrote a brilliant article on the concept of agent convergence a while back and it's still one of my favorites. Worth the read if you missed it!
Add Another Layer of Abstraction đŞ
Now for my version of truth, which I know a lot of you are going to hate so go ahead and brace for it. Once youâre working in a smart orchestration-driven flow, there's no reason you need to keep prompting from the IDE. Wait before you jump into the debate, thoughâI'm not saying the IDE becomes obsolete! It just stops being the primary interface for developer workflows because youâre consistently able to work at a higher level of abstraction. In this future, developers are directing systems that generate, test, and validate the code several layers underneath you automatically.
Youâre orchestrating agents that direct other agents. Some run sequentially. Others will run in parallel. Documentation is generated automatically and added to the agent's working knowledge base. Tests run continuously alongside agents implementing new code. Integration testing matters. Systems testing matters more. Chaos testing morphs from an abstract concept into a baseline requirement. The code still existsâbut itâs no longer written by or for humans. AI slowly takes that over, which makes natural language the newest language you need to learn.
đŚ For the record, developers are most definitely still building and driving solutions. That will never changeâwe're the mad scientists thinking up wild potions you didn't know you needed! Besides, all the future advancements in the world won't give silicon the ability to invent new things. Humans create. AI helps. Period.
Trust, Then Speed (not the other way around) đď¸
When something breaks in any of my workflows, I donât correct the mistake in the code immediately. I start by correcting whatever instruction caused the mistake, and then I rerun it. Even when Iâm busy, even when work is chaotic, and especially when I should have left it alone hours agoâI never fully disengage from this. I canât.
This is exactly why AI doesnât make you fasterânot yet, anyway. Not because it canât, but because the systems havenât caught up to where speed actually emerges. If youâre learning to use AI correctly, it almost always makes you slower at firstânot faster. The delay isnât failure. Itâs infrastructure lag.
Think of it like an investment. Youâre learning how the models behave and how instructions actually align with them. Youâre learning where the limits are, and then deliberately making the system work within those constraints. Speed comes laterâafter you trust that the system returns results that are validated, reviewed, and tested because you built it to behave that way.
AI evolution is a long game, and weâre barely getting started. Right now, it still feels like grade school. Weâre teaching it what our world looks like, how we think, and where the boundaries are.
All the work done nowâin this awkward middle stateâis what makes that learning possible. Long runs of trial-and-error prompts, walls of instructions, documentation that later turns into knowledge basesâthatâs the curriculum. And by the time itâs ready to graduate, it wonât just be competent. Itâll be a master. Thatâs the moment you realize you trust AIânot because itâs autonomous, but because you finally are. đđ§ââď¸
đĄď¸ I Worked Until It Worked
This post was written by me, with ChatGPT nearby like an overly talkative whiteboardâlistening, interrupting, getting corrected, and occasionally making a genuinely good point. We argued about structure, laughed at the mic cutting out at the worst moments, and kept going anyway. The opinions are mine. The fact that it finally worked is the point.
