Terminal Agents, Context Files, and Why Your AI Harness Is More Important Than Your AI Model

Written by | Jun 22, 2026

Justin Williams is an AgilityFeat Senior WebRTC Engineer. Want to work with great nearshore engineers like Justin? Get in touch.


I work primarily in the terminal, running agentic coding tools as my daily driver for everything from low-level systems programming to front end UI. The field moves fast and staying productive means staying flexible. The most important thing I’ve learned through all of it: the harness matters more than the model. A well-built harness can take an older or less capable model and get it performing almost in the same ballpark as a frontier one, and that changes everything about how you approach this work.

Here is how I think about agentic development, the tools I use, and how I keep it all flexible enough to actually work.

What Agentic Programming Actually Is

We are still in the early days of agentic programming, yet workflows are already evolving faster than most teams can keep up with. The idea is straightforward: autonomous AI agents plan, write, test, and modify code with minimal human intervention. Depending on the domain, this is either widely accepted or met with a lot of skepticism. There are good reasons on both sides.

Traditional AI coding assistants, which are barely one or two years old themselves, passively autocomplete lines or wait for a chat prompt. Agentic coding tools moved past that and run on an independent reason-and-act loop. You give the agent a high level goal. It then navigates through a multi-step process where it navigates repositories, runs terminal commands, analyzes errors, and self-corrects until the task is done.

How much agentic programming actually drives your workflow comes down to a few things: your token budget, how much you want to (or are okay with) offloading to agents, and how well suited the agent is to your domain. The nice thing about agentic programming is that there is truly not just one correct way to do it. It is very flexible. You decide how much or how little the agents build, plan, and review, and then course correct as you go.

Why Terminal Agents Are the Most Flexible AI Coding Setup

A terminal agent delivers agentic programming by giving an AI system the ability to interact directly with a development environment through a command line interface. It is my preferred approach, since the command line is where I feel most comfortable and productive. I use neovim as my text editor and IDE, and I lean heavily on scripts to support my workflows, everything from curl wrappers to test HTTP and web servers, to build scripts for testing and validating code. Command line tools see a lot of action in software development. There are a huge number of them available to customize your workflow and make it really efficient.

Terminal agents also keep me flexible, because it is easy to adopt any development strategy, from spec driven development to pointing the agent at one precise function in a file and driving it to do exactly what I ask. I can pivot between strategies depending on the task and the mood I am in. My main drivers are Anthropic’s Claude Code and OpenAI’s Codex CLI. I have also played with OpenCode, which was really nice, maybe my favorite overall, but I have not fully picked it up, mostly due to plain old subscription fatigue.

What Terminal Agents Can and Can’t Do

AI models have come a really long way, even just since last year. The frontier ones, particularly Codex/GPT and Claude/Opus, can get pretty much any task done from low level systems programming to front end UI. As strong as the agents are, it is still important to stay close to the code and understand it. Agents cannot hold a whole codebase in memory at once, they often lack context that was never written down (or typed), and they still forget to fully follow instructions and guardrails, with the odd hallucination thrown in once in a while. They are really good at smaller, targeted work. They can still do a great job at high level design, but they excel when you give them direct instructions, and their output almost always needs refinement and validation. One of the trickiest parts of working with agents is that the output always looks top notch, so it can confidently mask issues or just be flat out wrong, even if your prompt says “make no mistakes”.

Agentic Harness

If you want to understand terminal agents and get the most out of them (or any agentic setup), the harness is the crux of what makes them work well and produce reliable, valid output. The harness is mostly independent of the model. A good one can take an older or less capable model and get it performing almost in the same ballpark as a frontier model.

An agentic harness is the collection of tools, instructions, context, and workflows that feed into the model and handle what comes back out. It is what helps the agent decide whether it needs to refine its work, make a tool call, or stop and ask you for more information to get the job done. Think of it as the operating environment the agent works within.

How System Prompts Shape Agent Behavior

At the centre of every harness is the system prompt. This is the standing set of instructions the agent receives on every turn, before it ever sees your request. It defines how the agent behaves, what tools it knows about, how it is expected to use them, and the rules it should never break. You do not usually write the system prompt yourself, the tool ships with one, but it shapes everything the agent does, and it is worth knowing it is there.

This is also where the different terminal agents start to feel different, even when they are pointed at the same underlying model. Claude Code, Codex, and OpenCode each ship their own system prompt, their own set of built-in tools, and their own opinions about how an agent should work. Claude Code leans into a tightly tuned prompt and tool set built around Anthropic’s own models, which is part of why it feels so dialed in. Codex is built around OpenAI’s models and has spread across a lot of surfaces (the CLI, an IDE extension, a desktop app, cloud tasks), so its harness is shaped as much by where it runs as by the prompt itself. OpenCode takes the open, model-agnostic route, letting you bring almost any provider and even swap models mid session, so its harness has to stay more neutral and avoid assuming one specific model’s quirks.

These prompts are not static either. They have evolved a lot, and quickly. Early versions were short and generic. Today’s are far more detailed about planning, tool use, when to ask for clarification, and how to avoid common failure modes like editing the wrong file or charging ahead without a plan. Every release tends to tighten this up a little more, which is a big reason an agent can feel noticeably better from one month to the next even when the model under it has not changed. It is also a reminder that a lot of what makes these tools good lives in the harness, not just the model.

How I Use Context Files to Get Better Output from AI Agents

Context files like AGENTS.md or CLAUDE.md hold the top level instructions, rules, and information for an agent: how a project works, the high level guardrails and style guides, and where to find other context files for more specific or verbose detail. These top level files should be concise, ideally no more than around 500 lines. The agent reads them every session, so you want to be descriptive enough that the important information and guardrails are always followed, but not so verbose that you pollute the context with text that degrades the quality of the output.

A really helpful strategy here is to link out to other documents instead of duplicating information. That keeps things up to date, because there is only one place (ideally just one) where any given thing is defined. It also keeps the context small and lets the agent decide for itself when to go read more.

One thing I try to do is keep these files agnostic to the agent I am using. Claude Code reads CLAUDE.md, while AGENTS.md is used by Codex and most other agents for top level context. My CLAUDE.md has a few Claude specific rules in it, but for the most part it just tells Claude to go read AGENTS.md, which is where most of the information I want every agent to know actually lives. I also note in AGENTS.md that a CLAUDE.md file exists with Claude specific instructions, in case I am running Codex, since giving it that context can be helpful sometimes.

AI Agent Skills and Why Wrapping Them Is a Winning Strategy

Agent skills are similar to context files, in that they hold information for an agent to follow. At the end of the day, everything an agent takes in and gives out is just text. The difference is that skills are more specific and more repeatable, aimed at getting a particular task done. It is really helpful for a skill to carry domain knowledge for the project. Do not re-define knowledge the agent almost certainly already has in its training data, unless it feels really important to reiterate something.

Wrapping skills is a pretty solid strategy. For example, if you are using Claude Code, it is really beneficial to have a custom code review skill that supplies domain knowledge of the project being worked on, any guidelines the project follows, and maybe even the specific tools to run, the environments to test on, or the testing strategies to use. Then you can have that skill hand off to the existing Claude Code code review skill, so the agent layers general best practices, security audits, and the rest on top of your project specific review process.

How to Stay Flexible as AI Development Tools Keep Changing

If there is one thread running through all of this, it is flexibility. None of these strategies are fixed. You decide how much the agent plans versus how much you do, how much it writes versus how much you write, and how much of the review is automated versus handled by you. The right mix shifts with the task, the codebase, and, let us be honest, your mood that day.

The good part is that you can keep strengthening your setup over time, and it compounds. Tightening your context files helps. Trim anything the agent ignores and push detail down into linked documents to keep the top level lean. Building more skills as patterns repeat helps too, since anything you find yourself explaining more than once is a good candidate. Adding MCP servers gives the agent access to the systems you actually use so it is not working blind, and leaning on subagents, hooks, and custom commands (where your tool supports them) takes care of the repetitive parts of your loop. For bigger features, a more structured approach like spec driven development can pay off, while small targeted edits can be faster and possibly require less refinement. Adjusting your workflow as you go, trying new things, and being able to pivot easily keeps you ahead of the game, and keeps things interesting because you can always try the new shiny thing and go back to whatever worked before.

The Best AI Coding Tools and Terminal Agent, and Alternatives

As for the tools themselves, my day to day runs on Claude Code and Codex, with OpenCode as the open, model-agnostic option I definitely want to go back to. They all keep me thoroughly enjoying working in the terminal, and they do a great job of keeping me as productive as ever. I can learn new workflows, software, programming languages at the same pace I can develop new features and fix bugs. Using these do really feel like a huge productivity win: less context switching, all the time spent doing what I enjoy and excel at and a lot less time banging my head against a wall and fighting against redundant tasks.

The other options (outside of terminal agents) are also very good and worth knowing about.

Editor-Based and GUI Alternatives

If you like the terminal agents but want more visibility, there is a growing set of graphical front ends. OpenAI ships an official Codex desktop app that acts as a command center for running and supervising multiple agents at once, and there is a whole ecosystem of Claude Code GUIs and desktop wrappers that add things like parallel sessions, file trees, and visual diff review on top of the agent. If juggling several agents in raw terminals starts to feel like a mess, these are worth a look.

If you would rather the AI live inside an editor, Cursor is the obvious one to try. It started as a VS Code fork with great autocomplete and has evolved into a full agent-first IDE, with an Agent mode that reads the codebase, edits across files, runs commands, and iterates, plus background and parallel agents for longer running work. It used to be my daily driver, and was a great and productive experience. It is now fairly well polished and continues to evolve on its own end, while pushing the boundaries of agenting programming in its own world.

One for really strong flexibility is Pi. Pi is a minimal terminal coding harness that leans all the way into the idea that the harness is yours to shape. It keeps its core small and deliberately skips things like sub-agents and plan mode, then lets you add those back (or anything else) through extensions, skills, prompt templates, and themes, which you can bundle into packages and share over npm or git. It is model-agnostic too, so you are not tied to one provider.

Worth a mention too is Amazon’s Kiro, which takes a different philosophical stance. Instead of prompting your way to working software, Kiro pushes spec driven development, generating requirements and design documents before it writes any code. It is an interesting answer to the problem where agents produce code that runs but quietly drifts from your intent, and it appeals if you work in the AWS ecosystem or even just like more structure up front. Other names you will run into include Windsurf, Cline, and Warp’s agentic terminal, all circling the same idea from slightly different angles.

There is no single right answer here, and that is sort of the point. Pick the tool that matches how you like to work, invest in the harness around it, and stay ready to swap pieces out as the tools keep moving. I do feel like the terminal agents remain the most flexible out of all of these, but as I already said, the agents really just read and output text, and with enough imagination any workflow can be achieved, tested, and reinvented.

Putting This Into Practice

None of this requires a big process overhaul. Start with a single context file, tighten your system prompt, and build one skill around something you find yourself explaining repeatedly. The compounding effect is real and a better harness today means less correction tomorrow.

The harder part is having engineers who know how to build and maintain this kind of setup. At AgilityFeat and WebRTC.ventures we staff AI-enabled engineers like Justin who work this way every day.

Whether you need a single engineer or a full custom team under staff augmentation, a build-operate-transfer model to establish and eventually own a nearshore development team, or a standalone nearshore project or AI integration, we can help! Get in touch.


Further Reading:

About the author

About the author

Justin Williams

Justin Williams is a Senior WebRTC Engineer on a long-term staff augmentation through our WebRTC.ventures brand. Justin specializes in building, debugging, and monitoring scalable real-time video applications. Nearshoring from Ontario, Canada, his expertise spans WebRTC architectures, streaming media protocols, and cloud platforms including Vonage, Amazon IVS, and Amazon Chime.

Recent Blog Posts