Skip to content
All writing
/software-development 9 min read

The Senior Engineer’s Job Just Changed

The senior engineer’s job just changed. Code is free, taste is not. Why building the harness around the agent beats writing code yourself.

Code is free. Taste is not. After a year of building agent harnesses around my own work, I think the senior engineer’s job has quietly shifted from writing code to designing the system that produces it. The leverage moved up a layer, and most developers are still arguing about which model is best while the real work has gone somewhere else.

I’ve spent the past year wiring up Claude Code skills, planning templates, deploy lanes, and lint rules around my projects. The output went up. The hours did not. What changed wasn’t the model. The harness changed.

Table of Contents

Where my thinking started

Last August I wrote about vibe coding and the two sides of AI in software. At the time I was already uncomfortable with the framing of “AI as autocomplete.” Something felt incomplete. I kept seeing two camps: people treating Claude like a fancy search engine, and people letting it write whole apps with no guardrails and shipping the slop.

Over the next twelve months I drifted into a third camp without naming it. I started writing skills. Then planning templates. Then a release gate. Then deploy templates. Then a memory system. Each piece solved a class of problem that had been costing me synchronous attention. None of them were impressive in isolation. Together they changed what I could ship in a week.

I didn’t have a word for what I was doing until I watched two talks back-to-back at AI Engineer 2026.

The two quotes that made it click

“Vibe coding is about raising the floor for everyone. Agentic engineering is about preserving the quality bar of what existed before in professional software.”

Andrej Karpathy

That’s the line. Vibe coding lets anyone ship a prototype. Agentic engineering is what professionals do once they’re inside the building. It’s a discipline, not a vibe. The agent is powerful and a little stochastic, and your job is to coordinate it without sacrificing quality.

The second quote, from Peter Steinberger talking about OpenCode:

“You still need to ask the right questions, otherwise that makes the difference between good code and slop.”

Peter Steinberger

Both quotes point to the same shift. The model is no longer the constraint. Your ability to specify, structure, and steer is the constraint. The senior engineer’s craft has moved from typing code to building the system that produces correct code.

What “harness” actually means

Ryan Lopopolo at OpenAI gave the most direct talk on this. He calls it harness engineering: the software, structures, and instructions you put around the agent so it does the full job without you reaching back into the loop. His framing was the cleanest I’ve heard:

“Code is free. We have an abundance of code to solve the problems that we come across. Each engineer today has access to five, fifty, or five thousand engineers worth of capacity. The only thing that needs to happen is to figure out how to productively deploy these resources.”

Ryan Lopopolo, OpenAI

If code is free, then the scarce resources are different now. Lopopolo names three: human time, human and model attention, and model context window. Every part of your setup either feeds those scarce resources or wastes them. A good harness feeds them. A bad one drains them.

This is why I think model choice has become a distraction. Swapping Claude for GPT inside a well-engineered harness barely changes the output. Swapping a well-engineered harness for “I’ll just paste into ChatGPT” changes everything.

The layered harness in practice

My own harness is layered. Each layer hands off to the next with structured artifacts the agent can read.

Layer 1: Skills (codified workflows)

Every workflow I do more than twice gets written down as a skill. Slash commands, frontmatter, scripts. The skill is the prompt I would have re-typed. Here’s the frontmatter from one of mine, which builds pre-hire demos for Codeable leads:

---
name: lab-demo
description: Interview-driven workflow to build a pre-hire Astro+Tailwind
  demo from a client's Figma design and deploy it to {client-slug}.lab.pluginslab.com
  on Cloudflare Pages. Triggers on phrases like "lab demo for task X",
  "build a demo of this design".
---

Once that exists, “build a demo for this client” stops being a 30-minute setup conversation and becomes one sentence. The agent reads the skill, runs the steps, and asks me only the questions the skill doesn’t already answer.

Layer 2: Planning documents

Every project gets four files: architecture.md, concept.md, plan.md, progress.md. The agent writes them with me at the start. It reads them at every step. They are the project’s memory across sessions, and they survive the model’s context window getting flushed.

This is the part where most of my fellow experts underestimate the most, in my opinion. They want the agent to “just code.” But the agent has no idea what success looks like in your business unless you wrote it down. The plan document is the agent’s job description.

Layer 3: Release gates and deploy lanes

The boring layer. The one that actually catches mistakes. Every push to main on every active project runs the same release gate, which fails if I haven’t bumped the version or written a changelog entry:

if [[ "$CURRENT" == "$PREVIOUS" ]]; then
  echo "ERROR: version in $SOURCE not bumped (still $CURRENT)." >&2
  exit 1
fi

if [[ ! -f CHANGELOG.md ]]; then
  echo "ERROR: CHANGELOG.md missing at repo root" >&2
  exit 1
fi

Thirty lines of bash that durably eliminated a class of “oh, I forgot to bump” mistakes I used to make every other week. This is exactly Lopopolo’s “Garbage Collection Friday” pattern: take a class of recurring slop, eliminate it once, never spend attention on it again.

Behind the gate sit two deploy lanes. Lane A pastes a Ploi script (I use ploi.io to manage my Digital Ocean droplets and I’d very much recommend it to everyone). Lane B is a GitHub Actions workflow that joins Tailscale and SSHes into the home server:

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with: { fetch-depth: 2 }
      - name: Validate release (version bump + changelog)
        run: ./scripts/validate-release.sh

Two templates. Every custom app uses one of them. No bespoke deploy debugging anymore.

None of these layers are clever in isolation. Stacked, they let me run more projects than I used to be able to babysit, and the agent rarely falls off the rails because every layer hands it the next prompt with structured context.

The objections I keep hearing

Most of my fellow Codeable experts I talk to push back with the same line: “I don’t have time to build all that, I just want to ship.” They’re still copy-pasting from ChatGPT. Two more serious objections come up regularly.

“This only works for solo devs and small teams”

The strongest version of the objection: at a 50-engineer company you can’t unilaterally rewrite the SDLC. Fair. You can’t.

But Ryan Lopopolo is running this exact playbook at OpenAI. His project ended up with 750 PNPM packages, persona-based review agents, and a dedicated weekly ritual (“Garbage Collection Friday”) where engineers find recurring failure modes and durably eliminate them. The pattern at scale isn’t “every engineer builds their own harness.” It’s “one engineer documents what good looks like, and a review agent enforces it for the rest of the team.” That’s leverage, not gatekeeping.

At scale the harness gets more important, not less. The codebase has to become legible to agents the same way it has to be legible to new hires.

“Models will absorb the harness anyway”

The bitter-lesson objection. In twelve months Claude or Codex will plan, lint, and deploy themselves, so your skills folder is a transitional artifact.

Lopopolo addresses this directly: “Context is a thing that I don’t think will ever be obsoleted. The models must be told the requirements of the task, which guardrails to pay attention to.” Your team’s non-functional requirements, your QA plan, your deploy lanes, the way your billing system handles refunds. None of that is in the model. None of it ever will be. It’s a fact about your business, not a property of the weights.

Karpathy says the same thing in a different register: the work moves up the stack. Today it’s lints and skills. Tomorrow it’s specs, acceptance criteria, and review prompts. The artifacts get more abstract, but the engineering work doesn’t disappear. As Karpathy put it, “You can outsource your thinking, but you can’t outsource your understanding.”

Where this doesn’t apply

Two cases where I think the harness argument breaks down.

Throwaway scripts and one-shot prototypes. Building a harness for a 50-line script is over-engineering. Karpathy’s menu-genen story is the reference: vibe code it, ship it, move on. If the artifact is disposable, the harness is overhead.

Beginners. If you’re new to programming, you don’t yet know what “good” looks like. You can’t write down quality bars for problems you’ve never solved. Vibe coding is the right entry point. The harness layer comes later, after you’ve shipped enough to develop opinions worth encoding.

What to do Monday morning

Two concrete moves.

1. Codify one workflow as a skill. Pick the thing you do every week. PR review, deploy, scaffolding a new project, writing a blog post. Open a new file. Write down the steps the way you’d explain them to a junior. Add a frontmatter block. That’s a skill. Stop re-typing the instructions.

2. Treat your repo as a prompt. Every file structure, lint rule, error message, and README is context you’re handing to an agent. Read your repo through that lens. The agent will read your error messages and try to act on them. If your linter says “unexpected any”, the agent doesn’t know what you want. If it says “don’t use any here, this type is derived from a Zod schema, replace with the inferred type”, the agent fixes the right thing the first time.

This is the mindset shift. Stop asking “which model is best.” Start asking “what context would the model need to do this without me.” For a more vertical-specific take on the same idea, see my piece on agentic development for WordPress developers.

We’re still engineers, what we build is different

The framing I keep returning to: we’re still engineers. The job hasn’t disappeared. It moved up a layer. We used to build software. Now we build the systems that build software.

That’s not a smaller job. It’s a more interesting one. System design, taste, knowing what to say no to, writing down what good looks like so a hundred agent runs all converge on the same quality bar. These are senior engineering skills. They’ve always been valuable. Now they’re the only things that compound.

A disclosure to close. I’m a Codeable expert and the founder of Pluginslab. I sell engineering work. Faster shipping is more revenue for me. So I’m not neutral on the claim that the harness layer is where the leverage is. Take this post as a signal of what I’m doing on my own projects, not as a sales pitch. The strongest version of the argument is the one you test against your own week. Pick a workflow. Codify it. See what changes.

Then come back and tell me what broke. The harness gets better when more people break it.

Watch the talks

These three talks from the AI Engineer conference shaped the argument in this post. Worth watching in full.