/software-development May 19, 2026 10 min read

Stop Writing WordPress Boilerplate

I built a starter kit so other WordPress devs do not have to figure out the harness layer from scratch. Here is what is in it and what I learned building it.

I spent the last few months building a harness for WordPress plugin development with Claude Code. Skills, hooks, planning files, two sub-agents, a CLI, a worked example. This week it shipped as v1.0.0 of wp-agentic-kit, and I’m writing this so other WordPress devs don’t have to figure out the harness layer from scratch. The kit is opinionated; this post is about what I learned that made me think those opinions were worth opinionating about.

I gave up on AI for WordPress around the end of 2024. The output was always one missing nonce or one wrong escape away from a CVE. I’d ask for a REST endpoint and get something with permission_callback set to __return_true on a mutating route. I’d ask for a Settings page and get capability checks that fired on the wrong hook. The code looked right and was almost never right. So I went back to typing.

A year later I think I was wrong, but not for the reason you’d expect. The fix wasn’t a better model. The fix was the system around the model. Code was always going to be free. The thing that wasn’t free was making sure the free code did what I needed.

Where my thinking started
The four ideas I ended up encoding
The moment I caught myself over-prescribing
One feature, end to end
Start in five minutes
What I notice now that the harness is in place
Where this doesn’t apply
What to do Monday morning
Disclosure and an invitation

Where my thinking started

A couple of weeks ago I wrote about how the senior engineer’s job just changed. That post was the general claim. This post is what happens when you sit down to apply that claim to a specific codebase. In my case, WordPress, because WordPress is where I make most of my living: Codeable engagements, Pluginslab clients, my own plugins.

I started writing pieces of the harness in January, mostly to scratch my own itches. A CLAUDE.md that explained the plugin conventions. A phpcs pre-commit hook because the agent kept forgetting to escape. A skill that asked the right questions when I wanted to add a Gutenberg block. Each piece solved one annoyance. None of them felt like a “project.”

Then I got invited to speak at WordCamp Portugal 2026 on agentic engineering. The talk forced me to assemble the pieces into something coherent enough to explain on stage. The kit is what fell out of that assembly. I wrote it more for the WordCamp audience than for myself, and the act of writing it for someone else changed which pieces stayed in and which got cut.

The four ideas I ended up encoding

Anthropic publishes an AI Fluency Framework with four ideas they call the four D’s. I didn’t start with these; I rediscovered them after the fact. Once I went back to look at what I’d been building, the kit mapped onto them almost too cleanly. So I leaned into the framing.

Delegation. Hand structured work to the agent. Not “write me a plugin,” but “follow this interview, ask one question at a time, then scaffold using these conventions.” Structure beats freestyle. The kit’s two skills (wordpress-scaffold, wordpress-feature) are the structured handoffs.

Description. Tell the agent what’s allowed and what you’re building. Not in prose, in lists. “Allowed escape functions: esc_html, esc_attr, esc_url” is binding. “Use modern escape functions” is interpretable, and the model interprets it differently every time it reads. The kit’s constitution.md is an enumerated list, not a paragraph.

Discernment. Get an independent read before signing off. Not a second agent that can also write code, a read-only sub-agent that audits the plan or the diff and reports findings. It can flag a missing nonce. It cannot helpfully “fix” it, which is the entire point. The kit ships two: plan-reviewer and security-reviewer, both with no Edit or Write tool by design.

Diligence. Make the safety nets non-skippable. A hook that runs phpcs on every commit cannot be talked out of it by a clever prompt. A hook that injects the active plan into context on every turn cannot be ignored under deadline pressure. Diligence is engineering, not memory. The kit ships five hooks; the load-bearing one is UserPromptSubmit, which re-anchors the agent on its plan every turn.

These four ideas are the bones of the starter kit. The kit makes each one concrete with files you can read, edit, and steal. I’ve written about why WordPress specifically benefits from this in Agentic Development: Why WordPress Developers Must Adopt It Now; the kit is what that argument looks like when it ships.

The moment I caught myself over-prescribing

Around v0.7 of the kit, I had a moment that changed the shape of it. I was extending the constitution to also list every npm package the agent could use. Every Composer dep. Every helper class. The list kept growing, and the kit kept feeling more brittle. Eventually I caught myself muttering at the screen:

Claude is already smart enough that if I tell him too much, it’s more likely to fail than to do its own thing.

That was the unlock. The constitution shouldn’t be one big allowlist. It should be two: a strict allowlist for the security-relevant choices (sanitizers, escapers, capability constants, forbidden constructs) and a default list for the dependencies. The strict list binds hard because the failure mode is a CVE. The default list is advisory because the failure mode is “you added a utility library someone might have questions about.” Same file, different contracts.

The same logic applied to the plan-freeze rule. My first version was: “Every plan ships as its own PR. Every feature waits for human review of the plan before any code gets written.” That was right for security-relevant features and absurd for typo fixes. The fix was a five-checkbox Freeze assessment on every plan: any box checked means freeze, zero boxes checked means proceed in-session. A typo and a new REST endpoint no longer pay the same ceremony.

I trimmed about a third of the prompt prose after that. Both skills went from around 180 lines to 54. The kit feels lighter now and the agent makes fewer apologetic detours to ask permission for trivia.

One feature, end to end

Here is what building a feature looks like once the kit is in place. You have a plugin with a settings page and a couple of blocks. A stakeholder asks for a Gutenberg block that shows the status of an order, fed by a REST endpoint they can hit from anywhere.

You open Claude Code and type:

i need a gutenberg block “order-status” that shows the current status of an order, and a REST endpoint to fetch the status by order id. dynamic render.

What happens, in order:

The agent reads CLAUDE.md, notices a plugin already exists, and invokes the wordpress-feature skill.
The skill writes spec.md: what’s being built, three to seven acceptance bullets, and an explicit Out of scope section (no list view, no historical timeline, no notifications). That out-of-scope line is the single highest-leverage sentence in the spec, because everything not named there is undefined and the model will assume it’s in scope.
The skill writes plan.md: phased steps with file paths in every step. Each step has the test that gets written first.
The plan ends with a Freeze assessment. New REST route ticks “touches security-relevant code” and “changes public API surface.” Two boxes checked, so the recommendation is freeze. The skill runs ./scripts/open-plan-pr.sh, which opens a PR with just spec.md and plan.md on it. The plan-reviewer sub-agent audits the PR (read-only). I review the plan on GitHub. Five minutes, focused on whether the approach is right, not whether the syntax is. I merge.
The agent writes the code on a separate branch. Hooks fire on every edit (lint), every prompt (re-inject plan), and every commit (full quality suite). Broken commit, blocked commit.
Before the feature PR merges, security-reviewer audits the diff. Read-only, severity-graded.
After merge, the feature directory moves to .claude/plans/archive/. Long-term memory for the next feature that touches similar code.

That’s the loop. The agent does the typing. I do the judging. Most of my time goes into reading the spec and the plan, not the diff.

Start in five minutes

You don’t need to read any of the above to start. Run:

npx create-wp-ai-plugin my-plugin

You’ll be asked five questions: plugin name, optional vendor prefix, description, author, target WordPress and PHP versions. Then the CLI pulls the kit, substitutes your identifiers across every file, renames pl-example.php to {your-slug}.php, installs PHP and Node deps, pulls fifteen WordPress-flavored skills from WordPress/agent-skills, and inits a git repo with a scaffold tag.

What you get is a working plugin: one example REST endpoint, a passing PHPUnit fixture, a CLAUDE.md that knows your conventions, the hooks that enforce them, the planning layer with a worked example, and three slash commands for the operations you’ll repeat: /plan-freeze, /audit-plan, /ship-feature.

Open the new directory in Claude Code. The harness is already loaded. Tell the agent what you want to build.

What I notice now that the harness is in place

Three things have shifted in my day-to-day since I started using a version of this kit on my own client work.

I no longer worry about security regressions. Not because the agent got smarter, but because phpcs runs on every commit and security-reviewer reads every diff before it merges. The hooks don’t ask permission. I haven’t shipped an unescaped variable in months, and I haven’t been the one catching it either.

I stop drifting between sessions. The UserPromptSubmit hook re-injects the active plan and progress on every turn. I open a project after a week away, type my first prompt, and the agent already knows where I left off, what’s next, and which step is blocked. The “what was I doing here?” tax disappeared.

I stop reading diffs line by line. Around the third or fourth feature with the harness in place, my attention shifted up the stack. I read the spec and the plan carefully. I skim the diff. The test results carry the weight that line-reading used to carry. That shift is what the kit is trying to give you.

Where this doesn’t apply

Two cases where I’d skip the harness, at least at first.

One-shot scripts and disposable plugins. Building this harness for a 50-line snippet to migrate a client’s old shortcode is over-engineering. Vibe-code it, ship it, move on. The harness pays off when you’ll touch the same codebase repeatedly, not when you’ll touch it once.

Devs who haven’t shipped a plugin yet. If you’ve never gone through the cycle of writing, activating, breaking, debugging, and uninstalling a plugin, you don’t yet have the opinions worth encoding in a constitution. Ship a few plugins the slow way. Find the conventions you actually believe in. Then come back and write them down.

What to do Monday morning

Three concrete moves, in order of cost.

1. Scaffold one throwaway plugin with the kit. Spend ten minutes. Read the worked example at .claude/plans/features/001-example-hello-rest/. The spec, the plan, the progress, the actual code. You’ll either think “yes, this is what I want” or “no, this is wrong for me.” Either answer is useful.

2. Add one rule to your own plugin’s CLAUDE.md. The rule should be a thing you’ve corrected the agent on twice. “Always use wp_kses_post for HTML output, never esc_html on rich content.” “Custom post types are registered in includes/class-cpt.php, not the bootstrap.” If you’ve corrected it twice, write it down once.

3. Install one hook. Pick the safety net you keep forgetting. Mine was phpcs on commit; yours might be running tests before push or refusing to commit if a feature flag is off. Five minutes in .claude/hooks/ and .claude/settings.json. Now it always runs.

You don’t need the whole kit to start. You need to start.

Disclosure and an invitation

A disclosure to close. I’m a Codeable expert and the founder of Pluginslab. I sell engineering work, mostly WordPress, often WooCommerce. Faster shipping with fewer regressions is good for my business. So I’m not neutral on the claim that the harness layer is where the leverage is. I built the kit and I’m telling you to use the kit. Take the post as a signal of what I do on my own projects, not as a sales pitch.

The kit is MIT-licensed and lives at pluginslab/wp-agentic-kit. If you adopt it and something breaks, tell me. If you adopt it and it changes how a Monday feels, also tell me. Both kinds of feedback make the next version better. The Portuguese version of the underlying thesis is in the WordCamp Portugal talk; if you’d rather watch than read, that’s the entry point.

The strongest version of the argument isn’t this post. It’s whatever changes after you spend a week shipping with a harness in place. Run the install. Try one feature. Then come back and tell me what broke.

Table of Contents