I Let Two AI Agents Fix the Same GitHub Issue, One Took 7 Tries, the Other Nailed It in One Shot

Yes, I know. Most of us already use AI coding tools. Cursor, Antigravity, you name it, they're all pretty good at autocompleting a function or generating a quick boilerplate. But here's the thing: I tested Claude 4.7 Opus via Cursor, and a local Qwen 3.5 running through Ollama, which was also armed with Skillware's new Issue Resolver skill. The results were shockingly surprising, and I genuinly did not expected that myself.

The setup

I grabbed a real GitHub issue, not a trivial typo fix, but a moderately complex bug that touched several files, had downstream ripple effects, and required documentation and versioning updates that weren't obvious from the issue description alone. So you would normally have to manually dig deeper, compare changes, etc. The kind of thing where the fix itself is straightforward, but getting it right means touching half a dozen places most people, and even AI IDE agents would overlook.

Then I let both agents take a crack at it.

Claude Opus went first, and it did fine. Eventually.

Opus dived in exactly how you'd expect a capable but eager junior dev to behave. It read the issue, found the most obvious place to patch, and implemented a minimal fix. Good. But then I noticed it hadn't updated the tests. So I had to do a follow up prompt. Then it forgot the docs. Another prompt. Then I spotted a version bump that should've happened. Prompt again. Then a related config file that referenced the changed behavior. Another prompt, and you get the gist.

All in all, it took seven back-and-forth exchanges to get everything right. Each iteration was fast, and the code quality was solid, but I had to steer the ship manually after every step. Opus wasn't thinking holistically. It was treating the issue like a surgical strike on one file, blind to everything else that needed to change. That's not a Claude problem, buw what happens when an agent jumps straight to implementation without pausing to think. Sure it had an implementation plan and parsed the repo and all, before working, but still didn't sweep everything as expected. At least not without micromanaging it like a corporate watchdog.

Then Qwen 3.5 with Skillware had a go

Same issue. Same repo. But this time, the agent was using Skillware's Issue Resolver, a new skill that forces a structured analysis phase before any code gets written. Instead of rushing to the fix, the agent first fetched the issue, read the repo's readme, code of conduct, contributing, etc, scanned the directory tree, and mapped every file that could possibly be affected.

Then it didn't just presented a plan. Here's what's broken. Here are the acceptance criteria. Here are the files that need to change, including tests, docs, config, and downstream modules you might not have thought about. Here are three ways to fix it, ranked by risk and complexity. I picked one, approved it, and the agent implemented the whole thing in one go, including local testing, linting, formatting, and all the complementary changes that would make it sail through a PR review instantly. It also created a branch, ensured there are no co authors in commit, pushed the commit, all in one go.

One shot. No nudges. No "oh wait, you forgot the..."

And here's the part that matters: this isn't about Qwen being better than Claude or whatever model fights make up the headlines these days. Don't get me wrong, Opus is an incredible model. The difference was entirely in the workflow. Opus without Issue Resolver behaved like a brilliant developer who skipped the planning meeting. Qwen with Issue Resolver behaved like a senior engineer who thought through the whole problem before touching the keyboard, evaluated his changes, tested the changes before shipping, shipped only when it was 101% sure.

What the Issue Resolver actually does

The idea is simple. Most AI agents are optimized to produce, eg. give them a prompt, they generate code. But for non-trivial issues, producing code immediately is often the wrong move. What you actually need first is understanding, scoping, and a plan.

The Issue Resolver skill, part of the Skillware open-source framework, enforces exactly that in a five-stage fashion:

Fetch the issue, comments, and linked PRs.
Understand the repository by reading its actual README, CONTRIBUTING guide, and directory tree at runtime — no hardcoded assumptions about what kind of project it is.
Analyze the problem, define what "done" looks like, map every file that needs to change, trace ripple effects into dependent modules, and rank up to three implementation options.
Present the plan and wait for your approval.
Implement but only after you say go, and only within the bounds of the approved plan.

The output is a structured resolution plan you can actually review. Affected files. Downstream risks. A recommended approach with rationale. It turns a chaotic "fix this issue" prompt into a disciplined, reviewable workflow that catches the tests, docs, and version bumps before they become "oh right, forgot that" moments.

Who this is for

If you're a developer, this is like having a teammate who does the legwork before the standup. You show up, they've already read the issue, explored the codebase, and handed you a breakdown of what needs to happen. You just make decisions.

If you're building AI agent workflows, it's a reusable planning scaffold. Instead of your agent raw-dogging every issue and hoping for the best, Issue Resolver gives it structure. The result is fewer iterations, fewer regressions, and output that reads like a human thought about it first.

Most importantly, it was made to work locally with local models, no cloud inference reliance or top shelf LLM credits. You can literally pair it with a 3B model and get better results than LLM darlings, not cause the model is "smarter" or whatever, but cause it knows exactly how to fetch issues, repos, how to parse them, how to make sure changes won't affect files out of scope, and if they do, it is aware and knows how to behave. Moreover, it will test, evaluate, and ensure everything is sound before shipping.

The bigger picture

What this experiment really drove home for me is that process matters more than model size. We spend so much time comparing benchmarks and parameter counts, but the gap between "seven back-and-forths with manual steering" and "one shot, PR-ready" wasn't closed by a bigger model. It was closed by a better workflow. The Issue Resolver skill is just a structured way of saying: slow down, think first, then act.

Please, feel free to contribute, it's 100% free and open source, and we are planning to enhance the skill with more features and pit stops for evals, eg. considering commit messages, parsing previous relevant issues and prs that are closed, and seamlessly managing git without assumptions or user input.

If you just want to try it yourself, the skill is available now at github.com/ARPAHLS/skillware or simply pip install skillware. Plug in a GitHub issue URL, optionally toss in your GITHUB_TOKEN and any project-specific instructions, and see how your favorite agent performs when it's forced to plan before it codes. I'd genuinely love to hear what happens. <3

Skillware Just Dropped a Skill That Lets Your Agent Solve Any GitHub Issue in One Prompt

I Let Two AI Agents Fix the Same GitHub Issue, One Took 7 Tries, the Other Nailed It in One Shot

The setup

Claude Opus went first, and it did fine. Eventually.

Then Qwen 3.5 with Skillware had a go

What the Issue Resolver actually does

Who this is for

The bigger picture

Tags

Author

Stats

Published

You Might Also Like

How to Start Contributing to Open-Source AI Projects (Python, Agents, Good First Issues)

skills.md is Dead: Why Your Agents Need Skillware