Should Makers Still Care About Code Quality?

For a long time now, it's been possible to divide the software world into two camps: those who see code as a craft, and those who see code as a means to an end. For the purposes of this post, let's call them Coders and Makers.

Coders vs Makers

At the one extreme, Coders believe that code should be clean, as described in books like Robert C. Martin's (a.k.a. Uncle Bob's) Clean Code or Steve McConnell's Code Complete. At the micro-level, this means variable, class, and function names should be self-documenting, comments should make intentions clear, and code re-use should be the default. At the macro-level, components should have clear separation of concerns, clean interfaces, and code should be thoroughly tested at every level.

At the other extreme, Makers pride themselves on shipping fast. They believe that clean code doesn't matter; what matters is that it works. Ship fast, ship often, worry about the code quality later (if ever). Software requirements are inherently poorly-defined and malleable, so rather than waste time building the wrong thing perfectly, build it imperfectly, get user feedback, and iterate towards the solution that users really want. If you don't, the Makers reason, you will be left in the dust by your competitors who do.

Now, I've caricatured these viewpoints, and there are many possible shades in between. In my view, the "best" viewpoint is the one that says "it depends" and adapts its approach according to context. And even though I enjoy writing code and like it to be clean as much as the next person, I just as often find myself siding with the Maker point of view in business settings, especially in startups. Of course, if you're developing software for serious use cases, like banking or healthcare, then the risks are greater and quality matters much more. But on the other hand, if you're developing a prototype to see if there's market fit for a SaaS idea, then arguably moving fast matters more. Still, Coders and Makers often struggle to see eye to eye, and product discussions can get heated when one engineer believes they should refactor the codebase, leading to better developer experience, and the other would rather ship the next feature, leading to better user experience.

…And Then Came the Agents

Okay. But this is a decades-old debate at this point. Why am I bringing this up now?

With Vibe coding becoming a thing, I think this debate is more important now than ever. It is becoming increasingly possible to leave coding entirely to LLM-powered agents such as Claude Code. And many of my programmer friends say they are doing exactly that: they hardly write a line of code themselves anymore. Their jobs now are to write the prompts that produce the code. Ideally they would review the code and correct the agent if it is going down the wrong path… but do you have the patience to carefully review thousands of lines of code with changes across a codebase, generated within a few minutes of Claude pontificating and reticulating? I don't think most of us do. So we tend to read the summary of what the LLM tells us it did, or is planning to do, and if it sounds good, we let it implement the code. And we believe we're increasing our productivity, because there's no way we could have written that much ourselves in such a short amount of time.

Taken in isolation, usually most code written by the LLM looks reasonable. Every pull request LGTM, so to speak. At the micro-level, LLMs mostly follow the good practice guidelines loved by Coders: well-named variables and functions, comments making intentions clear. But zoom out one level, and quickly you find that at the macro-level, things are a mess. Code re-use? It's more likely the LLM has defined two functions in different parts of the codebase that do the same thing, although implemented subtly differently. Clear separation of concerns? Unlikely, but maybe if you're lucky. Tests? Hopefully, if you asked for them. But do the tests have sensible assertions, or do they just run the code? Or maybe they check every minutia of the underlying code in hundreds of fine-grained test cases, losing the bigger picture. You better check. Dead or unreachable code? All over the place.

Perhaps the problem lies with me: did I provide a detailed enough specification of what needs to be done? Indeed, I could spend time writing several pages of clear design specifications, and then have the LLM implement it in one shot. Done from scratch on a new codebase, I think this may turn out okay. But when let loose on an existing codebase, my experience has been that the LLM's lack of full visibility of the big picture quickly leads to a mess. At this point in time, LLM-based systems are highly capable of handling simple tasks like basic algorithms. However, when presented with large projects over hundreds of files, they struggle to maintain order.

When recently developing the first version of a bank statement converter, I wrote almost all the code by hand, with some help from GitHub Copilot or ChatGPT here and there for completing small sections. One HackerNews observer memorably compared this approach to "banging rocks together" when compared to more agentic approaches. But I carefully desiged the interfaces, components, and individual units to a level of polish that I was satisfied with. And, to me, the codebase was relatively clean, well-tested and easy to modify. But, fearing I need to speed things up, I let Claude Code inspect the codebase and asked it to implement a new feature. Within 15 minutes it had disabled CSRF protections (to get rid of those pesky "Forbidden" errors!) and implemented a dozen new functions, hundreds of lines each, that subtly recreated already-existing functionality for user authentication but with minor modifications. It all worked, but changes it made to authentication were only applied to its new endpoints, not the existing code, so users could run into errors depending on which sequence of pages they visited. Clearly, this relationship was not meant to last, and I soon went back to my old rock-banging methods.

Yet from the Maker's point of view, what happened was all fine, right? The application after Claude's modifications did what it's supposed to. After all, I didn't explicitly ask it to keep CSRF protections intact, or to ensure existing endpoints worked consistently with the new functionality. And when I asked it to do so, it tried tirelessly until it had spent all my credits, and probably would have succeeded eventually (but never quite did, by the way).

So we're moving fast and breaking things, as Mark Zuckerberg (in)famously put it. But with LLMs, I fear we may be taking the "breaking things" part a bit too far. There is a hidden cost to messy code: development velocity decreases, because every change introduces a new bug or inconsistency somewhere else. We also risk serious security vulnerabilities. It's not just Claude's disdain for CSRF. In March 2025, Matt Palmer, an engineer at Replit, wrote a report finding that 170 out of 1,645 Lovable-created web apps were suffering from a security flaw, easily allowing hackers to extract highly sensitive information. (It should be mentioned here that Replit is a competitor of Lovable, but the vulnerability pointed out, it seems, was indeed real.)

So even the Makers, at some point, will be forced to reconcile the reality that while they were able to move incredibly fast initially, eventually the untidiness of the approach started slowing them down, perhaps to the point where they have to start looking at the code themselves, or turn to someone with the experience who can. And this is where the wisdom of "it depends" becomes so important. Maybe the fast approach was fine for the MVP (Minimum Viable Product) or MLP (Minimum Lovable Product, perhaps the more appropriate term in this setting), but be prepared to either start over or pay down that technical debt in the future, should your application gain some traction.

So what can we do?

Clearly, LLMs are extremely useful tools. And Vibe coding, still in its infancy today, very well may replace traditional "artisanal" coding in the long run as AI systems continue to improve. But we also can't (or at least, shouldn't) simply let them loose on our codebases today.

So here's an idea: what if we developed a way to measure the "cleanliness" of a codebase, both at the micro- and macro-levels, and then let the AI Agents tirelessly optimize, like only they can, until the codebase is certified as 100% "clean"? We can fight fire with fire and let the agents tidy up their own spaghetti code. Sure, it will cost us some more tokens, but it will work, right? …Right? Well, maybe.

While it definitely can help to provide AI assistants with more tools: linters, code coverage, maybe even cyclomatic complexity, I think the problem you'll run into with this is that all these tools tend to operate at the micro-level. Taking a step back to the macro-level, it becomes increasingly difficult to tell whether code is complex because the domain it's operating in is inherently complex, or the code is complex because it was poorly designed. This distinction has also been called Accidental or Essential Complexity.

But perhaps this issue, too, is not insurmountable. If we define clean code (at the macro level) as code that is easy to modify in new directions safely and correctly, then perhaps we can measure the cleanliness of a codebase. Well-designed code should be easy to understand, verify and modify, or at least as easy as the underlying domain allows. That has always been the case, and it is still true for coding agents.

So what if we created a measure that said, essentially, for a given set of changes in functionality, how many lines, functions and files need to be modified to implement it, including tests? We could have an LLM generate 100 hypothetical changes in functionality that a codebase might require in the future, implement each one, and then measure the average of this number. We could even, somehow, express the number as a percentage of the size of the codebase. Then we can optimize to get this number as close as possible to 0% (though obviously never quite reach it), and either accept or reject alternative architectural designs for the codebase based on whether this number is lower or higher.

I don't know. Maybe that could work. But the computational cost (not to mention the token cost) would make such an approach prohibitively expensive today.

I think there may be some simpler tricks worth trying first, like perhaps giving LLMs a means to "visualize" the graph of components in a codebase, and asking it to ensure the dependency graph at various levels (files, classes, functions) remains as tree-like as possible, with no cycles. And we can highlight to it, using vectorized representations, spots where code duplication seems to be high, and ask it to refactor while maintaining the tree structure and keeping the interfaces between components as clean as possible and ensure all the tests continue to pass. I'm keen to try this, and would be interested to know if others have had success with such ideas already. Maybe some MCP tools for this already exist.

Conclusion

To answer the question in the title of the post, in case it needed answering: Yes, Makers should care about code quality. And likewise, Coders should care about real-world impact. But as to how much they should care… it depends. And sometimes maybe they don't need to care at all. The goal and context of a project should always be taken into account.

With LLMs now writing so much new code, it's tempting to believe that code quality is going to matter less and less, because, after all, that will be the LLM's problem to deal with now. But low code quality can have knock-on effects with real-world impact down the line, whether it's slower velocity, bugs or security vulnerabilities. And even if extreme Makers don't care about code quality directly, they should at least care about those things.

My experience with this has been much the same as described in Mo Bitar's blog post, After two years of vibecoding, I'm back to writing by hand. From where we stand today, it seems like LLMs just cannot deliver high quality at the macro-level yet. Given the right tools, or context far beyond what they are capable of today, maybe one day they will, and I'm keen to see how we can push the boundaries of this problem. I laid out some ideas in this post that I'll be keen to try in the near term. But for now, for anything even vaguely important, I'll continue being the architect of my code and only let the LLMs sweat the small stuff that I can easily verify, thank you very much.

Should Makers Still Care About Code Quality?

If it works, it works, let the LLMs worry about the code… right?

Coders vs Makers

…And Then Came the Agents

So what can we do?

Conclusion