Despite the significant risks, the efficiency advantages of using generative AI tools in programming are almost irresistible. However, a new approach to managing software development—one that keeps humans in the loop—is essential to mitigate the challenges posed by AI-generated code.
Rethinking Software Development with Generative AI
Despite the severe dangers, the efficiency boost from using generative AI tools for coding is almost too tempting to ignore. What is needed, though, is a fresh approach to managing software development that incorporates human oversight in the loop.
There is ample evidence that AI-generated code contains errors fundamentally different from those that human programmers might make. Yet, many companies' plans to fix AI coding errors revolve around simply inserting experienced human programmers into the process.
Experienced programmers intuitively understand the types of mistakes or shortcuts a fellow coder might take. However, they need special training to detect the variety of errors that occur when software creates software.
AWS CEO Matt Garman accelerated this conversation when he predicted that most developers would no longer code by 2026.
Many tool providers in the development space believe this can be addressed by using AI applications to manage AI-generated code. Even financial giants like Morgan Stanley are considering using AI to manage AI.
As a practical matter, the only safe and remotely viable method is to train programming managers to understand the unique nature of generative AI coding errors. In fact, given how different AI errors are, it might be more beneficial to train newcomers—those who have not yet been steeped in detecting human coding errors—to manage AI coding tasks.
Part of the problem lies in human nature. People tend to magnify and misunderstand differences. When managers see an entity—be it human or AI—making a mistake they themselves would never make, they often conclude that the entity is less competent in coding.
But we might consider the analogy of self-driving cars. Statistically, these vehicles are safer than human-driven cars. Automated systems never tire, never drink, and never act recklessly.
Yet, self-driving cars aren’t flawless. The types of mistakes they make—like driving full-speed into a stationary truck waiting in traffic—cause humans to say, "I’d never make that mistake. I don’t trust them."
However, just because autonomous cars make strange mistakes doesn’t mean they’re less safe than human drivers. But human nature struggles to reconcile these differences.
Coding management follows the same pattern. Generative AI models for coding can be highly effective, but when they go off course, they can go far off.
DevNag, CEO of SaaS company Query Pal, has been working with generative AI coding. He feels that many enterprise IT leaders are not yet prepared for the nuances of this new technology.
Nag remarked, "It makes a lot of strange mistakes, like it’s from another planet. The code behaves in ways a human developer wouldn’t. It’s like an alien intelligence, thinking in weird directions. AI finds pathological ways to game the system."
Tom Taulli, author of multiple books on AI programming, including this year’s AI-Augmented Programming: Better Planning, Coding, Testing, and Deployment, echoes this sentiment.
Taulli explained, “For example, you can get these LLMs (large language models) to create code, and sometimes they’ll fabricate a framework or an imagined library or module to accomplish what you want.”
He clarified that the LLMs aren’t actually creating a new framework, but pretending to do so.
"No human programmer would consider doing this," Taulli noted, "unless they were mad. They wouldn’t make up an imaginary library or module out of thin air."
When this happens, it’s usually easy to detect—if anyone is paying attention. “If I tried to install it, you’d quickly realize there’s nothing there. If there’s a hallucination, the IDE or compiler will throw an error.”
Turning over the entire coding process, including creative control of the executable, to a system that periodically hallucinates seems like a terrifying proposition.
The better way to harness the efficiency of generative AI in coding is to use it as a tool to help programmers accomplish more. As AWS’s Garman suggested, excluding humans altogether would be suicidal.
What if a generative AI coding tool lets its imagination wander and creates a backdoor, so it can later fix something without bothering a human? That same backdoor could also be exploited by attackers.
Companies are usually effective at testing applications, especially those they develop in-house, to ensure they do what they’re supposed to do. However, they often fail when testing if applications can do things they aren’t supposed to do. This requires a penetration-testing mindset.
In the world of generative AI coding, this penetration testing approach must become the default. It must also be overseen by supervisors well-trained in the quirky realm of AI errors.
Enterprise IT certainly seeks a more efficient coding future, where programmers take on more strategic roles, focusing on what applications should do and why, rather than painstakingly coding every line.
However, the efficiency and strategic benefits will come at a price—paying better-trained individuals who can ensure AI-generated code continues to operate as expected.