AI didn’t invent code generation. It just raised the abstraction

The past 8 to 10 months, we have seen AI agents become “software engineers” — They are being increasingly used to “write” code, with the requirements as inputs. And they do more than code generation — Those agents can review code, give you code suggestions, and refactor existing code.

It feels revolutionary, no doubt. And it is a very interesting time to be in the software industry. This paradigm shift in how software gets created opens up newer avenues of research.

But the idea that machines generate software artifacts is far older than AI — and predates several common paradigms that we find today.

The concept of code generation is almost as old as software itself.

I’ve been fortunate to work in environments where code generation wasn’t an experiment — it was the norm. At Motorola, executable models and MDE-based generation were used in production systems long before “AI coding” was a phrase. At McAfee, large portions of protocol-handling C++ code were generated deterministically from domain models such as XML. Reduced defect density in protocol code was a direct benefit of this approach.

If we were to look at the evolution of code generation systems, we could split those into 5 phases. Starting from lexers, parsers and other compilers of the 1970’s (lex, yacc, etc.), to the AI Agents that generate code, what’s changed is the level of abstraction — It has gone significantly up. At the same time, the level of determinism/predictability has gone down.

First Generation: Build-time code-generation

This generation focused on tooling, rather than on the problem domain. The objective was to eliminate environmental boilerplate code. Tools of this generation include

Automake → Generate Makefiles
Autoconf → Generate build tooling configuration
lex/yacc → Generate lexers & parsers (compiler tools)

They used templates (text based) as inputs and produced highly predictable, repeatable outputs. But their use was restricted to build time generations only. This mattered because it proved that repetitive, mechanical code should never be written by humans.

Second Generation: Domain models

This is a huge leap from the first generation. This focused on generating actual production quality code from machine-readable declarative models.

Examples:

XML → C/C++ bindings
WSDL → SOAP clients
ASN.1 → Protocol processing

For instance, at McAfee, protocol components (Say, RFC822, or just parts of it like headers, etc.) were represented in XML. If RFC has a change, you would just change that model in the XML. At build time, it generates C++ bindings, with reasonable guards and checks for acceptable value ranges. LDAP and SNMP processing code was generated from ASN.1 representations

This is a huge leap because the generators consumed actual data and generated code — It did not end at populating templates. And, this mattered because at the domain validation level, correctness moved from code reviews to model validation.

Despite this, the behavior was still fully hand written. And the domain models were heavily structural, though declarative (i.e. not free-form declarative, but structural declarative). Some teams in McAfee (now called Trellix) use this heavily.

Third Generation: MDE & Executable Models

This was radical, when it was introduced. Researchers took tools that were intended for design and architecture, built tooling around it and came up with generating behaviour from design docs. This led to the field of Model Driven Engineering (MDE)

This was largely aided by tools like UML, xUML, Telelogic/Rational etc.

This is where the code generation hit its limits, in a manner of speaking. This is also where the adoption was limited, not as wide spread as the earlier generations.

Also, for the first time, we have executable semantics. That is, you compiled the model, and you got your binaries.

It failed to scale socially and economically beyond constrained domains. In part, this is also because behaviour is hard to model and implement. More importantly, unlike the earlier generations, this was tool dependent and so vendor lock-in played a huge part. Hobbyist programmers and free software developers, who led the previous generations could not afford such tools.

Side note: This experience also highlights why long-lived modeling innovations tend to succeed best when they are open and interoperable.

Fourth Generation: Frameworks

This started in the first decade of the century, aligning with growth of the web, Web 2.0, popularity of highly structured frameworks for web development. The salient features of this generation include

Implicit code generation
The “model” was in the implementation language
(back to) No vendor lock-in

Some common examples include:

JPA Entities → SQL + ORM
Spring Annotations → Dependency Injection

This again took code generation back to the masses, making it widely successful. The tooling for these was, like in the first generation, part of the language tooling. And unlike the first generation, the code generation and the model that drove it were in the same language (Java, Ruby, Python, etc.)

Fifth Generation: Agentic & Autonomous

The age of LLMs, copilots, autonomous agents

This is the era of highest abstraction. And also of non-determinism.

The model, is in human language. Not even in the language of implementation. And agents (It can be more than one agent, also known as multi-agent systems) that operate autonomously write code.

So, this is the second time we had a radical leap

Initial

Input is intent, not schema
Output is plausible, not guaranteed
Validation shifts downstream

For the first time, we accept that generation is plausible, not guaranteed.

The generation is characterized by Probabilistic synthesis, Contextual reasoning, and Iterative self-correction.

It aligns more with the generation of MDE than with others, in that the input to the code generation is intent (although in text, instead of UML). Also, the input models behaviour, just like in MDE.

We haven’t learned from the mistakes of the MDE era either. The Agents are vendor dependent. The “prompt” that works with one vendor’s agents might not work with others.

Summarizing it all

Code generation succeeds when abstraction boundaries are stable, and fails when they are not.