Introduction

Who am I?

Baldur Bjarnason
Been doing web dev for twenty-five years, client and server
PhD in Interactive Media
My superpower is researching complex problems

Why I started researching “AI” for programming

The software industry is really bad at software development

	SUCCESSFUL	CHALLENGED	FAILED	TOTAL
Grand	6%	51%	43%	100%
Large	11%	59%	30%	100%
Medium	12%	62%	26%	100%
Moderate	24%	64%	12%	100%
Small	61%	32%	7%	100%

The Chaos Report 2015, Standish Group

Weinberg’s Law:

If builders built buildings the way programmers wrote programs, then the first woodpecker that came along would destroy civilization.

I’ve been obsessed with software quality for a few years now
Wrote a book: “Out of the Software Crisis”
Generative AI is the most interesting new tool to appear in later years
Ended up diving so deep into the research on generative AI that I wrote a book “The Intelligence Illusion”
https://illusion.baldurbjarnason.com/

Is generative AI good or bad for software development?

What do you mean with “software development”?

The system of creating, delivering, maintaining, and using a software project, from idea to end-user.
That includes the value generated by the software project being used.

What do you mean with “generative AI”?

Language models, trained on text and code, used as autocomplete copilots or chatbots

Language models

A high-level crash course in language models

Text →

Tokens →

Algorithm →

Very much not a brain

Brains are constructed from nutrition, using genes and the chemical environment
Language models are constructed from the data

There is no actual training in “AI”, only construction.

A pregnant mother isn’t “training” the fetus.

A language model isn’t “trained” from the data.

But this is the term that the AI industry uses, so we’re stuck with it.

The prompt

Who was the first man on the moon?

The input prompt is just text. No control or parameters. It cannot be secured.

The response

On July 20, 1969, Neil Armstrong became the first human to step on the moon.

This happens to be a fact, but the language model only knows the text.
The language model is fabricating a mathematically plausible response, based on word distributions in the training data.

There are no facts in a language model or its output. Only memorised text.
A language model only fabricates.
It’s all “hallucinations”.
Occasionally those fabrications correlate with facts, because people often speak the truth.

A knowledge system?

To be able to answer a question factually and pass as a knowledge system, the model needs to memorise the answer
The model doesn’t know facts, only text
If you want facts from it, it needs to memorise the text

“Dr. AI?”

Vendors use human exams as benchmarks
These are specifically designed to test rote memorisation, because that’s hard for humans
Programming benchmarks also require memorisation. Otherwise, you’d only get pseudocode.

In all research to date, memorisation has been key to model performance in a wide range of benchmarks.

Performance drops when researchers try to “fix” memorisation.

This here would be a Chekhov’s gun

👉🏻 memorisation! pewpew

Biases

Language model text is the averagely plausible text for a given prompt
It is fixed on the past because it’s constructed from historical training data
“Racist grandpa who has learned to speak fluent LinkedIn”

Generated text is going to skew conservative. (Bigoted, prejudiced, but polite.)
Even when the cut-off date for the data set is recent, it’s still going to skew historical.
Language models will always skew towards the more common, middling, mediocre, and predictable.
All the worst of the web is in there.

Modify, summarise, and “reason”

Great at converting and modifying
Summarisation is so-so
“Reasoning”? 😬
Not actual reasoning

“AI” for programming is probably a bad idea

There was reason to be hopeful.
Programming languages look perfect for language models.
But, the more I studied the technology the more flaws I found.

1. Language models can’t be secured

Prompt injections are not a solved problem.
The training data set itself is a security hazard.
Integration into public-facing products can’t be done safely.

2. It encourages the worst of our management and development practices

A language model will never question, push back, doubt, hesitate, or waver.
Your managers are going to use it to flesh out and describe unworkable ideas at record speed.
It’ll let you implement the worst ideas ever in your code without protest.

Language models don’t deliver productivity improvements.

They increase the volume, unchecked by reason.

These tools will indeed make you go faster, but you’ll be accelerating in the wrong direction.

3. Its User Interfaces do not work, and we haven’t found one that does

Automation bias
Automation to help you think less…
…helps you think less.

Microsoft themselves have said that 40% of GitHub Copilot’s output is committed unchanged.

People overwhelmingly seem to trust the output of a language model.

4. It’s biased towards the stale and popular

Biases are functional problems for code.
It favours old frameworks and platforms over new.
Old APIs over new.
It won’t know about updates, deprecations, or security issues.

The software industry is built on change.

Language models are built on a static past.

5. No matter how the lawsuits go, the tech threatens the existence of free and open source software

So many lawsuits
Bad for free and open source software no matter what

They’re trained on open source code without payment or even acknowledgement.
Language models replace open source projects.
Demotivates maintainers and drain away both resources and users.

The open source ecosystem surrounding the web and node are likely to be hit the hardest.

6. Licence contamination

They only train on open source code.
Microsoft and Google have the biggest private repositories of code in the world.
They don’t seem to use any of it for training.
Code licenses need to be complied with…
Otherwise things can get bad.

Remember our gun on the mantelpiece?