Generative AI and software development

Introduction

Who am I?

  • Baldur Bjarnason
  • Been doing web dev for twenty-five years, client and server
  • PhD in Interactive Media
  • My superpower is researching complex problems

Why I started researching “AI” for programming

The software industry is really bad at software development

SUCCESSFUL CHALLENGED FAILED TOTAL
Grand 6% 51% 43% 100%
Large 11% 59% 30% 100%
Medium 12% 62% 26% 100%
Moderate 24% 64% 12% 100%
Small 61% 32% 7% 100%
The Chaos Report 2015, Standish Group

Weinberg’s Law:

If builders built buildings the way programmers wrote programs, then the first woodpecker that came along would destroy civilization.

  • I’ve been obsessed with software quality for a few years now
  • Wrote a book: “Out of the Software Crisis”
  • Generative AI is the most interesting new tool to appear in later years
  • Ended up diving so deep into the research on generative AI that I wrote a book “The Intelligence Illusion”
  • https://illusion.baldurbjarnason.com/

Is generative AI good or bad for software development?

What do you mean with “software development”?

  • The system of creating, delivering, maintaining, and using a software project, from idea to end-user.
  • That includes the value generated by the software project being used.

What do you mean with “generative AI”?

  • Language models, trained on text and code, used as autocomplete copilots or chatbots

Language models

A high-level crash course in language models

A neural network of sorts

Text →

Tokens →

Algorithm →

Very much not a brain

  • Brains are constructed from nutrition, using genes and the chemical environment
  • Language models are constructed from the data

There is no actual training in “AI”, only construction.

A pregnant mother isn’t “training” the fetus.

A language model isn’t “trained” from the data.

But this is the term that the AI industry uses, so we’re stuck with it.

The prompt

Who was the first man on the moon?

  • The input prompt is just text. No control or parameters. It cannot be secured.

The response

On July 20, 1969, Neil Armstrong became the first human to step on the moon.

  • This happens to be a fact, but the language model only knows the text.
  • The language model is fabricating a mathematically plausible response, based on word distributions in the training data.
  • There are no facts in a language model or its output. Only memorised text.
  • A language model only fabricates.
  • It’s all “hallucinations”.
  • Occasionally those fabrications correlate with facts, because people often speak the truth.

A knowledge system?

  • To be able to answer a question factually and pass as a knowledge system, the model needs to memorise the answer
  • The model doesn’t know facts, only text
  • If you want facts from it, it needs to memorise the text

“Dr. AI?”

  • Vendors use human exams as benchmarks
  • These are specifically designed to test rote memorisation, because that’s hard for humans
  • Programming benchmarks also require memorisation. Otherwise, you’d only get pseudocode.

In all research to date, memorisation has been key to model performance in a wide range of benchmarks.

Performance drops when researchers try to “fix” memorisation.

This here would be a Chekhov’s gun

👉🏻 memorisation! pewpew

Biases

  • Language model text is the averagely plausible text for a given prompt
  • It is fixed on the past because it’s constructed from historical training data
  • “Racist grandpa who has learned to speak fluent LinkedIn”
  • Generated text is going to skew conservative. (Bigoted, prejudiced, but polite.)
  • Even when the cut-off date for the data set is recent, it’s still going to skew historical.
  • Language models will always skew towards the more common, middling, mediocre, and predictable.
  • All the worst of the web is in there.

Modify, summarise, and “reason”

  • Great at converting and modifying
  • Summarisation is so-so
  • “Reasoning”? 😬
  • Not actual reasoning

“AI” for programming is probably a bad idea

  • There was reason to be hopeful.
  • Programming languages look perfect for language models.
  • But, the more I studied the technology the more flaws I found.

1. Language models can’t be secured

  • Prompt injections are not a solved problem.
  • The training data set itself is a security hazard.
  • Integration into public-facing products can’t be done safely.

2. It encourages the worst of our management and development practices

  • A language model will never question, push back, doubt, hesitate, or waver.
  • Your managers are going to use it to flesh out and describe unworkable ideas at record speed.
  • It’ll let you implement the worst ideas ever in your code without protest.

Language models don’t deliver productivity improvements.

They increase the volume, unchecked by reason.

These tools will indeed make you go faster, but you’ll be accelerating in the wrong direction.

3. Its User Interfaces do not work, and we haven’t found one that does

  • Automation bias
  • Automation to help you think less…
  • …helps you think less.

Microsoft themselves have said that 40% of GitHub Copilot’s output is committed unchanged.

People overwhelmingly seem to trust the output of a language model.

  • Biases are functional problems for code.
  • It favours old frameworks and platforms over new.
  • Old APIs over new.
  • It won’t know about updates, deprecations, or security issues.

The software industry is built on change.

Language models are built on a static past.

5. No matter how the lawsuits go, the tech threatens the existence of free and open source software

  • So many lawsuits
  • Bad for free and open source software no matter what
  1. They’re trained on open source code without payment or even acknowledgement.
  2. Language models replace open source projects.
  3. Demotivates maintainers and drain away both resources and users.

The open source ecosystem surrounding the web and node are likely to be hit the hardest.

6. Licence contamination

  • They only train on open source code.
  • Microsoft and Google have the biggest private repositories of code in the world.
  • They don’t seem to use any of it for training.
  • Code licenses need to be complied with…
  • Otherwise things can get bad.

Remember our gun on the mantelpiece?

👉🏻 memorisation! pewpew

Turns out blindly copying open source code is problematic.

Whodathunkit?

  • These models all memorise a lot.
  • A lot of it will get copied into the output.
  • GitHub’s own numbers say 1% of the time.
  • They let you block verbatim copies but language models excel at making slight changes to text.
  • It doesn’t work that well.
  • And it can degrade performance so people are gonna turn it off.

It won’t matter. I won’t get caught.

  • Fine. But your employer might.
  • During due diligence for an acquisition.
  • In discovery for a separate lawsuit.
  • During hacks and other security incidents.

Each one could cost the company a fortune.

Language models for software development are a lawsuit waiting to happen

Language model code generators, unless they are completely reinvented from scratch, are unsuitable for anything except for prototypes and throwaway projects.

So, obviously, everybody’s going to use them

  • All the bad stuff happens later. Unlikely to affect your bonuses or employment.
  • It’ll be years before the first licence contamination lawsuits happen.
  • Most employees will be long gone before anybody realises just how much of a bad idea it was.
  • But you’ll still get that nice “AI” bump in the stock market.

It’s nonsense without consequence.

The best is yet to come

In a few years’ time, once the effects of the “AI” bubble finally dissipates…

Somebody’s going to get paid to fix the crap it left behind. 🙂