Language model text is the averagely plausible text for a given
prompt
It is fixed on the past because it’s constructed from historical
training data
“Racist grandpa who has learned to speak fluent LinkedIn”
Generated text is going to skew conservative. (Bigoted,
prejudiced, but polite.)
Even when the cut-off date for the data set is recent, it’s still
going to skew historical.
Language models will always skew towards the more common,
middling, mediocre, and predictable.
All the worst of the web is in there.
Modify, summarise, and “reason”
Great at converting and modifying
Summarisation is so-so
“Reasoning”? 😬
Not actual reasoning
“AI” for programming is probably a bad idea
There was reason to be hopeful.
Programming languages look perfect for language models.
But, the more I studied the technology the more flaws I found.
1. Language models can’t be secured
Prompt injections are not a solved problem.
The training data set itself is a security hazard.
Integration into public-facing products can’t be done safely.
2. It encourages the worst of our management and development
practices
A language model will never question, push back, doubt, hesitate,
or waver.
Your managers are going to use it to flesh out and describe
unworkable ideas at record speed.
It’ll let you implement the worst ideas ever in your code
without protest.
Language models don’t deliver productivity improvements.
They increase the volume, unchecked by reason.
These tools will indeed make you go faster, but you’ll be
accelerating in the wrong direction.
3. Its User Interfaces do not work, and we haven’t found one that
does
Automation bias
Automation to help you think less…
…helps you think less.
Microsoft themselves have said that 40% of GitHub Copilot’s output
is committed unchanged.
People overwhelmingly seem to trust the output of a language
model.
4. It’s biased towards the stale and popular
Biases are functional problems for code.
It favours old frameworks and platforms over new.
Old APIs over new.
It won’t know about updates, deprecations, or security issues.
The software industry is built on change.
Language models are built on a static past.
5. No matter how the lawsuits go, the tech threatens the existence
of free and open source software
So many lawsuits
Bad for free and open source software no matter what
They’re trained on open source code without payment or even
acknowledgement.
Language models replace open source projects.
Demotivates maintainers and drain away both resources and users.
The open source ecosystem surrounding the web and node are likely to
be hit the hardest.
6. Licence contamination
They only train on open source code.
Microsoft and Google have the biggest private repositories of code
in the world.
They don’t seem to use any of it for training.
Code licenses need to be complied with…
Otherwise things can get bad.
Remember our gun on the mantelpiece?
👉🏻 memorisation! pewpew
Turns out blindly copying open source code is
problematic.
Whodathunkit?
These models all memorise a lot.
A lot of it will get copied into the output.
GitHub’s own numbers say 1% of the time.
They let you block verbatim copies but language models excel at
making slight changes to text.
It doesn’t work that well.
And it can degrade performance so people are gonna turn it off.
It won’t matter. I won’t get caught.
Fine. But your employer might.
During due diligence for an acquisition.
In discovery for a separate lawsuit.
During hacks and other security incidents.
Each one could cost the company a fortune.
Language models for software development are a lawsuit waiting to
happen
Language model code generators, unless they are completely
reinvented from scratch, are unsuitable for anything except for
prototypes and throwaway projects.
So, obviously, everybody’s going to use them
All the bad stuff happens later. Unlikely to affect your bonuses
or employment.
It’ll be years before the first licence
contamination lawsuits happen.
Most employees will be long gone before anybody realises just how
much of a bad idea it was.
But you’ll still get that nice “AI” bump in the stock market.
It’s nonsense without consequence.
The best is yet to come
In a few years’ time, once the effects of the “AI” bubble finally
dissipates…
Somebody’s going to get paid to fix the crap it left behind. 🙂