Aligning Sam Altman

If you can't align your CEO, you probably aren't going to align a superintelligence

Nov 22, 2023

If you listen to the people at OpenAI, their aim as a company is less “invent cool things” than “invent the final human technology, construct a benevolent Machine God, and usher in a new age of human flourishing”. Critical to this is the idea of being able to align this artificial intelligence to human interests.

The concept of “alignment” is so central to this mission that it is written into the company’s organisational structure. The fiduciary duty of OpenAI is to “humanity”; the Board is tasked to produce “safe AGI that is broadly beneficial”. This is all well and good, and from what I can tell the people at the company are very smart, well-intentioned, and genuinely concerned about safety. But.

How do you intend to align superintelligence when you can’t align Sam Altman?

There are several obvious answers to this, most of which revolve around technical challenges. Once you work out an AI might not be aligned, you can reprogram it.You can’t reprogram people without the aid of a CIA blacksite, and that sort of behaviour is frowned upon in most of the corporate world. But it is also a genuine question, which revolves around the point that OpenAI’s governance structures seem to be bad at dealing with very persuasive things that hide their true intentions.

Just look at the sequence of events over the last few days. On the 17th of November, the Board annouced that it had decided to remove Altman as the CEO of OpenAI, saying he was “not consistently candid in his communications”. Ilya Sutskever, the CTO, told staff “you can call” the removal a coup, but that “this was the Board doing its duty… which is to make sure that OpenAI builds AGI [Artificial General Intelligence] that benefits all of humanity”.

Now, what happened after it was determined that this very human intelligence was not aligned with the Board’s goals?

Sam Altman joined Microsoft to lead “a new advanced AI research team”
More than 90% of OpenAI staff signed a letter threatening to resign unless the board changed course
Ilya Sutskever took to Twitter to say he “deeply regret[s] my participation in the board’s actions”.
OpenAI started negotiating Sam Altman’s return
Sam Altman was reinstated as CEO, his critics were pushed off the board, and his control over OpenAI secured.

Now, let’s draw three more points out. The first is Sam Altman’s own observation that AI will likely “be capable of superhuman persuasion well before it is superhuman at general intelligence”. The second is that Sam Altman, with his naturally occurring general intelligence, was able to completely outmanouvre the Board, and render all it and its noble duties irrelevant. The third is that OpenAI’s compensation structures give its staff massive incentives to accelerate development.

I don’t believe that dangerous AI will emerge with a puff of sulphurous smoke, cackling loudly, and then enslave everyone in sight through carefully rendered patterns of pixels. But I do suspect that it might prove quite effective at getting people to do things; even sub-GPT-4 systems have convinced people they’re sentient.

So let’s say that rather than the Board clashing with Sam Altman, OpenAI finishes training GPT-6, 7, or 8, began to test, and suddenly found the Machine God staring back at them. How might they handle it? How would the staff handle it?

There’s a version of that story which probably runs something like this.

The CEO would like to make more money, and insists the product is safe. The CTO vehemently disagrees. Lots of researchers, who have equity, would also like to make money, and some of them have talked to the very convincing system.
The board, alarmed, decide to boot the CEO.
The staff, outraged at the treatment of their leader, revolt
Your large corporate sponsor, outraged at the treatment of their investment, revolts
Oh

If you look at the last week’s events as a test run, OpenAI’s board was worried that human-level intelligence and persuasion Sam Altman was moving too fast, or behaving unethically, in the pursuit of superhuman intelligence and persuasion. The net result from the Board’s point of view seems to be that he is now at one of the world’s largest companies, all its staff are in revolt because they want to join him, its been utterly outmanouevred in the media and also its company no longer existed as anything other than some IP and contracts that nobody wanted to fulfill, until it handed it back to Sam and removed its ability to restrain him. GG, as they say.

In fact, it’s arguably worse than that. It turns out OpenAI has given its staff strong financial incentives to accelerate rather than slow down, because all that (not quite) equity is worthless if you dissolve the company in an attempt to keep humanity safe. The Board isn’t really able to override the staff. And in its one encounter with a genuine emergent force — call it capitalism, or the modern corporation, or whetever else you want to call the systems of combined decision-makers and incentives that form a self-directing layer on top of human behaviour — and it was beaten all-ends-up.

In a world where Altman wanted to develop something dangerous, even without his reinstatement he would now have Microsoft about to give him a huge amount of resources to race ahead, the Board’s technical staff would be desperate to join him, and a media narrative that casts them as power hungry safetyists intent on holding back human development.

It also turns out that all the cooperation OpenAI relied on — the people and systems that were meant to act as safety valves, the corporate partner unquestioningly providing funding — can break down sharply under pressure. People back down or are persuadable, they don’t behave in the way you expect to. If you add the pressure of imminent AGI to this, with uncertain timescales, the probability of sudden deviation goes through the roof.

Human governance structures are, even when very well-designed, filled with humans. And humans can be fooled by non-superintelligences about their motives and actions. I do wish OpenAI well. The people on the Board genuinely seem to care. But they might want to look at making sure that emergency brake handle is connected to something.

Marginally Productive

Discussion about this post