=institutions =AI safety =response
AI alignment may seem like it's a new problem that's only relevant for the future, but it's a quite similar problem to institutional alignment, which is an ancient and thoroughly-pondered topic that's very important right now. In many cases, AI alignment issues become clearer if a government or large corporation is substituted for the superintelligent AI.
This post is about the meme previously discussed here. The key part reads:
There are deceptively-aligned mesaoptimizers even for a myopic base objective.
That post tries to explain what that means, but I wanted to instead translate it to an institutional alignment issue and see if that made things clearer.
Suppose there's an important
government agency. Let's say it's the CDC of the USA. In theory, it's
supposed to keep citizens healthy in efficient ways. In this metaphor we're
ignoring the personal incentives of the leader, so let's suppose that the
leader truly believes in this mission and cares about nothing else.
If the leader - let's call them "L" - is in power for a long-time and cares
about citizen health to the exclusion of other things, the logical thing for
them to do is expand their power to better accomplish that goal. Over a long
enough period of time, a leader of a powerful enough department could end up
controlling a country. However, L in this example has a limited term of
office, which limits their ability to make such long-term plans. Their term
being limited makes their base objective (health) myopic.
While L has
a limited term of office, the mid-level bureaucrats don't. In this metaphor,
groups of mid-level bureaucrats are mesaoptimizers, tools which L manages by
giving them lower-level objectives.
L knows that infectious diseases
are bad for health, so L creates the Department of Minimizing Infectious
Diseases, DMID. That's a mesaoptimizer of L.
DMID concludes that
handwashing and flu vaccines are good for minimizing infectious diseases, so
DMID creates a Department of Maximizing Handwashing, DMH, and a Department
of Maximizing Flu Vaccines, DMFV. These are mesaoptimizers of DMID. People
in those departments will build their careers - perhaps even their
identities - on handwashing and flu vaccines being really important.
Now, suppose a new airborne virus causes a pandemic. The DMH will tell the
DMID to tell everyone to wash their hands more. The DMFV will tell the DMID
to tell everyone to worry more about the flu.
Eventually, people
might notice that masks are more effective, and start scrutinizing the CDC.
The DMH and DMFV, noticing this scrutiny, will start pretending they support
using masks, even over handwashing and flu vaccines. That makes them
deceptively-aligned mesaoptimizers, despite L being myopic due to their
limited term of office.
Now, suppose flu rates fall to negligible
levels because most people have been wearing masks for a while. The DMFV and
thus DMID will still tell everyone that it's more important than ever to get
a flu vaccine.