deceptive mesaoptimizers

=institutions =AI safety =response

 

 

AI alignment may seem like it's a new problem that's only relevant for the future, but it's a quite similar problem to institutional alignment, which is an ancient and thoroughly-pondered topic that's very important right now. In many cases, AI alignment issues become clearer if a government or large corporation is substituted for the superintelligent AI.

 

 

This post is about the meme previously discussed here. The key part reads:

There are deceptively-aligned mesaoptimizers even for a myopic base objective.

 

That post tries to explain what that means, but I wanted to instead translate it to an institutional alignment issue and see if that made things clearer.

 

 

Suppose there's an important government agency. Let's say it's the CDC of the USA. In theory, it's supposed to keep citizens healthy in efficient ways. In this metaphor we're ignoring the personal incentives of the leader, so let's suppose that the leader truly believes in this mission and cares about nothing else.

If the leader - let's call them "L" - is in power for a long-time and cares about citizen health to the exclusion of other things, the logical thing for them to do is expand their power to better accomplish that goal. Over a long enough period of time, a leader of a powerful enough department could end up controlling a country. However, L in this example has a limited term of office, which limits their ability to make such long-term plans. Their term being limited makes their base objective (health) myopic.

While L has a limited term of office, the mid-level bureaucrats don't. In this metaphor, groups of mid-level bureaucrats are mesaoptimizers, tools which L manages by giving them lower-level objectives.

L knows that infectious diseases are bad for health, so L creates the Department of Minimizing Infectious Diseases, DMID. That's a mesaoptimizer of L.

DMID concludes that handwashing and flu vaccines are good for minimizing infectious diseases, so DMID creates a Department of Maximizing Handwashing, DMH, and a Department of Maximizing Flu Vaccines, DMFV. These are mesaoptimizers of DMID. People in those departments will build their careers - perhaps even their identities - on handwashing and flu vaccines being really important.

Now, suppose a new airborne virus causes a pandemic. The DMH will tell the DMID to tell everyone to wash their hands more. The DMFV will tell the DMID to tell everyone to worry more about the flu.

Eventually, people might notice that masks are more effective, and start scrutinizing the CDC. The DMH and DMFV, noticing this scrutiny, will start pretending they support using masks, even over handwashing and flu vaccines. That makes them deceptively-aligned mesaoptimizers, despite L being myopic due to their limited term of office.

Now, suppose flu rates fall to negligible levels because most people have been wearing masks for a while. The DMFV and thus DMID will still tell everyone that it's more important than ever to get a flu vaccine.

 

 

 

 

back to index