=biology =machine learning =neural networks
Any type of brain must have a way to transfer signals across distances, and it's desirable to transmit those signals quickly. The potential ways that biological systems could transmit signals quickly that come to mind for me are:
1)
bioluminescence -> biological optical fiber -> photoreceptors
2) electron
transfer to a conductive polymer
3) mechanically pulling a fiber
(something like a hair with
lubricin on its
surface, inside a lubricating sheath) linked to a
mechanoreceptor
4) ion channels triggered by other ions, causing propagating electrostatic
charge when triggered
Of those, (4) seems the easiest
to evolve, and it's what developed on Earth.
A neuron
firing this way
by opening sodium channels is all-or-nothing, so information must be
transmitted in the timing. Neurons have no global absolute clock, so
information must be contained in "time since last spike" rather than the
absolute time of spikes, unless a spike means that something has just
happened.
For some current theories of neuron firing, see
this page.
From artificial neural network (ANN) research, we know that linear
representations of activations are worse at low resolution than some
nonlinear ones. I would expect a spike to typically represent an activation
of approximately:
Formula 1: a + b * e^(-c * time_since_last_spike)
Each neuron is connected to many
synapses, typically ~7000 in a human brain. When a neuron fires, at each
connected synapse, some
neurotransmitters
are released. Most neurotransmitters have no immediate effect on neuron
firing, and instead regulate slower processes, but let's consider just the
short-term behavior of neurons. A neuron N1 fires, it releases some
neurotransmitters at a synapse, and they bind to receptors at neuron N2
which have some net effect on the electric potential of N2.
That net
short-term effect on N2 potential is analogous to a "weight" in an ANN, but
it's not a constant, it's a function: spike timing -> change in potential.
(Also, the neurotransmitter receptors at synapses can be disabled or added
over a longer timescale.) The N2 potential is analogous to an ANN
accumulator, but it decays towards a baseline over time.
The
represented weights can be both positive and negative. It's also common for
a single synapse to have receptors with
positive and
negative effects on cell potential active at the same time.
I
wrote above that I'd expect the activation represented by a spike to often
approximately follow Formula 1. An obvious way to accomplish that is to have
a synapse with a channel type with approximately constant value on firing,
and another channel type transporting ions that, as time passes after a
spike, are transported and asymptotically approach an approximately opposite
value.
That being the case, I'd expect two spikes in rapid succession
at the same synapse to usually represent a large activation value. Depending
on the "weights" at that synapse, that could either strongly inhibit neuron
firing, or lead to immediate firing. But of course, there's no need for all
synapses to use timing representations with the same shape. At some
synapses, longer times between spikes probably represent larger
values; that would be useful for making fast simple reactions, where you
want a single pulse to propagage through some paths quickly.
When neurons modify receptors at synapses, how much internal data can they draw upon? Memories are partly stored by DNA methylation patterns, so potentially quite a bit.
A typical human brain has about 7*10^14 synapses. GPT-3 has about 1.7*10^11 weights. Does this mean that GPT-3 has about 1/4000th the effective weights of a human brain? No.
1) Synapse
connections are sparse, which makes them equivalent to at least 10x as many
dense ANN weights.
2) Neurons can shift between receptor patterns at
synapses, so at timescales long enough for that, we should multiply by at
least 10x again.
I feel confident in saying a
human brain has >10^5x the effective weights of GPT-3. Does this mean that
scaling up GPT-3 to 10^5 as many parameters would produce a human-level
intelligence? No.
Transistors are much faster than neurons. Thanks to
that advantage, GPT-3 was trained on more text than a human can read in a
lifetime - yet it's still widely considered "undertrained"
relative to its parameter count. A "human-level AI" wouldn't be a normal
human - it would be closer to a human that spent thousands of years reading
the internet.
On the other hand,
transformers scale quadratically
with input context size. In that sense, they're a brute-force solution that
only works
well for small contexts. (Dense ANNs are also a brute-force solution -
they're less efficient than
sparse ones, but easier to implement and still useful.)
Humans think more efficiently, and at least some humans can operate on
higher conceptual levels than something like GPT-3. That said, the remaining
insights to bridge that gap could be fairly simple.