The Intersection of Neuroscience and Artificial Intelligence: Spiking Neural Networks

Aritifical Intelligence

The Intersection of Neuroscience and Artificial Intelligence: Spiking Neural Networks

wolfhaq9083

February 6, 2024

The Intersection of Neuroscience and Artificial Intelligence: Spiking Neural Networks

High energy consumption and the increasing computational cost of Artificial Neural Network (ANN) training tend to be prohibitive. Furthermore, their difficulty and inability to learn even simple temporal tasks seem to trouble the research community.

Nonetheless, one can observe natural intelligence with minuscule energy consumption, capable of creativity, problem-solving, and multitasking. Biological systems seem to have mastered information processing and response through natural evolution. The need to understand what makes them so effective and adapt these findings led to Spiking Neural Networks (SNNs).

In this article, we will cover both the theory and a simplistic implementation of SNNs in PyTorch.

Information representation: the spike

Biological neuron cells do not behave like the neuron we use in ANNs. But what is it that makes them different?

One major difference is the input and output signals a biological neuron can process. The biological neuron’s output is not a float number to be propagated. Instead, the output is an electric current caused by the movements of ions in the cell. These currents move forward to the connected cells through the synapse. When the propagated current reaches the next cells, it increases their membrane potential, which is a result of the imbalanced concentrations of these ions.

When the membrane potential of a neuron exceeds a certain threshold, the cell emits a spike, i.e. the current to be passed to forward cells . To this end, using our understanding of typical ANNs, we can map the output of a neuron as a binary value, where 1 represents the spike’s existence in time, and the synapse as the weight that connects two neurons.

One more feature of the biological neurons that arise from this mapping, is asynchronous communication and information processing. Typical ANNs transfer information in sync, wherein in one step, a layer reads an input, computes an output, and feeds it forward. Biological neurons, on the other hand, rely on the temporal dimension. At any moment, they can take an input signal and produce an output, regardless of the behavior of the rest of the neurons.

To sum up, biological neurons have inner dynamics that cause them to change through time. As time passes, they tend to discharge and decrease their membrane potential. Hence, sparse input spikes will not cause a neuron to fire.

To further understand how biological neurons behave , we will look at an example. We will model the neuron using the Leaky Integrate-and-Fire model, one of the simplest models to be used for SNNs.

Leaky Integrate and Fire

The Leaky Integrate-and-Fire (LIF) model can be represented as a Resistor-Capacitor circuit (RC circuit), with an input that feeds the circuit with spikes and a threshold switch which causes… well, we will see what it does.

Briefly, the solution of the circuit analysis results in an exponential decay through time with sudden increases of the value of the input.

In more detail, let’s set the steady-state voltage at 0. When an input current arrives with the value V, the voltage will increase by V volts. Then, this voltage will start to decay exponentially until a second spike arrives. When an input spike causes the voltage to exceed the threshold (i.e. 1 Volt), then the neuron emits a spike itself and resets to the initial state.

τmdVm(t)dt=−(Vm−Em)+iinjectgleak\tau_{m}\frac{dV_{m}(t)}{dt}=-(V_{m}-E_{m})+\frac{i_{inject}}{g_{leak}}τmdtdVm(t)=−(Vm−Em)+gleakiinject

In this equation, the term τm=C/gleak\tau_{m}=C/g_{leak}τm=C/gleak represents a time constant, the so-called membrane time constant, which controls the decay rate of the neuron. VmV_{m}Vm and EmE_{m}Em represent the membrane’s voltage and initial state voltage of the neuron respectively. The input signal is fed through the term iinjecti_{inject}iinject.

In the following figure, we see the response of the neuron when an input spike of value 0.8 arrives. The exponential decay can be easily shown, but here we do not exceed the threshold of 1. So the neuron neither does reset its voltage nor emits a spike.

Information encoding

Information encoding is the process of getting an input signal, and converting it into spiketrains.

One such simple encoding method is the Poisson encoding , where a value of the signal to be given as an input to a neuron, is normalized between the minimum and the maximum value. The normalized value represents a probability of rrr. Over a time window TTT and a small timestep dtdtdt, the resulting spiketrain of the encoded signal, at each timestep dtdtdt, has probability rdtrdtrdt of containing a spike. In this way, the higher the probability rrr, the more spikes the resulting spiketrain will have. Thus the information will be encoded more accurately.

Images to spiketrains

Let’s see an example. Imagine a grayscale image of size 32×32 pixel. The value of the pixel at position (1,1) is 32. Normalizing the value we get r=32/256=>r=0.125r=32/256 => r = 0.125r=32/256=>r=0.125. Over the time window of 1 second and dt=0.001dt = 0.001dt=0.001 seconds, the resulting spiketrain will have about r∗T/dt=125r*T/dt = 125r∗T/dt=125 spikes, with the total number of them following a Gaussian distribution. At each timestep 0.001, we expect a probability of 0.125 of finding a spike.

This method shows high biological plausibility since human vision seems to store information the same way for the first layer of neurons. However, this disregards the information encoded at the exact timing of the spike.

One can easily realize this, since two identical values may result in a different spiketrain.

Rank Order Coding and Population Order Coding

To alleviate this problem, a wide variety of algorithms have been proposed, such as Rank Order Coding (ROC) or Population Order Coding (POC) . ROC encodes the information in the order the spikes arrive, over a given time window, with the first spike meaning the highest value of the signal. This results in losses in information. POC solves this problem by encoding a single value in multiple neurons that use receptive fields that cover all the possible values.

Let us see two encoding-to-spikes methods, one rate-based (Poisson Encoder) and one temporal-based (ROC-like). Have a look at the following figure.

Trying to process this 2 by 2 grayscale image, we realize that it is stored as 4 integers in the range of 0 (for white) to 255 (for brack). In ANNs this set of numbers is the input vector that is fed to the network. We want to transform it into spiketrains in a given time window; for example 1ms with dtdtdt of 0.02ms. This results in 50 discrete timesteps.

The Poisson encoding method will produce a proportional number of spikes for the given time window as shown below. The value of 255 will be packed with spikes, the value of 207 will produce less and so it goes for the rest.

On the other hand, the temporal method will produce only one spike for each value but the information is encoded in the timing of this spike, where the highest values produce an early spike compared to lower ones.

Both of them have their pros and cons . Even a look into nature and how biological brains handle the information is enough to understand that both types or a mix of them are used .

The choice of the encoding algorithms affects the result of the neuron’s output and the behavior of the network in general. Moreover, not all developed learning schemes can be applied to every encoding scheme. Some learning methods rely on the information coded in the exact timing of the spike and so rate encoding algorithms like the Poisson encoding algorithms will not work, and vice versa.

Many more algorithms have been developed such as Temporal Contrast , HSA , BSA that can provide a wide variety of choices.

On the other side, event cameras, most known as Dynamic Vision Sensors (DVS), overcome the whole encoding process, by recording input, such as video, directly into spiketrains.

Dynamic Vision Sensors (DVS)

In video recording, typical cameras map a value at each pixel for every timestep (frame).

DVS , on the contrary, focuses on the brightness intensity differences. When the brightness changes enough over two consecutive timesteps, the camera produces an “event”.

This event stores: a) the information about the occurrence of the spike (simply, that a spike is generated), b) the pixel that produces the spike in the form of its spatial coordinates XXX, YYY, c) the timestamp that the event occurs and d) the polarity.

Polarity denotes whether the threshold was over exceeded or under exceeded. So when the intensity increases over the threshold, the polarity will be positive. On the other hand, when it decreases over the (negative) threshold, it will be negative. The polarity storage requires the existence of “negative” spikes but it also adds to the information for the network, since it can use this attribute to extract more meaningful results.

This picture shows the data produced by a dynamic vision sensor. On the left, we see the image from a standard camera. On the right, we see the same picture taken by a DVS. Here, we see the aforementioned positive and negative spikes produced by the sensor. Again, DVS takes spatiotemporal data. DVS collect and compare luminance differences through time. The picture above does not make this clear.

One can argue that the data produced by such sensors lag behind in information contained compared to standard cameras. Importantly though, DVS:

use only a fraction of the energy consumed by typical vision recorders,

can operate to frequencies up to 1MHz, and

do not blur the resulting image or video .

This makes them very appealing to the research community and a possible candidate for robotic systems with limited energy sources.

Training the SNN

You have seen how to model the neuron for our network, studied its response, and analyzed the encoding methods to get our desired input. Now comes the most important process: the training of the network. How does the human brain learn, and what does it even mean?

Learning is the process of changing the weight that connects neurons in a desirable way. Biological neurons, though, use synapses to communicate information. The weight change is equivalent to the connection strength of the synapse, which can be altered through time with processes that cause synaptic plasticity. Synaptic plasticity is the ability of the synapse to change its strength.

Synaptic Time Dependent Plasticity (STDP)

“Neurons that fire together, wire together”. This phrase describes the well-known Hebbian learning and the Synaptic Time Dependent Plasticity (STDP) learning method that we will discuss. STDP is an unsupervised method that is based on the following principle.

A neuron adapts its pre-synaptic input spike (the input weight of a previous neuron) to match the timing of an input spike with the output spike.

A mathematical expression will help us understand.

Let’s think of an input weight w1w_1w1. If the spike that comes from w1w_1w1 arrives after the neuron has emitted the spike, the weight decreases because the input spike has no effect on the output one.

On the other hand, a spike arriving before the fire of a neuron strongly affects the timing of the neuron’s spike, and so its weight increases to temporally connect the two neurons through this synapse (weight w1w_1w1).

Due to this process, patterns emerge in the connections of the neurons, and it has been shown that learning is achieved.

But STDP is not the only learning method.

SpikeProp

SpikeProp is one of the first learning methods to be used. It can be thought of as a supervised STDP learning method.

The core principle is the same: it changes the synapse weight based on the spike timing. The difference with STDP is that while STDP measures the difference in presynaptic and postsynaptic spike timing, here we focus only on the resulting spikes of the neuron (postsynaptic) and their timing.

Since it is a supervised learning method, we need a target value, which is the desired firing time of the output spike. The loss function and the proportional weight change of the synapse are dependent on the difference in the resulting spike timing and the desired spike timing.

In other words, we try to change the input synapse weight in such a way that the timing of the output spike matches the desired timing for the resulting spike.

If we take a look at the equation of the loss function, we notice its similarity with the STDP weight update rule.

The fact that we consider the difference of the output with the desired spike timing, allows us to: a) create classes to classify data and b) use the loss function with a backpropagation-like rule to update the input weight of the neurons.

The adaptation of the delta term can give us a backpropagation rule for SpikeProp with multiple layers.

This method illustrates the power of SNNs. With SpikeProp we can code a single neuron to classify multiple classes of a given problem.