# Nano-Neurons for Artificial Intelligence

#### Julie Grollier<sup>1</sup>

Nathan Leroux<sup>1</sup>, Danijela Marković<sup>1</sup>, Jérémie Laydevant<sup>1</sup>, Dedalo Sanz Hernandez<sup>1</sup>, Philippe Talatchian<sup>1</sup>, Miguel Romera<sup>1</sup>, Mathieu Riou<sup>1</sup>, Jacob Torrejon<sup>1</sup>, Flavio Abreu Araujo<sup>1</sup>, Paolo Bortolotti<sup>1</sup>, Juan Trastoy<sup>1</sup>, Erwann Martin<sup>1</sup>, Teodora Petrisor<sup>1</sup>, Vincent Cros<sup>1</sup>, Guru Khalsa<sup>2</sup>, Mark Stiles<sup>2</sup>, Sumito Tsunegi<sup>3</sup>, Kay Yakushiji<sup>3</sup>, Akio Fukushima<sup>3</sup>, Hitoshi Kubota<sup>3</sup>, Shinji Yuasa<sup>3</sup>, Ricardo Ferreira<sup>4</sup>, Alex Jenkins<sup>4</sup>, Leandro Martins<sup>4</sup>, Tifenn Hirtzlin<sup>5</sup>, Maxence Ernoult<sup>5</sup>, Alice Mizrahi<sup>1</sup>, Damien Querlioz<sup>5</sup>

<sup>1</sup>CNRS/Thales, France <sup>2</sup>NIST, USA <sup>3</sup>AIST, Japan <sup>4</sup>INL, Portugal <sup>5</sup>C2N, France











### Deep Neural networks run on unoptimized hardware



// Load configuration
require\_once APP\_ROOT.'/config.php';

if (!defined('PSI\_CONFIG\_FILE') || !defined('PSI\_DEBUG')) {
 \$tpl = new TempLate("/templates/html/error\_config.html");
 echo \$tpl->fetch();
 dia():



### Deep Neural networks run on unoptimized hardware



### Current CMOS processors cannot run future Al



[ Based on https://nicsefc.ee.tsinghua.edu.cn/projects/neural-network-accelerator/

Training neural networks on current computers is extremely power inefficient

### Digital computer:

CPUs, GPUs, TPUs, FPGAs

procesmemory data sing

Operation Energy consumption Addition of data 1x Access data (onchip 60x cache) Access data (offchip 3500x RAM)

Pedram et al, IEEE Xplore (2017)

Training neural networks on current computers is extremely power inefficient

Digital computer:

CPUs, GPUs, TPUs, FPGAs





6

# 1000 kW.h to train a Natural Language Processor

D. Marković et al, "Physics for neuromorphic computing", Nature Review Physics 2020

Orders of magnitude in energy can be saved by assembling physical synapses and neurons in neuromorphic chips



Hundred millions of neurons and synapses in a 1 cm<sup>2</sup> chip  $\rightarrow$  Each device smaller than 1  $\mu$ m<sup>2</sup>

### CMOS neurons and synapses are complex circuits

- A transistor is nanoscale but it is just a switch
- CMOS does not provide memory (volatile)



Merolla et al, *Science* **345**, 668 (2014) Davies et al, *IEEE Micro*. **38**, 82–99 (2018)



#### Brainscales 20 wafer machine. 4M neurons, 1B synapses

# Transistors alone won't do the job: they should be complemented by emerging nanotechnologies



Zhang et al, Nature Electronics 3, 371 (2020)

# The power of novel nanotechnologies for AI

Novel nanotechnologies are monothically integrated in major foundry process: they are commercially available and bring memory at the closest to compute



They are multifunctional: they can emulate many features of neurons and synapses



Neurons are non-linear and synapses are valves with memory



Most neural networks today



 $y = \sum w_i x_i$ 

is called a Multiply and Accumulate (MAC) operation State-of-the-art neural networks are deep: they extract features layer by layer



# Synapses and neurons should be densely interconnected

Cortex: 10<sup>4</sup> synapses / neurones = 10<sup>4</sup> wires/neurons



Moritz Helmstaedter lab, retina flight 2013

### Memristive neural nets

• Spintronics neural nets

### Non-volatile memristors emulate synapses

Chua, IEEE Trans. Circuit Theory (1971)





Filamentary switching



#### Phase change





Kuzum et al, Nanotechnology (2013)



Going deep: crossbar arrays of memristors physically implement the multiply and accumulate operation



# **HP** labs





Lin et al, Nature Electronics 3, 225 (2020)

### Memristive neural nets

• Spintronics neural nets

### Deep learning through RF communications?



# Magnetic tunnel junctions can be used as radio-frequency neurons

#### Nanoscale, fast (GHz), non-linear and easily measurable



#### Same structure as magnetic memories

J. Grollier et al, "Neuromorphic Spintronics", Nature Electronics (2020)

### Step 1: Single junction

# Due to its rich dynamics the nano-oscillator recognizes spoken digits with a success rate > 99.6%

TI-46 database, 5 female speakers, cochlear pre-processing



J. Torrejon, M. Riou, F. Abreu Araujo et al, Nature 547, 428 (2017)

# **Step 2:** RF communication between the two layers of a magnetic neural network



M. Romera, P. Talatchian et al, Nature 563, 230 (2018)

# **Step 3:** connect layers of radio-frequency neurons with tunable synapses



Multiply-And-Accumulate (MAC)



N. Leroux et al, Radio-Frequency Multiply-And-Accumulate Operations with Spintronic Synapses, arxiv:2011.07885 A magnetic tunnel junction can perform the multiplication operation on an RF signal



#### **Output = Input \* Weight**

Input: RF Power received by MTJ Output: DC Voltage accross the MTJ Weight is a function of frequency mismatch  $W(f_{RF} - f_{res})$ 



# We perform the MAC operation through frequency multiplexing



# Frequency multiplexing make high density connectivity possible



# Two magnetic tunnel junctions perform the MAC on RF signals



$$Y_{th} = P_{RF}^1 \times W_1(f_{RF}^1 - f_{res}^1) + P_{RF}^2 \times W_2(f_{RF}^2 - f_{res}^2)$$



### A simulated single synaptic layer perceptron recognizes digits database



Our goal is to design and build a deep neural network made of spintronic nano-synapses and nano-neurons with RF interconnexions



LAYER 1

LAYER 2





# The downside of novel nanotechnologies for AI

# Nanodevices are by essence noisy, imperfect and highly variable from device to device On-Chip Trainable 1.4M 6T2R PCM Synaptic A

Panorama of memristor synapse faults



Zhang et al, Nature Electronics 3, 371 (2020)

On-Chip Trainable 1.4M 6T2R PCM Synaptic Array with 1.6K Stochastic LIF Neurons for Spiking RBM



First fully integrated memristor/CMOS chip: only 92% on MNIST due to device variability



They are hardly compatible with the flagship training algorithm of deep neural networks: backpropagation of errors



Yann Lecun, Yoshua Bengio and Geoffrey Hinton, *Nature* 521, 436 (2015)



# Three main approaches

Al inspired 1- implement backpropagation

2- make backpropagation more hardware-compatible (top-down)

Neuroscience & Al inspired 3 - find new ways to perform hardware-compatible learning (bottom-up)

# Three main approaches

Al inspired 1- implement backpropagation

2- make backpropagation more hardware-compatible (top-down)

Neuroscience & Al inspired 3 - find new ways to perform hardware-compatible learning (bottom-up)

**Geoffrey Hinton** Al pioneer **Turing Prize** 



Can the brain do a form of backpropagation?

Stanford Seminar - Can the brain do back-propagation?

Backpropagation requires cumbersome external circuits and additional memories to store activations and gradients

#### **Backward pass**



There are no external circuits, no additional memories in the brain: how are gradients computed, stored and applied to synapses ?

Lillicrap et al, "Backpropagation and the brain", Nature Reviews Neuroscience (2020) <sup>37</sup>

Learning through physics: networks that minimize their error at the same time as they minimize their energy



Learning through physics: networks that minimize their error at the same time as they minimize their energy



# The EP learning rule is equivalent to Backpropagationthrough timeM Ernoult, J Grollier, D Querlioz, Y Bengio, B Scellier, NeurIPS 2019

EqSpike is a spiking version of Equilibrium Propagation compatible with neuromorphic implementations



Bidirectional SNN (784 -300-10), 97.6% on MNIST (SOA for online-trained SNNs)

Towards intrinsic learning

E. Martin et al, EqSpike: Spike-driven Equilibrium Propagation for Neuromorphic Implementations, arXiv:2010.07859

# Conclusion

Future high performance, low power AI requires emerging nanotechnologies and physics

