Dear Jan,

Thank you very much for your very detailed analysis. We will try to reproduce this as soon as possible.

Three questions:

- You only use threads, no MPI parallelization, correct?

- Your machine has >= 32 cores?

- Do the neurons receive the expected input currents, especially the same currents independent of number of threads?

Best,

Hans Ekkehard

Prof. Dr. Hans Ekkehard Plesser

Head, Department of Data Science

Faculty of Science and Technology

Norwegian University of Life Sciences

PO Box 5003, 1432 Aas, Norway

Phone +47 6723 1560

Email hans.ekkehard.plesser@nmbu.no

Home http://arken.nmbu.no/~plesser

On 28/04/2022, 16:22, "Jan Střeleček" <strelda@protonmail.com> wrote:

Dear NEST developers,

In our group, we're working on a model of the primary visual cortex and use step_current_source generators to simulate the input current of the LGN neurons. We noticed that the simulation time of our model was very sensitive to the number of step_current_sources. When trying to narrow down the cause, we found out that this might be due to an issue with the parallelization of the step_current_source_generators. The resulting simple system in which the problem can be observed is attached below, simple_example.py. It essentially creates NS step_current_generators and injects them into NL neurons with fixed indegree. The iaf_cond_exp neuron model is used here. The increment in the number of step_current sources does not benefit from a multithreading performance boost as one would expect. This is compared to the performance boost for the number of neurons; see the technical details below. Our estimated guess is that the difference between 1 and 32 threads is 10 to 20 times slower than the parallelization suggests.

Technical details:

The relative slowdown due to the parallelization of step_current_sources was measured using linear regression over

simulation time = a NL + b NS.

See slowdown_example.png.

The ratio b/a was then calculated. This ratio was then measured in dependence on the number of threads. A bigger difference between the ratio for 1 thread and 32 threads means a greater problem in parallelization in step_current_generators.

Some additional results:

· interval_dependence.png - the slowdown does not depend on amplitude_times in the step_current_source function

· indegree_dependence.png - the slowdown depends on the indegree of nest.Connect(source, neurons). Specifically, the slowdown is worse for low indegree values. This shows the slowdown depends on the number of step_current_sources created, not on the injections themselves.

Are you aware of some lack of parallelization of the step_current_source or current the injection itself? If so, are there any plans for improving it?

best regards,

Jan Střeleček