Dear Jan,
Thank you very much for your very detailed analysis. We will try to reproduce this as soon as possible.
Three questions:
- You only use threads, no MPI parallelization, correct?
- Your machine has >= 32 cores?
- Do the neurons receive the expected input currents, especially the same currents independent of number of threads?
Best,
Hans Ekkehard
--
Prof. Dr. Hans Ekkehard Plesser
Head, Department of Data Science
Faculty of Science and Technology
Norwegian University of Life Sciences
PO Box 5003, 1432 Aas, Norway
Phone +47 6723 1560
Email hans.ekkehard.plesser@nmbu.no
Home http://arken.nmbu.no/~plesser
On 28/04/2022, 16:22, "Jan Střeleček" <strelda@protonmail.com> wrote:
Dear NEST developers,
In our group, we're working on a model of the primary visual cortex and use step_current_source generators to simulate the input current of the LGN neurons. We noticed that the simulation time of
our model was very sensitive to the number of step_current_sources. When trying to narrow down the cause, we found out that this might be due to an issue with the parallelization of the step_current_source_generators. The resulting simple system in which the
problem can be observed is attached below, simple_example.py. It essentially creates NS step_current_generators and
injects them into NL neurons
with fixed indegree. The iaf_cond_exp neuron model is used here. The increment in the number of step_current sources does not benefit
from a multithreading performance boost as one would expect. This is compared to the performance boost for the number of neurons; see the technical details below. Our estimated guess is that the difference between 1 and 32 threads is 10 to 20 times slower
than the parallelization suggests.
Technical details:
The relative slowdown due to the parallelization of step_current_sources was measured using linear regression over
simulation time = a
NL + b
NS.
See slowdown_example.png.
The ratio b/a was then calculated. This ratio was then measured in dependence on the
number of threads. A bigger difference between the ratio for 1 thread and 32 threads means a greater problem in parallelization in step_current_generators.
Some additional results:
·
interval_dependence.png - the slowdown does not depend on amplitude_times in
the step_current_source function
·
indegree_dependence.png - the slowdown depends on the
indegree of nest.Connect(source, neurons). Specifically, the slowdown is worse for low indegree values. This shows the slowdown depends on the number of step_current_sources created,
not on the injections themselves.
Are you aware of some lack of parallelization of the step_current_source or current the injection itself? If so, are there any plans for improving it?
best regards,
Jan Střeleček