Hello Miriam,
I have been able to reproduce the error. I assume it is related to the fact that you slice
the layer of neurons, i.e., pick out individual neurons and use them as sources. I will
have a closer look at this soon.
Ideally, such slicing should not be necessary. It rather seems to point to lack of support
in NEST for certain connection patterns. We can look at that at a later stage.
Best,
Hans Ekkehard
--
Prof. Dr. Hans Ekkehard Plesser
Department of Data Science
Faculty of Science and Technology
Norwegian University of Life Sciences
PO Box 5003, 1432 Aas, Norway
Phone +47 6723 1560
Email hans.ekkehard.plesser@nmbu.no<mailto:hans.ekkehard.plesser@nmbu.no>
Home
http://arken.nmbu.no/~plesser
From: Kempter, Miriam <m.kempter(a)fz-juelich.de>
Date: Friday, 9 February 2024 at 10:59
To: users(a)nest-simulator.org <users(a)nest-simulator.org>
Subject: [NEST Users] Problems when using a mask inside the conn_dict on 3 or 4 MPI
processes
Einige Personen, die diese Nachricht erhalten haben, erhalten nicht oft eine E-Mail von
m.kempter(a)fz-juelich.de. Erfahren Sie, warum dies wichtig
ist<https://aka.ms/LearnAboutSenderIdentification>
Dear NEST Community,
while adapting my model to run on multiple MPI processes, I have been running into some
problems connected with the usage of masks inside the connectivity dictionary for a
2D-spatially distributed population.
You can find a minimal example containing further details in the attachments in the form
of a .txt file. Please just change "txt" with "py" for execution. The
following explanation also references to the example.
My setup:
- NEST version 3.4
- Python version 3.10
- Executed with mpirun inside a conda environment
"mpirun -np 4 python3 minEx_4MPIprocesses_problem.py"
- System: Ubuntu 22.04
- The same problem occurred when the model was executed on JURECA.
Problem description:
While using multiple MPI processes:
1. Create circular mask. See line 119
Create a spatial 2D-Population of neurons "neurons". See line
125-132
2. Select some of the neurons as source neurons. See line 154
3. Set up the connection dictionary with a mask inside. See line 157-163
4. For each source neuron: See line 173
Connect the source neurons with the neurons population using the
connection dictionary
See line 183
Where does the Problem occur?:
The error occurs at the point when the nest.Connect(…) call is executed in the loop at
line 183 when each source neuron is connected.
When does the Problem occur?:
Its occurrance depends on whether or not a mask is used inside the connection-dict
given to the nest.Connect(...) function.
If the mask is removed as in the conn-dict. in line 164-169, no error is produced in
none of the used settings (however the result is not as desired).
If the mask is used, the way the problem shows itself depends on the number of MPI
processes used, and on the setting of the number of neurons, extent, mask radius, number
of source neurons and if edge wrap is used.
- For 1 and 2 MPI procs the model is running correctly, independent of the used
conn-dict and setting.
- For 3 MPI procs either
the model runs but the distance between connected neurons does not correspond
to the given mask dimensions. In other words: The established connections are
longer/shorter than the mask should allow.
Or execution leads to an error code output ("segmentation fault")
from mpirun and job abortion.
The terminal output for an example run can be seen in the attached file
"minEx_4MPIprocesses_problem_error_output_3_MPI"
- For 4 MPI procs either,
the model runs correctly,
or it leads to an error code output ("segmentation fault") and the
model not terminating. (When executed in the terminal the keyboard command
"str+c" is needed to stop the execution. The terminal output for an example run
can be seen in the attached file
"minEx_4MPIprocesses_problem_error_output_4_MPI"
If the model is executed with the same setting on 3 and 4 MPI procs there are 3
possible combinations of above described problems.
1. The model runs and terminates on both, however the distances created on 3 MPI
procs are wrong.
2. The model works with 4 MPI procs but creates the "segmentation fault"
error on 3 MPI processes.
3. The model creates a "segmentation fault" error on both.
Please keep in mind that whether an error occurs and in which form highly depends on the
used setting. In the minimal example I provide different settings which represent the
above described cases. However, it might not cover everything that can occur.
Workaround:
For my own use-case I found a rather computing-time consuming workaround.
It involves applying nest.SelectNodesByMask() on every source neuron, and from the
resulting set choosing the targets using the desired probability.
This requires multiple additional loops and also communicating the position data of every
neuron to every MPI process in the beginning.
Using this approach, the connection distances seem to be correct and no error occurs while
executing the model. However, while writing this I'm questioning if I actually tested
it enough. So there might be some not yet discovered problems.
Is there something that I overlooked or approached wrong when using masks in the
connectivity dict on multiple MPI processes?
Thanks in advance!
Best,
Miriam Kempter
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Jülich GmbH
52425 Jülich
Sitz der Gesellschaft: Jülich
Eingetragen im Handelsregister des Amtsgerichts Düren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Stefan Müller
Geschäftsführung: Prof. Dr. Astrid Lambrecht (Vorsitzende),
Karsten Beneke (stellv. Vorsitzender), Dr. Ir. Pieter Jansens
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------