Dear Michele,
I assume you use the hpc_benchmark.py without any modifications? On my laptop,
mpirun -np 2 python
install/share/doc/nest/examples/pynest/hpc_benchmark.py
executes without problems with NEST 3.6 (and with current master).
Interestingly, for the hpc_benchmark, NEST should never even get to the else block to
which the assertion on line 107 in target_table.cpp belongs, since all connections in the
network are primary connections. So something seems to go wrong in the communication of
information about connections to the presynaptic side.
Could you try with this branch
https://github.com/heplesser/nest-simulator/tree/36_nosingle
It makes sure all MPI communication strictly happens in the OpenMP master thread (in NEST
3.6, it may happen inside OpenMP single constructs). This should not make a difference for
the hpc_benchmark, since it uses a single thread by default.
Best,
Hans Ekkehard
--
Prof. Dr. Hans Ekkehard Plesser
Department of Data Science
Faculty of Science and Technology
Norwegian University of Life Sciences
PO Box 5003, 1432 Aas, Norway
Phone +47 6723 1560
Email hans.ekkehard.plesser@nmbu.no<mailto:hans.ekkehard.plesser@nmbu.no>
Home
http://arken.nmbu.no/~plesser
From: Michele Martinelli <michele.martinelli(a)roma1.infn.it>
Date: Monday, 29 January 2024 at 10:35
To: users(a)nest-simulator.org <users(a)nest-simulator.org>
Subject: [NEST Users] Assert failed running hpc_benchmark
Some people who received this message don't often get email from
michele.martinelli(a)roma1.infn.it. Learn why this is
important<https://aka.ms/LearnAboutSenderIdentification>
Dear NEST Users & Developers,
we're currently working on a custom OpenMPI BTL (supporting a custom FPGA-based NIC)
at the National Institute for Nuclear Physics in Rome, Italy and we have an error when
running hpc_benchmark (this test is currently used as simple validation test) with 2
processes (one on each of 2 hosts), the command we run is like:
mpirun -n 2 -H host1:1,host2:1 --bynode --report-bindings -mca btl apelink,self,sm python
hpc_benchmark.py (apelink is our custom BTL component)
but then we see this error:
python: [...]/NEST_with_local_ompi/nest-simulator-3.6/nestkernel/target_table.cpp:107:
void nest::TargetTable::add_target(size_t, size_t, const nest::TargetData&): Assertion
`syn_id < secondary_send_buffer_pos_[ tid ][ lid ].size()' failed. [host:23979] ***
Process received signal *** [host:23979] Signal: Aborted (6) [host:23979] Signal code:
(-6)
My guess is that we are transferring something incorrectly (maybe during the
initialization/setup phase?), but I'm not sure what the assert expects to have in
secondary_send_buffer_pos_[ tid ][ lid ].size() and how this field should be set.
Best,
Michele