Dear Jiadu,
Running NEST (or any software) across multiple servers with MPI is not trivial due to
dependencies between NEST, OpenMPI and the system you are running on. The error message
you receive indicates that MPI communication does not work properly, most likely because
the OpenMPI version with which the binary provided via conda and the MPI version on the
servers you use do not fit with each other.
To run NEST on your servers, you should build NEST from sources against the MPI libraries
for your server system. Please see the NEST installation instructions here
https://nest-simulator.readthedocs.io/en/nest-2.20.1/installation/index.htm…
We have tested NEST installations on a considerable range of systems and believe that our
build system is quite robust. If you should run into problems building on your system,
I'd suggest that you get in touch with support staff for your server system who should
be able to guide you to suitable versions of, e.g., MPI libraries. We only have very
limited resources to support installation of NEST on user systems.
Best regards,
Hans Ekkehard
--
Prof. Dr. Hans Ekkehard Plesser
Head, Department of Data Science
Faculty of Science and Technology
Norwegian University of Life Sciences
PO Box 5003, 1432 Aas, Norway
Phone +47 6723 1560
Email hans.ekkehard.plesser(a)nmbu.no
Home
http://arken.nmbu.no/~plesser
On 03/12/2020, 02:34, "jiadu xie" <1323504842(a)qq.com> wrote:
I want to run neuron simulations in parallel on multiple servers in OpenMPI.
I've installed Nest through conda: conda install -c conda-forge nest-simulator.
multi_test.pycode show as below:
from nest import *
SetKernelStatus({"total_num_virtual_procs": 4})
pg = Create("poisson_generator", params={"rate": 50000.0})
n = Create("iaf_psc_alpha", 4)
sd = Create("spike_detector", params={"to_file": True})
print("work01,My Rank is :{}".format(Rank()))
#print("Processes Number is :{}".format(NumProcesses())
#print("Processor Name is :{}".format(ProcessorName())
Connect(pg, [n[0]], syn_spec={'weight': 1000.0, 'delay': 1.0})
Connect([n[0]], [n[1]], syn_spec={'weight': 1000.0, 'delay': 1.0})
Connect([n[1]], [n[2]], syn_spec={'weight': 1000.0, 'delay': 1.0})
Connect([n[2]], [n[3]], syn_spec={'weight': 1000.0, 'delay': 1.0})
Connect(n, sd)
Simulate(100.0)
To Reproduce
Steps to reproduce the behavior:
(pynest) work@work01:~/xiejiadu/nest_multi_test$
/home/work/anaconda3/envs/pynest/bin/mpirun -np 2 -host work01:1,work02:1
/home/work/anaconda3/envs/pynest/bin/python3
/home/work/xiejiadu/nest_multi_test/multi_test.py
[INFO] [2020.11.23 3:57:6
/home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/nestkernel/rng_manager.cpp:217
@ Network::create_rngs_] : Creating default RNGs
[INFO] [2020.11.23 3:57:6
/home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/nestkernel/rng_manager.cpp:260
@ Network::create_grng_] : Creating new default global RNG
[INFO] [2020.11.23 3:57:6
/home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/nestkernel/rng_manager.cpp:217
@ Network::create_rngs_] : Creating default RNGs
[INFO] [2020.11.23 3:57:6
/home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/nestkernel/rng_manager.cpp:260
@ Network::create_grng_] : Creating new default global RNG
python3:
/home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/sli/scanner.cc:581:
bool Scanner::operator()(Token&): Assertion `in->good()' failed.
[work02:95945] *** Process received signal ***
[work02:95945] Signal: Aborted (6)
[work02:95945] Signal code: (-6)
[work02:95945] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7fc94a207730]
[work02:95945] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b)[0x7fc94a0697bb]
[work02:95945] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x121)[0x7fc94a054535]
[work02:95945] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x2240f)[0x7fc94a05440f]
[work02:95945] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x30102)[0x7fc94a062102]
[work02:95945] [ 5]
/home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(_ZN7ScannerclER5Token+0x1489)[0x7fc93cf3ceb9]
[work02:95945] [ 6]
/home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(_ZN6ParserclER5Token+0x49)[0x7fc93cf2f229]
[work02:95945] [ 7]
/home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(_ZNK14IparseFunction7executeEP14SLIInterpreter+0x96)[0x7fc93cf66666]
[work02:95945] [ 8]
/home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(+0x74193)[0x7fc93cf25193]
[work02:95945] [ 9]
/home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(_ZN14SLIInterpreter8execute_Em+0x222)[0x7fc93cf29a32]
[work02:95945] [10]
/home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(_ZN14SLIInterpreter7startupEv+0x27)[0x7fc93cf29e57]
[work02:95945] [11]
/home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libnest.so(_Z11neststartupPiPPPcR14SLIInterpreterNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x1ea0)[0x7fc93d97ba40]
[work02:95945] [12]
/home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/pynestkernel.so(+0x444dc)[0x7fc93dd774dc]
[work02:95945] [13]
/home/work/anaconda3/envs/pynest/bin/python3(+0x1b4924)[0x55e5ae205924]
[work02:95945] [14]
/home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalFrameDefault+0x4bf)[0x55e5ae22dbcf]
[work02:95945] [15]
/home/work/anaconda3/envs/pynest/bin/python3(_PyFunction_Vectorcall+0x1b7)[0x55e5ae21a637]
[work02:95945] [16]
/home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalFrameDefault+0x71a)[0x55e5ae22de2a]
[work02:95945] [17]
/home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalCodeWithName+0x260)[0x55e5ae219490]
[work02:95945] [18]
/home/work/anaconda3/envs/pynest/bin/python3(+0x1f6bb9)[0x55e5ae247bb9]
[work02:95945] [19]
/home/work/anaconda3/envs/pynest/bin/python3(+0x13a23d)[0x55e5ae18b23d]
[work02:95945] [20]
/home/work/anaconda3/envs/pynest/bin/python3(PyVectorcall_Call+0x6f)[0x55e5ae1aef2f]
[work02:95945] [21]
/home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalFrameDefault+0x5fc1)[0x55e5ae2336d1]
[work02:95945] [22]
/home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalCodeWithName+0x260)[0x55e5ae219490]
[work02:95945] [23]
/home/work/anaconda3/envs/pynest/bin/python3(_PyFunction_Vectorcall+0x594)[0x55e5ae21aa14]
[work02:95945] [24]
/home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalFrameDefault+0x4e73)[0x55e5ae232583]
[work02:95945] [25]
/home/work/anaconda3/envs/pynest/bin/python3(_PyFunction_Vectorcall+0x1b7)[0x55e5ae21a637]
[work02:95945] [26]
/home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalFrameDefault+0x4bf)[0x55e5ae22dbcf]
[work02:95945] [27]
/home/work/anaconda3/envs/pynest/bin/python3(_PyFunction_Vectorcall+0x1b7)[0x55e5ae21a637]
[work02:95945] [28]
/home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalFrameDefault+0x71a)[0x55e5ae22de2a]
[work02:95945] [29]
/home/work/anaconda3/envs/pynest/bin/python3(_PyFunction_Vectorcall+0x1b7)[0x55e5ae21a637]
[work02:95945] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: Open MPI failed to TCP connect to a peer MPI process. This
should not happen.
Your Open MPI job may now hang or fail.
Local host: work01
PID: 114620
Message: connect() to 192.168.204.122:1024 failed
Error: Operation now in progress (115)
--------------------------------------------------------------------------
[work01:114615] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 2193
-- N E S T --
Copyright (C) 2004 The NEST Initiative
Version: nest-2.18.0
Built: Jan 27 2020 12:49:17
This program is provided AS IS and comes with
NO WARRANTY. See the file LICENSE for details.
Problems or suggestions?
Visit
https://www.nest-simulator.org
Type 'nest.help()' to find out more about NEST.
Nov 23 03:57:06 ModelManager::clear_models_ [Info]:
Models will be cleared and parameters reset.
Nov 23 03:57:06 Network::create_rngs_ [Info]:
Deleting existing random number generators
Nov 23 03:57:06 Network::create_rngs_ [Info]:
Creating default RNGs
Nov 23 03:57:06 Network::create_grng_ [Info]:
Creating new default global RNG
Nov 23 03:57:06 RecordingDevice::set_status [Info]:
Data will be recorded to file and to memory.
work01,My Rank is :0
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 95945 on node work02 exited on signal 6
(Aborted).
--------------------------------------------------------------------------
Command to run
/home/work/anaconda3/envs/pynest/bin/mpirun -np 2 -host work01:1,work02:1
/home/work/anaconda3/envs/pynest/bin/python3
/home/work/xiejiadu/nest_multi_test/multi_test.py
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop/Environment:
OS: Debain-10.0
Shell: conda4.8.3
Python-Version: Python 3.8.6
NEST-Version: nest-2.18
Installation:conda packet, with MPI
Best
jiaduxie
_______________________________________________
NEST Users mailing list -- users(a)nest-simulator.org
To unsubscribe send an email to users-leave(a)nest-simulator.org