I want to run neuron simulations in parallel on multiple servers in OpenMPI. I've installed Nest through conda: conda install -c conda-forge nest-simulator. multi_test.pycode show as below:
from nest import * SetKernelStatus({"total_num_virtual_procs": 4}) pg = Create("poisson_generator", params={"rate": 50000.0}) n = Create("iaf_psc_alpha", 4) sd = Create("spike_detector", params={"to_file": True}) print("work01,My Rank is :{}".format(Rank())) #print("Processes Number is :{}".format(NumProcesses()) #print("Processor Name is :{}".format(ProcessorName()) Connect(pg, [n[0]], syn_spec={'weight': 1000.0, 'delay': 1.0}) Connect([n[0]], [n[1]], syn_spec={'weight': 1000.0, 'delay': 1.0}) Connect([n[1]], [n[2]], syn_spec={'weight': 1000.0, 'delay': 1.0}) Connect([n[2]], [n[3]], syn_spec={'weight': 1000.0, 'delay': 1.0}) Connect(n, sd) Simulate(100.0) To Reproduce Steps to reproduce the behavior:
(pynest) work@work01:~/xiejiadu/nest_multi_test$ /home/work/anaconda3/envs/pynest/bin/mpirun -np 2 -host work01:1,work02:1 /home/work/anaconda3/envs/pynest/bin/python3 /home/work/xiejiadu/nest_multi_test/multi_test.py [INFO] [2020.11.23 3:57:6 /home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/nestkernel/rng_manager.cpp:217 @ Network::create_rngs_] : Creating default RNGs [INFO] [2020.11.23 3:57:6 /home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/nestkernel/rng_manager.cpp:260 @ Network::create_grng_] : Creating new default global RNG [INFO] [2020.11.23 3:57:6 /home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/nestkernel/rng_manager.cpp:217 @ Network::create_rngs_] : Creating default RNGs [INFO] [2020.11.23 3:57:6 /home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/nestkernel/rng_manager.cpp:260 @ Network::create_grng_] : Creating new default global RNG python3: /home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/sli/scanner.cc:581: bool Scanner::operator()(Token&): Assertion `in->good()' failed. [work02:95945] *** Process received signal *** [work02:95945] Signal: Aborted (6) [work02:95945] Signal code: (-6) [work02:95945] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7fc94a207730] [work02:95945] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b)[0x7fc94a0697bb] [work02:95945] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x121)[0x7fc94a054535] [work02:95945] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x2240f)[0x7fc94a05440f] [work02:95945] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x30102)[0x7fc94a062102] [work02:95945] [ 5] /home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(_ZN7ScannerclER5Token+0x1489)[0x7fc93cf3ceb9] [work02:95945] [ 6] /home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(_ZN6ParserclER5Token+0x49)[0x7fc93cf2f229] [work02:95945] [ 7] /home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(_ZNK14IparseFunction7executeEP14SLIInterpreter+0x96)[0x7fc93cf66666] [work02:95945] [ 8] /home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(+0x74193)[0x7fc93cf25193] [work02:95945] [ 9] /home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(_ZN14SLIInterpreter8execute_Em+0x222)[0x7fc93cf29a32] [work02:95945] [10] /home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(_ZN14SLIInterpreter7startupEv+0x27)[0x7fc93cf29e57] [work02:95945] [11] /home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libnest.so(_Z11neststartupPiPPPcR14SLIInterpreterNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x1ea0)[0x7fc93d97ba40] [work02:95945] [12] /home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/pynestkernel.so(+0x444dc)[0x7fc93dd774dc] [work02:95945] [13] /home/work/anaconda3/envs/pynest/bin/python3(+0x1b4924)[0x55e5ae205924] [work02:95945] [14] /home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalFrameDefault+0x4bf)[0x55e5ae22dbcf] [work02:95945] [15] /home/work/anaconda3/envs/pynest/bin/python3(_PyFunction_Vectorcall+0x1b7)[0x55e5ae21a637] [work02:95945] [16] /home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalFrameDefault+0x71a)[0x55e5ae22de2a] [work02:95945] [17] /home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalCodeWithName+0x260)[0x55e5ae219490] [work02:95945] [18] /home/work/anaconda3/envs/pynest/bin/python3(+0x1f6bb9)[0x55e5ae247bb9] [work02:95945] [19] /home/work/anaconda3/envs/pynest/bin/python3(+0x13a23d)[0x55e5ae18b23d] [work02:95945] [20] /home/work/anaconda3/envs/pynest/bin/python3(PyVectorcall_Call+0x6f)[0x55e5ae1aef2f] [work02:95945] [21] /home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalFrameDefault+0x5fc1)[0x55e5ae2336d1] [work02:95945] [22] /home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalCodeWithName+0x260)[0x55e5ae219490] [work02:95945] [23] /home/work/anaconda3/envs/pynest/bin/python3(_PyFunction_Vectorcall+0x594)[0x55e5ae21aa14] [work02:95945] [24] /home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalFrameDefault+0x4e73)[0x55e5ae232583] [work02:95945] [25] /home/work/anaconda3/envs/pynest/bin/python3(_PyFunction_Vectorcall+0x1b7)[0x55e5ae21a637] [work02:95945] [26] /home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalFrameDefault+0x4bf)[0x55e5ae22dbcf] [work02:95945] [27] /home/work/anaconda3/envs/pynest/bin/python3(_PyFunction_Vectorcall+0x1b7)[0x55e5ae21a637] [work02:95945] [28] /home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalFrameDefault+0x71a)[0x55e5ae22de2a] [work02:95945] [29] /home/work/anaconda3/envs/pynest/bin/python3(_PyFunction_Vectorcall+0x1b7)[0x55e5ae21a637] [work02:95945] *** End of error message *** -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- WARNING: Open MPI failed to TCP connect to a peer MPI process. This should not happen.
Your Open MPI job may now hang or fail.
Local host: work01 PID: 114620 Message: connect() to 192.168.204.122:1024 failed Error: Operation now in progress (115) -------------------------------------------------------------------------- [work01:114615] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 2193
-- N E S T -- Copyright (C) 2004 The NEST Initiative
Version: nest-2.18.0 Built: Jan 27 2020 12:49:17
This program is provided AS IS and comes with NO WARRANTY. See the file LICENSE for details.
Problems or suggestions? Visit https://www.nest-simulator.org
Type 'nest.help()' to find out more about NEST.
Nov 23 03:57:06 ModelManager::clear_models_ [Info]: Models will be cleared and parameters reset.
Nov 23 03:57:06 Network::create_rngs_ [Info]: Deleting existing random number generators
Nov 23 03:57:06 Network::create_rngs_ [Info]: Creating default RNGs
Nov 23 03:57:06 Network::create_grng_ [Info]: Creating new default global RNG
Nov 23 03:57:06 RecordingDevice::set_status [Info]: Data will be recorded to file and to memory. work01,My Rank is :0 -------------------------------------------------------------------------- mpirun noticed that process rank 1 with PID 95945 on node work02 exited on signal 6 (Aborted). -------------------------------------------------------------------------- Command to run /home/work/anaconda3/envs/pynest/bin/mpirun -np 2 -host work01:1,work02:1 /home/work/anaconda3/envs/pynest/bin/python3 /home/work/xiejiadu/nest_multi_test/multi_test.py Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Desktop/Environment:
OS: Debain-10.0 Shell: conda4.8.3 Python-Version: Python 3.8.6 NEST-Version: nest-2.18 Installation:conda packet, with MPI
Best jiaduxie
Dear Jiadu,
Running NEST (or any software) across multiple servers with MPI is not trivial due to dependencies between NEST, OpenMPI and the system you are running on. The error message you receive indicates that MPI communication does not work properly, most likely because the OpenMPI version with which the binary provided via conda and the MPI version on the servers you use do not fit with each other.
To run NEST on your servers, you should build NEST from sources against the MPI libraries for your server system. Please see the NEST installation instructions here
https://nest-simulator.readthedocs.io/en/nest-2.20.1/installation/index.html...
We have tested NEST installations on a considerable range of systems and believe that our build system is quite robust. If you should run into problems building on your system, I'd suggest that you get in touch with support staff for your server system who should be able to guide you to suitable versions of, e.g., MPI libraries. We only have very limited resources to support installation of NEST on user systems.
Best regards, Hans Ekkehard
Thank you for your reply.I installed NEST of compiled and installed in conda, and then I tried to simulate a neural network with multiple nodes but failed.I think this error is like an error in the NEST kernel libraryCan you help me see what is wrong with this error?Was my test code and running script commands wrong?
erro: home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/sli/scanner.cc:581: bool Scanner::operator()(Token&): Assertion `in->good()' failed. [work02:95945] *** Process received signal *** [work02:95945] Signal: Aborted (6) [work02:95945] Signal code: (-6) [work02:95945] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7fc94a207730] [work02:95945] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b)[0x7fc94a0697bb] [work02:95945] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x121)[0x7fc94a054535] [work02:95945] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x2240f)[0x7fc94a05440f] [work02:95945] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x30102)[0x7fc94a062102] [work02:95945] [ 5] /home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(_ZN7ScannerclER5Token+0x1489)[0x7fc93cf3ceb9] [work02:95945] [ 6] /home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(_ZN6ParserclER5Token+0x49)[0x7fc93cf2f229] [work02:95945] [ 7] /home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(_ZNK14IparseFunction7executeEP14SLIInterpreter+0x96)[0x7fc93cf66666] [work02:95945] [ 8] /home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(+0x74193)[0x7fc93cf25193] [work02:95945] [ 9] /home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(_ZN14SLIInterpreter8execute_Em+0x222)[0x7fc93cf29a32] [work02:95945] [10] /home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(_ZN14SLIInterpreter7startupEv+0x27)[0x7fc93cf29e57] [work02:95945] [11] /home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libnest.so(_Z11neststartupPiPPPcR14SLIInterpreterNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x1ea0)[0x7fc93d97ba40] [work02:95945] [12] /home/work/anaconda3/envs/pynest/lib/
Best regards, jiaduxie