I want to run neuron simulations in parallel on multiple servers in OpenMPI. I've installed Nest through conda: conda install -c conda-forge nest-simulator. multi_test.pycode show as below:
from nest import * SetKernelStatus({"total_num_virtual_procs": 4}) pg = Create("poisson_generator", params={"rate": 50000.0}) n = Create("iaf_psc_alpha", 4) sd = Create("spike_detector", params={"to_file": True}) print("work01,My Rank is :{}".format(Rank())) #print("Processes Number is :{}".format(NumProcesses()) #print("Processor Name is :{}".format(ProcessorName()) Connect(pg, [n[0]], syn_spec={'weight': 1000.0, 'delay': 1.0}) Connect([n[0]], [n[1]], syn_spec={'weight': 1000.0, 'delay': 1.0}) Connect([n[1]], [n[2]], syn_spec={'weight': 1000.0, 'delay': 1.0}) Connect([n[2]], [n[3]], syn_spec={'weight': 1000.0, 'delay': 1.0}) Connect(n, sd) Simulate(100.0) To Reproduce Steps to reproduce the behavior:
(pynest) work@work01:~/xiejiadu/nest_multi_test$ /home/work/anaconda3/envs/pynest/bin/mpirun -np 2 -host work01:1,work02:1 /home/work/anaconda3/envs/pynest/bin/python3 /home/work/xiejiadu/nest_multi_test/multi_test.py [INFO] [2020.11.23 3:57:6 /home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/nestkernel/rng_manager.cpp:217 @ Network::create_rngs_] : Creating default RNGs [INFO] [2020.11.23 3:57:6 /home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/nestkernel/rng_manager.cpp:260 @ Network::create_grng_] : Creating new default global RNG [INFO] [2020.11.23 3:57:6 /home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/nestkernel/rng_manager.cpp:217 @ Network::create_rngs_] : Creating default RNGs [INFO] [2020.11.23 3:57:6 /home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/nestkernel/rng_manager.cpp:260 @ Network::create_grng_] : Creating new default global RNG python3: /home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/sli/scanner.cc:581: bool Scanner::operator()(Token&): Assertion `in->good()' failed. [work02:95945] *** Process received signal *** [work02:95945] Signal: Aborted (6) [work02:95945] Signal code: (-6) [work02:95945] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7fc94a207730] [work02:95945] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b)[0x7fc94a0697bb] [work02:95945] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x121)[0x7fc94a054535] [work02:95945] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x2240f)[0x7fc94a05440f] [work02:95945] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x30102)[0x7fc94a062102] [work02:95945] [ 5] /home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(_ZN7ScannerclER5Token+0x1489)[0x7fc93cf3ceb9] [work02:95945] [ 6] /home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(_ZN6ParserclER5Token+0x49)[0x7fc93cf2f229] [work02:95945] [ 7] /home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(_ZNK14IparseFunction7executeEP14SLIInterpreter+0x96)[0x7fc93cf66666] [work02:95945] [ 8] /home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(+0x74193)[0x7fc93cf25193] [work02:95945] [ 9] /home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(_ZN14SLIInterpreter8execute_Em+0x222)[0x7fc93cf29a32] [work02:95945] [10] /home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(_ZN14SLIInterpreter7startupEv+0x27)[0x7fc93cf29e57] [work02:95945] [11] /home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libnest.so(_Z11neststartupPiPPPcR14SLIInterpreterNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x1ea0)[0x7fc93d97ba40] [work02:95945] [12] /home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/pynestkernel.so(+0x444dc)[0x7fc93dd774dc] [work02:95945] [13] /home/work/anaconda3/envs/pynest/bin/python3(+0x1b4924)[0x55e5ae205924] [work02:95945] [14] /home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalFrameDefault+0x4bf)[0x55e5ae22dbcf] [work02:95945] [15] /home/work/anaconda3/envs/pynest/bin/python3(_PyFunction_Vectorcall+0x1b7)[0x55e5ae21a637] [work02:95945] [16] /home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalFrameDefault+0x71a)[0x55e5ae22de2a] [work02:95945] [17] /home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalCodeWithName+0x260)[0x55e5ae219490] [work02:95945] [18] /home/work/anaconda3/envs/pynest/bin/python3(+0x1f6bb9)[0x55e5ae247bb9] [work02:95945] [19] /home/work/anaconda3/envs/pynest/bin/python3(+0x13a23d)[0x55e5ae18b23d] [work02:95945] [20] /home/work/anaconda3/envs/pynest/bin/python3(PyVectorcall_Call+0x6f)[0x55e5ae1aef2f] [work02:95945] [21] /home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalFrameDefault+0x5fc1)[0x55e5ae2336d1] [work02:95945] [22] /home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalCodeWithName+0x260)[0x55e5ae219490] [work02:95945] [23] /home/work/anaconda3/envs/pynest/bin/python3(_PyFunction_Vectorcall+0x594)[0x55e5ae21aa14] [work02:95945] [24] /home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalFrameDefault+0x4e73)[0x55e5ae232583] [work02:95945] [25] /home/work/anaconda3/envs/pynest/bin/python3(_PyFunction_Vectorcall+0x1b7)[0x55e5ae21a637] [work02:95945] [26] /home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalFrameDefault+0x4bf)[0x55e5ae22dbcf] [work02:95945] [27] /home/work/anaconda3/envs/pynest/bin/python3(_PyFunction_Vectorcall+0x1b7)[0x55e5ae21a637] [work02:95945] [28] /home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalFrameDefault+0x71a)[0x55e5ae22de2a] [work02:95945] [29] /home/work/anaconda3/envs/pynest/bin/python3(_PyFunction_Vectorcall+0x1b7)[0x55e5ae21a637] [work02:95945] *** End of error message *** -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- WARNING: Open MPI failed to TCP connect to a peer MPI process. This should not happen.
Your Open MPI job may now hang or fail.
Local host: work01 PID: 114620 Message: connect() to 192.168.204.122:1024 failed Error: Operation now in progress (115) -------------------------------------------------------------------------- [work01:114615] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 2193
-- N E S T -- Copyright (C) 2004 The NEST Initiative
Version: nest-2.18.0 Built: Jan 27 2020 12:49:17
This program is provided AS IS and comes with NO WARRANTY. See the file LICENSE for details.
Problems or suggestions? Visit https://www.nest-simulator.org
Type 'nest.help()' to find out more about NEST.
Nov 23 03:57:06 ModelManager::clear_models_ [Info]: Models will be cleared and parameters reset.
Nov 23 03:57:06 Network::create_rngs_ [Info]: Deleting existing random number generators
Nov 23 03:57:06 Network::create_rngs_ [Info]: Creating default RNGs
Nov 23 03:57:06 Network::create_grng_ [Info]: Creating new default global RNG
Nov 23 03:57:06 RecordingDevice::set_status [Info]: Data will be recorded to file and to memory. work01,My Rank is :0 -------------------------------------------------------------------------- mpirun noticed that process rank 1 with PID 95945 on node work02 exited on signal 6 (Aborted). -------------------------------------------------------------------------- Command to run /home/work/anaconda3/envs/pynest/bin/mpirun -np 2 -host work01:1,work02:1 /home/work/anaconda3/envs/pynest/bin/python3 /home/work/xiejiadu/nest_multi_test/multi_test.py Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Desktop/Environment:
OS: Debain-10.0 Shell: conda4.8.3 Python-Version: Python 3.8.6 NEST-Version: nest-2.18 Installation:conda packet, with MPI
Best jiaduxie