Hello,
I have tried to execute one of the e-prop tutorials (eprop_supervised_regression_handwriting_bsshslm_2020.py) and get into problems when activating MPI. I do not know whether it is an e-prop C++ implementation problem or a Python tutorial code problem (in some cases, but not all of them, it looks like the last one).
I have two files (hostfile_orig_1 and hostfile_orig_2) to define in what nodes of the cluster to execute the program. The content of these files is: hostfile_orig_1: node0 slots=1 node1 slots=1
hostfile_orig_2: node0 slots=2 node0 slots=2
The 'slots' key tells how many (mpi) processes can be executed on a particular node.
Depending on the number of processes, the errors are a bit different. In all the following examples, I use "total_num_virtual_procs": 2 on line 173 of tutorial file.
Below I write the execution command and the output error.
Command: mpirun -np 1 -hostfile hostfile_orig_1 python3 eprop_supervised_regression_handwriting_bsshslm_2020.py Output: With a "serial" execution like this one, everything is OK:
Command: mpirun -np 2 -hostfile hostfile_orig_2 python3 eprop_supervised_regression_handwriting_bsshslm_2020.py Output: Traceback (most recent call last): File "eprop_supervised_regression_handwriting_bsshslm_2020.py", line 404, in <module> nest.GetConnections(nrns_rec[0], nrns_rec[1:3]).set([params_init_optimizer] * 2) File "/home/neurobit/local/nest_3.9/lib64/python3.8/site-packages/nest/lib/hl_api_types.py", line 945, in set raise TypeError("status dict must be a dict, or a list of dicts of length {}".format(self.__len__())) TypeError: status dict must be a dict, or a list of dicts of length 1
Command: mpirun -np 2 -hostfile hostfile_orig_2 python3 eprop_supervised_regression_handwriting_bsshslm_2020.py (commenting line 404) Output: Traceback (most recent call last): File "eprop_supervised_regression_handwriting_bsshslm_2020.py", line 545, in <module> readout_signal = readout_signal.reshape((n_out, n_iter, batch_size, steps["sequence"])) ValueError: cannot reshape array of size 364800 into shape (2,200,1,1824)
Command: mpirun -np 2 -hostfile hostfile_orig_1 python3 eprop_supervised_regression_handwriting_bsshslm_2020.py (commenting line 404) Output: Traceback (most recent call last): File "eprop_supervised_regression_handwriting_bsshslm_2020.py", line 545, in <module> readout_signal = readout_signal.reshape((n_out, n_iter, batch_size, steps["sequence"])) ValueError: cannot reshape array of size 364800 into shape (2,200,1,1824)
Command: mpirun -np 4 -hostfile hostfile_orig_2 python3 eprop_supervised_regression_handwriting_bsshslm_2020.py (commenting line 404 and using "total_num_virtual_procs": 4 on line 173 of tutorial file.) Output: Traceback (most recent call last): File "eprop_supervised_regression_handwriting_bsshslm_2020.py", line 493, in <module> "rec_out": get_weights(nrns_rec, nrns_out), File "eprop_supervised_regression_handwriting_bsshslm_2020.py", line 482, in get_weights conns["senders"] = np.array(conns["source"]) - np.min(conns["source"]) TypeError: tuple indices must be integers or slices, not str
In this last case, program stops and hungs.
If you want, I could submit a bug report on github.
Xavier