Hi all!
What I'm doing:
I've used v2.20.2 on HPC (bwForCluster NEMO) for my large-scale simulations involving
structural plasticity. Now I'm trying to move to v3.8.
How am I doing this?
I built NESTv3.8 on my HPC workspace following the cmake instructions mentioned at:
https://nest-simulator.readthedocs.io/en/v3.8/installation/cmake_options.ht…
Here's an overview of my build and installation commands:
```
source /<some_path>/NESTv3.8/bin/activate # activate a fresh python venv (v3.9.7)
[optional]
module load mpi/openmpi/4.0-gnu-9.2
# cmake version 3.30.4
cmake --debug-find \
-DMPI_C_COMPILER=$(which mpicc) \
-DMPI_CXX_COMPILER=$(which mpicxx) \
-DMPI_HOME=$(which mpirun) \
-Dwith-mpi=ON \
-Dwith-openmp=ON \
-DCYTHON_EXECUTABLE=/<some_path>/intel/oneapi/2022.1/intelpython/latest/bin/cython
\
-DCMAKE_INSTALL_PREFIX:PATH=/<some_path>/nest-simulator-3.8-build \
/<path_to_extracted_tar>/nest-simulator-3.8/
# if I don't specify the Cython path, cmake picks an outdated (uncompatible) cython
version. I did not try the -Dcythonize-pynest=OFF option because I don't know how to
build PyNEST from a pre-cythonized pynestkernel.pyx.
make
make install
make installcheck
```
Once this finishes, I do the usual `source`-ing of `nest_vars.sh`. I also add the NEST
path to my python venv's `site-packages`. This has always worked for all NEST versions
I ever installed.
`installcheck` finishes with 1 error. See below:
```
THE NEST TESTSUITE DISCOVERED PROBLEMS
The following tests failed
| regressiontests.issue-1703.py
Please report test failures by creating an issue at
https://github.com/nest/nest-simulator/issues
---------------------------------------------------------------------------
make[3]: *** [CMakeFiles/installcheck] Error 1
make[2]: *** [CMakeFiles/installcheck.dir/all] Error 2
make[1]: *** [CMakeFiles/installcheck.dir/rule] Error 2
make: *** [installcheck] Error 2
```
THE WARNING:
Ignoring this, if I proceed with pyNEST in the venv by `import nest`, I get the following
warning:
```
--------------------------------------------------------------------------
WARNING: There is at least non-excluded one OpenFabrics device found,
but there are no active ports detected (or Open MPI was unable to use
them). This is most certainly not what you wanted. Check your
cables, subnet manager configuration, etc. The openib BTL will be
ignored for this job.
Local host: n4669
--------------------------------------------------------------------------
n4669.nemo.privat.74030PSM2 no hfi units are active (err=23)
--------------------------------------------------------------------------
Open MPI failed an OFI Libfabric library call (fi_endpoint). This is highly
unusual; your job may behave unpredictably (and/or abort) after this.
Local host: n4669
Location: mtl_ofi_component.c:627
Error: Invalid argument (22)
--------------------------------------------------------------------------
-- N E S T --
Copyright (C) 2004 The NEST Initiative
Version: 3.8.0
Built: Oct 10 2024 16:21:41
This program is provided AS IS and comes with
NO WARRANTY. See the file LICENSE for details.
Problems or suggestions?
Visit
https://www.nest-simulator.org
Type 'nest.help()' to find out more about NEST.
```
THE CRASH:
If I ignore this warning and proceed to submit a job, the execution eventually crashes
with exit code: 134. If you'd like to see the crash dump, check this MD file on
GitLab.:
https://gitlab.rz.uni-freiburg.de/as2013/mpi-hpc-nestv3.8.git
Any idea what's going wrong here?
If I follow the same build and installation steps on my local machine (other than
specifying the Cython path in cmake; because no need). Neither do I get any warnings nor
does any simulation crash.
I'm guessing this has to do with the build and installation on the cluster.
I'd appreciate any input. Thanks!
Best,
Ady