Hi Stefan,
I'm also having some unexplainable performance issue with OpenMP on our cluster, and
have yet to find a solution, but perhaps the behavior I've observed would provide you
some ideas.
I run my Nest scripts on a single node (so no MPI) but have set the number of threads
using `local_num_threads` in my script and also the `#SBATCH --ntasks=28`. Additionally,
I've set the `MKL_THREADING_LAYER` environment variable per [GitHub issue
#2573](https://github.com/nest/nest-simulator/issues/2573).
However, when monitoring the node via `htop`, I only sometimes see utilization close to
28x cores. The core utilization seems to be random from each SBATCH runs (sometimes 10x
core, sometimes 6x cores) even when on an idle node of my cluster, thus the issue seems to
be independent of how subscribed a node is.
The GH issue #2573 also points to [GH issue
#2401](https://github.com/nest/nest-simulator/pull/2401) and [this documentation on
threading](https://nest-simulator.readthedocs.io/en/stable/hpc/threading.html#table-of-openmp-settings).