This document describes nmad installation and configuration.
for any question, send mail to: Alexa.nosp@m.ndre.nosp@m..Deni.nosp@m.s@in.nosp@m.ria.f.nosp@m.r.
The following development tools are required to compile NewMadeleine:
gcc
(version 4.0 and higher) or compatible (icc
, clang
).make
(version 3.81 and higher).$EXPAT_ROOT
if it cannot be found by pkg-config)$IBHOME
if not installed in /usr) (optional)PSM2_DIR
if not installed in system directories) (optional)$MX_DIR
if not installed in /usr) (optional)$PMIX_ROOT
if not installed in /usr)NewMadeleine may be downloaded either as a tarball from https://pm2.gitlabpages.inria.fr/releases/.
or use the git master from https://gitlab.inria.fr/pm2/pm2.
Installation of NewMadeleine is a standard sequence of ./configure ; make ; make install
. However, to build and install all modules required by NewMadeleine, we propose a script to build the modules with the right order and parameters.
Automated build (recommended):
To build all modules required by nmad, we recommend to use the build script located in pm2/scripts/pm2-build-packages using a given or a custom configuration file, e.g.:
% cd pm2/scripts % ./pm2-build-packages ./madmpi.conf --prefix=$HOME/soft/x86_64
For a standard multi-threaded build, it is advised to use madmpi.conf
. For a non-threaded (no progression!), configuration from madmpi-mini.conf
will lead to a slightly more efficient library.
Manual build (not recommended, advanced users only).
Module nmad requires other pm2 modules: Puk, PadicoTM, PukABI (optionnal), pioman (optionnal).
For each module:
./autogen.sh mkdir build ; cd build ../configure [your options here] make make install
Note: nmad purposely cannot be configured in its source directory. Please use a separate build directory.
Usefull configure flags (see ./configure –help)
--enable-sampling Enable network sampling --enable-mpi Enable builtin MPI implementation MadMPI --with-pioman use pioman I/O manager [default=no] --with-ibverbs use Infiniband ibverbs [default=check] --with-mx use Myrinet MX [default=check] --with-psm use Intel Performance Scaled Messasging (PSM) [default=check] --with-psm2 use Intel Performance Scaled Messasging 2 (PSM2) [default=check]
For an MPI applicatiion using MadMPI, use the standard mpicc
, mpif77
and mpif90
compiler frontends to build and link.
To build an application using native NewMadeleine interface, get the required flags through pkg-config. For CFLAGS:
% pkg-config --cflags nmad
For libraries:
% pkg-config --libs nmad
In a Makefile
, you will typically need:
CFLAGS += $(shell pkg-config --cflags) LIBS += $(shell pkg-config --libs)
For MadMPI use the standard mpirun
as launcher. Please see mpirun --help
for up-to-date documentation. Please note that MadMPI mpirun
is a frontend to padico-launch
so it accepts all options described below.
For native NewMadeleine applications, it is recommended to use padico-launch
as a launcher for nmad. It accepts parameters similar to mpirun
. Please see padico-launch --help
for up-to-date documentation. For example:
% padico-launch -n 2 -nodelist jack0,jack1 nm_bench_sendrecv
starts program 'nm_bench_sendrecv' on hosts jack0 and jack1, using auto-detected network.
Environment variables may be set using -D parameters, e.g.:
% padico-launch -c -p -n 2 -nodelist jack0,jack1 -DNMAD_DRIVER=ib nm_bench_sendrecv
starts program 'nm_bench_sendrecv' on hosts jack0 and jack1, over Infiniband, using one console per process.
On clusters using slurm, mpirun
and padico-launch
will start processes using srun
. It is possible for the user to directly use srun
without using mpirun
at all. Make sure to enable pmi2 or pmix, e.g.:
% srun -N 2 --exclusive --mpi=pmi2 nm_bench_sendrecv
To launch each process in a gdb
debugger, use argument -d
in association with -c
to get one console per node, i.e.:
% padico-launch -c -d -n 2 nm_sr_hello
To launch each process in valgrind
memcheck tool, use argument --padico-valgrind
, in association with -c
, i.e.:
% padico-launch -c --padico-valgrind -n 2 nm_sr_hello
NewMadeleine comes with a tool to detect if user data is modified while a non-blocking send is manipulating the data, which leads to data corruption. To detect such a bug in application code, please set the environment variable NMAD_ISEND_CHECK
to a non-null value, i.e.:
% padico-launch -n 2 -DNMAD_ISEND_CHECK=1 nm_sr_hello
It checks whether user buffer has been modified between nm_sr_isend
and nm_sr_swait
, or between MPI_Isend
and MPI_Test
or MPI_Wait
. This feature is only available when nmad is built in debug mode.
By default, NewMadeleine is quiet and outputs only warnings and fatal errors. To display info about the init (network detection, addresses, drivers used), it is advised to use verbose mode with the '-v' parameter:
% padico-launch -v -n 2 nm_sr_hello
Verbose mode is the default when NewMadeleine is built in debug mode. It is possible to switch to quiet mode with parameter '-q'. A custom trace policy may be given with --trace
(syntax not documented yet).
To help debug code on a large number of nodes, standard output and stderr may be captured and sent to disk, with one file per node using the --log
parameter, e.g.:
% padico-launch --log=${HOME}/log-$$ -n 2 nm_sr_hello
will send output to files in ${HOME}/log-$$. Directory is created if needed. File names contain: the username, the session uuid, the node rank, the hostname, and the node uuid, to avoid collisions and to allow easy browsing.
To help debug deadlocks in communications, NewMadeleine is able to detect stalled packets using the environment variable NMAD_PWSEND_TIMEOUT
, i.e. :
% padico-launch -n 2 -DNMAD_PWSEND_TIMEOUT=1 nm_sr_hello
It checks whether a packet wrapper takes more than 30 seconds to be sent on any track, or to be received only on large track. Since it relies on timers from profiling, it requires NewMadeleine to be built with profiling (--enable-profile
at configure).
In addition, when built with PadicoTM (the default), this flag enables a watchdog to check how often the optimizing strategy is called.
To help diagnose OOM errors, a memory monitor is available to display the allocated memory. It may be enabled by using the MemMonitor
PadicoTM module. It is loaded by using -iload-MemMonitor
init flag, e.g.:
% padico-launch -n 2 -iload-MemMonitor -DPADICO_MEM_MONITOR_PERIOD=5 nm_sr_hello
By default, it displays periodically the memory usage of the whole process, as given by getrusage
. The period may be tuned through the optionnal environment variable PADICO_MEM_MONITOR_PERIOD
(in seconds); the default is 3 seconds.
In addition, when NewMadeleine profiling is enabled, it displays the amount of memory allocated directly by Puk+PadicoTM+NewMadeleine, to distinguish its memory usage from the application (total amount of memory in bytes, number of mallocs, number of frees).
NewMadeleine may generate a trace of its internal state in the PAJE format. To do so, you may use the --enable-trace
configuration flag. This option requires the external library GTG.
When compiled with traces, NewMadeleine will automatically generate a PAJE trace file in the current directory at the end of the execution. A single file is generated for all nodes.
The content of traces may be controlled by the NMAD_TRACE
environment variable. It must contain a coma-separated list of the following items:
- core trace state of nmad core (beware: huge traces) - driver trace state of packet-wrappers - pack trace state of pack/unpack requests - link generate arrows for messages - all all of the above - none no trace - ^core remove core state - ^driver remove pw state - ^pack remove request state - ^link remove arrows
Operands are evaluated in order when adding/removing filters. The default when the variable is not set by user is NMAD_TRACE=all,^core
.
To enable profiling counters in NewMadeleine, please give --enable-profile
to the configure.
Then it is possible to control which counters are displayed using the PUK_PROFILE
environment variable which gives a filter to be matched against the profiling varibale name. By default, nothing is displayed. Use ‘PUK_PROFILE=’*'to display all variables,
PUK_PROFILE=nm_drv.*` to only display variables from nmad drivers, etc.
Set PUK_DISPLAY_PROFILE=yes
to display the description of all profiling variables.
Memory profiling is available only if Puk, in addition to nmad, was configured with the --enable-profile
option.
NewMadeleine is tuned through parameters that can be set through environment variables or programmatically (see Puk-opt.h). For convenience, environment variables may be set on the command line using the following syntax:
% padico-launch -DVAR=value
to set a value to environment variable VAR
.
Parameters are typed (string, int, unsigned, bool). Valid values for boolean variables are: 0/1, y/n, yes/no, true/false, on/off, enabled/disabled.
To display the list of all parameters and their value, give the parameter -DPUK_DISPLAY_ENV=yes
The strategy used by nmad is selected using the following rules:
NMAD_STRATEGY
is set, it is used whatever the other configuration parameters are.Valid strategies are: default, aggreg, aggreg_autoextended, split_balance, prio.
The following are deprecated/unmatained: split_all, qos
The default choice should fit most cases.
The drivers used by nmad are selected using the following rules:
The following driver names are recognized:
ibverbs
for default InfiniBand driversibrcache
, iblr2
, ibsrq
or ibbuf
to force the InfiniBand protocoltcp
for TCP socketspsm
for Infinipathpsm2
for Omni-Pathbxi
or portals4
for Portals4 network (tested only with Atos BXI)ucx
for UCX libraryofi
for libfabricshm
for shared memory on the same nodelocal
for Unix domain sockets (basic driver, for debug)self
for intra-process loopback is always added by default by nmad and does not need to be given by the enduser.mx
, sisci
, cci
, dcfa
are deprecated).if nmad is launched with mpirun
, srun
, or padico-launch
, then PadicoTM default NetSelector rules apply:
opensm
.if nmad is launched through the cmdline launcher, then a "-R <string>" parameter is taken as a railstring, with the same syntax as NMAD_DRIVER. Please note that cmdline launcher is only for debug purpose and manages only 2 nodes.
Strategy 'prio' limits the total number of simultaneous outgoing packets. This number may be tuned using environment variable NM_PRIO_MAX_PW. The default value is 2.
NMAD_DISPLAY_DRIVERS=1 displays strategy and drivers used by each process.
For most users, auto-detection should do the right thing and endusers are not expected to manually select a driver.
By defdault, intra-node shared memory communication uses a pipelined copy. The more efficient "Cross Memory Attachment" (CMA) method may be used on systems that support it using environment variable NMAD_SHM_CMA=1
.
NUIOA (Non-Uniform I/O Access) may be takein into account by NewMadeleine. To do so, use NMAD_NUIOA_ENABLE=1
to automatically bind threads to the NUMA node where the network board is attached.
This is disabled by default.
Infiniband may be tuned at run time through environment variables:
To use a specific IB device or port, they mey be specified in the driver string through driver attributes. The supported attributes for IB drivers ar ibv_device
and ibv_port
, e.g.:
% padico-launch -n 2 -DNMAD_DRIVER=ibverbs:ibv_device=mlx5_0:ibv_port=1 nm_sr_hello
Either ibv_device
, ibv_port
, or both, may be given.
When opensm
is used as subnet manager, subnet GID must be customized with a value unique to the given subnet, so as nmad is able to automatically detect IB connectivity. As root:
% opensm -o -c /var/cache/opensm/opensm.opts
% /etc/init.d/infiniband restart
By default, nmad sets HFI_NO_CPUAFFINITY=1
if no value was set by the user, to ensure that PSM2 does not mangle with thread binding as set by mpirun. To disable this feature, the user can set HFI_NO_CPUAFFINITY=0
explicitely.
When multiple Omni-Path ports are present, nmad uses by default psm2 automatic port selection. A specific port may be selected by using the port
attribute, e.g.:
% padico-launch -n 2 -DNMAD_DRIVER=psm2:port=2 nm_sr_hello
to select the second port. Ports are numbered from 1. port=0
enables the automatic port selection.
The appropriate launcher to use is usually selected automatically. For testing and debugging, it may be forced using environment variable NM_LAUNCHER. Valid values are:
The default is 'pmix' if PMIx is detected in the job, 'pmi2' if a slurm job is detected with pmi2 enabled, 'madico' if we detect job was launched with 'padico-launch' and neither PMI2 nor PMIx are available, 'single' if nothing else is available. 'cmdline' is never selected by default and should be used only for debug.
In case PMIx is not installed in system directories, a full path may be given to –with-pmix=/full/path to configure, or it may be set globally through the PMIX_ROOT
environment variable.
The default routing tree for the multicast interface is a binomial one. You can change it with the environment variable NMAD_MCAST_TREE set to binary
, 3ary
, 4ary
, 8ary
, binomial
, 3nomial
, 4nomial
, 8nomial
, flat
, chain
, bitree
, ladder
or simply default
. The default choice will use 4nomial
for messages < 32kB then binomial
for larger messages.
When the bitree
routing tree is selected, the multicast tree will be split in two trees. You have to provide the characteristics of the bitrees with environment variables:
bitree
).If you are not using bitrees, you can use the delegate option, which has to be enbled with the environment variable NMAD_MCAST_DELEGATE=1. With this option, the root node of a multicast sends data to the first recipient node and lets this recipient manage the rest of the multicast (perform a binomial tree if this kind of tree is selected, etc).
By default, broadcasting trees are reordered to take into account message priorities. You can disable this reordering with the environment variable NMAD_MCAST_REORDER_TREE=0.
Simulation may be performed by compiling NewMadeleine with support for simgrid. To do so, the requirements are:
--with-simgrid
.Compilation of user code is done as usual, with mpicc
for MPI code or by using pkg-config
for native NewMadeleine code. Note that build with -fPIC
and linking with -shared
will be forced, so as to generate a dynamically loadable object instead of a plain binary. This should be transparent for configure/makefiles as long as they do not try to start the binary (which is actually a dynamic object).
Then launching must be done with nm_simgrid_run
instead of padico-launch
or mpirun
. See nm_simgrid_run -h
for help on accepted parameters.
NewMadeleine will automatically perform global symbols privatization, and supports dynamic linking. Dynamic libraries that needs to be privatized must be declared with -lib
to nm_simgrid_run
. To do so, libraries are automatically duplicated thus enough disk space must be available in <prefix>/var/tmp/
. To start a large number of simulated nodes (several hundreds), it may be needed to increase /proc/sys/vm/max_map_count
.
To generate doxygen documentation:
% cd $prefix/build/nmad % make docs
It is available online at https://pm2.gitlabpages.inria.fr/pm2/nmad/doc/.