Mad-MPI: An Efficient Implementation of MPI for Fast Networks - Performances


LaBRI, INRIA Bordeaux - Sud-Ouest

High Performance Runtime Systems for Parallel Architectures

Intel MPI Benchmarks

Intel MPI Benchmarks -- formerly known as 'Pallas MPI Benchmarks (PMB)' -- is a concise, easy-to-use set of MPI benchmarks. It compares the performance of various computing platforms or MPI implementations. It checks many MPI communication patterns, automatically detects clustering, and reports intra-cluster and inter-cluster performance. The benchmarks are targeted at measuring important MPI functions, such as:

  • Point-to-point message passing
  • Global data movement and computation routines
  • One-sided communications
  • File I/O

Intel MPI Benchmarks can be downloaded at

##### sweetie
##### rantanplan
#    Intel (R) MPI Benchmark Suite V3.0, MPI-1 part
# Date                  : Tue Apr 24 16:14:00 2007
# Machine               : x86_64
# System                : Linux
# Release               : 2.6.17-2-amd64
# Version               : #1 SMP Wed Sep 13 17:49:33 CEST 2006
# MPI Version           : 1.0
# MPI Thread Environment: MPI_THREAD_SINGLE

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
# MPI_Datatype                   :   MPI_BYTE
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM

# List of Benchmarks to run:

# PingPong
# PingPing
# Sendrecv
# Exchange
# Allreduce
# Reduce
# Reduce_scatter
# Allgather
# Allgatherv
# Alltoall
# Alltoallv
# Bcast
# Barrier

# Benchmarking PingPong
# #processes = 2
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         2.87         0.00
            1         1000         2.86         0.33
            2         1000         2.86         0.67
            4         1000         2.92         1.31
            8         1000         2.93         2.60
           16         1000         2.99         5.11
           32         1000         3.77         8.09
           64         1000         3.93        15.54
          128         1000         5.37        22.73
          256         1000         5.64        43.26
          512         1000         6.31        77.33
         1024         1000         8.24       118.56
         2048         1000         9.93       196.66
         4096         1000        14.62       267.15
         8192         1000        19.69       396.73
        16384         1000        29.52       529.38
        32768         1000        44.16       707.58
        65536          640        80.51       776.28
       131072          320       133.66       935.20
       262144          160       240.73      1038.50
       524288           80       460.54      1085.68
      1048576           40       884.54      1130.53
      2097152           20      1753.16      1140.79
      4194304           10      3521.06      1136.02

# Benchmarking PingPing
# #processes = 2
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         3.25         0.00
            1         1000         3.39         0.28
            2         1000         3.26         0.58
            4         1000         3.23         1.18
            8         1000         3.19         2.39
           16         1000         3.24         4.71
           32         1000         4.17         7.32
           64         1000         4.24        14.40
          128         1000         6.15        19.84
          256         1000         6.50        37.55
          512         1000         6.90        70.72
         1024         1000         9.55       102.31
         2048         1000        12.70       153.84
         4096         1000        20.20       193.40
         8192         1000        24.75       315.70
        16384         1000        40.36       387.13
        32768         1000        62.41       500.74
        65536          640        86.39       723.49
       131072          320       142.30       878.46
       262144          160       251.60       993.66
       524288           80       473.08      1056.90
      1048576           40       906.45      1103.21
      2097152           20      1783.21      1121.57
      4194304           10      3719.95      1075.28

# Benchmarking Sendrecv
# #processes = 2
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000         3.19         3.19         3.19         0.00
            1         1000         3.50         3.50         3.50         0.55
            2         1000         3.22         3.22         3.22         1.18
            4         1000         3.25         3.26         3.25         2.34
            8         1000         3.29         3.29         3.29         4.64
           16         1000         3.41         3.41         3.41         8.95
           32         1000         3.95         3.95         3.95        15.44
           64         1000         4.18         4.18         4.18        29.18
          128         1000         6.11         6.11         6.11        39.94
          256         1000         6.42         6.42         6.42        76.04
          512         1000         7.05         7.06         7.06       138.33
         1024         1000         9.61         9.62         9.62       203.03
         2048         1000        12.70        12.70        12.70       307.49
         4096         1000        20.23        20.24        20.24       385.96
         8192         1000        24.83        24.84        24.83       629.00
        16384         1000        40.94        40.94        40.94       763.32
        32768         1000        62.65        62.65        62.65       997.55
        65536          640        84.92        84.93        84.92      1471.83
       131072          320       138.24       138.26       138.25      1808.24
       262144          160       247.68       247.71       247.70      2018.46
       524288           80       472.36       472.42       472.39      2116.78
      1048576           40       896.94       897.10       897.02      2229.40
      2097152           20      1816.89      1817.02      1816.95      2201.41
      4194304           10      3623.87      3624.47      3624.17      2207.22

# Benchmarking Exchange
# #processes = 2
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000         5.42         5.42         5.42         0.00
            1         1000         5.60         5.61         5.61         0.68
            2         1000         5.60         5.60         5.60         1.36
            4         1000         5.68         5.68         5.68         2.69
            8         1000         5.99         5.99         5.99         5.09
           16         1000         6.12         6.12         6.12         9.97
           32         1000         6.87         6.87         6.87        17.76
           64         1000         7.19         7.19         7.19        33.95
          128         1000        10.41        10.41        10.41        46.89
          256         1000        10.69        10.70        10.70        91.26
          512         1000        11.71        11.72        11.72       166.64
         1024         1000        17.41        17.42        17.41       224.28
         2048         1000        22.94        22.95        22.95       340.37
         4096         1000        36.38        36.40        36.39       429.31
         8192         1000        44.18        44.20        44.19       707.08
        16384         1000        74.18        74.20        74.19       842.33
        32768         1000       120.43       120.44       120.44      1037.82
        65536          640       167.05       167.06       167.06      1496.43
       131072          320       274.43       274.45       274.44      1821.82
       262144          160       492.14       492.18       492.16      2031.77
       524288           80       932.26       932.35       932.30      2145.13
      1048576           40      1785.98      1786.15      1786.07      2239.45
      2097152           20      3549.29      3549.54      3549.41      2253.81
      4194304           10      7116.37      7116.86      7116.62      2248.18

# Benchmarking Allreduce
# #processes = 2
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
            0         1000         6.14         6.14         6.14
            4         1000         6.27         6.27         6.27
            8         1000         6.40         6.40         6.40
           16         1000         6.49         6.49         6.49
           32         1000         7.93         7.93         7.93
           64         1000         8.26         8.26         8.26
          128         1000        11.13        11.13        11.13
          256         1000        11.81        11.81        11.81
          512         1000        13.27        13.27        13.27
         1024         1000        17.37        17.38        17.37
         2048         1000        21.07        21.08        21.08
         4096         1000        31.62        31.63        31.63
         8192         1000        45.51        45.52        45.51
        16384         1000        71.46        71.47        71.47
        32768         1000       117.28       117.30       117.29
        65536          640       223.33       223.35       223.34
       131072          320       398.66       398.69       398.68
       262144          160       927.19       927.26       927.23
       524288           80      2113.77      2113.92      2113.85
      1048576           40      4825.84      4826.19      4826.01
      2097152           20      9830.98      9831.67      9831.32
      4194304           10     19839.45     19840.84     19840.15

# Benchmarking Reduce
# #processes = 2
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
            0         1000         3.26         3.26         3.26
            4         1000         3.22         3.22         3.22
            8         1000         3.24         3.24         3.24
           16         1000         3.23         3.23         3.23
           32         1000         4.05         4.05         4.05
           64         1000         4.13         4.13         4.13
          128         1000         5.63         5.63         5.63
          256         1000         6.02         6.03         6.02
          512         1000         6.81         6.81         6.81
         1024         1000         8.96         8.96         8.96
         2048         1000        11.14        11.15        11.14
         4096         1000        16.92        16.93        16.92
         8192         1000        25.51        25.53        25.52
        16384         1000        42.67        42.70        42.69
        32768         1000        71.35        71.40        71.37
        65536          640       142.00       142.08       142.04
       131072          320       256.21       256.56       256.38
       262144          160       719.45       721.31       720.38
       524288           80      1739.19      1749.91      1744.55
      1048576           40      3961.00      4015.83      3988.42
      2097152           20      7935.42      8150.40      8042.91
      4194304           10     15676.97     16552.24     16114.60

# Benchmarking Reduce_scatter
# #processes = 2
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
            0         1000         6.33         6.33         6.33
            4         1000         6.42         6.42         6.42
            8         1000         6.51         6.51         6.51
           16         1000         6.54         6.54         6.54
           32         1000         7.26         7.26         7.26
           64         1000         8.16         8.16         8.16
          128         1000         9.86         9.86         9.86
          256         1000        11.73        11.74        11.73
          512         1000        12.91        12.91        12.91
         1024         1000        15.75        15.75        15.75
         2048         1000        20.09        20.09        20.09
         4096         1000        28.51        28.52        28.51
         8192         1000        46.88        46.89        46.88
        16384         1000        77.12        77.12        77.12
        32768         1000       129.00       129.00       129.00
        65536          640       265.95       265.95       265.95
       131072          320       529.43       529.52       529.48
       262144          160      1256.84      1257.31      1257.07
       524288           80      2702.63      2705.38      2704.01
      1048576           40      5651.73      5664.87      5658.30
      2097152           20     11320.39     11372.71     11346.55
      4194304           10     22465.48     22661.20     22563.34

# Benchmarking Allgather
# #processes = 2
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
            0         1000         6.52         6.52         6.52
            1         1000         6.78         6.78         6.78
            2         1000         6.74         6.74         6.74
            4         1000         6.63         6.63         6.63
            8         1000         6.78         6.78         6.78
           16         1000         6.94         6.94         6.94
           32         1000         8.24         8.24         8.24
           64         1000         8.51         8.51         8.51
          128         1000        11.35        11.35        11.35
          256         1000        11.85        11.85        11.85
          512         1000        13.20        13.20        13.20
         1024         1000        17.23        17.23        17.23
         2048         1000        20.42        20.43        20.43
         4096         1000        30.21        30.22        30.21
         8192         1000        40.58        40.59        40.59
        16384         1000        63.64        63.66        63.65
        32768         1000       100.10       100.12       100.11
        65536          640       181.76       181.77       181.77
       131072          320       305.95       305.97       305.96
       262144          160       555.82       555.85       555.83
       524288           80      1237.10      1237.19      1237.15
      1048576           40      2799.83      2800.03      2799.93
      2097152           20      5532.60      5533.03      5532.82
      4194304           10     10964.68     10965.52     10965.10

# Benchmarking Allgatherv
# #processes = 2
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
            0         1000         6.39         6.39         6.39
            1         1000         6.51         6.51         6.51
            2         1000         6.52         6.52         6.52
            4         1000         6.52         6.52         6.52
            8         1000         6.63         6.63         6.63
           16         1000         7.34         7.34         7.34
           32         1000         8.25         8.25         8.25
           64         1000         9.78         9.79         9.79
          128         1000        11.66        11.66        11.66
          256         1000        12.79        12.79        12.79
          512         1000        15.15        15.15        15.15
         1024         1000        18.76        18.76        18.76
         2048         1000        25.09        25.10        25.10
         4096         1000        34.93        34.94        34.94
         8192         1000        50.28        50.29        50.29
        16384         1000        81.21        81.23        81.22
        32768         1000       135.99       136.00       135.99
        65536          640       234.22       234.24       234.23
       131072          320       412.98       413.01       412.99
       262144          160       772.88       772.93       772.91
       524288           80      1668.40      1668.52      1668.46
      1048576           40      3654.93      3655.20      3655.06
      2097152           20      7239.78      7240.32      7240.05
      4194304           10     14467.96     14469.02     14468.49

# Benchmarking Alltoall
# #processes = 2
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
            0         1000         3.65         3.66         3.66
            1         1000         3.73         3.73         3.73
            2         1000         3.90         3.91         3.90
            4         1000         3.91         3.91         3.91
            8         1000         3.62         3.62         3.62
           16         1000         4.00         4.00         4.00
           32         1000         4.68         4.69         4.69
           64         1000         4.75         4.75         4.75
          128         1000         6.27         6.27         6.27
          256         1000         6.52         6.52         6.52
          512         1000         7.22         7.23         7.22
         1024         1000        10.02        10.02        10.02
         2048         1000        13.18        13.19        13.19
         4096         1000        21.32        21.33        21.32
         8192         1000        26.64        26.65        26.64
        16384         1000        47.82        47.83        47.82
        32768         1000        71.15        71.16        71.16
        65536          640       105.91       105.92       105.92
       131072          320       179.49       179.51       179.50
       262144          160       327.98       328.02       328.00
       524288           80       800.14       800.25       800.20
      1048576           40      1952.86      1953.08      1952.97
      2097152           20      3889.12      3889.64      3889.38
      4194304           10      7741.64      7742.57      7742.10

# Benchmarking Alltoallv
# #processes = 2
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
            0         1000         3.64         3.64         3.64
            1         1000         3.78         3.78         3.78
            2         1000         3.64         3.64         3.64
            4         1000         3.59         3.59         3.59
            8         1000         4.07         4.07         4.07
           16         1000         3.66         3.66         3.66
           32         1000         4.56         4.56         4.56
           64         1000         4.82         4.83         4.82
          128         1000         6.25         6.25         6.25
          256         1000         6.72         6.73         6.73
          512         1000         7.41         7.41         7.41
         1024         1000         9.85         9.85         9.85
         2048         1000        13.05        13.05        13.05
         4096         1000        21.07        21.08        21.07
         8192         1000        27.10        27.11        27.11
        16384         1000        47.79        47.80        47.79
        32768         1000        71.22        71.22        71.22
        65536          640       103.44       103.45       103.44
       131072          320       175.67       175.69       175.68
       262144          160       323.91       323.96       323.94
       524288           80       805.34       805.39       805.36
      1048576           40      1984.66      1984.78      1984.72
      2097152           20      3948.71      3948.93      3948.82
      4194304           10      7956.05      7956.52      7956.28

# Benchmarking Bcast
# #processes = 2
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
            0         1000         3.07         3.07         3.07
            1         1000         3.17         3.17         3.17
            2         1000         3.22         3.22         3.22
            4         1000         3.15         3.16         3.15
            8         1000         3.22         3.23         3.23
           16         1000         3.24         3.25         3.25
           32         1000         4.00         4.00         4.00
           64         1000         4.10         4.10         4.10
          128         1000         5.63         5.64         5.63
          256         1000         5.90         5.90         5.90
          512         1000         6.44         6.45         6.45
         1024         1000         8.43         8.44         8.43
         2048         1000        10.03        10.05        10.04
         4096         1000        14.68        14.69        14.68
         8192         1000        20.02        20.04        20.03
        16384         1000        30.12        30.15        30.14
        32768         1000        45.36        45.39        45.38
        65536          640        82.28        82.29        82.28
       131072          320       135.47       135.49       135.48
       262144          160       245.56       245.60       245.58
       524288           80       468.75       468.84       468.79
      1048576           40       890.68       890.85       890.77
      2097152           20      1775.95      1776.29      1776.12
      4194304           10      3610.93      3611.58      3611.25

# Benchmarking Barrier
# #processes = 2
 #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
         1000        11.54        11.54        11.54