You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/usage.md
+30-14Lines changed: 30 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -74,33 +74,49 @@ with:
74
74
$ mpiexecjl --project=/path/to/project -n 20 julia script.jl
75
75
```
76
76
77
-
## CUDA-aware MPI support
77
+
## GPU-aware MPI support
78
78
79
-
If your MPI implementation has been compiled with CUDA support, then `CUDA.CuArray`s (from the
80
-
[CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) package) can be passed directly as
81
-
send and receive buffers for point-to-point and collective operations (they may also work with one-sided operations, but these are not often supported).
79
+
If your MPI implementation has been compiled with CUDA or ROCm support, then `CUDA.CuArray`s (from
80
+
[CUDA.jl](https://github.com/JuliaGPU/CUDA.jl)) or `AMDGPU.ROCArray`s (from [AMDGPU.jl](https://github.com/JuliaGPU/AMDGPU.jl)) can be passed directly as
81
+
send and receive buffers for point-to-point and collective operations (they may also work with one-sided operations, but these are not often supported). GPU-aware MPI requires in most cases to use a [system provided MPI installation](@ref using_system_mpi).
82
82
83
-
Successfully running the [alltoall\_test\_cuda.jl](https://gist.github.com/luraess/0063e90cb08eb2208b7fe204bbd90ed2)
84
-
should confirm your MPI implementation to have the CUDA support enabled. Moreover, successfully running the
85
-
[alltoall\_test\_cuda\_multigpu.jl](https://gist.github.com/luraess/ed93cc09ba04fe16f63b4219c1811566) should confirm
83
+
!!! note "Preloads"
84
+
On Cray machines, you may need to ensure the following preloads to be set in the preferences:
85
+
```
86
+
preloads = ["libmpi_gtl_hsa.so"]
87
+
preloads_env_switch = "MPICH_GPU_SUPPORT_ENABLED"
88
+
```
89
+
90
+
### CUDA
91
+
92
+
Successfully running the [alltoall\_test\_cuda.jl](../examples/alltoall_test_cuda.jl)
93
+
should confirm your MPI implementation to have the CUDA support enabled. Moreover, successfully running the
94
+
[alltoall\_test\_cuda\_multigpu.jl](../examples/alltoall_test_cuda_multigpu.jl) should confirm
86
95
your CUDA-aware MPI implementation to use multiple Nvidia GPUs (one GPU per rank).
87
96
88
97
If using OpenMPI, the status of CUDA support can be checked via the
89
98
[`MPI.has_cuda()`](@ref) function.
90
99
91
-
## ROCm-aware MPI support
100
+
###ROCm
92
101
93
-
If your MPI implementation has been compiled with ROCm support (AMDGPU), then `AMDGPU.ROCArray`s (from the
94
-
[AMDGPU.jl](https://github.com/JuliaGPU/AMDGPU.jl) package) can be passed directly as send and receive buffers for point-to-point and collective operations (they may also work with one-sided operations, but these are not often supported).
95
-
96
-
Successfully running the [alltoall\_test\_rocm.jl](https://gist.github.com/luraess/c228ec08629737888a18c6a1e397643c)
97
-
should confirm your MPI implementation to have the ROCm support (AMDGPU) enabled. Moreover, successfully running the
98
-
[alltoall\_test\_rocm\_multigpu.jl](https://gist.github.com/luraess/a47931d7fb668bd4348a2c730d5489f4) should confirm
102
+
Successfully running the [alltoall\_test\_rocm.jl](../examples/alltoall_test_rocm.jl)
103
+
should confirm your MPI implementation to have the ROCm support (AMDGPU) enabled. Moreover, successfully running the
104
+
[alltoall\_test\_rocm\_multigpu.jl](../examples/alltoall_test_rocm_multigpu.jl) should confirm
99
105
your ROCm-aware MPI implementation to use multiple AMD GPUs (one GPU per rank).
100
106
101
107
If using OpenMPI, the status of ROCm support can be checked via the
102
108
[`MPI.has_rocm()`](@ref) function.
103
109
110
+
### Multiple GPUs per node
111
+
112
+
In a configuration with multiple GPUs per node, mapping GPU ID to node local MPI rank can be achieved either (1) on the application side using node-local communicator (`MPI.COMM_TYPE_SHARED`) or (2) on the system side setting device visibility accordingly.
113
+
For (1), using the node-local rank `rank_loc` is a way to select the GPU device:
0 commit comments