You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/usage.md
+38-14Lines changed: 38 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -74,33 +74,57 @@ with:
74
74
$ mpiexecjl --project=/path/to/project -n 20 julia script.jl
75
75
```
76
76
77
-
## CUDA-aware MPI support
77
+
!!! note "Juliaup"
78
+
If you use `juliaup` as a manager for your `julia` installation and you want to use a non-default channel with `mpiexecjl`,
79
+
you need to use the environment variable `JULIAUP_CHANNEL` instead of using the `+` syntax.
78
80
79
-
If your MPI implementation has been compiled with CUDA support, then `CUDA.CuArray`s (from the
80
-
[CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) package) can be passed directly as
81
-
send and receive buffers for point-to-point and collective operations (they may also work with one-sided operations, but these are not often supported).
81
+
```shell
82
+
JULIAUP_CHANNEL=1.12 mpiexecjl --project=/path/to/project -n 20 julia script.jl
83
+
```
82
84
83
-
Successfully running the [alltoall\_test\_cuda.jl](https://gist.github.com/luraess/0063e90cb08eb2208b7fe204bbd90ed2)
84
-
should confirm your MPI implementation to have the CUDA support enabled. Moreover, successfully running the
85
-
[alltoall\_test\_cuda\_multigpu.jl](https://gist.github.com/luraess/ed93cc09ba04fe16f63b4219c1811566) should confirm
85
+
## GPU-aware MPI support
86
+
87
+
If your MPI implementation has been compiled with CUDA or ROCm support, then `CUDA.CuArray`s (from
88
+
[CUDA.jl](https://github.com/JuliaGPU/CUDA.jl)) or `AMDGPU.ROCArray`s (from [AMDGPU.jl](https://github.com/JuliaGPU/AMDGPU.jl)) can be passed directly as
89
+
send and receive buffers for point-to-point and collective operations (they may also work with one-sided operations, but these are not often supported). GPU-aware MPI requires in most cases to use a [system provided MPI installation](@ref using_system_mpi).
90
+
91
+
!!! note "Preloads"
92
+
On Cray machines, you may need to ensure the following preloads to be set in the preferences:
93
+
```
94
+
preloads = ["libmpi_gtl_hsa.so"]
95
+
preloads_env_switch = "MPICH_GPU_SUPPORT_ENABLED"
96
+
```
97
+
98
+
### CUDA
99
+
100
+
Successfully running the [alltoall\_test\_cuda.jl](../examples/alltoall_test_cuda.jl)
101
+
should confirm your MPI implementation to have the CUDA support enabled. Moreover, successfully running the
102
+
[alltoall\_test\_cuda\_multigpu.jl](../examples/alltoall_test_cuda_multigpu.jl) should confirm
86
103
your CUDA-aware MPI implementation to use multiple Nvidia GPUs (one GPU per rank).
87
104
88
105
If using OpenMPI, the status of CUDA support can be checked via the
89
106
[`MPI.has_cuda()`](@ref) function.
90
107
91
-
## ROCm-aware MPI support
92
-
93
-
If your MPI implementation has been compiled with ROCm support (AMDGPU), then `AMDGPU.ROCArray`s (from the
94
-
[AMDGPU.jl](https://github.com/JuliaGPU/AMDGPU.jl) package) can be passed directly as send and receive buffers for point-to-point and collective operations (they may also work with one-sided operations, but these are not often supported).
108
+
### ROCm
95
109
96
-
Successfully running the [alltoall\_test\_rocm.jl](https://gist.github.com/luraess/c228ec08629737888a18c6a1e397643c)
97
-
should confirm your MPI implementation to have the ROCm support (AMDGPU) enabled. Moreover, successfully running the
98
-
[alltoall\_test\_rocm\_multigpu.jl](https://gist.github.com/luraess/a47931d7fb668bd4348a2c730d5489f4) should confirm
110
+
Successfully running the [alltoall\_test\_rocm.jl](../examples/alltoall_test_rocm.jl)
111
+
should confirm your MPI implementation to have the ROCm support (AMDGPU) enabled. Moreover, successfully running the
112
+
[alltoall\_test\_rocm\_multigpu.jl](../examples/alltoall_test_rocm_multigpu.jl) should confirm
99
113
your ROCm-aware MPI implementation to use multiple AMD GPUs (one GPU per rank).
100
114
101
115
If using OpenMPI, the status of ROCm support can be checked via the
102
116
[`MPI.has_rocm()`](@ref) function.
103
117
118
+
### Multiple GPUs per node
119
+
120
+
In a configuration with multiple GPUs per node, mapping GPU ID to node local MPI rank can be achieved either (1) on the application side using node-local communicator (`MPI.COMM_TYPE_SHARED`) or (2) on the system side setting device visibility accordingly.
121
+
For (1), using the node-local rank `rank_loc` is a way to select the GPU device:
0 commit comments