# gloo **Repository Path**: mirrors_pytorch/gloo ## Basic Information - **Project Name**: gloo - **Description**: Collective communications library with various primitives for multi-machine training. - **Primary Language**: Unknown - **License**: BSD-3-Clause - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-04-30 - **Last Updated**: 2025-09-13 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
| Gloo Documentation | PyTorch Distributed Documentation | Introduction to Gloo Presentation |
--- Gloo is a collective communications library. It comes with a number of collective algorithms useful for machine learning applications. These include a barrier, broadcast, and allreduce. Transport of data between participating machines is abstracted so that IP can be used at all times, or InifiniBand (or RoCE) when available. In the latter case, if the InfiniBand transport is used, [GPUDirect][gpudirect] can be used to accelerate cross machine GPU-to-GPU memory transfers. [gpudirect]: https://developer.nvidia.com/gpudirect Where applicable, algorithms have an implementation that works with system memory buffers, and one that works with NVIDIA GPU memory buffers. In the latter case, it is not necessary to copy memory between host and device; this is taken care of by the algorithm implementations. ## Requirements Gloo is built to run on Linux and has no hard dependencies other than libstdc++. That said, it will generally only be useful when used in combination with a few optional dependencies below. Optional dependencies are: * [CUDA][cuda] and [NCCL][nccl] -- for CUDA aware algorithms, tests, and benchmark * [Google Test][gtest] -- to build and run tests * [Hiredis][hiredis] -- for coordinating machine rendezvous through Redis * [MPI][mpi] -- for coordinating machine rendezvous through MPI [cuda]: http://www.nvidia.com/object/cuda_home_new.html [nccl]: https://github.com/nvidia/nccl [gtest]: https://github.com/google/googletest [hiredis]: https://github.com/redis/hiredis [mpi]: https://www.open-mpi.org/ ## Documentation Please refer to [docs/](docs/) for detailed documentation. ## Building You can build Gloo using CMake. Since it is a library, it is most convenient to vendor it in your own project and include the project root in your own CMake configuration. ### Test Building the tests requires Google Test version 1.8 or higher. On Ubuntu, this version ships with version 17.10 and up. If you run an older version, you'll have to install Google Test yourself, and set the `GTEST_ROOT` CMake variable. You can install Google Test using conda with: ``` shell conda install -c anaconda gmock gtest ``` Be carefull that you might need to fish for a package that works with your glibc To build the tests, run: ``` shell mkdir -p build cd build cmake ../ -DBUILD_TEST=1 -DGTEST_ROOT=/some/path (if using custom install) make ls -l gloo/test/gloo_test* ``` To test the CUDA algorithms, specify `USE_CUDA=ON` as well, and the CUDA tests are built at `gloo/test/gloo_test_cuda`. ### Benchmark First install the dependencies required by the benchmark tool. On Ubuntu, you can do so by running: ``` shell sudo apt-get install -y libhiredis-dev ``` Then build the benchmark, run: ``` shell mkdir build cd build cmake ../ -DBUILD_BENCHMARK=1 make ls -l gloo/benchmark/benchmark ``` ## Benchmarking The benchmark tool depends on Redis/Hiredis for rendezvous. The benchmark tool for CUDA algorithms obviously also depends on both CUDA and NCCL. To run a benchmark: 1. Copy the benchmark tool to all participating machines 2. Start a Redis server on any host (either a client machine or one of the machines participating in the test). Note that Redis Cluster is **not** supported. 3. Determine some unique ID for the benchmark run (e.g. the `uuid` tool or some number). 4. On each machine, run (or pass `--help` for more options): ``` ./benchmark \ --size