Support CudaIpc connection within a single process #593

chhwang · 2025-07-31T07:07:19Z

Allow CudaIpc connection between GPUs in a single process
Added an example of connection in a single process
Minor interface updates

s

Copilot

Pull Request Overview

Extends CudaIpc transport to support connections between GPUs within a single process, whereas previously it only worked across different processes. This enables GPU-to-GPU communication within the same application.

Key Changes

Added process ID hash tracking to endpoints for intra-process GPU connections
Enabled peer access for same-process GPU connections
Updated constants and APIs for better maintainability

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/registered_memory.cc	Removed namespace prefix from SysError exception
src/include/endpoint.hpp	Added pidHash field to endpoint implementation
src/endpoint.cc	Added pidHash serialization and getter methods
src/context.cc	Added peer access enablement for same-process GPU connections
src/connection.cc	Moved connection constructor to implementation file
python/mscclpp/core_py.cpp	Removed deprecated Nvls transport enum value
include/mscclpp/port_channel_device.hpp	Replaced macros with constexpr constants
include/mscclpp/gpu.hpp	Added HIP compatibility for cudaDeviceEnablePeerAccess
include/mscclpp/core.hpp	Removed Nvls transport and updated constants
examples/tutorials/01-basic-concepts/	Added complete GPU ping-pong example with build files
apps/nccl/src/nccl.cu	Updated to use new constant naming

Comments suppressed due to low confidence (1)

examples/tutorials/01-basic-concepts/gpu_ping_pong.cu:10

[nitpick] Function names 'gpuKernel0' and 'gpuKernel1' are not descriptive. Consider more meaningful names like 'pingKernel' and 'pongKernel' or 'initiatorKernel' and 'responderKernel'.

__global__ void gpuKernel0(mscclpp::BaseMemoryChannelDeviceHandle *devHandle, int iter) {

src/context.cc

examples/tutorials/01-basic-concepts/gpu_ping_pong.cu

src/context.cc

Support CudaIpc connection within a single proces

a90f354

s

chhwang requested review from Binyang2014 and Copilot July 31, 2025 07:14

Copilot AI reviewed Jul 31, 2025

View reviewed changes

src/context.cc Outdated Show resolved Hide resolved

examples/tutorials/01-basic-concepts/gpu_ping_pong.cu Outdated Show resolved Hide resolved

Binyang2014 reviewed Jul 31, 2025

View reviewed changes

src/context.cc Outdated Show resolved Hide resolved

chhwang and others added 2 commits August 1, 2025 06:01

updates

3498330

Merge branch 'main' into chhwang/1-proc-conn

481e1fe

Binyang2014 enabled auto-merge (squash) August 1, 2025 20:39

Binyang2014 disabled auto-merge August 1, 2025 20:51

Binyang2014 approved these changes Aug 1, 2025

View reviewed changes

src/context.cc Show resolved Hide resolved

chhwang merged commit c580e4c into main Aug 2, 2025
14 checks passed

chhwang deleted the chhwang/1-proc-conn branch August 2, 2025 04:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support CudaIpc connection within a single process #593

Support CudaIpc connection within a single process #593

Uh oh!

chhwang commented Jul 31, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Support CudaIpc connection within a single process #593

Support CudaIpc connection within a single process #593

Uh oh!

Conversation

chhwang commented Jul 31, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants