Skip to content

Conversation

@ryanolson
Copy link
Contributor

@ryanolson ryanolson commented Oct 30, 2025

Summary by CodeRabbit

  • New Features

    • New memory management library: system, disk, CUDA device, and pinned host backends; arena allocator; typed/untype slice & memset helpers; NIXL registration and RAII lifecycle helpers; Torch tensor interoperability; offset sub-region views; type-erased buffer handle.
    • New config utilities: env boolean parsing and helpers.
  • Tests

    • Comprehensive unit/integration tests covering storage backends, arena behavior, NIXL registration, and config parsing.
  • Chores

    • Workspace updated to include new config and memory packages.

Signed-off-by: Ryan Olson <rolson@nvidia.com>
@ryanolson ryanolson requested a review from a team as a code owner October 30, 2025 22:47
@github-actions github-actions bot added the feat label Oct 30, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 30, 2025

Walkthrough

Adds a new lib/memory Rust library crate with type-erased storage abstractions, multiple concrete backends (system, pinned, device, disk), arena allocator, slicing/offset views, NIXL registration integration, Torch interoperability types, and extensive tests. Also adds a new lib/config crate and updates workspace configuration.

Changes

Cohort / File(s) Summary
Workspace configuration
Cargo.toml
Adds lib/memory and lib/config to workspace members and lib/config to default-members; adds workspace dependency dynamo-config = { path = "lib/config", version = "0.6.1" }
Memory crate manifest
lib/memory/Cargo.toml
New crate manifest for dynamo-memory with workspace-driven metadata, features (unsafe-slices, testing-*), and dependencies (dynamo-config, cudarc, nixl-sys, etc.)
Config crate manifest
lib/config/Cargo.toml
New crate manifest using workspace = true and workspace-scoped dependency declarations
Core API surface & re-exports
lib/memory/src/lib.rs, lib/memory/src/prelude.rs
Introduces MemoryDescription trait, Buffer type, MemoryRegion, StorageKind, StorageError/Result alias, public re-exports of backends and prelude items (MemoryDescription, NixlCompatible, RegisteredView)
Action traits
lib/memory/src/actions.rs
Adds Memset, Slice, and SliceMut traits with typed-slice helpers, bounds/alignment checks, and unsafe transmute usage (caller-guarded)
System backend
lib/memory/src/system.rs
Implements SystemStorage (posix_memalign), MemoryDescription/NixlCompatible, Memset and Slice implementations, unsafe ptr accessors and Drop
Pinned (CUDA) backend
lib/memory/src/pinned.rs
Implements PinnedStorage with per-device CUDA context caching, cudaHostAlloc usage, NixlCompatible, Memset, unsafe accessors and Drop
Device (CUDA) backend
lib/memory/src/device.rs
Implements DeviceStorage with per-device CUDA context caching, device allocation/free, MemoryDescription and Nixl compatibility, and Send/Sync unsafe impls
Disk backend
lib/memory/src/disk.rs
Implements DiskStorage using mkostemp/open + fallocate, path/fd accessors, unlink management, MemoryDescription and Nixl params, and Drop cleanup
Arena allocator
lib/memory/src/arena.rs
Adds ArenaAllocator and ArenaBuffer with page-based allocation, error enum ArenaError, MemoryDescription impls for buffers, Drop to free allocations, and extensive tests
Offset views
lib/memory/src/offset.rs
Adds OffsetBuffer for bounds-checked sub-region views of a Buffer, propagating descriptors and implementing MemoryDescription
NIXL integration (core)
lib/memory/src/nixl.rs
Adds NixlCompatible trait, NixlDescriptor, NixlRegistered wrapper (RAII for registration), RegisteredView trait, and register_with_nixl() helper
NIXL agent wrapper
lib/memory/src/nixl/agent.rs
Adds NixlAgent wrapper around nixl_sys::Agent tracking initialized backends, backend helpers, Deref to Agent, and tests
NIXL backend config
lib/memory/src/nixl/config.rs
Adds NixlBackendConfig parsing from env vars, backend management, merge/has_backend helpers, and tests
Torch interop types
lib/memory/src/torch.rs
Adds TorchDevice enum and TorchTensor trait declarations
Tests
lib/memory/src/tests.rs
Comprehensive test suite covering System/Disk/Pinned/Device backends, NIXL registration workflows, Arena allocator behavior, and feature-gated CUDA/NIXL scenarios
Misc (module files)
lib/memory/src/nixl.rs, lib/memory/src/actions.rs, lib/memory/src/...
New modules and implementations documented across the crate (nixl, offset, arena, actions, prelude, pinned, device, disk, system, torch)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Areas needing extra attention:
    • CUDA-related unsafe FFI and per-device context caching (device.rs, pinned.rs)
    • Memory safety of typed-slice conversions and alignment checks in actions.rs
    • Arena allocation logic, page math, and Drop/free sequencing (arena.rs)
    • NIXL registration lifecycle and Drop ordering in nixl.rs and nixl/agent.rs
    • File-backed allocation and unlink semantics on Unix (disk.rs)

Poem

🐰 I dug a lane of bytes and beds,

pages, pins, and CUDA threads.
Buffers nested, descriptors shine,
NIXL gates and config fine.
Hoppity hop — the memory grows divine! 🌿

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description is entirely empty, with no content provided in any of the required template sections. The repository's template specifies that pull requests should include an Overview, Details explaining the changes, a section indicating where reviewers should start, and Related Issues. This is a large, XXL-sized PR that introduces substantial new public APIs across multiple modules including trait definitions, storage backends, arena allocation, NIXL integration, and configuration utilities. The complete absence of descriptive content makes it difficult for reviewers to understand the objectives, key files to focus on, or the relationship to any related issues. The author should add a comprehensive pull request description following the repository template. At minimum, the description should include: an overview of the kvbm v2 memory crate objectives and design philosophy, details of the major modules and their purposes (lib/memory backends, arena allocation, NIXL integration, lib/config utilities), specific files or modules reviewers should prioritize, and any related GitHub issues using action keywords (Closes/Fixes/Resolves/Relates to). Given the XXL size and scope of this PR, detailed guidance on the review starting point would be particularly valuable.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "feat: kvbm v2 memory crate" directly relates to the primary change in the changeset, which is the introduction of a new lib/memory crate implementing a comprehensive v2 storage API with multiple backends (SystemStorage, DeviceStorage, DiskStorage, PinnedStorage), arena allocation, and NIXL integration. The title uses conventional commit format and is concise and specific enough that a reviewer scanning the history would understand the main focus is adding a new memory system. While the PR also introduces a lib/config crate, the title appropriately emphasizes the larger and more complex memory crate as the main contribution.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 22d910a and 5c90a41.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (14)
  • Cargo.toml (1 hunks)
  • lib/memory/Cargo.toml (1 hunks)
  • lib/memory/src/actions.rs (1 hunks)
  • lib/memory/src/arena.rs (1 hunks)
  • lib/memory/src/device.rs (1 hunks)
  • lib/memory/src/disk.rs (1 hunks)
  • lib/memory/src/lib.rs (1 hunks)
  • lib/memory/src/nixl.rs (1 hunks)
  • lib/memory/src/offset.rs (1 hunks)
  • lib/memory/src/pinned.rs (1 hunks)
  • lib/memory/src/prelude.rs (1 hunks)
  • lib/memory/src/system.rs (1 hunks)
  • lib/memory/src/tests.rs (1 hunks)
  • lib/memory/src/torch.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 2746
File: lib/llm/src/grpc/service/tensor.rs:269-447
Timestamp: 2025-09-19T18:21:03.693Z
Learning: In lib/llm/src/grpc/service/tensor.rs, GuanLuo prefers using unsafe Vec::from_raw_parts operations to avoid data copying for performance reasons when converting raw tensor data from Vec<u8> to typed vectors like Vec<u32>, Vec<f32>, etc., even when it involves safety risks.
📚 Learning: 2025-09-19T18:21:03.693Z
Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 2746
File: lib/llm/src/grpc/service/tensor.rs:269-447
Timestamp: 2025-09-19T18:21:03.693Z
Learning: In lib/llm/src/grpc/service/tensor.rs, GuanLuo prefers using unsafe Vec::from_raw_parts operations to avoid data copying for performance reasons when converting raw tensor data from Vec<u8> to typed vectors like Vec<u32>, Vec<f32>, etc., even when it involves safety risks.

Applied to files:

  • lib/memory/src/torch.rs
📚 Learning: 2025-08-18T20:51:51.324Z
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 2465
File: lib/runtime/src/pipeline/network/egress/push_router.rs:0-0
Timestamp: 2025-08-18T20:51:51.324Z
Learning: The runtime crate cannot depend on the llm crate due to architectural dependency constraints, preventing imports from lib/llm into lib/runtime.

Applied to files:

  • Cargo.toml
📚 Learning: 2025-08-15T02:01:01.238Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2453
File: deploy/dynamo_check.py:739-741
Timestamp: 2025-08-15T02:01:01.238Z
Learning: In the dynamo project, missing cargo should be treated as a fatal error in dynamo_check.py because developers need cargo to build the Rust components. The tool is designed to validate the complete development environment, not just import functionality.

Applied to files:

  • lib/memory/Cargo.toml
📚 Learning: 2025-09-18T21:49:28.906Z
Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 2989
File: lib/llm/src/block_manager/block/transfer/cuda.rs:416-420
Timestamp: 2025-09-18T21:49:28.906Z
Learning: Pinned memory (page-locked host memory) being accessible by GPU via DMA doesn't change the fact that host and device pointers exist in separate virtual address spaces, so overlap checks between host and device pointers are still invalid regardless of whether the host memory is pinned.

Applied to files:

  • lib/memory/src/pinned.rs
🧬 Code graph analysis (10)
lib/memory/src/offset.rs (3)
lib/memory/src/arena.rs (6)
  • size (170-172)
  • addr (167-169)
  • storage_kind (173-175)
  • as_any (176-178)
  • nixl_descriptor (179-187)
  • nixl_descriptor (201-213)
lib/memory/src/lib.rs (12)
  • size (96-96)
  • size (116-118)
  • size (174-176)
  • addr (93-93)
  • addr (113-115)
  • addr (169-171)
  • storage_kind (99-99)
  • storage_kind (119-121)
  • as_any (102-102)
  • as_any (122-124)
  • nixl_descriptor (105-105)
  • nixl_descriptor (125-127)
lib/memory/src/nixl.rs (8)
  • size (34-36)
  • size (92-94)
  • addr (88-90)
  • storage_kind (96-98)
  • as_any (100-102)
  • nixl_descriptor (104-106)
  • descriptor (55-55)
  • descriptor (114-122)
lib/memory/src/disk.rs (6)
lib/memory/src/arena.rs (6)
  • size (170-172)
  • addr (167-169)
  • storage_kind (173-175)
  • as_any (176-178)
  • nixl_descriptor (179-187)
  • nixl_descriptor (201-213)
lib/memory/src/device.rs (6)
  • size (92-94)
  • addr (88-90)
  • storage_kind (96-98)
  • as_any (100-102)
  • nixl_descriptor (104-106)
  • nixl_params (111-118)
lib/memory/src/lib.rs (12)
  • size (96-96)
  • size (116-118)
  • size (174-176)
  • addr (93-93)
  • addr (113-115)
  • addr (169-171)
  • storage_kind (99-99)
  • storage_kind (119-121)
  • as_any (102-102)
  • as_any (122-124)
  • nixl_descriptor (105-105)
  • nixl_descriptor (125-127)
lib/memory/src/offset.rs (5)
  • size (59-61)
  • addr (55-57)
  • storage_kind (63-65)
  • as_any (67-69)
  • nixl_descriptor (71-76)
lib/memory/src/system.rs (6)
  • size (90-92)
  • addr (86-88)
  • storage_kind (94-96)
  • as_any (98-100)
  • nixl_descriptor (102-104)
  • nixl_params (109-111)
lib/memory/src/nixl.rs (7)
  • size (34-36)
  • size (92-94)
  • addr (88-90)
  • storage_kind (96-98)
  • as_any (100-102)
  • nixl_descriptor (104-106)
  • nixl_params (17-17)
lib/memory/src/actions.rs (3)
lib/memory/src/pinned.rs (2)
  • memset (131-142)
  • size (105-107)
lib/memory/src/system.rs (3)
  • memset (115-126)
  • size (90-92)
  • as_slice (130-132)
lib/memory/src/lib.rs (4)
  • size (96-96)
  • size (116-118)
  • size (174-176)
  • as_slice (181-183)
lib/memory/src/lib.rs (8)
lib/memory/src/offset.rs (8)
  • offset (44-46)
  • fmt (18-24)
  • addr (55-57)
  • size (59-61)
  • storage_kind (63-65)
  • as_any (67-69)
  • nixl_descriptor (71-76)
  • new (31-41)
lib/memory/src/actions.rs (12)
  • std (70-70)
  • std (72-72)
  • std (79-79)
  • std (105-105)
  • std (113-113)
  • std (169-169)
  • std (171-171)
  • std (178-178)
  • std (207-207)
  • std (215-215)
  • as_slice (32-32)
  • slice (45-56)
lib/memory/src/arena.rs (9)
  • fmt (66-72)
  • fmt (154-163)
  • addr (167-169)
  • size (170-172)
  • storage_kind (173-175)
  • as_any (176-178)
  • nixl_descriptor (179-187)
  • nixl_descriptor (201-213)
  • new (99-123)
lib/memory/src/nixl.rs (7)
  • fmt (78-84)
  • addr (88-90)
  • size (34-36)
  • size (92-94)
  • storage_kind (96-98)
  • as_any (100-102)
  • nixl_descriptor (104-106)
lib/memory/src/device.rs (6)
  • addr (88-90)
  • size (92-94)
  • storage_kind (96-98)
  • as_any (100-102)
  • nixl_descriptor (104-106)
  • new (44-61)
lib/memory/src/disk.rs (6)
  • addr (162-164)
  • size (166-168)
  • storage_kind (170-172)
  • as_any (174-176)
  • nixl_descriptor (177-179)
  • new (28-37)
lib/memory/src/pinned.rs (6)
  • addr (101-103)
  • size (105-107)
  • storage_kind (109-111)
  • as_any (113-115)
  • nixl_descriptor (117-119)
  • new (44-67)
lib/memory/src/system.rs (7)
  • addr (86-88)
  • size (90-92)
  • storage_kind (94-96)
  • as_any (98-100)
  • nixl_descriptor (102-104)
  • new (22-56)
  • as_slice (130-132)
lib/memory/src/nixl.rs (7)
lib/memory/src/arena.rs (7)
  • size (170-172)
  • agent_name (223-225)
  • addr (167-169)
  • storage_kind (173-175)
  • as_any (176-178)
  • nixl_descriptor (179-187)
  • nixl_descriptor (201-213)
lib/memory/src/lib.rs (12)
  • size (96-96)
  • size (116-118)
  • size (174-176)
  • addr (93-93)
  • addr (113-115)
  • addr (169-171)
  • storage_kind (99-99)
  • storage_kind (119-121)
  • as_any (102-102)
  • as_any (122-124)
  • nixl_descriptor (105-105)
  • nixl_descriptor (125-127)
lib/memory/src/offset.rs (5)
  • size (59-61)
  • addr (55-57)
  • storage_kind (63-65)
  • as_any (67-69)
  • nixl_descriptor (71-76)
lib/memory/src/device.rs (7)
  • nixl_params (111-118)
  • size (92-94)
  • device_id (69-71)
  • addr (88-90)
  • storage_kind (96-98)
  • as_any (100-102)
  • nixl_descriptor (104-106)
lib/memory/src/disk.rs (6)
  • nixl_params (184-207)
  • size (166-168)
  • addr (162-164)
  • storage_kind (170-172)
  • as_any (174-176)
  • nixl_descriptor (177-179)
lib/memory/src/pinned.rs (7)
  • nixl_params (124-127)
  • as_ptr (73-75)
  • size (105-107)
  • addr (101-103)
  • storage_kind (109-111)
  • as_any (113-115)
  • nixl_descriptor (117-119)
lib/memory/src/system.rs (7)
  • nixl_params (109-111)
  • as_ptr (62-64)
  • size (90-92)
  • addr (86-88)
  • storage_kind (94-96)
  • as_any (98-100)
  • nixl_descriptor (102-104)
lib/memory/src/system.rs (6)
lib/memory/src/actions.rs (13)
  • std (70-70)
  • std (72-72)
  • std (79-79)
  • std (105-105)
  • std (113-113)
  • std (169-169)
  • std (171-171)
  • std (178-178)
  • std (207-207)
  • std (215-215)
  • memset (21-21)
  • as_slice (32-32)
  • slice (45-56)
lib/memory/src/arena.rs (8)
  • new (99-123)
  • drop (242-244)
  • addr (167-169)
  • size (170-172)
  • storage_kind (173-175)
  • as_any (176-178)
  • nixl_descriptor (179-187)
  • nixl_descriptor (201-213)
lib/memory/src/device.rs (8)
  • new (44-61)
  • drop (75-84)
  • addr (88-90)
  • size (92-94)
  • storage_kind (96-98)
  • as_any (100-102)
  • nixl_descriptor (104-106)
  • nixl_params (111-118)
lib/memory/src/offset.rs (7)
  • new (31-41)
  • addr (55-57)
  • size (59-61)
  • storage_kind (63-65)
  • as_any (67-69)
  • nixl_descriptor (71-76)
  • offset (44-46)
lib/memory/src/pinned.rs (11)
  • new (44-67)
  • as_ptr (73-75)
  • as_mut_ptr (82-84)
  • drop (88-97)
  • addr (101-103)
  • size (105-107)
  • storage_kind (109-111)
  • as_any (113-115)
  • nixl_descriptor (117-119)
  • nixl_params (124-127)
  • memset (131-142)
lib/memory/src/nixl.rs (9)
  • as_ptr (30-32)
  • drop (70-74)
  • addr (88-90)
  • size (34-36)
  • size (92-94)
  • storage_kind (96-98)
  • as_any (100-102)
  • nixl_descriptor (104-106)
  • nixl_params (17-17)
lib/memory/src/device.rs (3)
lib/memory/src/pinned.rs (5)
  • cuda_context (14-25)
  • new (44-67)
  • drop (88-97)
  • size (105-107)
  • nixl_params (124-127)
lib/memory/src/nixl.rs (5)
  • device_id (44-46)
  • drop (70-74)
  • size (34-36)
  • size (92-94)
  • nixl_params (17-17)
lib/memory/src/system.rs (4)
  • new (22-56)
  • drop (77-82)
  • size (90-92)
  • nixl_params (109-111)
lib/memory/src/arena.rs (4)
lib/memory/src/lib.rs (13)
  • new (164-166)
  • size (96-96)
  • size (116-118)
  • size (174-176)
  • addr (93-93)
  • addr (113-115)
  • addr (169-171)
  • storage_kind (99-99)
  • storage_kind (119-121)
  • as_any (102-102)
  • as_any (122-124)
  • nixl_descriptor (105-105)
  • nixl_descriptor (125-127)
lib/memory/src/nixl.rs (14)
  • storage (127-129)
  • size (34-36)
  • size (92-94)
  • addr (88-90)
  • storage_kind (96-98)
  • as_ptr (30-32)
  • as_any (100-102)
  • nixl_descriptor (104-106)
  • descriptor (55-55)
  • descriptor (114-122)
  • mem_type (40-42)
  • device_id (44-46)
  • agent_name (52-52)
  • agent_name (110-112)
lib/memory/src/offset.rs (7)
  • new (31-41)
  • size (59-61)
  • offset (44-46)
  • addr (55-57)
  • storage_kind (63-65)
  • as_any (67-69)
  • nixl_descriptor (71-76)
lib/memory/src/system.rs (7)
  • new (22-56)
  • size (90-92)
  • addr (86-88)
  • storage_kind (94-96)
  • as_ptr (62-64)
  • as_any (98-100)
  • nixl_descriptor (102-104)
lib/memory/src/tests.rs (6)
lib/memory/src/nixl.rs (9)
  • storage (127-129)
  • size (34-36)
  • size (92-94)
  • storage_kind (96-98)
  • addr (88-90)
  • device_id (44-46)
  • register_with_nixl (169-196)
  • agent_name (52-52)
  • agent_name (110-112)
lib/memory/src/device.rs (5)
  • new (44-61)
  • size (92-94)
  • storage_kind (96-98)
  • addr (88-90)
  • device_id (69-71)
lib/memory/src/disk.rs (6)
  • new (28-37)
  • size (166-168)
  • storage_kind (170-172)
  • addr (162-164)
  • path (131-133)
  • new_at (39-125)
lib/memory/src/lib.rs (10)
  • new (164-166)
  • size (96-96)
  • size (116-118)
  • size (174-176)
  • storage_kind (99-99)
  • storage_kind (119-121)
  • addr (93-93)
  • addr (113-115)
  • addr (169-171)
  • create_buffer (149-151)
lib/memory/src/pinned.rs (4)
  • new (44-67)
  • size (105-107)
  • storage_kind (109-111)
  • addr (101-103)
lib/memory/src/system.rs (4)
  • new (22-56)
  • size (90-92)
  • storage_kind (94-96)
  • addr (86-88)
lib/memory/src/pinned.rs (4)
lib/memory/src/actions.rs (11)
  • std (70-70)
  • std (72-72)
  • std (79-79)
  • std (105-105)
  • std (113-113)
  • std (169-169)
  • std (171-171)
  • std (178-178)
  • std (207-207)
  • std (215-215)
  • memset (21-21)
lib/memory/src/device.rs (9)
  • cuda_context (13-24)
  • device_id (69-71)
  • new (44-61)
  • drop (75-84)
  • addr (88-90)
  • size (92-94)
  • storage_kind (96-98)
  • nixl_descriptor (104-106)
  • nixl_params (111-118)
lib/memory/src/nixl.rs (9)
  • device_id (44-46)
  • as_ptr (30-32)
  • drop (70-74)
  • addr (88-90)
  • size (34-36)
  • size (92-94)
  • storage_kind (96-98)
  • nixl_descriptor (104-106)
  • nixl_params (17-17)
lib/memory/src/system.rs (9)
  • new (22-56)
  • as_ptr (62-64)
  • drop (77-82)
  • addr (86-88)
  • size (90-92)
  • storage_kind (94-96)
  • nixl_descriptor (102-104)
  • nixl_params (109-111)
  • memset (115-126)
🪛 GitHub Actions: Copyright Checks
Cargo.toml

[error] 1-1: Copyright checkers detected missing or invalid copyright headers. File 'lib/memory/Cargo.toml' is missing required header pattern.

lib/memory/Cargo.toml

[error] 1-1: Copyright checkers detected missing or invalid copyright headers. File 'lib/memory/Cargo.toml' is missing required header pattern.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: sglang (arm64)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: trtllm (arm64)
  • GitHub Check: sglang (amd64)
  • GitHub Check: operator (arm64)
  • GitHub Check: operator (amd64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: vllm (arm64)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: clippy (.)
  • GitHub Check: tests (launch/dynamo-run)
  • GitHub Check: clippy (launch/dynamo-run)
  • GitHub Check: tests (lib/runtime/examples)
  • GitHub Check: tests (.)
  • GitHub Check: tests (lib/bindings/python)
  • GitHub Check: clippy (lib/bindings/python)
🔇 Additional comments (11)
Cargo.toml (1)

10-10: Verify default-members inclusion for lib/memory.

The new lib/memory crate is added to members but not to default-members, unlike other core library crates (lib/llm, lib/runtime, lib/tokens, etc.). Confirm whether this exclusion is intentional (e.g., if the crate is slow to build or only used internally).

lib/memory/Cargo.toml (1)

16-25: All external dependency versions are current and compatible.

The dependencies declared are nixl-sys 0.7 (latest 0.7.0), nix 0.30 (latest 0.30.1), offset-allocator 0.2 (latest 0.2.0), and libc 0.2 (latest 0.2.176). All are within compatible semver ranges as declared in the manifest.

lib/memory/src/offset.rs (5)

8-25: LGTM: Clean structure and debug formatting.

The OffsetBuffer struct design is appropriate for representing sub-region views, and the manual Debug implementation provides clear visibility into the view's composition.


31-41: LGTM: Robust bounds validation.

The constructor correctly validates that the offset region fits within the base buffer bounds, using checked_add to prevent integer overflow.


44-51: LGTM: Simple and correct accessors.

The accessor methods provide appropriate read-only access to the offset and base buffer.


55-57: Verify that address arithmetic won't overflow.

The addr() method performs unchecked addition self.base.addr() + self.offset. While the constructor validates that offset + size <= base.size(), it doesn't guarantee that adding the offset to the base address won't overflow usize::MAX. This is unlikely in practice for valid memory addresses, but could be a concern if base addresses are near the upper limit of the address space.

Consider using checked_add here for defensive programming:

 fn addr(&self) -> usize {
-    self.base.addr() + self.offset
+    self.base.addr().saturating_add(self.offset)
 }

71-76: LGTM: Consistent with arena.rs pattern.

The nixl_descriptor() implementation correctly adjusts the base descriptor to reflect the offset sub-region, consistent with the pattern used in arena.rs (lines 172-186).

lib/memory/src/prelude.rs (1)

1-6: LGTM: Standard prelude pattern.

The re-exports are appropriate for exposing the memory crate's public API surface, following standard Rust conventions for prelude modules.

lib/memory/src/torch.rs (3)

4-8: LGTM: Clean enum design.

The TorchDevice enum provides a clean abstraction for device types with appropriate derives and an extensible Other variant for future device types.


10-21: LGTM: Idiomatic helper methods.

The helper methods is_cuda() and cuda_device_index() are well-designed and follow Rust conventions, providing convenient access to device information.


23-29: LGTM: Well-designed trait for PyTorch interop.

The TorchTensor trait provides a clean abstraction for tensor metadata with appropriate supertraits (Send + Sync + Debug). The use of u64 for data_ptr() is reasonable given the target platforms for CUDA.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Ryan Olson <ryanolson@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Ryan Olson <ryanolson@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Ryan Olson <ryanolson@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Ryan Olson <ryanolson@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Ryan Olson <ryanolson@users.noreply.github.com>
Signed-off-by: Ryan Olson <rolson@nvidia.com>
Signed-off-by: Ryan Olson <rolson@nvidia.com>
Signed-off-by: Ryan Olson <rolson@nvidia.com>
Signed-off-by: Ryan Olson <rolson@nvidia.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Ryan Olson <ryanolson@users.noreply.github.com>
@ryanolson
Copy link
Contributor Author

@coderabbitai - full review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 31, 2025

@ryanolson: I'll perform a full review of the PR.

✅ Actions performed

Full review triggered.

@ryanolson ryanolson requested a review from kthui October 31, 2025 06:03
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (1)
lib/memory/src/actions.rs (1)

173-189: Handle zero-sized element types before dividing

as_slice_typed_mut still divides by size_of::<T>() without guarding against ZSTs, so passing T = () (or any zero-sized type) will panic in safe code. Please mirror the immutable path: reject elem_size == 0 up front and reuse the cached value for both the division and modulo checks.

Apply this diff:

-        let ptr = bytes.as_mut_ptr() as *mut T;
-        let len = bytes.len() / std::mem::size_of::<T>();
-
-        if !(bytes.as_ptr() as usize).is_multiple_of(std::mem::align_of::<T>()) {
+        let ptr = bytes.as_mut_ptr() as *mut T;
+        let elem_size = std::mem::size_of::<T>();
+        if elem_size == 0 {
+            return Err(StorageError::Unsupported(
+                "zero-sized types are not supported".into(),
+            ));
+        }
+        let len = bytes.len() / elem_size;
+
+        if !(bytes.as_ptr() as usize).is_multiple_of(std::mem::align_of::<T>()) {
             return Err(StorageError::Unsupported(format!(
                 "memory not aligned for type (required alignment: {})",
                 std::mem::align_of::<T>()
             )));
         }

-        if bytes.len() % std::mem::size_of::<T>() != 0 {
+        if bytes.len() % elem_size != 0 {
             return Err(StorageError::Unsupported(format!(
                 "size {} is not a multiple of type size {}",
                 bytes.len(),
-                std::mem::size_of::<T>()
+                elem_size
             )));
         }
🧹 Nitpick comments (2)
lib/memory/src/pinned.rs (1)

13-25: Consider exposing device selection to PinnedStorage::new.

The cuda_context function supports arbitrary device IDs, but PinnedStorage::new (line 51) hardcodes device 0. If multi-device pinned allocations are expected, consider adding a device_id parameter to PinnedStorage::new to leverage this flexibility.

lib/memory/src/lib.rs (1)

61-63: Remove or implement CUDA feature gating.

Line 61 has a commented #[cfg(feature = "cuda")], but the Cuda variant (lines 62-63) is not actually feature-gated. This appears in multiple places (lines 61, 76, 80). Either remove the comments or implement consistent feature gating to make the CUDA dependency optional.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 22d910a and 876a4aa.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (18)
  • Cargo.toml (3 hunks)
  • lib/config/Cargo.toml (1 hunks)
  • lib/config/src/lib.rs (1 hunks)
  • lib/memory/Cargo.toml (1 hunks)
  • lib/memory/src/actions.rs (1 hunks)
  • lib/memory/src/arena.rs (1 hunks)
  • lib/memory/src/device.rs (1 hunks)
  • lib/memory/src/disk.rs (1 hunks)
  • lib/memory/src/lib.rs (1 hunks)
  • lib/memory/src/nixl.rs (1 hunks)
  • lib/memory/src/nixl/agent.rs (1 hunks)
  • lib/memory/src/nixl/config.rs (1 hunks)
  • lib/memory/src/offset.rs (1 hunks)
  • lib/memory/src/pinned.rs (1 hunks)
  • lib/memory/src/prelude.rs (1 hunks)
  • lib/memory/src/system.rs (1 hunks)
  • lib/memory/src/tests.rs (1 hunks)
  • lib/memory/src/torch.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (11)
📚 Learning: 2025-10-17T22:57:12.526Z
Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 3700
File: lib/llm/src/block_manager.rs:19-19
Timestamp: 2025-10-17T22:57:12.526Z
Learning: The dynamo library (ai-dynamo/dynamo repository) is currently developed to run on Linux only, so Linux-specific code and syscalls do not require platform guards.

Applied to files:

  • Cargo.toml
  • lib/config/Cargo.toml
  • lib/memory/Cargo.toml
📚 Learning: 2025-08-18T20:51:51.324Z
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 2465
File: lib/runtime/src/pipeline/network/egress/push_router.rs:0-0
Timestamp: 2025-08-18T20:51:51.324Z
Learning: The runtime crate cannot depend on the llm crate due to architectural dependency constraints, preventing imports from lib/llm into lib/runtime.

Applied to files:

  • Cargo.toml
📚 Learning: 2025-08-15T02:01:01.238Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2453
File: deploy/dynamo_check.py:739-741
Timestamp: 2025-08-15T02:01:01.238Z
Learning: In the dynamo project, missing cargo should be treated as a fatal error in dynamo_check.py because developers need cargo to build the Rust components. The tool is designed to validate the complete development environment, not just import functionality.

Applied to files:

  • Cargo.toml
  • lib/memory/Cargo.toml
📚 Learning: 2025-10-17T22:57:18.773Z
Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 3700
File: lib/llm/src/block_manager/numa_allocator.rs:4-6
Timestamp: 2025-10-17T22:57:18.773Z
Learning: The Dynamo library (lib/llm) is developed to run on Linux only, so platform-specific cfg guards for Linux-only features are not required.

Applied to files:

  • Cargo.toml
📚 Learning: 2025-09-22T18:09:23.513Z
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3165
File: components/backends/sglang/src/dynamo/sglang/args.py:201-202
Timestamp: 2025-09-22T18:09:23.513Z
Learning: KrishnanPrash suggested adding early validation for custom Jinja template paths in the Rust layer (lib/bindings/python/rust/lib.rs) to benefit both vLLM and SGLang workflows, using PathBuf::from() and path.exists() checks with appropriate PyFileNotFoundError handling.

Applied to files:

  • lib/memory/src/disk.rs
📚 Learning: 2025-09-11T03:24:47.820Z
Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3004
File: lib/runtime/src/pipeline/network/ingress/push_handler.rs:271-277
Timestamp: 2025-09-11T03:24:47.820Z
Learning: In lib/runtime/src/pipeline/network/ingress/push_handler.rs, the maintainer prefers to keep the existing error comparison logic using format!("{:?}", err) == STREAM_ERR_MSG unchanged until proper error types are implemented, even though it has technical debt. Avoid suggesting changes to working legacy code that will be refactored later.

Applied to files:

  • lib/memory/src/disk.rs
  • lib/memory/src/actions.rs
📚 Learning: 2025-09-22T18:09:23.513Z
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3165
File: components/backends/sglang/src/dynamo/sglang/args.py:201-202
Timestamp: 2025-09-22T18:09:23.513Z
Learning: Both vLLM and SGLang implementations of --custom-jinja-template pass the path directly to register_llm() without any validation. KrishnanPrash suggested implementing early validation in the Rust layer (lib/bindings/python/rust/lib.rs) using PathBuf::from() and path.exists() checks with PyFileNotFoundError for consistent error handling across both backends.

Applied to files:

  • lib/memory/src/disk.rs
📚 Learning: 2025-09-19T18:21:03.693Z
Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 2746
File: lib/llm/src/grpc/service/tensor.rs:269-447
Timestamp: 2025-09-19T18:21:03.693Z
Learning: In lib/llm/src/grpc/service/tensor.rs, GuanLuo prefers using unsafe Vec::from_raw_parts operations to avoid data copying for performance reasons when converting raw tensor data from Vec<u8> to typed vectors like Vec<u32>, Vec<f32>, etc., even when it involves safety risks.

Applied to files:

  • lib/memory/src/torch.rs
📚 Learning: 2025-09-18T21:49:28.906Z
Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 2989
File: lib/llm/src/block_manager/block/transfer/cuda.rs:416-420
Timestamp: 2025-09-18T21:49:28.906Z
Learning: Pinned memory (page-locked host memory) being accessible by GPU via DMA doesn't change the fact that host and device pointers exist in separate virtual address spaces, so overlap checks between host and device pointers are still invalid regardless of whether the host memory is pinned.

Applied to files:

  • lib/memory/src/pinned.rs
📚 Learning: 2025-08-05T22:51:59.230Z
Learnt from: dmitry-tokarev-nv
Repo: ai-dynamo/dynamo PR: 2300
File: pyproject.toml:64-66
Timestamp: 2025-08-05T22:51:59.230Z
Learning: The ai-dynamo/dynamo project does not ship ARM64 wheels, so platform markers to restrict dependencies to x86_64 are not needed in pyproject.toml dependencies.

Applied to files:

  • lib/config/Cargo.toml
📚 Learning: 2025-08-30T20:43:10.091Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project, devcontainer.json files use templated container names (like "dynamo-vllm-devcontainer") that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.

Applied to files:

  • lib/config/Cargo.toml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: sglang (amd64)
  • GitHub Check: vllm (arm64)
  • GitHub Check: trtllm (arm64)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: operator (arm64)
  • GitHub Check: operator (amd64)
  • GitHub Check: sglang (arm64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: tests (lib/runtime/examples)
  • GitHub Check: tests (launch/dynamo-run)
  • GitHub Check: tests (lib/bindings/python)
  • GitHub Check: tests (.)
  • GitHub Check: Mirror Repository to GitLab
🔇 Additional comments (3)
lib/memory/src/pinned.rs (2)

117-119: Clarify nixl_descriptor design intent.

nixl_descriptor() returns None, but NixlCompatible is implemented below (lines 123-128), suggesting NIXL registration is supported. Is this intentional? If pinned memory requires explicit NIXL registration via the NixlCompatible interface rather than automatic descriptor creation, consider adding a comment explaining this design choice.


130-146: LGTM: Overflow protection properly implemented.

The previous overflow concern has been correctly addressed using checked_add (lines 132-134). The bounds checking and unsafe operations are now properly protected.

lib/memory/src/lib.rs (1)

108-151: LGTM: Clean type erasure pattern.

The Buffer wrapper around Arc<dyn MemoryDescription> provides an excellent type-erased abstraction. The Deref implementation enables transparent access, and the create_buffer helper offers good ergonomics for wrapping concrete storage types.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Ryan Olson <ryanolson@users.noreply.github.com>
Signed-off-by: Ryan Olson <rolson@nvidia.com>
Signed-off-by: Ryan Olson <rolson@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants