feat: kvbm v2 memory crate #4012

ryanolson · 2025-10-30T22:47:22Z

Summary by CodeRabbit

New Features
- New memory management library: system, disk, CUDA device, and pinned host backends; arena allocator; typed/untype slice & memset helpers; NIXL registration and RAII lifecycle helpers; Torch tensor interoperability; offset sub-region views; type-erased buffer handle.
- New config utilities: env boolean parsing and helpers.
Tests
- Comprehensive unit/integration tests covering storage backends, arena behavior, NIXL registration, and config parsing.
Chores
- Workspace updated to include new config and memory packages.

Signed-off-by: Ryan Olson <rolson@nvidia.com>

coderabbitai · 2025-10-30T22:56:46Z

Walkthrough

Adds a new lib/memory Rust library crate with type-erased storage abstractions, multiple concrete backends (system, pinned, device, disk), arena allocator, slicing/offset views, NIXL registration integration, Torch interoperability types, and extensive tests. Also adds a new lib/config crate and updates workspace configuration.

Changes

Cohort / File(s)	Summary
Workspace configuration `Cargo.toml`	Adds `lib/memory` and `lib/config` to workspace members and `lib/config` to `default-members`; adds workspace dependency `dynamo-config = { path = "lib/config", version = "0.6.1" }`
Memory crate manifest `lib/memory/Cargo.toml`	New crate manifest for `dynamo-memory` with workspace-driven metadata, features (`unsafe-slices`, testing-*), and dependencies (dynamo-config, cudarc, nixl-sys, etc.)
Config crate manifest `lib/config/Cargo.toml`	New crate manifest using `workspace = true` and workspace-scoped dependency declarations
Core API surface & re-exports `lib/memory/src/lib.rs`, `lib/memory/src/prelude.rs`	Introduces `MemoryDescription` trait, `Buffer` type, `MemoryRegion`, `StorageKind`, `StorageError`/Result alias, public re-exports of backends and prelude items (`MemoryDescription`, `NixlCompatible`, `RegisteredView`)
Action traits `lib/memory/src/actions.rs`	Adds `Memset`, `Slice`, and `SliceMut` traits with typed-slice helpers, bounds/alignment checks, and unsafe transmute usage (caller-guarded)
System backend `lib/memory/src/system.rs`	Implements `SystemStorage` (posix_memalign), MemoryDescription/NixlCompatible, Memset and Slice implementations, unsafe ptr accessors and Drop
Pinned (CUDA) backend `lib/memory/src/pinned.rs`	Implements `PinnedStorage` with per-device CUDA context caching, cudaHostAlloc usage, NixlCompatible, Memset, unsafe accessors and Drop
Device (CUDA) backend `lib/memory/src/device.rs`	Implements `DeviceStorage` with per-device CUDA context caching, device allocation/free, MemoryDescription and Nixl compatibility, and Send/Sync unsafe impls
Disk backend `lib/memory/src/disk.rs`	Implements `DiskStorage` using mkostemp/open + fallocate, path/fd accessors, unlink management, MemoryDescription and Nixl params, and Drop cleanup
Arena allocator `lib/memory/src/arena.rs`	Adds `ArenaAllocator` and `ArenaBuffer` with page-based allocation, error enum `ArenaError`, MemoryDescription impls for buffers, Drop to free allocations, and extensive tests
Offset views `lib/memory/src/offset.rs`	Adds `OffsetBuffer` for bounds-checked sub-region views of a `Buffer`, propagating descriptors and implementing MemoryDescription
NIXL integration (core) `lib/memory/src/nixl.rs`	Adds `NixlCompatible` trait, `NixlDescriptor`, `NixlRegistered` wrapper (RAII for registration), `RegisteredView` trait, and `register_with_nixl()` helper
NIXL agent wrapper `lib/memory/src/nixl/agent.rs`	Adds `NixlAgent` wrapper around `nixl_sys::Agent` tracking initialized backends, backend helpers, Deref to Agent, and tests
NIXL backend config `lib/memory/src/nixl/config.rs`	Adds `NixlBackendConfig` parsing from env vars, backend management, merge/has_backend helpers, and tests
Torch interop types `lib/memory/src/torch.rs`	Adds `TorchDevice` enum and `TorchTensor` trait declarations
Tests `lib/memory/src/tests.rs`	Comprehensive test suite covering System/Disk/Pinned/Device backends, NIXL registration workflows, Arena allocator behavior, and feature-gated CUDA/NIXL scenarios
Misc (module files) `lib/memory/src/nixl.rs`, `lib/memory/src/actions.rs`, `lib/memory/src/...`	New modules and implementations documented across the crate (nixl, offset, arena, actions, prelude, pinned, device, disk, system, torch)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas needing extra attention:
- CUDA-related unsafe FFI and per-device context caching (device.rs, pinned.rs)
- Memory safety of typed-slice conversions and alignment checks in actions.rs
- Arena allocation logic, page math, and Drop/free sequencing (arena.rs)
- NIXL registration lifecycle and Drop ordering in nixl.rs and nixl/agent.rs
- File-backed allocation and unlink semantics on Unix (disk.rs)

Poem

🐰 I dug a lane of bytes and beds,

pages, pins, and CUDA threads.
Buffers nested, descriptors shine,
NIXL gates and config fine.
Hoppity hop — the memory grows divine! 🌿

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request description is entirely empty, with no content provided in any of the required template sections. The repository's template specifies that pull requests should include an Overview, Details explaining the changes, a section indicating where reviewers should start, and Related Issues. This is a large, XXL-sized PR that introduces substantial new public APIs across multiple modules including trait definitions, storage backends, arena allocation, NIXL integration, and configuration utilities. The complete absence of descriptive content makes it difficult for reviewers to understand the objectives, key files to focus on, or the relationship to any related issues.	The author should add a comprehensive pull request description following the repository template. At minimum, the description should include: an overview of the kvbm v2 memory crate objectives and design philosophy, details of the major modules and their purposes (lib/memory backends, arena allocation, NIXL integration, lib/config utilities), specific files or modules reviewers should prioritize, and any related GitHub issues using action keywords (Closes/Fixes/Resolves/Relates to). Given the XXL size and scope of this PR, detailed guidance on the review starting point would be particularly valuable.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The PR title "feat: kvbm v2 memory crate" directly relates to the primary change in the changeset, which is the introduction of a new `lib/memory` crate implementing a comprehensive v2 storage API with multiple backends (SystemStorage, DeviceStorage, DiskStorage, PinnedStorage), arena allocation, and NIXL integration. The title uses conventional commit format and is concise and specific enough that a reviewer scanning the history would understand the main focus is adding a new memory system. While the PR also introduces a `lib/config` crate, the title appropriately emphasizes the larger and more complex memory crate as the main contribution.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 9

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 22d910a and 5c90a41.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (14)

Cargo.toml (1 hunks)
lib/memory/Cargo.toml (1 hunks)
lib/memory/src/actions.rs (1 hunks)
lib/memory/src/arena.rs (1 hunks)
lib/memory/src/device.rs (1 hunks)
lib/memory/src/disk.rs (1 hunks)
lib/memory/src/lib.rs (1 hunks)
lib/memory/src/nixl.rs (1 hunks)
lib/memory/src/offset.rs (1 hunks)
lib/memory/src/pinned.rs (1 hunks)
lib/memory/src/prelude.rs (1 hunks)
lib/memory/src/system.rs (1 hunks)
lib/memory/src/tests.rs (1 hunks)
lib/memory/src/torch.rs (1 hunks)

🧰 Additional context used

🧠 Learnings (5)

📓 Common learnings

Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 2746
File: lib/llm/src/grpc/service/tensor.rs:269-447
Timestamp: 2025-09-19T18:21:03.693Z
Learning: In lib/llm/src/grpc/service/tensor.rs, GuanLuo prefers using unsafe Vec::from_raw_parts operations to avoid data copying for performance reasons when converting raw tensor data from Vec<u8> to typed vectors like Vec<u32>, Vec<f32>, etc., even when it involves safety risks.

📚 Learning: 2025-09-19T18:21:03.693Z

Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 2746
File: lib/llm/src/grpc/service/tensor.rs:269-447
Timestamp: 2025-09-19T18:21:03.693Z
Learning: In lib/llm/src/grpc/service/tensor.rs, GuanLuo prefers using unsafe Vec::from_raw_parts operations to avoid data copying for performance reasons when converting raw tensor data from Vec<u8> to typed vectors like Vec<u32>, Vec<f32>, etc., even when it involves safety risks.

Applied to files:

lib/memory/src/torch.rs

📚 Learning: 2025-08-18T20:51:51.324Z

Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 2465
File: lib/runtime/src/pipeline/network/egress/push_router.rs:0-0
Timestamp: 2025-08-18T20:51:51.324Z
Learning: The runtime crate cannot depend on the llm crate due to architectural dependency constraints, preventing imports from lib/llm into lib/runtime.

Applied to files:

Cargo.toml

📚 Learning: 2025-08-15T02:01:01.238Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2453
File: deploy/dynamo_check.py:739-741
Timestamp: 2025-08-15T02:01:01.238Z
Learning: In the dynamo project, missing cargo should be treated as a fatal error in dynamo_check.py because developers need cargo to build the Rust components. The tool is designed to validate the complete development environment, not just import functionality.

Applied to files:

lib/memory/Cargo.toml

📚 Learning: 2025-09-18T21:49:28.906Z

Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 2989
File: lib/llm/src/block_manager/block/transfer/cuda.rs:416-420
Timestamp: 2025-09-18T21:49:28.906Z
Learning: Pinned memory (page-locked host memory) being accessible by GPU via DMA doesn't change the fact that host and device pointers exist in separate virtual address spaces, so overlap checks between host and device pointers are still invalid regardless of whether the host memory is pinned.

Applied to files:

lib/memory/src/pinned.rs

🧬 Code graph analysis (10)

lib/memory/src/offset.rs (3)

lib/memory/src/arena.rs (6)

size (170-172)

addr (167-169)

storage_kind (173-175)

as_any (176-178)

nixl_descriptor (179-187)

nixl_descriptor (201-213)

lib/memory/src/lib.rs (12)

size (96-96)

size (116-118)

size (174-176)

addr (93-93)

addr (113-115)

addr (169-171)

storage_kind (99-99)

storage_kind (119-121)

as_any (102-102)

as_any (122-124)

nixl_descriptor (105-105)

nixl_descriptor (125-127)

lib/memory/src/nixl.rs (8)

size (34-36)

size (92-94)

addr (88-90)

storage_kind (96-98)

as_any (100-102)

nixl_descriptor (104-106)

descriptor (55-55)

descriptor (114-122)

lib/memory/src/disk.rs (6)

lib/memory/src/arena.rs (6)

size (170-172)

addr (167-169)

storage_kind (173-175)

as_any (176-178)

nixl_descriptor (179-187)

nixl_descriptor (201-213)

lib/memory/src/device.rs (6)

size (92-94)

addr (88-90)

storage_kind (96-98)

as_any (100-102)

nixl_descriptor (104-106)

nixl_params (111-118)

lib/memory/src/lib.rs (12)

size (96-96)

size (116-118)

size (174-176)

addr (93-93)

addr (113-115)

addr (169-171)

storage_kind (99-99)

storage_kind (119-121)

as_any (102-102)

as_any (122-124)

nixl_descriptor (105-105)

nixl_descriptor (125-127)

lib/memory/src/offset.rs (5)

size (59-61)

addr (55-57)

storage_kind (63-65)

as_any (67-69)

nixl_descriptor (71-76)

lib/memory/src/system.rs (6)

size (90-92)

addr (86-88)

storage_kind (94-96)

as_any (98-100)

nixl_descriptor (102-104)

nixl_params (109-111)

lib/memory/src/nixl.rs (7)

size (34-36)

size (92-94)

addr (88-90)

storage_kind (96-98)

as_any (100-102)

nixl_descriptor (104-106)

nixl_params (17-17)

lib/memory/src/actions.rs (3)

lib/memory/src/pinned.rs (2)

memset (131-142)

size (105-107)

lib/memory/src/system.rs (3)

memset (115-126)

size (90-92)

as_slice (130-132)

lib/memory/src/lib.rs (4)

size (96-96)

size (116-118)

size (174-176)

as_slice (181-183)

lib/memory/src/lib.rs (8)

lib/memory/src/offset.rs (8)

offset (44-46)

fmt (18-24)

addr (55-57)

size (59-61)

storage_kind (63-65)

as_any (67-69)

nixl_descriptor (71-76)

new (31-41)

lib/memory/src/actions.rs (12)

std (70-70)

std (72-72)

std (79-79)

std (105-105)

std (113-113)

std (169-169)

std (171-171)

std (178-178)

std (207-207)

std (215-215)

as_slice (32-32)

slice (45-56)

lib/memory/src/arena.rs (9)

fmt (66-72)

fmt (154-163)

addr (167-169)

size (170-172)

storage_kind (173-175)

as_any (176-178)

nixl_descriptor (179-187)

nixl_descriptor (201-213)

new (99-123)

lib/memory/src/nixl.rs (7)

fmt (78-84)

addr (88-90)

size (34-36)

size (92-94)

storage_kind (96-98)

as_any (100-102)

nixl_descriptor (104-106)

lib/memory/src/device.rs (6)

addr (88-90)

size (92-94)

storage_kind (96-98)

as_any (100-102)

nixl_descriptor (104-106)

new (44-61)

lib/memory/src/disk.rs (6)

addr (162-164)

size (166-168)

storage_kind (170-172)

as_any (174-176)

nixl_descriptor (177-179)

new (28-37)

lib/memory/src/pinned.rs (6)

addr (101-103)

size (105-107)

storage_kind (109-111)

as_any (113-115)

nixl_descriptor (117-119)

new (44-67)

lib/memory/src/system.rs (7)

addr (86-88)

size (90-92)

storage_kind (94-96)

as_any (98-100)

nixl_descriptor (102-104)

new (22-56)

as_slice (130-132)

lib/memory/src/nixl.rs (7)

lib/memory/src/arena.rs (7)

size (170-172)

agent_name (223-225)

addr (167-169)

storage_kind (173-175)

as_any (176-178)

nixl_descriptor (179-187)

nixl_descriptor (201-213)

lib/memory/src/lib.rs (12)

size (96-96)

size (116-118)

size (174-176)

addr (93-93)

addr (113-115)

addr (169-171)

storage_kind (99-99)

storage_kind (119-121)

as_any (102-102)

as_any (122-124)

nixl_descriptor (105-105)

nixl_descriptor (125-127)

lib/memory/src/offset.rs (5)

size (59-61)

addr (55-57)

storage_kind (63-65)

as_any (67-69)

nixl_descriptor (71-76)

lib/memory/src/device.rs (7)

nixl_params (111-118)

size (92-94)

device_id (69-71)

addr (88-90)

storage_kind (96-98)

as_any (100-102)

nixl_descriptor (104-106)

lib/memory/src/disk.rs (6)

nixl_params (184-207)

size (166-168)

addr (162-164)

storage_kind (170-172)

as_any (174-176)

nixl_descriptor (177-179)

lib/memory/src/pinned.rs (7)

nixl_params (124-127)

as_ptr (73-75)

size (105-107)

addr (101-103)

storage_kind (109-111)

as_any (113-115)

nixl_descriptor (117-119)

lib/memory/src/system.rs (7)

nixl_params (109-111)

as_ptr (62-64)

size (90-92)

addr (86-88)

storage_kind (94-96)

as_any (98-100)

nixl_descriptor (102-104)

lib/memory/src/system.rs (6)

lib/memory/src/actions.rs (13)

std (70-70)

std (72-72)

std (79-79)

std (105-105)

std (113-113)

std (169-169)

std (171-171)

std (178-178)

std (207-207)

std (215-215)

memset (21-21)

as_slice (32-32)

slice (45-56)

lib/memory/src/arena.rs (8)

new (99-123)

drop (242-244)

addr (167-169)

size (170-172)

storage_kind (173-175)

as_any (176-178)

nixl_descriptor (179-187)

nixl_descriptor (201-213)

lib/memory/src/device.rs (8)

new (44-61)

drop (75-84)

addr (88-90)

size (92-94)

storage_kind (96-98)

as_any (100-102)

nixl_descriptor (104-106)

nixl_params (111-118)

lib/memory/src/offset.rs (7)

new (31-41)

addr (55-57)

size (59-61)

storage_kind (63-65)

as_any (67-69)

nixl_descriptor (71-76)

offset (44-46)

lib/memory/src/pinned.rs (11)

new (44-67)

as_ptr (73-75)

as_mut_ptr (82-84)

drop (88-97)

addr (101-103)

size (105-107)

storage_kind (109-111)

as_any (113-115)

nixl_descriptor (117-119)

nixl_params (124-127)

memset (131-142)

lib/memory/src/nixl.rs (9)

as_ptr (30-32)

drop (70-74)

addr (88-90)

size (34-36)

size (92-94)

storage_kind (96-98)

as_any (100-102)

nixl_descriptor (104-106)

nixl_params (17-17)

lib/memory/src/device.rs (3)

lib/memory/src/pinned.rs (5)

cuda_context (14-25)

new (44-67)

drop (88-97)

size (105-107)

nixl_params (124-127)

lib/memory/src/nixl.rs (5)

device_id (44-46)

drop (70-74)

size (34-36)

size (92-94)

nixl_params (17-17)

lib/memory/src/system.rs (4)

new (22-56)

drop (77-82)

size (90-92)

nixl_params (109-111)

lib/memory/src/arena.rs (4)

lib/memory/src/lib.rs (13)

new (164-166)

size (96-96)

size (116-118)

size (174-176)

addr (93-93)

addr (113-115)

addr (169-171)

storage_kind (99-99)

storage_kind (119-121)

as_any (102-102)

as_any (122-124)

nixl_descriptor (105-105)

nixl_descriptor (125-127)

lib/memory/src/nixl.rs (14)

storage (127-129)

size (34-36)

size (92-94)

addr (88-90)

storage_kind (96-98)

as_ptr (30-32)

as_any (100-102)

nixl_descriptor (104-106)

descriptor (55-55)

descriptor (114-122)

mem_type (40-42)

device_id (44-46)

agent_name (52-52)

agent_name (110-112)

lib/memory/src/offset.rs (7)

new (31-41)

size (59-61)

offset (44-46)

addr (55-57)

storage_kind (63-65)

as_any (67-69)

nixl_descriptor (71-76)

lib/memory/src/system.rs (7)

new (22-56)

size (90-92)

addr (86-88)

storage_kind (94-96)

as_ptr (62-64)

as_any (98-100)

nixl_descriptor (102-104)

lib/memory/src/tests.rs (6)

lib/memory/src/nixl.rs (9)

storage (127-129)

size (34-36)

size (92-94)

storage_kind (96-98)

addr (88-90)

device_id (44-46)

register_with_nixl (169-196)

agent_name (52-52)

agent_name (110-112)

lib/memory/src/device.rs (5)

new (44-61)

size (92-94)

storage_kind (96-98)

addr (88-90)

device_id (69-71)

lib/memory/src/disk.rs (6)

new (28-37)

size (166-168)

storage_kind (170-172)

addr (162-164)

path (131-133)

new_at (39-125)

lib/memory/src/lib.rs (10)

new (164-166)

size (96-96)

size (116-118)

size (174-176)

storage_kind (99-99)

storage_kind (119-121)

addr (93-93)

addr (113-115)

addr (169-171)

create_buffer (149-151)

lib/memory/src/pinned.rs (4)

new (44-67)

size (105-107)

storage_kind (109-111)

addr (101-103)

lib/memory/src/system.rs (4)

new (22-56)

size (90-92)

storage_kind (94-96)

addr (86-88)

lib/memory/src/pinned.rs (4)

lib/memory/src/actions.rs (11)

std (70-70)

std (72-72)

std (79-79)

std (105-105)

std (113-113)

std (169-169)

std (171-171)

std (178-178)

std (207-207)

std (215-215)

memset (21-21)

lib/memory/src/device.rs (9)

cuda_context (13-24)

device_id (69-71)

new (44-61)

drop (75-84)

addr (88-90)

size (92-94)

storage_kind (96-98)

nixl_descriptor (104-106)

nixl_params (111-118)

lib/memory/src/nixl.rs (9)

device_id (44-46)

as_ptr (30-32)

drop (70-74)

addr (88-90)

size (34-36)

size (92-94)

storage_kind (96-98)

nixl_descriptor (104-106)

nixl_params (17-17)

lib/memory/src/system.rs (9)

new (22-56)

as_ptr (62-64)

drop (77-82)

addr (86-88)

size (90-92)

storage_kind (94-96)

nixl_descriptor (102-104)

nixl_params (109-111)

memset (115-126)

🪛 GitHub Actions: Copyright Checks

Cargo.toml

[error] 1-1: Copyright checkers detected missing or invalid copyright headers. File 'lib/memory/Cargo.toml' is missing required header pattern.

lib/memory/Cargo.toml

[error] 1-1: Copyright checkers detected missing or invalid copyright headers. File 'lib/memory/Cargo.toml' is missing required header pattern.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)

GitHub Check: sglang (arm64)
GitHub Check: trtllm (amd64)
GitHub Check: trtllm (arm64)
GitHub Check: sglang (amd64)
GitHub Check: operator (arm64)
GitHub Check: operator (amd64)
GitHub Check: vllm (amd64)
GitHub Check: vllm (arm64)
GitHub Check: Build and Test - dynamo
GitHub Check: clippy (.)
GitHub Check: tests (launch/dynamo-run)
GitHub Check: clippy (launch/dynamo-run)
GitHub Check: tests (lib/runtime/examples)
GitHub Check: tests (.)
GitHub Check: tests (lib/bindings/python)
GitHub Check: clippy (lib/bindings/python)

🔇 Additional comments (11)

Cargo.toml (1)

10-10: Verify default-members inclusion for lib/memory.

The new lib/memory crate is added to members but not to default-members, unlike other core library crates (lib/llm, lib/runtime, lib/tokens, etc.). Confirm whether this exclusion is intentional (e.g., if the crate is slow to build or only used internally).

lib/memory/Cargo.toml (1)

16-25: All external dependency versions are current and compatible.

The dependencies declared are nixl-sys 0.7 (latest 0.7.0), nix 0.30 (latest 0.30.1), offset-allocator 0.2 (latest 0.2.0), and libc 0.2 (latest 0.2.176). All are within compatible semver ranges as declared in the manifest.
lib/memory/src/offset.rs (5)
8-25: LGTM: Clean structure and debug formatting.

The OffsetBuffer struct design is appropriate for representing sub-region views, and the manual Debug implementation provides clear visibility into the view's composition.

31-41: LGTM: Robust bounds validation.

The constructor correctly validates that the offset region fits within the base buffer bounds, using checked_add to prevent integer overflow.

44-51: LGTM: Simple and correct accessors.

The accessor methods provide appropriate read-only access to the offset and base buffer.

55-57: Verify that address arithmetic won't overflow.

The addr() method performs unchecked addition self.base.addr() + self.offset. While the constructor validates that offset + size <= base.size(), it doesn't guarantee that adding the offset to the base address won't overflow usize::MAX. This is unlikely in practice for valid memory addresses, but could be a concern if base addresses are near the upper limit of the address space.

Consider using checked_add here for defensive programming:
 fn addr(&self) -> usize {
-    self.base.addr() + self.offset
+    self.base.addr().saturating_add(self.offset)
 }
71-76: LGTM: Consistent with arena.rs pattern.

The nixl_descriptor() implementation correctly adjusts the base descriptor to reflect the offset sub-region, consistent with the pattern used in arena.rs (lines 172-186).
lib/memory/src/prelude.rs (1)

1-6: LGTM: Standard prelude pattern.

The re-exports are appropriate for exposing the memory crate's public API surface, following standard Rust conventions for prelude modules.

lib/memory/src/torch.rs (3)

4-8: LGTM: Clean enum design.

The TorchDevice enum provides a clean abstraction for device types with appropriate derives and an extensible Other variant for future device types.

10-21: LGTM: Idiomatic helper methods.

The helper methods is_cuda() and cuda_device_index() are well-designed and follow Rust conventions, providing convenient access to device information.

23-29: LGTM: Well-designed trait for PyTorch interop.

The TorchTensor trait provides a clean abstraction for tensor metadata with appropriate supertraits (Send + Sync + Debug). The use of u64 for data_ptr() is reasonable given the target platforms for CUDA.

lib/memory/Cargo.toml

lib/memory/src/actions.rs

lib/memory/src/arena.rs

lib/memory/src/disk.rs

lib/memory/src/nixl.rs

lib/memory/src/pinned.rs

lib/memory/src/system.rs

lib/memory/src/tests.rs

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Ryan Olson <ryanolson@users.noreply.github.com>

Signed-off-by: Ryan Olson <rolson@nvidia.com>

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Ryan Olson <ryanolson@users.noreply.github.com>

ryanolson · 2025-10-31T06:02:54Z

@coderabbitai - full review

coderabbitai · 2025-10-31T06:03:19Z

@ryanolson: I'll perform a full review of the PR.

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 6

♻️ Duplicate comments (1)

lib/memory/src/actions.rs (1)

173-189: Handle zero-sized element types before dividing

as_slice_typed_mut still divides by size_of::<T>() without guarding against ZSTs, so passing T = () (or any zero-sized type) will panic in safe code. Please mirror the immutable path: reject elem_size == 0 up front and reuse the cached value for both the division and modulo checks.

Apply this diff:

-        let ptr = bytes.as_mut_ptr() as *mut T;
-        let len = bytes.len() / std::mem::size_of::<T>();
-
-        if !(bytes.as_ptr() as usize).is_multiple_of(std::mem::align_of::<T>()) {
+        let ptr = bytes.as_mut_ptr() as *mut T;
+        let elem_size = std::mem::size_of::<T>();
+        if elem_size == 0 {
+            return Err(StorageError::Unsupported(
+                "zero-sized types are not supported".into(),
+            ));
+        }
+        let len = bytes.len() / elem_size;
+
+        if !(bytes.as_ptr() as usize).is_multiple_of(std::mem::align_of::<T>()) {
             return Err(StorageError::Unsupported(format!(
                 "memory not aligned for type (required alignment: {})",
                 std::mem::align_of::<T>()
             )));
         }

-        if bytes.len() % std::mem::size_of::<T>() != 0 {
+        if bytes.len() % elem_size != 0 {
             return Err(StorageError::Unsupported(format!(
                 "size {} is not a multiple of type size {}",
                 bytes.len(),
-                std::mem::size_of::<T>()
+                elem_size
             )));
         }

🧹 Nitpick comments (2)

lib/memory/src/pinned.rs (1)

13-25: Consider exposing device selection to PinnedStorage::new.

The cuda_context function supports arbitrary device IDs, but PinnedStorage::new (line 51) hardcodes device 0. If multi-device pinned allocations are expected, consider adding a device_id parameter to PinnedStorage::new to leverage this flexibility.

lib/memory/src/lib.rs (1)

61-63: Remove or implement CUDA feature gating.

Line 61 has a commented #[cfg(feature = "cuda")], but the Cuda variant (lines 62-63) is not actually feature-gated. This appears in multiple places (lines 61, 76, 80). Either remove the comments or implement consistent feature gating to make the CUDA dependency optional.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 22d910a and 876a4aa.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (18)

Cargo.toml (3 hunks)
lib/config/Cargo.toml (1 hunks)
lib/config/src/lib.rs (1 hunks)
lib/memory/Cargo.toml (1 hunks)
lib/memory/src/actions.rs (1 hunks)
lib/memory/src/arena.rs (1 hunks)
lib/memory/src/device.rs (1 hunks)
lib/memory/src/disk.rs (1 hunks)
lib/memory/src/lib.rs (1 hunks)
lib/memory/src/nixl.rs (1 hunks)
lib/memory/src/nixl/agent.rs (1 hunks)
lib/memory/src/nixl/config.rs (1 hunks)
lib/memory/src/offset.rs (1 hunks)
lib/memory/src/pinned.rs (1 hunks)
lib/memory/src/prelude.rs (1 hunks)
lib/memory/src/system.rs (1 hunks)
lib/memory/src/tests.rs (1 hunks)
lib/memory/src/torch.rs (1 hunks)

🧰 Additional context used

🧠 Learnings (11)

📚 Learning: 2025-10-17T22:57:12.526Z

Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 3700
File: lib/llm/src/block_manager.rs:19-19
Timestamp: 2025-10-17T22:57:12.526Z
Learning: The dynamo library (ai-dynamo/dynamo repository) is currently developed to run on Linux only, so Linux-specific code and syscalls do not require platform guards.

Applied to files:

Cargo.toml
lib/config/Cargo.toml
lib/memory/Cargo.toml

📚 Learning: 2025-08-18T20:51:51.324Z

Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 2465
File: lib/runtime/src/pipeline/network/egress/push_router.rs:0-0
Timestamp: 2025-08-18T20:51:51.324Z
Learning: The runtime crate cannot depend on the llm crate due to architectural dependency constraints, preventing imports from lib/llm into lib/runtime.

Applied to files:

Cargo.toml

📚 Learning: 2025-08-15T02:01:01.238Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2453
File: deploy/dynamo_check.py:739-741
Timestamp: 2025-08-15T02:01:01.238Z
Learning: In the dynamo project, missing cargo should be treated as a fatal error in dynamo_check.py because developers need cargo to build the Rust components. The tool is designed to validate the complete development environment, not just import functionality.

Applied to files:

Cargo.toml
lib/memory/Cargo.toml

📚 Learning: 2025-10-17T22:57:18.773Z

Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 3700
File: lib/llm/src/block_manager/numa_allocator.rs:4-6
Timestamp: 2025-10-17T22:57:18.773Z
Learning: The Dynamo library (lib/llm) is developed to run on Linux only, so platform-specific cfg guards for Linux-only features are not required.

Applied to files:

Cargo.toml

📚 Learning: 2025-09-22T18:09:23.513Z

Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3165
File: components/backends/sglang/src/dynamo/sglang/args.py:201-202
Timestamp: 2025-09-22T18:09:23.513Z
Learning: KrishnanPrash suggested adding early validation for custom Jinja template paths in the Rust layer (lib/bindings/python/rust/lib.rs) to benefit both vLLM and SGLang workflows, using PathBuf::from() and path.exists() checks with appropriate PyFileNotFoundError handling.

Applied to files:

lib/memory/src/disk.rs

📚 Learning: 2025-09-11T03:24:47.820Z

Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3004
File: lib/runtime/src/pipeline/network/ingress/push_handler.rs:271-277
Timestamp: 2025-09-11T03:24:47.820Z
Learning: In lib/runtime/src/pipeline/network/ingress/push_handler.rs, the maintainer prefers to keep the existing error comparison logic using format!("{:?}", err) == STREAM_ERR_MSG unchanged until proper error types are implemented, even though it has technical debt. Avoid suggesting changes to working legacy code that will be refactored later.

Applied to files:

lib/memory/src/disk.rs
lib/memory/src/actions.rs

📚 Learning: 2025-09-22T18:09:23.513Z

Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3165
File: components/backends/sglang/src/dynamo/sglang/args.py:201-202
Timestamp: 2025-09-22T18:09:23.513Z
Learning: Both vLLM and SGLang implementations of --custom-jinja-template pass the path directly to register_llm() without any validation. KrishnanPrash suggested implementing early validation in the Rust layer (lib/bindings/python/rust/lib.rs) using PathBuf::from() and path.exists() checks with PyFileNotFoundError for consistent error handling across both backends.

Applied to files:

lib/memory/src/disk.rs

📚 Learning: 2025-09-19T18:21:03.693Z

Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 2746
File: lib/llm/src/grpc/service/tensor.rs:269-447
Timestamp: 2025-09-19T18:21:03.693Z
Learning: In lib/llm/src/grpc/service/tensor.rs, GuanLuo prefers using unsafe Vec::from_raw_parts operations to avoid data copying for performance reasons when converting raw tensor data from Vec<u8> to typed vectors like Vec<u32>, Vec<f32>, etc., even when it involves safety risks.

Applied to files:

lib/memory/src/torch.rs

📚 Learning: 2025-09-18T21:49:28.906Z

Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 2989
File: lib/llm/src/block_manager/block/transfer/cuda.rs:416-420
Timestamp: 2025-09-18T21:49:28.906Z
Learning: Pinned memory (page-locked host memory) being accessible by GPU via DMA doesn't change the fact that host and device pointers exist in separate virtual address spaces, so overlap checks between host and device pointers are still invalid regardless of whether the host memory is pinned.

Applied to files:

lib/memory/src/pinned.rs

📚 Learning: 2025-08-05T22:51:59.230Z

Learnt from: dmitry-tokarev-nv
Repo: ai-dynamo/dynamo PR: 2300
File: pyproject.toml:64-66
Timestamp: 2025-08-05T22:51:59.230Z
Learning: The ai-dynamo/dynamo project does not ship ARM64 wheels, so platform markers to restrict dependencies to x86_64 are not needed in pyproject.toml dependencies.

Applied to files:

lib/config/Cargo.toml

📚 Learning: 2025-08-30T20:43:10.091Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project, devcontainer.json files use templated container names (like "dynamo-vllm-devcontainer") that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.

Applied to files:

lib/config/Cargo.toml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)

GitHub Check: sglang (amd64)
GitHub Check: vllm (arm64)
GitHub Check: trtllm (arm64)
GitHub Check: trtllm (amd64)
GitHub Check: operator (arm64)
GitHub Check: operator (amd64)
GitHub Check: sglang (arm64)
GitHub Check: vllm (amd64)
GitHub Check: Build and Test - dynamo
GitHub Check: tests (lib/runtime/examples)
GitHub Check: tests (launch/dynamo-run)
GitHub Check: tests (lib/bindings/python)
GitHub Check: tests (.)
GitHub Check: Mirror Repository to GitLab

🔇 Additional comments (3)

lib/memory/src/pinned.rs (2)

117-119: Clarify nixl_descriptor design intent.

nixl_descriptor() returns None, but NixlCompatible is implemented below (lines 123-128), suggesting NIXL registration is supported. Is this intentional? If pinned memory requires explicit NIXL registration via the NixlCompatible interface rather than automatic descriptor creation, consider adding a comment explaining this design choice.

130-146: LGTM: Overflow protection properly implemented.

The previous overflow concern has been correctly addressed using checked_add (lines 132-134). The bounds checking and unsafe operations are now properly protected.

lib/memory/src/lib.rs (1)

108-151: LGTM: Clean type erasure pattern.

The Buffer wrapper around Arc<dyn MemoryDescription> provides an excellent type-erased abstraction. The Deref implementation enables transparent access, and the create_buffer helper offers good ergonomics for wrapping concrete storage types.

lib/memory/src/disk.rs

lib/memory/src/lib.rs

lib/memory/src/nixl.rs

lib/memory/src/nixl/config.rs

lib/memory/src/pinned.rs

lib/memory/src/tests.rs

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Ryan Olson <ryanolson@users.noreply.github.com>

Signed-off-by: Ryan Olson <rolson@nvidia.com>

kvbm v2 memory

5c90a41

Signed-off-by: Ryan Olson <rolson@nvidia.com>

ryanolson requested a review from a team as a code owner October 30, 2025 22:47

pull-request-size bot added the size/XXL label Oct 30, 2025

github-actions bot added the feat label Oct 30, 2025

coderabbitai bot reviewed Oct 30, 2025

View reviewed changes

Update lib/memory/src/disk.rs

ef11ec0

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Ryan Olson <ryanolson@users.noreply.github.com>

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 23:28 Inactive

Update lib/memory/src/disk.rs

5feaad6

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Ryan Olson <ryanolson@users.noreply.github.com>

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 23:29 Inactive

Update lib/memory/src/system.rs

3c66f02

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Ryan Olson <ryanolson@users.noreply.github.com>

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 23:38 Inactive

Update lib/memory/src/pinned.rs

118c956

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Ryan Olson <ryanolson@users.noreply.github.com>

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 23:39 Inactive

Update lib/memory/src/actions.rs

8c3a8c9

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Ryan Olson <ryanolson@users.noreply.github.com>

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 23:47 Inactive

ryanolson added 2 commits October 30, 2025 23:47

copyright headers

ece80d9

Signed-off-by: Ryan Olson <rolson@nvidia.com>

fixed arena nixl descriptor offset

e6fa60b

Signed-off-by: Ryan Olson <rolson@nvidia.com>

copy-pr-bot bot temporarily deployed to GITLAB October 31, 2025 00:07 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 31, 2025 05:55 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 31, 2025 05:56 Inactive

adding nixl config and backends

cb7d915

Signed-off-by: Ryan Olson <rolson@nvidia.com>

copy-pr-bot bot temporarily deployed to GITLAB October 31, 2025 05:56 Inactive

copyright headers

992af68

Signed-off-by: Ryan Olson <rolson@nvidia.com>

copy-pr-bot bot temporarily deployed to GITLAB October 31, 2025 05:58 Inactive

Update lib/memory/src/nixl.rs

876a4aa

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Ryan Olson <ryanolson@users.noreply.github.com>

copy-pr-bot bot temporarily deployed to GITLAB October 31, 2025 06:01 Inactive

ryanolson requested a review from kthui October 31, 2025 06:03

ryanolson requested review from nicolasnoble, nnshah1 and whoisj October 31, 2025 06:03

copy-pr-bot bot temporarily deployed to GITLAB October 31, 2025 06:07 Inactive

coderabbitai bot reviewed Oct 31, 2025

View reviewed changes

Update lib/memory/src/nixl.rs

73a6d49

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Ryan Olson <ryanolson@users.noreply.github.com>

copy-pr-bot bot temporarily deployed to GITLAB October 31, 2025 06:19 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 31, 2025 06:20 Inactive

fixes from @coderabbitai

33a1fb6

Signed-off-by: Ryan Olson <rolson@nvidia.com>

copy-pr-bot bot temporarily deployed to GITLAB October 31, 2025 06:36 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 31, 2025 06:37 Inactive

typeo

fcf523d

Signed-off-by: Ryan Olson <rolson@nvidia.com>

copy-pr-bot bot temporarily deployed to GITLAB October 31, 2025 06:38 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 31, 2025 06:42 Inactive

ryanolson requested review from grahamking, jthomson04, nv-tusharma and paulhendricks October 31, 2025 20:43

feat: kvbm v2 memory crate #4012

Are you sure you want to change the base?

feat: kvbm v2 memory crate #4012

Uh oh!

Conversation

ryanolson commented Oct 30, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ryanolson commented Oct 31, 2025

Uh oh!

coderabbitai bot commented Oct 31, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ryanolson commented Oct 30, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 30, 2025 •

edited

Loading