feat: enable HTTP completion endpoint to accept arrays of prompts and generate multiple completions per prompt #3953

zhongdaor-nv · 2025-10-29T07:11:41Z

Overview:

This PR enables the completion endpoint to accept arrays of prompts and generate multiple completions per prompt.

Details:

Add utility functions to handle prompt arrays (get_prompt_batch_size, extract_single_prompt)
Implement batch processing in HTTP handler with proper choice index remapping
Add validation for total choices (batch_size × n ≤ 128)
Generate unique request_id for each prompt to avoid conflicts
Add comprehensive tests for batch prompts and n parameter combinations
Maintain backward compatibility with single prompt requests

Where should the reviewer start?

lib/llm/src/protocols/openai/completions.rs - contains the new validation logic and utility functions
lib/llm/src/http/service/openai.rs - contains the batch processing implementation with choice index remapping

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features

Added support for batch processing of multiple prompts in a single completion request.
Introduced validation to enforce a maximum limit of 128 total choices per batch operation.

Bug Fixes

Fixed handling of multi-element prompt arrays; previously returned server errors, now process successfully with valid responses.

Enable completion endpoint to accept arrays of prompts and generate n completions per prompt, matching vLLM behavior. - Add utility functions to handle prompt arrays (get_prompt_batch_size, extract_single_prompt) - Implement batch processing in HTTP handler with proper choice index remapping - Add validation for total choices (batch_size × n ≤ 128) - Generate unique request_id for each prompt to avoid conflicts - Add comprehensive tests for batch prompts and n parameter combinations - Maintain backward compatibility with single prompt requests Choice index formula matches vLLM: final_index = prompt_idx * n + choice_idx Example: 3 prompts with n=2 yields indices 0,1 (prompt0), 2,3 (prompt1), 4,5 (prompt2)

Signed-off-by: zhongdaor <zhongdaor@nvidia.com>

…eature-parity-testingllama-33

Signed-off-by: zhongdaor <zhongdaor@nvidia.com>

coderabbitai · 2025-10-30T06:07:52Z

Walkthrough

The pull request implements batch-aware handling for LLM completions by introducing detection logic that routes single-prompt and multi-prompt requests through dedicated code paths. Batch utilities extract and validate prompts, enforce a total choices limit, and support per-prompt choice remapping with streaming and annotation handling.

Changes

Cohort / File(s)	Summary
HTTP Service Batch Routing `lib/llm/src/http/service/openai.rs`	Splits completions handling into `completions_batch` (multi-prompt) and `completions_single` (single-prompt) functions, with the main `completions` function delegating based on batch size detection. Handles per-prompt request setup, choice index remapping, stream merging, and metrics collection.
Protocol Layer Batch Utilities `lib/llm/src/protocols/openai/completions.rs`	Adds `get_prompt_batch_size` and `extract_single_prompt` utilities for batch operations. Extends `NvCreateCompletionRequest` validation to check total choices via batch size. Implements `raw_prompt()` on `NvExtProvider` trait to expose prompt data when enabled.
Validation Enforcement `lib/llm/src/protocols/openai/validate.rs`	Introduces `MAX_TOTAL_CHOICES` constant (128) and `validate_total_choices` function to enforce upper bounds on batch × n product.
Test Coverage `lib/llm/tests/openai_completions.rs`, `tests/frontend/test_completion_mocker_engine.py`	Adds batch prompt utility tests covering batch sizing, per-prompt extraction, validation limits, and multi-prompt array handling. Updates Python test to expect success (200) instead of error (500) for multi-prompt arrays.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Batch routing logic in service layer with stream merging and choice remapping requires careful verification of index calculations and state handling
Cross-field validation integration in NvCreateCompletionRequest needs confirmation of interaction with existing validators
Test expectation shift from error to success for multi-prompt arrays should be verified against any related constraints or documentation
New trait method raw_prompt() on NvExtProvider should be reviewed for implementation consistency across all implementors

Poem

🐰 Hop, skip, and batch we go,
Multiple prompts in a row,
Choices remapped, streams all flowing,
Validations keeping totals glowing,
Completions batch—a hop-timal show! 🌟

Pre-merge checks

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	The PR description follows the required template structure with clear sections for Overview, Details, Where should the reviewer start, and Related Issues. The Overview section succinctly explains that the PR enables array prompt support with multiple completions per prompt. The Details section lists specific implementation changes including batch utility functions, validation logic, and test coverage. The "Where should the reviewer start" section appropriately directs reviewers to the key files. The Related Issues section uses a placeholder (#xxx), which is a minor gap but does not significantly detract from an otherwise complete description that provides sufficient context for review.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title Check	✅ Passed	The pull request title "feat: enable HTTP completion endpoint to accept arrays of prompts and generate multiple completions per prompt" directly aligns with the main objective and changes in the changeset. The changes across multiple files (HTTP service, validation logic, utility functions, and tests) all implement exactly what the title describes: batch-aware prompt handling that allows arrays of prompts and per-prompt completion generation. The title is specific enough to convey the primary feature being added, concise as a single sentence, and avoids vague terminology. The changeset is entirely focused on enabling this batch completion capability while preserving backward compatibility for single-prompt requests.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: zhongdaor <zhongdaor@nvidia.com>

zhongdaor-nv added 2 commits October 28, 2025 23:52

move validation to completions.rs

48c44c0

Signed-off-by: zhongdaor <zhongdaor@nvidia.com>

pull-request-size bot added the size/L label Oct 29, 2025

Merge branch 'main' into zhongdaor/dis-871-5581615-p0llm-nim-dynamo-f…

6ff7480

…eature-parity-testingllama-33

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 05:48 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 05:49 Inactive

fix test

cd1473b

Signed-off-by: zhongdaor <zhongdaor@nvidia.com>

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 05:56 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 05:57 Inactive

pre-commit

0029d23

Signed-off-by: zhongdaor <zhongdaor@nvidia.com>

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 05:59 Inactive

zhongdaor-nv marked this pull request as ready for review October 30, 2025 06:01

zhongdaor-nv requested review from a team as code owners October 30, 2025 06:01

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 06:02 Inactive

zhongdaor-nv changed the title ~~Zhongdaor/dis 871 5581615 p0llm nim dynamo feature parity testingllama 33~~ feat: enable HTTP completion endpoint to accept arrays of prompts and generate multiple completions per prompt Oct 30, 2025

github-actions bot added the feat label Oct 30, 2025

zhongdaor-nv requested review from rmccorm4 and ryan-lempka October 30, 2025 06:03

more test

a62eb95

Signed-off-by: zhongdaor <zhongdaor@nvidia.com>

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 06:24 Inactive

pre-commit

22f34a9

Signed-off-by: zhongdaor <zhongdaor@nvidia.com>

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 06:44 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 06:45 Inactive

rmccorm4 requested review from KrishnanPrash and ayushag-nv October 31, 2025 22:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: enable HTTP completion endpoint to accept arrays of prompts and generate multiple completions per prompt #3953

feat: enable HTTP completion endpoint to accept arrays of prompts and generate multiple completions per prompt #3953

zhongdaor-nv commented Oct 29, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 30, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: enable HTTP completion endpoint to accept arrays of prompts and generate multiple completions per prompt #3953

Are you sure you want to change the base?

feat: enable HTTP completion endpoint to accept arrays of prompts and generate multiple completions per prompt #3953

Conversation

zhongdaor-nv commented Oct 29, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

New Features

Bug Fixes

Uh oh!

coderabbitai bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhongdaor-nv commented Oct 29, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 30, 2025 •

edited

Loading