Skip to content

Conversation

@zhongdaor-nv
Copy link
Contributor

@zhongdaor-nv zhongdaor-nv commented Oct 29, 2025

Overview:

This PR enables the completion endpoint to accept arrays of prompts and generate multiple completions per prompt.

Details:

  • Add utility functions to handle prompt arrays (get_prompt_batch_size, extract_single_prompt)
  • Implement batch processing in HTTP handler with proper choice index remapping
  • Add validation for total choices (batch_size × n ≤ 128)
  • Generate unique request_id for each prompt to avoid conflicts
  • Add comprehensive tests for batch prompts and n parameter combinations
  • Maintain backward compatibility with single prompt requests

Where should the reviewer start?

lib/llm/src/protocols/openai/completions.rs - contains the new validation logic and utility functions
lib/llm/src/http/service/openai.rs - contains the batch processing implementation with choice index remapping

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

New Features

  • Added support for batch processing of multiple prompts in a single completion request.
  • Introduced validation to enforce a maximum limit of 128 total choices per batch operation.

Bug Fixes

  • Fixed handling of multi-element prompt arrays; previously returned server errors, now process successfully with valid responses.

Enable completion endpoint to accept arrays of prompts and generate
n completions per prompt, matching vLLM behavior.

- Add utility functions to handle prompt arrays (get_prompt_batch_size, extract_single_prompt)
- Implement batch processing in HTTP handler with proper choice index remapping
- Add validation for total choices (batch_size × n ≤ 128)
- Generate unique request_id for each prompt to avoid conflicts
- Add comprehensive tests for batch prompts and n parameter combinations
- Maintain backward compatibility with single prompt requests

Choice index formula matches vLLM: final_index = prompt_idx * n + choice_idx
Example: 3 prompts with n=2 yields indices 0,1 (prompt0), 2,3 (prompt1), 4,5 (prompt2)
Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
@zhongdaor-nv zhongdaor-nv marked this pull request as ready for review October 30, 2025 06:01
@zhongdaor-nv zhongdaor-nv requested review from a team as code owners October 30, 2025 06:01
@zhongdaor-nv zhongdaor-nv changed the title Zhongdaor/dis 871 5581615 p0llm nim dynamo feature parity testingllama 33 feat: enable HTTP completion endpoint to accept arrays of prompts and generate multiple completions per prompt Oct 30, 2025
@github-actions github-actions bot added the feat label Oct 30, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 30, 2025

Walkthrough

The pull request implements batch-aware handling for LLM completions by introducing detection logic that routes single-prompt and multi-prompt requests through dedicated code paths. Batch utilities extract and validate prompts, enforce a total choices limit, and support per-prompt choice remapping with streaming and annotation handling.

Changes

Cohort / File(s) Summary
HTTP Service Batch Routing
lib/llm/src/http/service/openai.rs
Splits completions handling into completions_batch (multi-prompt) and completions_single (single-prompt) functions, with the main completions function delegating based on batch size detection. Handles per-prompt request setup, choice index remapping, stream merging, and metrics collection.
Protocol Layer Batch Utilities
lib/llm/src/protocols/openai/completions.rs
Adds get_prompt_batch_size and extract_single_prompt utilities for batch operations. Extends NvCreateCompletionRequest validation to check total choices via batch size. Implements raw_prompt() on NvExtProvider trait to expose prompt data when enabled.
Validation Enforcement
lib/llm/src/protocols/openai/validate.rs
Introduces MAX_TOTAL_CHOICES constant (128) and validate_total_choices function to enforce upper bounds on batch × n product.
Test Coverage
lib/llm/tests/openai_completions.rs, tests/frontend/test_completion_mocker_engine.py
Adds batch prompt utility tests covering batch sizing, per-prompt extraction, validation limits, and multi-prompt array handling. Updates Python test to expect success (200) instead of error (500) for multi-prompt arrays.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Batch routing logic in service layer with stream merging and choice remapping requires careful verification of index calculations and state handling
  • Cross-field validation integration in NvCreateCompletionRequest needs confirmation of interaction with existing validators
  • Test expectation shift from error to success for multi-prompt arrays should be verified against any related constraints or documentation
  • New trait method raw_prompt() on NvExtProvider should be reviewed for implementation consistency across all implementors

Poem

🐰 Hop, skip, and batch we go,
Multiple prompts in a row,
Choices remapped, streams all flowing,
Validations keeping totals glowing,
Completions batch—a hop-timal show! 🌟

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed The PR description follows the required template structure with clear sections for Overview, Details, Where should the reviewer start, and Related Issues. The Overview section succinctly explains that the PR enables array prompt support with multiple completions per prompt. The Details section lists specific implementation changes including batch utility functions, validation logic, and test coverage. The "Where should the reviewer start" section appropriately directs reviewers to the key files. The Related Issues section uses a placeholder (#xxx), which is a minor gap but does not significantly detract from an otherwise complete description that provides sufficient context for review.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title Check ✅ Passed The pull request title "feat: enable HTTP completion endpoint to accept arrays of prompts and generate multiple completions per prompt" directly aligns with the main objective and changes in the changeset. The changes across multiple files (HTTP service, validation logic, utility functions, and tests) all implement exactly what the title describes: batch-aware prompt handling that allows arrays of prompts and per-prompt completion generation. The title is specific enough to convey the primary feature being added, concise as a single sentence, and avoids vague terminology. The changeset is entirely focused on enabling this batch completion capability while preserving backward compatibility for single-prompt requests.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants