[Graph Optimization] SOT+CUDAGraph support ERNIE4.5T VL 28B / 424B #4645

DrRyanHuang · 2025-10-29T05:32:47Z

Motivation

SOT+CUDAGraph 跑通 ERNIE4.5T VL 28B / 424B
前置PR: #4610

Modifications

fastdeploy/model_executor/models/ernie4_5_vl/ernie4_5_vl_moe.py
添加 SOT+CUDAGraph 运行 ERNIE4.5T VL 28B 的单测

Usage or Command

MODEL=/workspace/EB_MODELS/ERNIE-4.5-VL-28B-A3B-Paddle
MODEL=/workspace/EB_MODELS/ERNIE-4.5-VL-424B-A47B-Paddle
rm -rf log/*

export FLAGS_cuda_graph_blacklist="custom_op.static_op_append_attention_with_output_"
export CUDA_VISIBLE_DEVICES=0,1,2,3
export PORT=39905

python -m fastdeploy.entrypoints.openai.api_server \
  --model $MODEL \
  --metrics-port 39717 \
  --port $PORT \
  --engine-worker-queue-port 39719 \
  --tensor-parallel-size 4 \
  --max-model-len 32768 \
  --max-num-seqs 128 \
  --quantization wint4 \
  --enable-mm \
  --graph-optimization-config '{"graph_opt_level": 1, "use_cudagraph": true, "full_cuda_graph": false}'

Accuracy Tests

NULL

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-10-29T05:33:06Z

Thanks for your contribution!

EmmonsCurse · 2025-10-29T06:23:45Z

You may need to update your branch to resolve errors.

gongshaotian

LGTM

gongshaotian · 2025-10-29T06:32:13Z

fastdeploy/model_executor/models/ernie4_5_vl/ernie4_5_vl_moe.py

        if self.num_shared_experts > 0:
            shared_experts_out = self.shared_experts(hidden_states)
-        hidden_states, vl_moe_meta.text_input, vl_moe_meta.image_input = text_image_gather_scatter(
+        hidden_states, text_input, image_input = text_image_gather_scatter(


这里和xpu那个修改会冲突吗

XPU 的 #4636 合入后，这个再合入就没问题

gongshaotian

LGTM

45TVL support sot+CUDAGraph

95d3c2d

DrRyanHuang requested review from EmmonsCurse, SigureMo, gongshaotian, jiangjiajun and zyfncg October 29, 2025 05:32

gongshaotian previously approved these changes Oct 29, 2025

View reviewed changes

gongshaotian reviewed Oct 29, 2025

View reviewed changes

SigureMo approved these changes Oct 29, 2025

View reviewed changes

mv unitest from ce_deploy 2 e2e

ce2aa67

DrRyanHuang dismissed gongshaotian’s stale review via ce2aa67 October 29, 2025 07:38

DrRyanHuang added 5 commits October 29, 2025 15:38

Merge branch 'develop' into 45t_vl_support_sot_cudagraph

1c4a9e1

add test_EB_VL_Lite_sot_serving

b2f6e70

rm useless line

5e19d19

add openai_client

15d7d91

fix unitest && reduce computing resources

ff6d671

EmmonsCurse approved these changes Oct 31, 2025

View reviewed changes

gongshaotian reviewed Oct 31, 2025

View reviewed changes

gongshaotian approved these changes Oct 31, 2025

View reviewed changes

gongshaotian merged commit 28de91b into PaddlePaddle:develop Oct 31, 2025
24 of 31 checks passed

gongshaotian deleted the 45t_vl_support_sot_cudagraph branch October 31, 2025 03:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Graph Optimization] SOT+CUDAGraph support ERNIE4.5T VL 28B / 424B #4645

[Graph Optimization] SOT+CUDAGraph support ERNIE4.5T VL 28B / 424B #4645

Uh oh!

DrRyanHuang commented Oct 29, 2025

Uh oh!

paddle-bot bot commented Oct 29, 2025

Uh oh!

EmmonsCurse commented Oct 29, 2025

Uh oh!

gongshaotian left a comment

Uh oh!

gongshaotian Oct 29, 2025

Uh oh!

DrRyanHuang Oct 29, 2025

Uh oh!

gongshaotian left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Graph Optimization] SOT+CUDAGraph support ERNIE4.5T VL 28B / 424B #4645

[Graph Optimization] SOT+CUDAGraph support ERNIE4.5T VL 28B / 424B #4645

Uh oh!

Conversation

DrRyanHuang commented Oct 29, 2025

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Oct 29, 2025

Uh oh!

EmmonsCurse commented Oct 29, 2025

Uh oh!

gongshaotian left a comment

Choose a reason for hiding this comment

Uh oh!

gongshaotian Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

DrRyanHuang Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

gongshaotian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants