[CPU] Improve INT8 SDPA template #3230

Xia-Weiwen · 2025-10-23T02:52:27Z

It brings about 1% E2E improvement when running int8 VIT on 4 cores.

Benchmark results of int8 ViT

Before: throughput 930.91
After: throughput 939.79 (0.95% improve)

(Tested on an Intel(R) Xeon(R) 6972P machine with 4 cores/instance, totally 15 instances on one socket)

pytorch-bot · 2025-10-23T02:52:30Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3230

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 64eae18 with merge base f3fc5e7 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Valentine233 · 2025-10-23T05:07:55Z

torchao/prototype/inductor/codegen/cpp_int8_sdpa_template.py

-        auto tmp2 = tmp1.round();
-        auto tmp3 = tmp2 + vec_beta1;
+        auto tmp1 = at::vec::fmadd(tmp0, vec_sum_scale, vec_beta1);
+        auto tmp3 = tmp1.round();


Could we also apply the optimization to the below masked vectorization part?

Updated. Thanks.

Valentine233 · 2025-10-23T05:08:10Z

torchao/prototype/inductor/codegen/cpp_int8_sdpa_template.py

-      auto tmp6 = tmp5.round();
-      auto tmp7 = tmp6 + vec_beta2;
+      auto tmp5 = at::vec::fmadd(tmp4, vec_alpha, vec_beta2);
+      auto tmp7 = tmp5.round();


Updated. Thanks.

Valentine233 · 2025-10-23T05:08:24Z

torchao/prototype/inductor/codegen/cpp_int8_sdpa_template.py

-      auto tmp6 = tmp5.round();
-      auto tmp7 = tmp6 + vec_beta2;
+      auto tmp5 = at::vec::fmadd(tmp4, vec_alpha, vec_beta2);
+      auto tmp7 = tmp5.round();


Updated. Thanks.

Xia-Weiwen · 2025-10-27T05:21:12Z

CC @mingfeima for review. Thanks.

jerryzh168 · 2025-10-28T20:16:50Z

LG, can you include any benchmarking numbers in the summary?

Xia-Weiwen · 2025-10-29T01:30:31Z

LG, can you include any benchmarking numbers in the summary?

Thanks. I have added numbers.

[CPU] Improve INT8 SDPA template

6c7a03a

Xia-Weiwen requested a review from Valentine233 October 23, 2025 02:52

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 23, 2025

Xia-Weiwen added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Oct 23, 2025

Valentine233 reviewed Oct 23, 2025

View reviewed changes

Update tail

64eae18

Xia-Weiwen requested a review from Valentine233 October 27, 2025 05:20

Xia-Weiwen marked this pull request as ready for review October 27, 2025 05:21

mingfeima approved these changes Oct 27, 2025

View reviewed changes

Valentine233 approved these changes Oct 27, 2025

View reviewed changes

Xia-Weiwen requested a review from jerryzh168 October 27, 2025 06:26

jerryzh168 approved these changes Oct 28, 2025

View reviewed changes

Xia-Weiwen merged commit 3577306 into pytorch:main Oct 29, 2025
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[CPU] Improve INT8 SDPA template #3230

[CPU] Improve INT8 SDPA template #3230

Xia-Weiwen commented Oct 23, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 23, 2025 •

edited

Loading

Uh oh!

Valentine233 Oct 23, 2025

Uh oh!

Xia-Weiwen Oct 23, 2025

Uh oh!

Valentine233 Oct 23, 2025

Uh oh!

Xia-Weiwen Oct 23, 2025

Uh oh!

Valentine233 Oct 23, 2025

Uh oh!

Xia-Weiwen Oct 23, 2025

Uh oh!

Xia-Weiwen commented Oct 27, 2025

Uh oh!

jerryzh168 commented Oct 28, 2025

Uh oh!

Xia-Weiwen commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[CPU] Improve INT8 SDPA template #3230

[CPU] Improve INT8 SDPA template #3230

Conversation

Xia-Weiwen commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3230

✅ No Failures

Uh oh!

Valentine233 Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Valentine233 Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Valentine233 Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen commented Oct 27, 2025

Uh oh!

jerryzh168 commented Oct 28, 2025

Uh oh!

Xia-Weiwen commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Xia-Weiwen commented Oct 23, 2025 •

edited

Loading

pytorch-bot bot commented Oct 23, 2025 •

edited

Loading