-
Couldn't load subscription status.
- Fork 1.1k
cpu: rv64: postops: add rvv postops binary op support via a primitive-based approach #4029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
cpu: rv64: postops: add rvv postops binary op support via a primitive-based approach #4029
Conversation
f82c9e0 to
57065a8
Compare
57065a8 to
f18c4bb
Compare
|
Hi @mgouicem , Could you please take a look at this PR when you have time? Thank you so much! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @zhangjian29.
In general relying on primitives and splitting post-ops init and execute makes sense.
However, this PR currently calls init and creates primitives related to post-ops (binary) within the main primitive execute calls (poolin), which is contrary to oneDNN external API principles (when calling execute, there should no more be any creation overhead).
72dfaee to
de2f273
Compare
|
Thank you so much for your feedback @mgouicem. Ready for review now. |
Description
This PR adds binary operation support for
rvv_postops.hppusing a primitive-based approach, following the implementation style ofacl_post_ops(aarch64) andjit_brgemm_post_ops(x64). The binary operations are implemented by integrating withrvv_binaryintroduced in #3899 (already merged).Refactoring
rvv_postopsBackground
The current
rvv_postops.hppimplementation uses anapplymethod during execution, which accepts avfloat32m1_tinput andvland returns avfloat32m1_toutput. However, this kind of fine-grained approach:Since only two implementations currently use this postop approach (merged: rvv_matmul, unmerged: #3948), making changes at this early stage requires minimal effort.
Motivation
To address these limitations, we propose a primitive-based implementation following the
acl_post_ops(AArch64) andjit_brgemm_post_ops(x64) patterns for better maintainability.Key Changes
initandexecutemethods (aligned withacl_postops.hpp)rvv_binary.hpp integration (futurervv_binarychanges won't affect this PR)post_ops_okandapplymethods unchanged in a fused way, making other implementation used original version still correct.relu(via original methods)binaryoperations (via new primitive methods):add,div,max,min,mul,sub,ge,gt,le,lt,eq,ne, and the ternaryselect.rvv_nchw_poolingFuture Plans
rvv_matmuland other related codepost_ops_okandapplymethods safelyeltwiseandsumoperationsChecklist
General
make testandmake test_benchdnn_*) pass locally for each commit?Performance improvements
We test the changes using the RISC-V GNU toolchain(14.2) and verify the functionality under the QEMU RISCV64 emulator with cmd:
All tests passed with proper dispatching confirmed (
reluandlinearare dispatched torefwhile others are toRISCV64GCVimplementation). Calls to the implementedrvv_nchw_poolingwith all kinds of postops can be traced by searching forRISCV64GCVin: