-
Couldn't load subscription status.
- Fork 1.1k
Description
Summary
Depthwise backward-weights on AArch64 SVE-256 produces incorrect results for strided, padded cases (e.g., C=24, Kh=3, Sh=2, Ph=1). PyTorch test TestConvolutionNN.test_Conv2d_OneDNN fails, while benchdnn does not flag the issue. A regression (gtest) comparing a legacy blocked-oh path vs a new per-row path exposes the defect.
Fix merged in PR #4081.
cc. @Sqvid
Version
- oneDNN v3.9.1 (commit
80a3a8e745d2f0186e674b0af9332fd6e074c94f) - Also reproduced with oneDNN v3.7.1
Environment
- CPU: AArch64 SVE (256-bit) (Neoverse V1)
- oneDNN runtime: OpenMP,
nthr=32 - PyTorch (arm/aarch64 build) using oneDNN backend
- Python 3.10
Steps to reproduce
1) PyTorch unit test (fails)
# ONEDD_VERBOSE=all to capture impl & commit
export ONEDD_VERBOSE=all
python pytorch/test/nn/test_convolution.py TestConvolutionNN.test_Conv2d_OneDNNTypical verbose snippet at failure:
onednn_verbose,v1,info,oneDNN v3.9.1 (commit 80a3a8e...)
onednn_verbose,v1,primitive,exec,cpu,convolution,jit_dw:sve_256,forward_training,...
onednn_verbose,v1,primitive,exec,cpu,convolution,jit_dw:sve_256,backward_weights,...
g24mb1_ic24oc24_ih6oh3kh3sh2ph1_iw6ow3kw3sw2pw1
2) Detailed C++ gtest reproduction steps
Start from oneDNN v3.9.1 (commit 80a3a8e745d2f0186e674b0af9332fd6e074c94f) on AArch64 SVE-256 (Neoverse V1).
Prerequisites
- Replace
tests/gtests/test_convolution_backward_weights_dw_compare.cppandsrc/cpu/aarch64/jit_uni_dw_convolution.cppwith the supplied versions (attachments) - File:
tests/gtests/test_convolution_backward_weights_dw_compare.cpp(attachment) - Compares legacy vs new AArch64 DW BWD_W (env-switchable):
ONEDNN_AARCH64_DW_BWDW_USE_OLD=1→ legacy path- unset → new per-row path
- Descriptor used:
g24mb1_ic24ih8iw8_oc24oh4ow4_kh3kw3_sh2sw2_ph1pw1
Build configuration
# Configure with tests enabled
cmake -S . -B build -DDNNL_BUILD_TESTS=ON
# Rebuild so both the kernel and gtest pick up changes
cmake --build build --target all -- -j$(nproc)Run regression test
cd build && ctest -V -R test_convolution_backward_weights_dw_compareOptional: benchdnn verification
ONEDNN_VERBOSE=all ./build/tests/benchdnn/benchdnn --conv --dir=BWD_W --fast-ref=false g24mb1_ic24ih8iw8_oc24oh4ow4_kh3kw3_sh2sw2_ph1pw1Logs & diff evidence
- Each run writes
depthwise_bwdw_compare.lognext to the binary (build/tests/gtests/depthwise_bwdw_compare.log) - Header shows both impl IDs, benchdnn descriptor, and replay command (see
tests/gtests/test_convolution_backward_weights_dw_compare.cpp:186-201)
Observed behavior
- PyTorch test failure:
AssertionError: Tensor-likes are not close! Mismatched elements: 72 / 216 (33.3%) Greatest absolute difference: 3.0 - OneDNN chooses jit_dw:sve_256 for both FWD and BWD_W on the above config.
- gtest A/B shows legacy path accumulates extra bottom-row contributions on strided, padded cases (duplicate accumulation at tile boundaries). New per-row path matches a naïve reference.
- benchdnn did not reproduce the mismatch (even with
--fast-ref=falseand buffer replay).
Workaround validated: removing the AArch64 jit BWD_W (SVE-256) path from the CPU convolution list avoids the failure (fallback path passes like it already does for Neoverse N1 & Neoverse V2).
Expected behavior
Backward-weights results should match the naïve reference (and mkldnn-disabled PyTorch path) with zero elementwise diffs for these configs.
Additional notes
- After applying the fix from PR cpu: aarch64: top padding stride inclusion, jit 256 depthwise bw kernel #4081, PyTorch unit tests and nightly suite pass; the attached gtest shows new path == reference.
- Toggle: ONEDNN_AARCH64_DW_BWDW_USE_OLD=1 (legacy) vs unset (new).
Attachments
-
src/cpu/aarch64/jit_uni_dw_convolution.cppkernel version with legacy/new path toggle -
tests/gtests/test_convolution_backward_weights_dw_compare.cpp(gtest repro; includes env flag to toggle old/new)