BWDW JIT 256 error reproducible through gtests, but not through benchdnn

# Summary
Depthwise **backward-weights** on AArch64 SVE-256 produces incorrect results for strided, padded cases (e.g., C=24, Kh=3, Sh=2, Ph=1). PyTorch test `TestConvolutionNN.test_Conv2d_OneDNN` fails, while benchdnn does not flag the issue. A regression (gtest) comparing a **legacy** blocked-oh path vs a new per-row path exposes the defect.  
Fix merged in **PR #4081**.

cc. @Sqvid 

# Version
- oneDNN **v3.9.1** (commit `80a3a8e745d2f0186e674b0af9332fd6e074c94f`)
- Also reproduced with oneDNN **v3.7.1**

# Environment
- CPU: **AArch64 SVE (256-bit)** (Neoverse V1)
- oneDNN runtime: **OpenMP**, `nthr=32`
- PyTorch (arm/aarch64 build) using oneDNN backend
- Python 3.10

# Steps to reproduce

## 1) PyTorch unit test (fails)
```bash
# ONEDD_VERBOSE=all to capture impl & commit
export ONEDD_VERBOSE=all
python pytorch/test/nn/test_convolution.py TestConvolutionNN.test_Conv2d_OneDNN
```
Typical verbose snippet at failure:
```
onednn_verbose,v1,info,oneDNN v3.9.1 (commit 80a3a8e...)
onednn_verbose,v1,primitive,exec,cpu,convolution,jit_dw:sve_256,forward_training,...
onednn_verbose,v1,primitive,exec,cpu,convolution,jit_dw:sve_256,backward_weights,...
g24mb1_ic24oc24_ih6oh3kh3sh2ph1_iw6ow3kw3sw2pw1
```

## 2) Detailed C++ gtest reproduction steps
Start from oneDNN **v3.9.1** (commit `80a3a8e745d2f0186e674b0af9332fd6e074c94f`) on AArch64 SVE-256 (Neoverse V1).

### Prerequisites
- Replace `tests/gtests/test_convolution_backward_weights_dw_compare.cpp` and `src/cpu/aarch64/jit_uni_dw_convolution.cpp` with the supplied versions (attachments)
- File: **`tests/gtests/test_convolution_backward_weights_dw_compare.cpp`** (attachment)  
- Compares **legacy** vs **new** AArch64 DW BWD_W (env-switchable):
  - `ONEDNN_AARCH64_DW_BWDW_USE_OLD=1` → legacy path  
  - unset → new per-row path  
- Descriptor used: `g24mb1_ic24ih8iw8_oc24oh4ow4_kh3kw3_sh2sw2_ph1pw1`

### Build configuration
```bash
# Configure with tests enabled
cmake -S . -B build -DDNNL_BUILD_TESTS=ON

# Rebuild so both the kernel and gtest pick up changes
cmake --build build --target all -- -j$(nproc)
```

### Run regression test
```bash
cd build && ctest -V -R test_convolution_backward_weights_dw_compare
```

### Optional: benchdnn verification
```bash
ONEDNN_VERBOSE=all ./build/tests/benchdnn/benchdnn --conv --dir=BWD_W --fast-ref=false g24mb1_ic24ih8iw8_oc24oh4ow4_kh3kw3_sh2sw2_ph1pw1
```

### Logs & diff evidence
- Each run writes `depthwise_bwdw_compare.log` next to the binary (`build/tests/gtests/depthwise_bwdw_compare.log`)
- Header shows both impl IDs, benchdnn descriptor, and replay command (see `tests/gtests/test_convolution_backward_weights_dw_compare.cpp:186-201`)

# Observed behavior
- PyTorch test failure:
  ```
  AssertionError: Tensor-likes are not close!
  Mismatched elements: 72 / 216 (33.3%)
  Greatest absolute difference: 3.0
  ```
- OneDNN chooses **jit_dw:sve_256** for both FWD and BWD_W on the above config.
- gtest A/B shows **legacy** path accumulates extra bottom-row contributions on strided, padded cases (duplicate accumulation at tile boundaries). New per-row path matches a naïve reference.
- **benchdnn did not reproduce** the mismatch (even with `--fast-ref=false` and buffer replay).

**Workaround validated:** removing the AArch64 **jit BWD_W (SVE-256)** path from the CPU convolution list avoids the failure (fallback path passes like it already does for Neoverse N1 & Neoverse V2).

# Expected behavior
Backward-weights results should match the naïve reference (and mkldnn-disabled PyTorch path) with **zero** elementwise diffs for these configs.

# Additional notes
- After applying the fix from **PR #4081**, PyTorch unit tests and nightly suite pass; the attached gtest shows new path == reference.
- Toggle: ONEDNN_AARCH64_DW_BWDW_USE_OLD=1 (legacy) vs unset (new).

# Attachments
- `src/cpu/aarch64/jit_uni_dw_convolution.cpp` kernel version with legacy/new path toggle 

- - [jit_uni_dw_convolution.cpp](https://github.com/user-attachments/files/22884641/jit_uni_dw_convolution.cpp)

- `tests/gtests/test_convolution_backward_weights_dw_compare.cpp` (gtest repro; includes env flag to toggle old/new)

- - [test_convolution_backward_weights_dw_compare.cpp](https://github.com/user-attachments/files/22884642/test_convolution_backward_weights_dw_compare.cpp)

# Related PR
- https://github.com/uxlfoundation/oneDNN/pull/4081

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BWDW JIT 256 error reproducible through gtests, but not through benchdnn #4124

Summary

Version

Environment

Steps to reproduce

1) PyTorch unit test (fails)

2) Detailed C++ gtest reproduction steps

Prerequisites

Build configuration

Run regression test

Optional: benchdnn verification

Logs & diff evidence

Observed behavior

Expected behavior

Additional notes

Attachments

Related PR

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

BWDW JIT 256 error reproducible through gtests, but not through benchdnn #4124

Description

Summary

Version

Environment

Steps to reproduce

1) PyTorch unit test (fails)

2) Detailed C++ gtest reproduction steps

Prerequisites

Build configuration

Run regression test

Optional: benchdnn verification

Logs & diff evidence

Observed behavior

Expected behavior

Additional notes

Attachments

Related PR

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions