Skip to content

Large kernels can generate xbyak_aarch64 exceptions #4089

@Sqvid

Description

@Sqvid

Summary

This is a follow-on from #4055. There exist several cases in the code where it is possible for very large kernels to be generated; this in turn can cause Xbyak_aarch64 to throw an exception when one tries to place a Label with a large jump address.

For example consider the following:

$ ./build/tests/benchdnn/benchdnn --conv --impl='brgconv:sve_128' --canonical=true --dt=bf16 --attr-post-ops=gelu_tanh g1ic64ih1000oc64oh1000kh3ph128dh127         

bad err=15 in Xbyak::Error
terminate called after throwing an instance of 'Xbyak_aarch64::Error'
  what():  illegal immediate parameter (range error)
zsh: abort (core dumped)

This gets thrown when underlying CodeArray grows large and you try to place a label at the end.
This is because the Xbyak_aarch64::LabelManager tries to calculate program-counter relative address, and if the value of the immediate value is too large then the instruction is malformed.

Since this error can pop up in a variety of scenarios, throws an uncaught exception, and is unrelated to any particular kernel, I think we really need to find a good way to address it.

cc: @vpirogov, @dzarukin, @jondea, @Shreyas-fuj

Environment

oneDNN includes hardware-specific optimizations and may behave
differently on depending on the compiler and build environment. Include
the following information to help reproduce the issue:

  • CPU make and model: Neoverse V1 (flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm paca pacg dcpodp svei8mm svebf16 i8mm bf16 dgh rng)
  • git hash: bcc0ca0

Steps to reproduce

$ ./build/tests/benchdnn/benchdnn --conv --impl='brgconv:sve_128' --canonical=true --dt=bf16 --attr-post-ops=gelu_tanh g1ic64ih1000oc64oh1000kh3ph128dh127         

Observed behavior

Xbyak_aarch64 exception thrown.

Expected behavior

Graceful error handling.

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedplatform:cpu-aarch64Codeowner: @oneapi-src/onednn-cpu-aarch64sightingSuspicious library behavior. Should be promoted to a bug when confirmed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions