Skip to content

Conversation

@martin-frbg
Copy link
Collaborator

No description provided.

@martin-frbg
Copy link
Collaborator Author

@ChipKerchner do we expect casts from bfloat16 to float32 to "just work" for C code on RISCV64 ? AFAICT this is not implemented at least in the cross-compiler setup that this gh workflow uses (even when using latest LLVM with latest riscv-gnu-toolchain), causing test failures as the intermediate result0 = (float)A[ai] * (float) B[bi] in your sbgemm kernel turns the small bfloat16 numbers into huge floats...

@ChipKerchner
Copy link
Contributor

Scalar casting should just work from bfloat16 to float. I don't see any issue. These are the qemu flags I use.

qemu-riscv64 -cpu rv64,g=true,f=true,d=true,c=true,v=true,vlen=256,elen=64,vext_spec=v1.0,zfh=true,zvfh=true,zvfbfwma=true,rvv_ma_all_1s=true,rvv_ta_all_1s=true,zbc=true,zvbc=true -L /home/ckerchner/tools/tt-riscv-toolchain-ae8a01f3/sysroot

@ChipKerchner
Copy link
Contributor

Actually after I sync, I'm seeing a failure in sbgemm - sbgemv seems fine. BTW, I didn't write sbgemm.

@martin-frbg
Copy link
Collaborator Author

Thanks for the flags - unfortunately adding the missing ones did not change the outcome for me. And I'm getting SGEMV FAILURES: 789504 as well with that setup, while the BGEMM test passes (as do all float16 ones). Most likely your TT toolchain is more advanced, and I should just leave out the SB tests in this CI job for now ?
I just noticed the use of plain (float) casts in some of the code, while the tests all go to sbf16tos() for conversions.

@ChipKerchner
Copy link
Contributor

Are you saying that some architectures besides RISC-V are using plain casts to float while others are using a external function?

@ChipKerchner
Copy link
Contributor

BTW, I tried an external function and I'm still getting failures.

@martin-frbg
Copy link
Collaborator Author

Are you saying that some architectures besides RISC-V are using plain casts to float while others are using a external function?

No, on the contrary I see RISC-V using plain casts while everything else uses an external function.
And at least the first few intermediate calculations in the sbgemm_kernel_16x8_zvl256 seem to make more sense now that I've changed them from casts to using the float16to32 wrapper around sbf16tos as in the test helper header

@ChipKerchner
Copy link
Contributor

Strange thing is SHGEMM uses the same type casting and all pass there.

@martin-frbg
Copy link
Collaborator Author

Yes, this got me thinking that maybe there is a conflict between the compiler having (or being expected to have) some "native" support for a floating point "bf16" type and OpenBLAS' fallback solution of assuming bfloat16 is an uint_16.
Replacing all obvious casts with calls to the conversion function did not solve the test errors for me, however - a lot of the result matrix elements became similar enough to their SGEMM counterparts, but not all. And I have no way of finding out if it is the cross-compiler at fault, or qemu-riscv64 10.1 not handling all aspects of bfloat16 correctly. My Banana PI F3 does great for checking fp16 code but appears to lack support for the bfloat16 extensions

@ChipKerchner
Copy link
Contributor

Yes, unfortunately the BananaPi does NOT support the bf16 format.

Another weird thing is the test pass for sizes 1 -> 100 but fail for size = 256.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants