Skip to content

Conversation

@YixingZhang007
Copy link
Contributor

@YixingZhang007 YixingZhang007 commented Oct 25, 2025

This patch resolves failures in SYCL E2E tests fp64-conv-emu-1.cpp and fp64-conv-emu-2.cpp when using the new offloading model by addressing two issues:

(1) The original implementation in the ClangLinkerWrapper.cpp incorrectly constructed ocloc command line arguments by concatenating all arguments into a single string; this caused parsing failures in the executor. This is fixed in this patch by splitting arguments on whitespace boundaries and rejoining them into a correctly formatted command string.

(2) The double-precision variable test case in fp64-conv-emu-2.cpp has been temporarily disabled because the current implementation only supports -fsycl-fp64-conv-emu, which provides limited FP64 emulation for kernels containing FP64 conversions but no FP64 computations. The test will be re-enabled once -fsycl-fp64-gen-emu (full FP64 emulation) is implemented.

Copy link
Contributor

@aelovikov-intel aelovikov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E2E tests LGTM

/// Otherwise return 'false'.
bool Driver::GetUseNewOffloadDriverForSYCLOffload(Compilation &C,
const ArgList &Args) const {
// Check only if enabled with -fsycl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Check only if enabled with -fsycl
// Check only if enabled with -fsycl.

Copy link
Contributor Author

@YixingZhang007 YixingZhang007 Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your suggestions! The changes for enabling the new offloading model in this PR were copied directly from another PR (#15121). This PR specifically focuses on resolving test failures in fp64-conv-emu-1.cpp and fp64-conv-emu-2.cpp that occur after enabling the new offloading model. I will move any comments related to the new offloading model changes to the original PR where that functionality was implemented. The changes related to enabling the new offloading model have been reverted and removed from this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'm good with keeping it as is for this PR, but we can follow up on a subsequent PR to address this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure! I guess I will add these comments to PR #15121, which contains changes for enabling the new offloading model as the default. These changes will be applied when we reopen the PR after all SYCL E2E test issues have been resolved.

// RUN: %clangxx -fsycl -fsycl-targets=intel_gpu_dg2_g10,intel_gpu_dg2_g11,intel_gpu_dg2_g12,intel_gpu_pvc,intel_gpu_mtl_h,intel_gpu_mtl_u -fsycl-fp64-conv-emu %O0 %s -o %t.out
// RUN: %{run} %t.out

// RUN: %clangxx -fsycl -fsycl-targets=intel_gpu_dg2_g10,intel_gpu_dg2_g11,intel_gpu_dg2_g12,intel_gpu_pvc,intel_gpu_mtl_h,intel_gpu_mtl_u -fsycl-fp64-conv-emu --offload-new-driver %O0 %s -o %t.out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also try with -g? The code you're changing used to have issues with -g, so just to be on the safe side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes work with '-g' and I have also added a new test command in this file for running the test with -g. Thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continue my previous comment: we can keep -g test, but just removing --offload-new-driver.

Copy link
Contributor Author

@YixingZhang007 YixingZhang007 Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your suggestion!
This test is currently testing three cases:

  1. old offloading model
  2. new offloading model
  3. new offloading model and -g

Do you mean that we should replace the third case to be "old offloading model and -g"?

Copy link
Contributor

@YuriPlyakhin YuriPlyakhin Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original test was testing 1 case:

  1. default offloading model

I suggest to add one more case, so result will be:

  1. default offloading model
  2. default offloading model and -g

We should not distinguish between new and old offloading model in this test. Instead, we should just add new entire SYCL E2E testing with new offloading model enabled by option for all tests and of course it should be separate PR.

If you still have questions, feel free to ping me in teams and we can discuss.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on this: there is a specific directory for New Offloading Model tests: https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/NewOffloadDriver/

Everything related to the New Offloading Model should go there, I think. Also, I agree that this can be done in a separate PR.

Copy link
Contributor

@YuriPlyakhin YuriPlyakhin Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was that instead of adding individual tests for new offloading model duplicating old offloading model tests, we should just start running entire SYCL E2E with new offloading model enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but while we do that, if we need to add individual tests, they should go to the specific directory. Once we start running entire SYCL E2E with new offloading model, we can drop the specific directory, I think.

@YixingZhang007 YixingZhang007 changed the title [NewOffloadModel] Fix argument parsing in Clang Linker Wrapper [SYCL][clang-linker-wrapper] Fix argument parsing in Clang Linker Wrapper Oct 28, 2025
options::OPT_offload_new_driver, false))
return false;

if (Args.hasArg(options::OPT_fintelfpga))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support for FPGA related options have been removed in this PR.
Don't think this check is necessary.

Copy link
Contributor Author

@YixingZhang007 YixingZhang007 Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your suggestions! The changes for enabling the new offloading model in this PR were copied directly from another PR (#15121). This PR specifically focuses on resolving test failures in fp64-conv-emu-1.cpp and fp64-conv-emu-2.cpp that occur after enabling the new offloading model. I will move any comments related to the new offloading model changes to the original PR where that functionality was implemented. The changes related to enabling the new offloading model have been reverted and removed from this PR.

Copy link
Contributor

@sarnex sarnex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm but will fall back to marcos for the in-depth review

if (q.get_device().has(aspect::fp64))
nfail += test<Increment<double>>(q);
// This test is currently disabled because it requires the -ze-fp64-gen-emu
// IGC option to run FP64 arithmetic operations. The -fsycl-fp64-conv-emu flag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry why are we seeing this failure now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out! We are seeing the following failure with this test when we enable the new offloading model (after applying changes from #15121). The error generated is as below.

Double arithmetic operation is not supported on this platform with FP64 conversion emulation mode (poison FP64 kernels is disabled).
in kernel: 'typeinfo name for int test<Increment<double>>(sycl::_V1::queue&)::'lambda'(sycl::_V1::handler&)::operator()(sycl::_V1::handler&) const::'lambda'()'

I think this error occurs because this test performs arithmetic operations with double-precision variables, which should only be permitted when the -ze-fp64-gen-emu IGC option is not enabled. The currently enabled -ze-fp64-gen-conv-emu option (activated by -fsycl-fp64-conv-emu flag that we passed into clang) should only allow FP64 conversions, not FP64 computations.

Copy link
Contributor

@YuriPlyakhin YuriPlyakhin Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I have the same question, and I still do not understand: how does this test pass with old offloading model then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to figure out how the old offloading model is able to pass this, but I have checked that the old offloading model only has -ze-fp64-gen-conv-emu passed into ocloc which should not be sufficient. I think this failure for the old offloading model did not get captured for some issue that would need more investigation (I can definitely look more into this if needed). But I have tried passing -ze-fp64-gen-emu into the ocloc command while running the test with the new offloading model, and this helps solve the issue, and also according to the previous documentation and PRs (such as #13912), I think the option -ze-fp64-gen-emu is required for this test to have arithmetic operations on double numbers. Thank you :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to figure out how the test is passing, before disabling anything in the code.
Currently the test is passing, but with your change, this scenario will be excluded from testing for the old offloading model, which doesn't look good to me.

Copy link
Contributor

@sarnex sarnex Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new/old offloading model shouldnt change the content of any IR, it should just change the tools called (which shouldn't effect IR content either) and the arguments to the tools, so it may be easy to see if there is some difference in args to ocloc or different image compile/link flags or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! For sure! I will investigate why errors are not raised when double-precision arithmetic operations are executed with only the fp64-gen-conv-emu option enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any update on this?

if (!BeforeOptions.empty())
CmdArgs.push_back(BeforeOptions);
if (!BeforeOptions.empty()) {
SmallVector<StringRef, 8> BeforeArgs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also just wondering, i see one of your commits is titled says 'revert nick's PR' but i don't remember writing that code so if it is referring to me do you mind linking the PR being reverted, thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely! I previously applied the changes directly from your PR #15121 (this PR is currently closed) to enable the new offloading model and verify the CI test results. After confirming that both fp64-conv-emu-1.cpp and fp64-conv-emu-2.cpp passed in CI with the new offloading model enabled, I reverted the changes. If my understanding is correct, I think we would enable the new offloading model changes after the other SYCL E2E test failures are resolved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe, resolving all SYCL E2E test failures with new offloading model is necessary to enable new model by default, but it is not the only condition.
I think as soon as SYCL E2E pass, we want to enable regular testing (pre-commit? post-commit? nightly?) with new offloading model to make sure no regressions.
Or we might want to even enable it now and mark not-passing tests as XFAIL.
Thoughts?

Copy link
Contributor

@sarnex sarnex Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say we should add it to the nightly, but yeah IMO we can do it now if someone has bandwidth do to it. i definitely do not but im happy to help. We need the job to pass though, so we ideally would have a LIT variable so we can XFAIL the failing ones, or more crudely we coukd have some file with a list of either passing or failing tests, in that case we could use LIT_FILTER or LIT_FILTER_OUT

btw that pr isn't from me but doesnt matter either way :P

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will ask someone to add it to nightly. I think we should do it similar to how spirv_backend is being tested, likely with a LIT variable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.

@sarnex sarnex requested review from a team October 29, 2025 15:17
Comment on lines +20 to +21
// RUN: %clangxx -fsycl -fsycl-targets=intel_gpu_dg2_g10,intel_gpu_dg2_g11,intel_gpu_dg2_g12,intel_gpu_pvc,intel_gpu_mtl_h,intel_gpu_mtl_u -fsycl-fp64-conv-emu --no-offload-new-driver %O0 %s -o %t.out
// RUN: %{run} %t.out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: do we want to modify this test that way by adding 2 command lines with and without offload new driver? It looks like that way we would modify a lot of tests introducing command lines like that and then when we switch to new offload driver by default, we would need to clean up all that.
I'm thinking would not it be better to just run entire SYCL E2E with new offload driver enabled, and mark currently failing tests as XFAIL? and when we are ready, we would just flip the switch and stop new offload driver testing (since it would become a default)
That way we would not need to modify all tests like that...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for:

  • adding entire SYCL E2E with new offload driver enabled as nightly and marking failing tests as XFAIL
  • remove modification of tests in this PR, which are adding new-offload-driver/old-offload-driver behavior
  • keep -g addition, but using default offload driver (which is old currently)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to go with the full SYCL E2E running with new offload driver. For sure we don't want any test adding new offload driver that is not under the appropriate directory: https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/NewOffloadDriver. Otherwise, it will be very difficult to find and remove them once we make the new offload driver run by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I had a discussion with Yury, and I think with this PR, we will keep the SYCL E2E tests (fp64-conv-emu-1.cpp and fp64-conv-emu-2.cpp) running only with the default offloading model (which is currently the old offloading model). A follow-up PR will be submitted afterward to enable new offloading model testing in the nightly build (I will most likely be working on this part). As a result, the changes in this PR will not be tested with the new offloading model by CI for the next couple of weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants