Skip to content

Conversation

@Vika-F
Copy link
Contributor

@Vika-F Vika-F commented Jun 3, 2025

Description

  • Vectorization pragmas were defined for icx/icpx, GNU and clang compilers. Previously they were defined only for icc.
  • Vectorization pragmas were redefined for all compilers in attempt to use OpenMP 5 #pragma omp simd for vectorization where possible:
    cpp/daal/src/services/service_defines.h
  • Previously defined icc-specific macros (PRAGMA_ICC_TO_STR, PRAGMA_ICC_OMP, PRAGMA_ICC_NO16) were removed
  • Compilation warnings were fixed after pragmas re-definition

Note: The use of #pargma omp simd was not implemented for MSVC because it is required to link with OpenMP to support the feature leading to additional dependency in Windows build.

  • DAAL_TYPENAME macro was removed and replaced with typename keyword in dtrees/gbt/gbt_train_updater.i.

The changes that were made in the implementations of the algorithms:

  • Common parts of the cosine distance and correlation distance algorithms were moved into cosdistance_impl.i and cordistance_impl.i files respectively. Diagonal elements of the distance matrices were copied into contiguous array diag to improve vectorization in non-diagonal elements computations.
  • Nested loops order was changed in dtrees/forest/df_train_dense_default_impl.i to make vectorization over a consecutive array elements, without strides.
  • if-else branch in a loop was replaced with a ternary (?:) operator to enable the loop's vectorization in dtrees/gbt/classification/gbt_classification_train_dense_default_impl.i
  • Constant multiplier was moved out of the loops in kmeans.
  • Scalar math functions (like sSqrt) were replaced by vector variants (like vSqrt) where possible.

PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.

You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).

Checklist to comply with before moving PR from draft:

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.

No new failures in the CI comparing to the nightly run.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
  • I have provided justification why performance has changed or why changes are not expected.

Geomean performance speedup across all the algorithms on GNR and EMR is greater than 1.0.
The degradations greater than 10% are due to performance instabilities (proven by manual re-run of the degraded algorithms on EMR with increased number of the iterations in the timing loop).

  • I have provided justification why quality metrics have changed or why changes are not expected.

The accuracy had slightly changed due to the differences in vectorization for some of the algorithms. But the changes are acceptable based on the testing results.

{
if (block[i * blockSize + i] > (algorithmFPType)0.0)
{
block[i * blockSize + i] = (algorithmFPType)1.0 / daal::internal::MathInst<algorithmFPType, cpu>::sSqrt(block[i * blockSize + i]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a PR adding a function for this from MKL: #3227

Perhaps that other one could be merged first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am Ok to have that one merged first and reuse that functionality; but I think the performance have to be measured, as that PR clearly might have performance impact on the algorithm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR has been merged by now and performance impact was positive. Please remember to update.

Copy link
Contributor

@icfaust icfaust Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in this case this new file could possibly be dropped (by the looks of it)?

Edit: nevermind. I was wrong. I am guess I understand what is in #3227, and wonder if there is a way to centralize this file with the copy existing in the cosdistance. If it were in the DAL side we would put it as a primitive, but I don't know if that something we do on the DAAL side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In DAAL we don't have some centralized place for primitives, but there is an ability to share the code across the algorithms.
Quick search shown that we already have such common place for distances here: https://github.com/uxlfoundation/oneDAL/blob/main/cpp/daal/src/algorithms/service_kernel_math.h#L110

For this PR let me replace this sSqrt with vInvSqrtI from #3227.
I have created a follow up task to further refactor the distances code.

@Vika-F
Copy link
Contributor Author

Vika-F commented Jun 17, 2025

/intelci: run

@Vika-F Vika-F mentioned this pull request Jun 18, 2025
9 tasks
@Vika-F
Copy link
Contributor Author

Vika-F commented Jul 11, 2025

/intelci: run

@Vika-F
Copy link
Contributor Author

Vika-F commented Aug 1, 2025

/intelci: run

@Vika-F
Copy link
Contributor Author

Vika-F commented Sep 24, 2025

/intelci: run

@Vika-F
Copy link
Contributor Author

Vika-F commented Sep 26, 2025

/intelci: run

@Vika-F
Copy link
Contributor Author

Vika-F commented Sep 29, 2025

/intelci: run

@Vika-F
Copy link
Contributor Author

Vika-F commented Oct 1, 2025

/intelci: run

@Vika-F
Copy link
Contributor Author

Vika-F commented Oct 2, 2025

/intelci: run

@Vika-F
Copy link
Contributor Author

Vika-F commented Oct 8, 2025

/intelci: run

@david-cortes-intel
Copy link
Contributor

@Vika-F Would you see an improvement in forests if you were to replace those pragmas with omp simd where applicable instead of removing them?

@Vika-F
Copy link
Contributor Author

Vika-F commented Oct 8, 2025

@Vika-F Would you see an improvement in forests if you were to replace those pragmas with omp simd where applicable instead of removing them?

@david-cortes-intel Performance rerun is a rather time consuming thing now. Last time it took 3 CI jobs to get the results from EMR ang GNR. So, I would leave the improvements for the further PRs, and now I am just trying to have no degradations to make it mergeable finally.

@Vika-F
Copy link
Contributor Author

Vika-F commented Oct 14, 2025

/intelci: run

@Vika-F
Copy link
Contributor Author

Vika-F commented Oct 14, 2025

/intelci: run

@Vika-F
Copy link
Contributor Author

Vika-F commented Oct 23, 2025

/intelci: run

@Vika-F
Copy link
Contributor Author

Vika-F commented Oct 27, 2025

/intelci: run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants