-
Couldn't load subscription status.
- Fork 226
Fix vectorization pragmas for icx, GNU, clang and MS VC compilers #3246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| { | ||
| if (block[i * blockSize + i] > (algorithmFPType)0.0) | ||
| { | ||
| block[i * blockSize + i] = (algorithmFPType)1.0 / daal::internal::MathInst<algorithmFPType, cpu>::sSqrt(block[i * blockSize + i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a PR adding a function for this from MKL: #3227
Perhaps that other one could be merged first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am Ok to have that one merged first and reuse that functionality; but I think the performance have to be measured, as that PR clearly might have performance impact on the algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR has been merged by now and performance impact was positive. Please remember to update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess in this case this new file could possibly be dropped (by the looks of it)?
Edit: nevermind. I was wrong. I am guess I understand what is in #3227, and wonder if there is a way to centralize this file with the copy existing in the cosdistance. If it were in the DAL side we would put it as a primitive, but I don't know if that something we do on the DAAL side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In DAAL we don't have some centralized place for primitives, but there is an ability to share the code across the algorithms.
Quick search shown that we already have such common place for distances here: https://github.com/uxlfoundation/oneDAL/blob/main/cpp/daal/src/algorithms/service_kernel_math.h#L110
For this PR let me replace this sSqrt with vInvSqrtI from #3227.
I have created a follow up task to further refactor the distances code.
|
/intelci: run |
|
/intelci: run |
|
/intelci: run |
…inomial kernel function
|
/intelci: run |
|
/intelci: run |
|
/intelci: run |
.../algorithms/outlierdetection_multivariate/outlierdetection_multivariate_dense_default_impl.i
Outdated
Show resolved
Hide resolved
|
/intelci: run |
|
/intelci: run |
…iction to fix performance drop
|
/intelci: run |
|
@Vika-F Would you see an improvement in forests if you were to replace those pragmas with |
@david-cortes-intel Performance rerun is a rather time consuming thing now. Last time it took 3 CI jobs to get the results from EMR ang GNR. So, I would leave the improvements for the further PRs, and now I am just trying to have no degradations to make it mergeable finally. |
|
/intelci: run |
|
/intelci: run |
|
/intelci: run |
|
/intelci: run |
Description
#pragma omp simdfor vectorization where possible:cpp/daal/src/services/service_defines.h
PRAGMA_ICC_TO_STR,PRAGMA_ICC_OMP,PRAGMA_ICC_NO16) were removedNote: The use of
#pargma omp simdwas not implemented for MSVC because it is required to link with OpenMP to support the feature leading to additional dependency in Windows build.DAAL_TYPENAMEmacro was removed and replaced withtypenamekeyword indtrees/gbt/gbt_train_updater.i.The changes that were made in the implementations of the algorithms:
cosdistance_impl.iandcordistance_impl.ifiles respectively. Diagonal elements of the distance matrices were copied into contiguous arraydiagto improve vectorization in non-diagonal elements computations.dtrees/forest/df_train_dense_default_impl.ito make vectorization over a consecutive array elements, without strides.dtrees/gbt/classification/gbt_classification_train_dense_default_impl.isSqrt) were replaced by vector variants (likevSqrt) where possible.PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.
You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).
Checklist to comply with before moving PR from draft:
PR completeness and readability
Testing
No new failures in the CI comparing to the nightly run.
Performance
Geomean performance speedup across all the algorithms on GNR and EMR is greater than 1.0.
The degradations greater than 10% are due to performance instabilities (proven by manual re-run of the degraded algorithms on EMR with increased number of the iterations in the timing loop).
The accuracy had slightly changed due to the differences in vectorization for some of the algorithms. But the changes are acceptable based on the testing results.