Releases: uxlfoundation/oneDPL
oneDPL 2022.9.0 release
New Features
- Added parallel range algorithms in
namespace oneapi::dpl::ranges:fill,move,replace,replace_if,
remove,remove_if,mismatch,minmax_element,min,max,find_first_of,find_end,
is_sorted_until. These algorithms operate with C++20 random access ranges. - Improved performance of set operation algorithms when using device policies:
set_union,set_difference,
set_intersection,set_symmetric_difference. - Improved performance of
copy,fill,for_each,replace,reverse,rotate,transformand 30+
other algorithms with device policies on GPUs when usingstd::reverse_iterator. - Added ADL-based customization point
is_onedpl_indirectly_device_accessible, which can be used to mark iterator
types as indirectly device accessible. Added public traitoneapi::dpl::is_directly_device_accessible[_v]to
query if types are indirectly device accessible.
Fixed Issues
- Eliminated runtime exceptions encountered when compiling code that called
inclusive_scan,copy_if,
partition,unique,reduce_by_segment, and related algorithms with device policies using
the open source oneAPI DPC++ Compiler without specifying an optimization flag. - Fixed a compilation error in
reduce_by_segmentregarding return type deduction when called with a device policy. - Eliminated multiple compile time warnings throughout the library.
Known Issues and Limitations
New in This Release
- The
set_intersection,set_difference,set_symmetric_difference, andset_unionalgorithms with a device policy
require GPUs with double-precision support on Windows, regardless of the value type of the input sequences.
Existing Issues
See the oneDPL Guide for other restrictions and known limitations
- Incorrect results may be observed when calling
sortwith a device policy on Intel® Arc™ graphics 140V with data
sizes of 4-8 million elements. histogramalgorithm requires the output value type to be an integral type no larger than four bytes
when used with a device policy on hardware that does not support 64-bit atomic operations.histogrammay provide incorrect results with device policies in a program built with-O0option and the driver
version is 2448.13 or older.- For
transform_exclusive_scanandexclusive_scanto run in-place (that is, with the same data
used for both input and destination) and with an execution policy ofunseqorpar_unseq,
it is required that the provided input and destination iterators are equality comparable.
Furthermore, the equality comparison of the input and destination iterator must evaluate to true.
If these conditions are not met, the result of these algorithm calls is undefined. - Incorrect results may be produced by
exclusive_scan,inclusive_scan,transform_exclusive_scan,
transform_inclusive_scan,exclusive_scan_by_segment,inclusive_scan_by_segment,reduce_by_segment
withunseqorpar_unseqpolicy when compiled by Intel® oneAPI DPC++/C++ Compiler 2024.1 or earlier
with-fiopenmp,-fiopenmp-simd,-qopenmp,-qopenmp-simdoptions on Linux.
To avoid the issue, pass-fopenmpor-fopenmp-simdoption instead. - With libstdc++ version 10, the compilation error SYCL kernel cannot use exceptions occurs
when calling the range-basedadjacent_find,is_sortedoris_sorted_untilalgorithms with device policies. - The range-based
count_ifmay produce incorrect results on Intel® Data Center GPU Max Series when the driver version
is "Rolling 2507.12" and newer.
oneDPL 2022.8.0 release
New Features
- Added support of host policies for
histogramalgorithms. - Added support for an undersized output range in the range-based
mergealgorithm. - Improved performance of the
mergeand sorting algorithms
(sort,stable_sort,sort_by_key,stable_sort_by_key) that rely on Merge sort*,
with device policies for large data sizes. - Improved performance of
copy,fill,for_each,replace,reverse,rotate,transformand 30+
other algorithms with device policies on GPUs. - Improved oneDPL use with SYCL implementations other than Intel oneAPI DPC++/C++ Compiler.
Fixed Issues
-
Fixed an issue with
drop_viewin the experimental range-based API. -
Fixed compilation errors in
find_ifandfind_if_notwith device policies where the user provided predicate is
device copyable but not trivially copyable. -
Fixed incorrect results or synchronous SYCL exceptions for several algorithms when compiled with
-O0and executed
on a GPU device. -
Fixed an issue preventing inclusion of the
<numeric>header after<execution>and<algorithm>headers. -
Fixed several issues in the
sort,stable_sort,sort_by_keyandstable_sort_by_keyalgorithms that:- Allows the use of non-trivially-copyable comparators.
- Eliminates duplicate kernel names.
- Resolves incorrect results on devices with sub-group sizes smaller than four.
- Resolved synchronization errors that were seen on Intel® Arc™ ** B-series GPU devices.
Known Issues and Limitations
New in This Release
- Incorrect results may be observed when calling
sortwith a device policy on Intel® Arc™ graphics 140V with data
sizes of 4-8 million elements. sort,stable_sort,sort_by_keyandstable_sort_by_keyalgorithms fail to compile
when using Clang 17 and earlier versions, as well as compilers based on these versions,
such as Intel oneAPI DPC++/C++ Compiler 2023.2.0.- When compiling code that uses device policies with the open source oneAPI DPC++ Compiler (clang++ driver),
synchronous SYCL runtime exceptions regarding unfound kernels may be encountered unless an optimization flag is
specified (for example-O1) as opposed to relying on the compiler's default optimization level.
Existing Issues
See oneDPL Guide for other restrictions and known limitations.
histogramalgorithm requires the output value type to be an integral type no larger than four bytes
when used with an FPGA policy.histogrammay provide incorrect results with device policies in a program built with-O0option.- Compilation issues may be encountered when passing zip iterators to
exclusive_scan_by_segmenton Windows. - For
transform_exclusive_scanandexclusive_scanto run in-place (that is, with the same data
used for both input and destination) and with an execution policy ofunseqorpar_unseq,
it is required that the provided input and destination iterators are equality comparable.
Furthermore, the equality comparison of the input and destination iterator must evaluate to true.
If these conditions are not met, the result of these algorithm calls is undefined. - Incorrect results may be produced by
exclusive_scan,inclusive_scan,transform_exclusive_scan,
transform_inclusive_scan,exclusive_scan_by_segment,inclusive_scan_by_segment,reduce_by_segment
withunseqorpar_unseqpolicy when compiled by Intel® oneAPI DPC++/C++ Compiler
with-fiopenmp,-fiopenmp-simd,-qopenmp,-qopenmp-simdoptions on Linux.
To avoid the issue, pass-fopenmpor-fopenmp-simdoption instead.
*The sorting algorithms in oneDPL use Radix sort for arithmetic data types and
sycl::half (since oneDPL 2022.6) compared with std::less or std::greater, otherwise Merge sort.
**Intel, the Intel logo, and Arc are the trademarks of Intel Corporation or its subsidiaries.
oneDPL 2022.7.1 release
Fixed Issues
- Fixed a build error for the
oneapi::dpl::sort_by_keyalgorithm when multiple calls are made to the algorithm
with identically typed parameter lists.
Known Issues and Limitations
Existing Issues
See oneDPL Guide for other restrictions and known limitations.
histogrammay provide incorrect results with device policies in a program built with -O0 option.- Inclusion of
<oneapi/dpl/dynamic_selection>prior to<oneapi/dpl/random>may result in compilation errors.
Include<oneapi/dpl/random>first as a workaround. - Incorrect results may occur when using
oneapi::dpl::experimental::philox_enginewith no predefined template
parameters and withword_sizevalues other than 64 and 32. - Incorrect results or a synchronous SYCL exception may be observed with the following algorithms built
with -O0 option and executed on a GPU device:exclusive_scan,inclusive_scan,transform_exclusive_scan,
transform_inclusive_scan,copy_if,remove,remove_copy,remove_copy_if,remove_if,
partition,partition_copy,stable_partition,unique,unique_copy, andsort. - The value type of the input sequence should be convertible to the type of the initial element for the following
algorithms with device execution policies:transform_inclusive_scan,transform_exclusive_scan,
inclusive_scan, andexclusive_scan. - The following algorithms with device execution policies may exceed the C++ standard requirements on the number
of applications of user-provided predicates or equality operators:copy_if,remove,remove_copy,
remove_copy_if,remove_if,partition_copy,unique, andunique_copy. In all cases,
the predicate or equality operator is appliedO(n)times. - The
adjacent_find,all_of,any_of,equal,find,find_if,find_end,find_first_of,
find_if_not,includes,is_heap,is_heap_until,is_sorted,is_sorted_until,mismatch,
none_of,search, andsearch_nalgorithms may cause a segmentation fault when used with a device execution
policy on a CPU device, and built on Linux with Intel® oneAPI DPC++/C++ Compiler 2025.0.0 and -O0 -g compiler options. histogramalgorithm requires the output value type to be an integral type no larger than 4 bytes
when used with an FPGA policy.- Compilation issues may be encountered when passing zip iterators to
exclusive_scan_by_segmenton Windows. - For
transform_exclusive_scanandexclusive_scanto run in-place (that is, with the same data
used for both input and destination) and with an execution policy ofunseqorpar_unseq,
it is required that the provided input and destination iterators are equality comparable.
Furthermore, the equality comparison of the input and destination iterator must evaluate to true.
If these conditions are not met, the result of these algorithm calls is undefined. sort,stable_sort,sort_by_key,stable_sort_by_key,partial_sort_copyalgorithms
may work incorrectly or cause a segmentation fault when used a device execution policy on a CPU device,
and built on Linux with Intel® oneAPI DPC++/C++ Compiler and -O0 -g compiler options.
To avoid the issue, pass-fsycl-device-code-split=per_kerneloption to the compiler.- Incorrect results may be produced by
exclusive_scan,inclusive_scan,transform_exclusive_scan,
transform_inclusive_scan,exclusive_scan_by_segment,inclusive_scan_by_segment,reduce_by_segment
withunseqorpar_unseqpolicy when compiled by Intel® oneAPI DPC++/C++ Compiler
with-fiopenmp,-fiopenmp-simd,-qopenmp,-qopenmp-simdoptions on Linux.
To avoid the issue, pass-fopenmpor-fopenmp-simdoption instead. - Incorrect results may be produced by
reduce,reduce_by_segment, andtransform_reduce
with 64-bit data types when compiled by Intel® oneAPI DPC++/C++ Compiler versions 2021.3 and newer
and executed on a GPU device. For a workaround, define theONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTION
macro to1before including oneDPL header files. std::tuple,std::paircannot be used with SYCL buffers to transfer data between host and device.std::arraycannot be swapped in DPC++ kernels withstd::swapfunction orswapmember function
in the Microsoft* Visual C++ standard library.- The
oneapi::dpl::experimental::ranges::reversealgorithm is not available with-fno-sycl-unnamed-lambdaoption. - STL algorithm functions (such as
std::for_each) used in DPC++ kernels do not compile with the debug version of
the Microsoft* Visual C++ standard library.
oneDPL 2022.7.0 release
New Features
- Improved performance of the
adjacent_find,all_of,any_of,copy_if,exclusive_scan,equal,
find,find_if,find_end,find_first_of,find_if_not,inclusive_scan,includes,
is_heap,is_heap_until,is_partitioned,is_sorted,is_sorted_until,lexicographical_compare,
max_element,min_element,minmax_element,mismatch,none_of,partition,partition_copy,
reduce,remove,remove_copy,remove_copy_if,remove_if,search,search_n,
stable_partition,transform_exclusive_scan,transform_inclusive_scan,unique, andunique_copy
algorithms with device policies. - Improved performance of
sort,stable_sortandsort_by_keyalgorithms with device policies when using Merge
sort 1 . - Added
stable_sort_by_keyalgorithm innamespace oneapi::dpl. - Added parallel range algorithms in
namespace oneapi::dpl::ranges:all_of,any_of,
none_of,for_each,find,find_if,find_if_not,adjacent_find,search,search_n,
transform,sort,stable_sort,is_sorted,merge,count,count_if,equal,copy,
copy_if,min_element,max_element. These algorithms operate with C++20 random access ranges
and views while also taking an execution policy similarly to other oneDPL algorithms. - Added support for operators ==, !=, << and >> for RNG engines and distributions.
- Added experimental support for the Philox RNG engine in
namespace oneapi::dpl::experimental. - Added the
<oneapi/dpl/version>header containing oneDPL version macros and new feature testing macros.
Fixed Issues
- Fixed unused variable and unused type warnings.
- Fixed memory leaks when using
sortandstable_sortalgorithms with the oneTBB backend. - Fixed a build error for
oneapi::dpl::beginandoneapi::dpl::endfunctions used with
the Microsoft* Visual C++ standard library and with C++20. - Reordered template parameters of the
histogramalgorithm to match its function parameter order.
For affectedhistogramcalls we recommend to remove explicit specification of template parameters
and instead add explicit type conversions of the function arguments as necessary. gpu::esimd::radix_sortandgpu::esimd::radix_sort_by_keykernel templates now throwstd::bad_alloc
if they fail to allocate global memory.- Fixed a potential hang occurring with
gpu::esimd::radix_sortand
gpu::esimd::radix_sort_by_keykernel templates. - Fixed documentation for
sort_by_keyalgorithm, which used to be mistakenly described as stable, despite being
possibly unstable for some execution policies. If stability is required, usestable_sort_by_keyinstead. - Fixed an error when calling
sortwith device execution policies on CUDA devices. - Allow passing C++20 random access iterators to oneDPL algorithms.
- Fixed issues caused by initialization of SYCL queues in the predefined device execution policies.
These policies have been updated to be immutable (const) objects.
Known Issues and Limitations
New in This Release
histogrammay provide incorrect results with device policies in a program built with -O0 option.- Inclusion of
<oneapi/dpl/dynamic_selection>prior to<oneapi/dpl/random>may result in compilation errors.
Include<oneapi/dpl/random>first as a workaround. - Incorrect results may occur when using
oneapi::dpl::experimental::philox_enginewith no predefined template
parameters and withword_sizevalues other than 64 and 32. - Incorrect results or a synchronous SYCL exception may be observed with the following algorithms built
with -O0 option and executed on a GPU device:exclusive_scan,inclusive_scan,transform_exclusive_scan,
transform_inclusive_scan,copy_if,remove,remove_copy,remove_copy_if,remove_if,
partition,partition_copy,stable_partition,unique,unique_copy, andsort. - The value type of the input sequence should be convertible to the type of the initial element for the following
algorithms with device execution policies:transform_inclusive_scan,transform_exclusive_scan,
inclusive_scan, andexclusive_scan. - The following algorithms with device execution policies may exceed the C++ standard requirements on the number
of applications of user-provided predicates or equality operators:copy_if,remove,remove_copy,
remove_copy_if,remove_if,partition_copy,unique, andunique_copy. In all cases,
the predicate or equality operator is appliedO(n)times. - The
adjacent_find,all_of,any_of,equal,find,find_if,find_end,find_first_of,
find_if_not,includes,is_heap,is_heap_until,is_sorted,is_sorted_until,mismatch,
none_of,search, andsearch_nalgorithms may cause a segmentation fault when used with a device execution
policy on a CPU device, and built on Linux with Intel® oneAPI DPC++/C++ Compiler 2025.0.0 and -O0 -g compiler options.
Existing Issues
See oneDPL Guide for other restrictions and known limitations.
histogramalgorithm requires the output value type to be an integral type no larger than 4 bytes
when used with an FPGA policy.- Compilation issues may be encountered when passing zip iterators to
exclusive_scan_by_segmenton Windows. - For
transform_exclusive_scanandexclusive_scanto run in-place (that is, with the same data
used for both input and destination) and with an execution policy ofunseqorpar_unseq,
it is required that the provided input and destination iterators are equality comparable.
Furthermore, the equality comparison of the input and destination iterator must evaluate to true.
If these conditions are not met, the result of these algorithm calls is undefined. sort,stable_sort,sort_by_key,stable_sort_by_key,partial_sort_copyalgorithms
may work incorrectly or cause a segmentation fault when used a device execution policy on a CPU device,
and built on Linux with Intel® oneAPI DPC++/C++ Compiler and -O0 -g compiler options.
To avoid the issue, pass-fsycl-device-code-split=per_kerneloption to the compiler.- Incorrect results may be produced by
exclusive_scan,inclusive_scan,transform_exclusive_scan,
transform_inclusive_scan,exclusive_scan_by_segment,inclusive_scan_by_segment,reduce_by_segment
withunseqorpar_unseqpolicy when compiled by Intel® oneAPI DPC++/C++ Compiler
with-fiopenmp,-fiopenmp-simd,-qopenmp,-qopenmp-simdoptions on Linux.
To avoid the issue, pass-fopenmpor-fopenmp-simdoption instead. - Incorrect results may be produced by
reduce,reduce_by_segment, andtransform_reduce
with 64-bit data types when compiled by Intel® oneAPI DPC++/C++ Compiler versions 2021.3 and newer
and executed on a GPU device. For a workaround, define theONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTION
macro to1before including oneDPL header files. std::tuple,std::paircannot be used with SYCL buffers to transfer data between host and device.std::arraycannot be swapped in DPC++ kernels withstd::swapfunction orswapmember function
in the Microsoft* Visual C++ standard library.- The
oneapi::dpl::experimental::ranges::reversealgorithm is not available with-fno-sycl-unnamed-lambdaoption. - STL algorithm functions (such as
std::for_each) used in DPC++ kernels do not compile with the debug version of
the Microsoft* Visual C++ standard library.
-
sorting algorithms in oneDPL use Radix sort for arithmetic data types and
sycl::half(since oneDPL 2022.6) compared withstd::lessorstd::greater, otherwise Merge sort. ↩
oneDPL 2022.6.0 release
News
- oneAPI DPC++ Library Manual Migration Guide to simplify the migration of Thrust* and CUB* APIs from CUDA*.
radix_sortandradix_sort_by_keykernel templates were moved intooneapi::dpl::experimental::kt::gpu::esimdnamespace. The formeroneapi::dpl::experimental::kt::esimdnamespace is deprecated and will be removed in a future release.- The
for_loop,for_loop_strided,for_loop_n,for_loop_n_stridedalgorithms innamespace oneapi::dpl::experimentalare enforced to fail with device execution policies.
New Features
- Added experimental
inclusive_scankernel template algorithm residing in theoneapi::dpl::experimental::kt::gpunamespace. radix_sortandradix_sort_by_keykernel templates are extended with overloads for out-of-place sorting.
These overloads preserve the input sequence and sort data into the user provided output sequence.- Improved performance of the
reduce,min_element,max_element,minmax_element,is_partitioned,
lexicographical_compare,binary_search,lower_bound, andupper_boundalgorithms with device policies. sort,stable_sort,sort_by_keyalgorithms now use Radix sort for sortingsycl::halfelements compared withstd::lessorstd::greater.
Fixed Issues
- Fixed compilation errors when using
reduce,min_element,max_element,minmax_element,is_partitioned, andlexicographical_comparewith Intel oneAPI DPC++/C++ compiler 2023.0 and earlier. - Fixed possible data races in the following algorithms used with device execution policies:
remove_if,unique,inplace_merge,stable_partition,partial_sort_copy,rotate. - Fixed excessive copying of data in
std::vectorallocated with a USM allocator for standard library implementations which have allocator information in thestd::vector::iteratortype. - Fixed an issue where checking
std::is_default_constructiblefortransform_iteratorwith a functor that is not default-constructible could cause a build error or an incorrect result. - Fixed handling of
sycl device copyable_ for internal and public oneDPL types. - Fixed handling of
std::reverse_iteratoras input to oneDPL algorithms using a device policy. - Fixed
set_intersectionto always copy from the first input sequence to the output, where previously some calls would copy from the second input sequence. - Fixed compilation errors when using
oneapi::dpl::zip_iteratorwith the oneTBB backend and C++20.
New Known Issues and Limitations
histogramalgorithm requires the output value type to be an integral type no larger than 4 bytes when used with an FPGA policy.
oneDPL 2022.5.0 release
New Features
- Added new
histogramalgorithms for generating a histogram from an input sequence into an output sequence representing either equally spaced or user-defined bins. These algorithms are currently only available for device execution policies. - Supported zip_iterator for
transformalgorithm.
Fixed Issues
- Fixed handling of
permutation_iteratoras input to oneDPL algorithms for a variety of source iterator and permutation types which caused issues. - Fixed
zip_iteratorto besycl device copyablefor trivially copyable source iterator types. - Added a workaround for reduction algorithm failures with 64-bit data types. Define the
ONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTIONmacro to1before including oneDPL header files.
New Known Issues and Limitations
- Crashes or incorrect results may occur when using
oneapi::dpl::reverse_iteratororstd::reverse_iteratoras input to oneDPL algorithms with device execution policies.
oneDPL 2022.4.0 release
New Features
- Added experimental
radix_sortandradix_sort_by_keyalgorithms residing in
theoneapi::dpl::experimental::kt::esimdnamespace. These algorithms are first
in the family of kernel templates that allow configuring a variety of parameters
including the number of elements to process by a work item, and the size of a workgroup.
The algorithms only work with Intel® Data Center GPU Max Series. - Added new
transform_ifalgorithm for applying a transform function conditionally
based on a predicate, with overloads provided for one and two input sequences
that use correspondingly unary and binary operations and predicates. - Optimizations used with Intel® oneAPI DPC++/C++ Compiler are expanded to the open source oneAPI DPC++ compiler.
New Known Issues and Limitations
esimd::radix_sortandesimd::radix_sort_by_keykernel templates fail to compile when a program
is built with -g, -O0, -O1 compiler options.esimd::radix_sort_by_keykernel template produces wrong results with the following combinations
ofkernel_paramand types of keys and values:sizeof(key_type) + sizeof(val_type) == 12,kernel_param::workgroup_size == 64, andkernel_param::data_per_workitem == 96sizeof(key_type) + sizeof(val_type) == 16,kernel_param::workgroup_size == 64, andkernel_param::data_per_workitem == 64
oneDPL 2022.3.0 release
New Features
- Added an experimental feature to dynamically select an execution context, e.g., a SYCL queue.
The feature provides selection functions such asselect,submitandsubmit_and_wait,
and several selection policies:fixed_resource_policy,round_robin_policy,
dynamic_load_policy, andauto_tune_policy. unseqandpar_unseqpolicies now enable vectorization also for Intel® oneAPI DPC++/C++ Compiler.- Added support for passing zip iterators as segment value data in
reduce_by_segment,
exclusive_scan_by_segment, andinclusive_scan_by_segment. - Improved performance of the
merge,sort,stable_sort,sort_by_key,
reduce,min_element,max_element,minmax_element,is_partitioned, and
lexicographical_comparealgorithms with DPC++ execution policies.
Fixed Issues
- Fixed the
reduce_asyncfunction to not ignore the provided binary operation.
New Known Issues and Limitations
- When compiled with
-fsycl-pstl-offloadoption of Intel® oneAPI DPC++/C++ compiler and with
libstdc++version 8 orlibc++,oneapi::dpl::execution::par_unseqoffloads
standard parallel algorithms to the SYCL device similarly tostd::execution::par_unseq
in accordance with the-fsycl-pstl-offloadoption value. - When using the dpl modulefile to initialize the user's environment and compiling with
-fsycl-pstl-offload
option of Intel® oneAPI DPC++/C++ compiler, a linking issue or program crash may be encountered due to the directory
containing libpstloffload.so not being included in the search path. Use the env/vars.sh to configure the working
environment to avoid the issue. - Compilation issues may be encountered when passing zip iterators to
exclusive_scan_by_segmenton Windows. - Incorrect results may be produced by
set_intersectionwith a DPC++ execution policy,
where elements are copied from the second input range rather than the first input range. - For
transform_exclusive_scanandexclusive_scanto run in-place (that is, with the same data
used for both input and destination) and with an execution policy ofunseqorpar_unseq,
it is required that the provided input and destination iterators are equality comparable.
Furthermore, the equality comparison of the input and destination iterator must evaluate to true.
If these conditions are not met, the result of these algorithm calls is undefined. sort,stable_sort,sort_by_key,partial_sort_copyalgorithms may work incorrectly or cause
a segmentation fault when used a DPC++ execution policy for CPU device, and built
on Linux with Intel® oneAPI DPC++/C++ Compiler and -O0 -g compiler options.
To avoid the issue, pass-fsycl-device-code-split=per_kerneloption to the compiler.- Incorrect results may be produced by
exclusive_scan,inclusive_scan,transform_exclusive_scan,
transform_inclusive_scan,exclusive_scan_by_segment,inclusive_scan_by_segment,reduce_by_segment
withunseqorpar_unseqpolicy when compiled by Intel® oneAPI DPC++/C++ Compiler
with-fiopenmp,-fiopenmp-simd,-qopenmp,-qopenmp-simdoptions on Linux.
To avoid the issue, pass-fopenmpor-fopenmp-simdoption instead. - Incorrect results may be produced by
reduceandtransform_reducewith 64-bit types andstd::multiplies,
sycl::multipliesoperations when compiled by Intel® C++ Compiler 2021.3 and newer and executed on GPU devices.
oneDPL 2022.2.0 release
New Features
- Added
sort_by_keyalgorithm for key-value sorting. - Improved performance of the
reduce,min_element,max_element,minmax_element,
is_partitioned, andlexicographical_comparealgorithms with DPC++ execution policies. - Improved performance of the
reduce_by_segment,inclusive_scan_by_segment, and
exclusive_scan_by_segmentalgorithms for binary operators with known identities
when using DPC++ execution policies. - Added
value_typeto all views inoneapi::dpl::experimental::ranges. - Extended
oneapi::dpl::experimental::ranges::sortto support projections applied to the range elements prior to comparison.
Fixed Issues
- The minimally required CMake version is raised to 3.11 on Linux and 3.20 on Windows.
- Added new CMake package
oneDPLIntelLLVMConfig.cmaketo resolve issues using CMake 3.20+ on Windows for icx and icx-cl. - Fixed an error in the
sortandstable_sortalgorithms when performing a descending sort
on signed numeric types with negative values. - Fixed an error in
reduce_by_segmentalgorithm when a non-commutative predicate is used. - Fixed an error in
sortandstable_sortalgorithms for integral types wider than 4 bytes. - Fixed an error for some compilers where OpenMP or SYCL backend was selected by CMake scripts without full compiler support.
New Known Issues and Limitations
- Incorrect results may be produced with in-place scans using
unseqandpar_unseqpolicies on
CPUs with the Intel® C++ Compiler 2021.8.
This release also includes the following changes from oneDPL 2022.1.1
New Features
- Improved
sortalgorithm performance for the arithmetic data types withstd::lessorstd::greatercomparison operator and DPC++ policy.
Fixes Issues
- Fixed an error that caused segmentation faults in
transform_reduce,minmax_element, and related algorithms when ran on CPU devices. - Fixed a compilation error in
transform_reduce,minmax_element, and related algorithms on FPGAs. - Fixed
permutation_iteratorto support C-style array as a permutation map. - Fixed a radix-sort issue with 64-bit signed integer types.
oneDPL 2022.1.0 release
New Features
- Added
generate,generate_n,transformalgorithms to Tested Standard C++ API. - Improved performance of
inclusive_scan,exclusive_scan,reduceand
max_elementalgorithms with DPC++ execution policies.
Fixed Issues
- Added a workaround for the
TBB headers not foundissue occurring with libstdc++ version 9 when
oneTBB headers are not present in the environment. The workaround requires inclusion of the oneDPL headers before the libstdc++ headers. - When possible, oneDPL CMake scripts now enforce C++17 as the minimally required language version. Inspired by Daniel Simon (#739).
- Fixed an error in the
exclusive_scanalgorithm when the output iterator is equal to the
input iterator (in-place scan).