You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Added get_sreg_lanemask_lt, get_sreg_lanemask_le, get_sreg_lanemask_gt and get_sreg_lanemask_ge.
Added rocprim::transform_output_iterator and rocprim::make_transform_output_iterator.
Added experimental support for SPIR-V, to use the correct tuned config for part of the appliable algorithms.
Added a new cmake option, BUILD_OFFLOAD_COMPRESS. When rocPRIM is build with this option enabled, the --offload-compress switch is passed to the compiler. This causes the compiler to compress the binary that it generates. Compression can be useful in cases where you are compiling for a large number of targets, since this often results in a large binary. Without compression, in some cases, the generated binary may become so large symbols are placed out of range, resulting in linking errors. The new BUILD_OFFLOAD_COMPRESS option is set to ON by default.
Added a new CMake option -DUSE_SYSTEM_LIB to allow tests to be built from ROCm libraries provided by the system.
Added rocprim::apply which applies a function to a rocprim::tuple.
Changed
Changed tests to support ptr-to-const output in /test/rocprim/test_device_batch_memcpy.cpp.
Optimizations
Improved performance of many algorithms, by updating their tuned configs.
891 specializations have been improved.
399 specializations have been added.
Upcoming changes
Deprecated the -> operator for the zip_iterator.
Resolved issues
Fixed device_select, device_merge, and device_merge_sort not allocating the correct amount of virtual shared memory on the host.
Fixed the -> operator for the transform_iterator, the texture_cache_iterator and the arg_index_iterator, by now returning a proxy pointer.
The arg_index_iterator also now only returns the internal iterator for the ->.