Skip to content

rocPRIM 4.1.0 for ROCm 7.1.0

Choose a tag to compare

@rocm-cirocm-ci released this 30 Oct 05:52

Added

  • Added get_sreg_lanemask_lt, get_sreg_lanemask_le, get_sreg_lanemask_gt and get_sreg_lanemask_ge.
  • Added rocprim::transform_output_iterator and rocprim::make_transform_output_iterator.
  • Added experimental support for SPIR-V, to use the correct tuned config for part of the appliable algorithms.
  • Added a new cmake option, BUILD_OFFLOAD_COMPRESS. When rocPRIM is build with this option enabled, the --offload-compress switch is passed to the compiler. This causes the compiler to compress the binary that it generates. Compression can be useful in cases where you are compiling for a large number of targets, since this often results in a large binary. Without compression, in some cases, the generated binary may become so large symbols are placed out of range, resulting in linking errors. The new BUILD_OFFLOAD_COMPRESS option is set to ON by default.
  • Added a new CMake option -DUSE_SYSTEM_LIB to allow tests to be built from ROCm libraries provided by the system.
  • Added rocprim::apply which applies a function to a rocprim::tuple.

Changed

  • Changed tests to support ptr-to-const output in /test/rocprim/test_device_batch_memcpy.cpp.

Optimizations

  • Improved performance of many algorithms, by updating their tuned configs.
    • 891 specializations have been improved.
    • 399 specializations have been added.

Upcoming changes

  • Deprecated the -> operator for the zip_iterator.

Resolved issues

  • Fixed device_select, device_merge, and device_merge_sort not allocating the correct amount of virtual shared memory on the host.
  • Fixed the -> operator for the transform_iterator, the texture_cache_iterator and the arg_index_iterator, by now returning a proxy pointer.
    • The arg_index_iterator also now only returns the internal iterator for the ->.