Skip to content

Releases: CHIP-SPV/chipStar

v1.3.0

20 May 17:07

Choose a tag to compare

chipStar v1.3 Release Notes

Overview

v1.3 is a major release with approximately 700 commits since v1.2.1. The main user-visible changes are official CMake integration, macOS support, broader LLVM and HIP compatibility, expanded HIP runtime API coverage, HIPRTC improvements, and many correctness and performance fixes.


New Platform Support

MacOS (ARM64 + x86)

  • macOS is now a supported chipStar platform, including Apple CPU Silicon via PoCL
  • LLVM 21/22-based macOS builds are supported
  • Runtime fixes make queue handling and unified-memory behavior work correctly on macOS

Intel Arc B570 / GPU Kernel 6.8+

  • Intel Arc B570 systems are supported with newer Linux kernels
  • Fixed timestamp handling on devices with 64-bit timestamps

ARM Mali GPU

  • Device-side printf is now supported on Mali GPUs with the cl_arm_printf extension
  • Workarounds for Mali driver deadlock (flush queues before finish, clFlush for marker/callback events)

Supported Libraries

install_chipstar.py now installs chipStar together with supported CHIP-SPV library ports.

Platform-Independent Libraries

  • rocPRIM - Parallel primitives
  • hipCUB - CUB-like primitives for HIP
  • rocThrust - Thrust parallel algorithms
  • rocRAND - Random number generation
  • hipRAND - HIP random number interface
  • rocSPARSE (new) - Sparse matrix operations
  • hipSPARSE (new) - HIP sparse matrix interface
  • hipMM (new) - HIP memory manager (RMM port)

Intel MKL-Based Libraries

Application Support


LLVM and HIP Compatibility

  • Added support for LLVM 20, LLVM 21, and LLVM 22
  • LLVM 22's integrated SPIR-V backend is detected automatically and is the default/preferred option
  • Added compatibility work for upcoming LLVM 23 / Clang offload driver changes
  • Updated the bundled HIP stack to HIP 7.2.0 (chipStar-hip-7)
  • Fixed HIP 7 API signature changes, including hipMemcpyHtoD/HtoDAsync
  • Fixed HIPRTC API parameter types for compatibility with newer HIP headers

Build and Project Integration

  • chipStar is officially supported by CMake starting with CMake 4.3.0, making it easier for projects to use chipStar through CMake's HIP language support
  • New install_chipstar.py with unified install dir, --with-tests flag, rocPRIM prefix support
  • install_chipstar.py now builds from cwd when run inside the repo
  • Fixed make install failure on macOS (hardcoded .so extension)
  • Excluded spdlog from install to avoid downstream symbol conflicts; renamed namespace to chipStar_spdlog
  • Installed hip-lang-config.cmake for CMake HIP language support
  • Installed FindHIP.cmake

New Features

Async Memory Allocation

  • Implemented hipMallocAsync, hipFreeAsync, hipMallocFromPoolAsync

Managed Memory

  • Implemented hipMemAdvise and hipMemRangeGetAttribute(s)
  • Implemented hipMemPrefetchAsync
  • OpenCL: use clHostMemAllocINTEL for managed memory; added clEnqueueMigrateMemINTEL support
  • Level0/OpenCL: managed memory support reporting

Device-Side malloc/free

  • Implemented device-side __chip_malloc / __chip_free with C++ new/delete wrappers

HIP Graphs

  • Fixed hipGraphAddDependencies/RemoveDependencies to iterate from[i]→to[i] pairs correctly
  • Added null-pointer parameter validation across hipGraph* API functions
  • Fixed getCaptureStatus() (was hardcoded to return None)
  • Fixed hipStreamWaitEvent incorrectly flipping stream capture status
  • Fixed hipStreamEndCapture returning IllegalState (401)

Separate Compilation

  • Handle -dc flag in hipcc for separate compilation workflows (#893)
  • Support unbundling of static device libraries in HIPSPV toolchain

HIPRTC Improvements

  • Runtime compilation handles more SPIR-V constructs correctly
  • HIPRTC compilation-output caching support
  • Auto-include fp16 headers in HIPRTC
  • Device variable registration for compiled modules (constant memory in HIPRTC)
  • Shell metacharacter escaping in hiprtcCompileProgram
  • Fixed hipRTC compile error with Clang-22

fp16 / Device Library

  • Added float→half conversion functions with rounding modes
  • Added __device__/__host__ decorators to fp16 header
  • Fixed raw bit extraction and error messages in fp16_conversion.hpp
  • Fixed __ocml_cvtrtn_f16_f32 and other missing conversion functions
  • Added float/double atomicMin/atomicMax devicelib implementations

SPIR-V Compatibility

  • Promote narrow integer kernel args to i32 for SPIR-V conformance (#849)
  • Inline kernel arg promotion to avoid wrapper function pattern (#849)
  • Improved linking behavior for programs that use atomics and ballot operations

SYCL Interop

  • Fixed hip_sycl_interop and hip_sycl_interop_no_buffers for Level Zero backend with MKL 2025 UR API

Notable Bug Fixes

  • Memory-copy validation is stricter and more compatible with HIP behavior, including invalid pointer handling and hipMemcpyDefault direction inference.
  • Atomics and ballot intrinsics received several correctness fixes, including atomicMin/atomicMax on floating-point types, __chip_all(), and __byte_perm.
  • Stream, event, and queue synchronization fixes address hangs and races seen with OpenCL, Level Zero, ARM Mali, and default-stream behavior.
  • Module loading/unloading and backend selection are more robust, including fixes for default backend selection and hipModuleUnload.
  • C and CMake integration fixes improve mixed C/HIP projects, CMake HIP language support, and downstream package integration.

Submodule Updates

  • HIP → 7.2.0 (chipStar-hip-7)
  • ROCm-Device-Libs: dynamic datalayout from clang; irif.h type-punning fix
  • HIPCC: various fixes including -no-hip-rt, LLVM 21, shell metacharacter escaping
  • PoCL support updated to version 7

Full Changelog

For the complete list of changes, see:

git log v1.2.1..v1.3

v1.3-RC2

29 Apr 16:49

Choose a tag to compare

chipStar v1.3 Release Notes

Overview

v1.3 is a major release with approximately 700 commits since v1.2.1. The main user-visible changes are official CMake integration, macOS support, broader LLVM and HIP compatibility, expanded HIP runtime API coverage, HIPRTC improvements, and many correctness and performance fixes.


New Platform Support

MacOS (ARM64 + x86)

  • macOS is now a supported chipStar platform, including Apple CPU Silicon via PoCL
  • LLVM 21/22-based macOS builds are supported
  • Runtime fixes make queue handling and unified-memory behavior work correctly on macOS

Intel Arc B570 / GPU Kernel 6.8+

  • Intel Arc B570 systems are supported with newer Linux kernels
  • Fixed timestamp handling on devices with 64-bit timestamps

ARM Mali GPU

  • Device-side printf is now supported on Mali GPUs with the cl_arm_printf extension
  • Workarounds for Mali driver deadlock (flush queues before finish, clFlush for marker/callback events)

Supported Libraries

install_chipstar.py now installs chipStar together with supported CHIP-SPV library ports.

Platform-Independent Libraries

  • rocPRIM - Parallel primitives
  • hipCUB - CUB-like primitives for HIP
  • rocThrust - Thrust parallel algorithms
  • rocRAND - Random number generation
  • hipRAND - HIP random number interface
  • rocSPARSE (new) - Sparse matrix operations
  • hipSPARSE (new) - HIP sparse matrix interface
  • hipMM (new) - HIP memory manager (RMM port)

Intel MKL-Based Libraries

Application Support


LLVM and HIP Compatibility

  • Added support for LLVM 20, LLVM 21, and LLVM 22
  • LLVM 22's integrated SPIR-V backend is detected automatically and is the default/preferred option
  • Added compatibility work for upcoming LLVM 23 / Clang offload driver changes
  • Updated the bundled HIP stack to HIP 7.2.0 (chipStar-hip-7)
  • Fixed HIP 7 API signature changes, including hipMemcpyHtoD/HtoDAsync
  • Fixed HIPRTC API parameter types for compatibility with newer HIP headers

Build and Project Integration

  • chipStar is officially supported by CMake starting with CMake 4.3.0, making it easier for projects to use chipStar through CMake's HIP language support
  • New install_chipstar.py with unified install dir, --with-tests flag, rocPRIM prefix support
  • install_chipstar.py now builds from cwd when run inside the repo
  • Fixed make install failure on macOS (hardcoded .so extension)
  • Excluded spdlog from install to avoid downstream symbol conflicts; renamed namespace to chipStar_spdlog
  • Installed hip-lang-config.cmake for CMake HIP language support
  • Installed FindHIP.cmake

New Features

Async Memory Allocation

  • Implemented hipMallocAsync, hipFreeAsync, hipMallocFromPoolAsync

Managed Memory

  • Implemented hipMemAdvise and hipMemRangeGetAttribute(s)
  • Implemented hipMemPrefetchAsync
  • OpenCL: use clHostMemAllocINTEL for managed memory; added clEnqueueMigrateMemINTEL support
  • Level0/OpenCL: managed memory support reporting

Device-Side malloc/free

  • Implemented device-side __chip_malloc / __chip_free with C++ new/delete wrappers

HIP Graphs

  • Fixed hipGraphAddDependencies/RemoveDependencies to iterate from[i]→to[i] pairs correctly
  • Added null-pointer parameter validation across hipGraph* API functions
  • Fixed getCaptureStatus() (was hardcoded to return None)
  • Fixed hipStreamWaitEvent incorrectly flipping stream capture status
  • Fixed hipStreamEndCapture returning IllegalState (401)

Separate Compilation

  • Handle -dc flag in hipcc for separate compilation workflows (#893)
  • Support unbundling of static device libraries in HIPSPV toolchain

HIPRTC Improvements

  • Runtime compilation handles more SPIR-V constructs correctly
  • HIPRTC compilation-output caching support
  • Auto-include fp16 headers in HIPRTC
  • Device variable registration for compiled modules (constant memory in HIPRTC)
  • Shell metacharacter escaping in hiprtcCompileProgram
  • Fixed hipRTC compile error with Clang-22

fp16 / Device Library

  • Added float→half conversion functions with rounding modes
  • Added __device__/__host__ decorators to fp16 header
  • Fixed raw bit extraction and error messages in fp16_conversion.hpp
  • Fixed __ocml_cvtrtn_f16_f32 and other missing conversion functions
  • Added float/double atomicMin/atomicMax devicelib implementations

SPIR-V Compatibility

  • Promote narrow integer kernel args to i32 for SPIR-V conformance (#849)
  • Inline kernel arg promotion to avoid wrapper function pattern (#849)
  • Improved linking behavior for programs that use atomics and ballot operations

SYCL Interop

  • Fixed hip_sycl_interop and hip_sycl_interop_no_buffers for Level Zero backend with MKL 2025 UR API

Notable Bug Fixes

  • Memory-copy validation is stricter and more compatible with HIP behavior, including invalid pointer handling and hipMemcpyDefault direction inference.
  • Atomics and ballot intrinsics received several correctness fixes, including atomicMin/atomicMax on floating-point types, __chip_all(), and __byte_perm.
  • Stream, event, and queue synchronization fixes address hangs and races seen with OpenCL, Level Zero, ARM Mali, and default-stream behavior.
  • Module loading/unloading and backend selection are more robust, including fixes for default backend selection and hipModuleUnload.
  • C and CMake integration fixes improve mixed C/HIP projects, CMake HIP language support, and downstream package integration.

Submodule Updates

  • HIP → 7.2.0 (chipStar-hip-7)
  • ROCm-Device-Libs: dynamic datalayout from clang; irif.h type-punning fix
  • HIPCC: various fixes including -no-hip-rt, LLVM 21, shell metacharacter escaping
  • PoCL support updated to version 7

Full Changelog

For the complete list of changes, see:

git log v1.2.1..v1.3

v1.3-RC1

28 Apr 21:59

Choose a tag to compare

v1.3-RC1 Pre-release
Pre-release

chipStar v1.3-RC1 Release Notes

Changes since v1.2.1.

Currently Tested Hardware

  • Apple CPU - M4 via PoCL.
  • ARM GPU - Mali G-52 via Arm OpenCL
  • Intel GPUs via Intel OpenCL or Level Zero provided by intel-compute-runtime
  • RISC-V CPU SiFive via PoCL OpenCL
  • AMD GPUs via rusticl OpenCL

New toolchain support

  • LLVM 22, including the integrated SPIR-V backend.
  • LLVM 21.
  • PoCL 7.
  • LLVM 18 / PoCL 6.0 retired.

New HIP APIs

  • Stream-ordered allocator: hipMallocAsync, hipFreeAsync, hipMallocFromPoolAsync.
  • hipLaunchHostFunc.
  • hipMemAdvise, hipMemRangeGetAttribute, hipMemRangeGetAttributes.
  • Memory prefetch.
  • Managed-memory hipMemset and managed-memory reporting in both backends.
  • hipMemcpyDefault direction inference.

Ecosystem libraries

install_chipstar.py now installs chipStar alongside the following libraries:

Library Description
rocPRIM Parallel primitives
hipCUB CUB-like primitives for HIP
rocThrust Thrust parallel algorithms
rocRAND Random number generation
hipRAND HIP random number interface
rocSPARSE Sparse matrix operations
hipSPARSE HIP sparse matrix interface
HipBLAS HIP BLAS via Intel MKL
HipSOLVER HIP linear solver via Intel MKL
HipFFT HIP FFT via Intel MKL
MKLShim Intel MKL shim layer
hipMM HIP memory manager (RMM port)

v1.2.1

11 Nov 13:15
3bd6515

Choose a tag to compare

What's Changed

This minor releases adds some fixes and performance improvements, most notably module caching.

New Contributors

Full Changelog: v1.2...v1.2.1

chipStar release 1.2

25 Sep 16:43
814e8f3

Choose a tag to compare

Release Notes

This release brings significant stability and performance improvements, enhanced support for CUDA, new HIP/ROCm library ports and integrations for HipBLAS, HipFFT, HipRAND/RocRAND. Initial testing of running HIP/CUDA applications on RISC-V.

Tested Platforms

  • Intel, AMD CPUs via Intel Compute Runtime
  • Intel GPUs via Neo i915 driver
  • ARM Mali GPUs (Quartz64 SBC)
  • RISC-V (Starfive Visionfive 2 SBC Debian, experimental)
  • AMD GPUs via rusticl(exploratory work)

Notable Changes

  • Introduced cucc, a drop-in replacement for nvcc:

    • Added cucc, enabling direct compilation of CUDA sources.
    • Added nvcc softlink, allowing you to compile CUDA sources without making any changes.
    • Adjusted CUDA headers to improve compatibility with CUDA sources, including a dummy cublas_v2.h header to prevent conflicts with system headers.
  • Enhanced OpenCL backend:

    • Support for cl_ext_buffer_device_address extension:
      • Added support for devices featuring the cl_ext_buffer_device_address extension, improving memory management capabilities.
    • Optimized queue profiling:
      • The OpenCL backend now uses non-profiling queues by default and switches to profiling queues only when needed, resulting in performance improvements.
    • Various other performance optimizations
  • Fixed Level Zero backend issues:

    • Addressed out-of-memory (OOM) errors:
      • Fixed memory leaks and improved resource management to prevent OOM errors during heavy workloads.
    • Improved thread safety:
      • Implemented mutexes and synchronization mechanisms to enhance thread safety within the Level Zero backend.
  • Rebased to HIP 6.x and updated hip-tests:

    • Updated the codebase to be compatible with HIP 6.x.

Library Support Changes

  • Expanded HIP library support:
    • HipBLAS integration:
      • Introduced the CHIP_BUILD_HIPBLAS option to enable building HipBLAS.
    • HipFFT integration:
      • Introduced the CHIP_BUILD_HIPFFT option to enable building HipFFT.
    • RocRAND port:

v1.2-RC1

04 Sep 07:41
95acbd1

Choose a tag to compare

v1.2-RC1 Pre-release
Pre-release

What's Changed

New Contributors

Full Changelog: v1.1...v1.2-RC1

chipStar release 1.1

22 Jan 13:08
ffebe4d

Choose a tag to compare

chipStar release 1.1

This release cycle focused on stabilization and performance improvements
over the 1.0 release. The release was measured to run some benchmarks up
to twice as fast as 1.0, with an average improvement of 30% measured on HeCBench.

Further highlights are described in the following.

Release Highlights

  • Added support for Clang/LLVM 17. LLVM 15 and 16 are still supported.
  • Ability to Use the Intel Unified Shared Memory Extension, with OpenCL backend
  • Optimized Atomic Operations
  • Use of Immediate Command Lists for Low Latency Dispatch, with Level Zero backend
  • Improved portability to other platforms & devices
  • Improved Asynchronous Execution

The full release notes are available in docs/release_notes/chipStar_1.1.rst

The full sources of the release (including git submodules) are available packaged in the attached file chipStar-1.1.tar.gz
(SHA256: 9258a313c503073a082ca310cebf048d84c4ab698facfc8d1d9ce1381ffb9fc5).

v1.1-RC4

16 Jan 09:55
ffebe4d

Choose a tag to compare

v1.1-RC4 Pre-release
Pre-release

The 4th release candidate for v1.1. Please test and add your results to the test log and any major problems or regressions as issues to the 1.1 milestone.

1.1 release notes.

v1.1-RC3

19 Dec 21:48
c2f604a

Choose a tag to compare

v1.1-RC3 Pre-release
Pre-release

The third release candidate for v1.1. Please test and add your results to the test log and any major problems or regressions as issues to the 1.1 milestone.

1.1 release notes.

v1.1-RC2

14 Dec 20:45

Choose a tag to compare

v1.1-RC2 Pre-release
Pre-release

The second release candidate for v1.1. Please test and add your results to the test log and any major problems or regressions as issues to the 1.1 milestone.

1.1 release notes.