Releases: CHIP-SPV/chipStar
v1.3.0
chipStar v1.3 Release Notes
Overview
v1.3 is a major release with approximately 700 commits since v1.2.1. The main user-visible changes are official CMake integration, macOS support, broader LLVM and HIP compatibility, expanded HIP runtime API coverage, HIPRTC improvements, and many correctness and performance fixes.
New Platform Support
MacOS (ARM64 + x86)
- macOS is now a supported chipStar platform, including Apple CPU Silicon via PoCL
- LLVM 21/22-based macOS builds are supported
- Runtime fixes make queue handling and unified-memory behavior work correctly on macOS
Intel Arc B570 / GPU Kernel 6.8+
- Intel Arc B570 systems are supported with newer Linux kernels
- Fixed timestamp handling on devices with 64-bit timestamps
ARM Mali GPU
- Device-side
printfis now supported on Mali GPUs with thecl_arm_printfextension - Workarounds for Mali driver deadlock (flush queues before finish,
clFlushfor marker/callback events)
Supported Libraries
install_chipstar.py now installs chipStar together with supported CHIP-SPV library ports.
Platform-Independent Libraries
- rocPRIM - Parallel primitives
- hipCUB - CUB-like primitives for HIP
- rocThrust - Thrust parallel algorithms
- rocRAND - Random number generation
- hipRAND - HIP random number interface
- rocSPARSE (new) - Sparse matrix operations
- hipSPARSE (new) - HIP sparse matrix interface
- hipMM (new) - HIP memory manager (RMM port)
Intel MKL-Based Libraries
- H4I-MKLShim - Intel MKL shim layer
- H4I-HipBLAS - hipBLAS via Intel MKL
- H4I-HipFFT - hipFFT via Intel MKL
- H4I-HipSOLVER - hipSOLVER via Intel MKL
Application Support
LLVM and HIP Compatibility
- Added support for LLVM 20, LLVM 21, and LLVM 22
- LLVM 22's integrated SPIR-V backend is detected automatically and is the default/preferred option
- Added compatibility work for upcoming LLVM 23 / Clang offload driver changes
- Updated the bundled HIP stack to HIP 7.2.0 (
chipStar-hip-7) - Fixed HIP 7 API signature changes, including
hipMemcpyHtoD/HtoDAsync - Fixed HIPRTC API parameter types for compatibility with newer HIP headers
Build and Project Integration
- chipStar is officially supported by CMake starting with CMake 4.3.0, making it easier for projects to use chipStar through CMake's HIP language support
- New
install_chipstar.pywith unified install dir,--with-testsflag,rocPRIMprefix support install_chipstar.pynow builds fromcwdwhen run inside the repo- Fixed
make installfailure on macOS (hardcoded.soextension) - Excluded
spdlogfrom install to avoid downstream symbol conflicts; renamed namespace tochipStar_spdlog - Installed
hip-lang-config.cmakefor CMake HIP language support - Installed
FindHIP.cmake
New Features
Async Memory Allocation
- Implemented
hipMallocAsync,hipFreeAsync,hipMallocFromPoolAsync
Managed Memory
- Implemented
hipMemAdviseandhipMemRangeGetAttribute(s) - Implemented
hipMemPrefetchAsync - OpenCL: use
clHostMemAllocINTELfor managed memory; addedclEnqueueMigrateMemINTELsupport - Level0/OpenCL: managed memory support reporting
Device-Side malloc/free
- Implemented device-side
__chip_malloc/__chip_freewith C++new/deletewrappers
HIP Graphs
- Fixed
hipGraphAddDependencies/RemoveDependenciesto iteratefrom[i]→to[i]pairs correctly - Added null-pointer parameter validation across
hipGraph* API functions - Fixed
getCaptureStatus()(was hardcoded to returnNone) - Fixed
hipStreamWaitEventincorrectly flipping stream capture status - Fixed
hipStreamEndCapturereturningIllegalState(401)
Separate Compilation
- Handle
-dcflag inhipccfor separate compilation workflows (#893) - Support unbundling of static device libraries in HIPSPV toolchain
HIPRTC Improvements
- Runtime compilation handles more SPIR-V constructs correctly
- HIPRTC compilation-output caching support
- Auto-include fp16 headers in HIPRTC
- Device variable registration for compiled modules (constant memory in HIPRTC)
- Shell metacharacter escaping in
hiprtcCompileProgram - Fixed
hipRTCcompile error with Clang-22
fp16 / Device Library
- Added float→half conversion functions with rounding modes
- Added
__device__/__host__decorators to fp16 header - Fixed raw bit extraction and error messages in
fp16_conversion.hpp - Fixed
__ocml_cvtrtn_f16_f32and other missing conversion functions - Added float/double
atomicMin/atomicMaxdevicelib implementations
SPIR-V Compatibility
- Promote narrow integer kernel args to
i32for SPIR-V conformance (#849) - Inline kernel arg promotion to avoid wrapper function pattern (#849)
- Improved linking behavior for programs that use atomics and ballot operations
SYCL Interop
- Fixed
hip_sycl_interopandhip_sycl_interop_no_buffersfor Level Zero backend with MKL 2025 UR API
Notable Bug Fixes
- Memory-copy validation is stricter and more compatible with HIP behavior, including invalid pointer handling and
hipMemcpyDefaultdirection inference. - Atomics and ballot intrinsics received several correctness fixes, including
atomicMin/atomicMaxon floating-point types,__chip_all(), and__byte_perm. - Stream, event, and queue synchronization fixes address hangs and races seen with OpenCL, Level Zero, ARM Mali, and default-stream behavior.
- Module loading/unloading and backend selection are more robust, including fixes for default backend selection and
hipModuleUnload. - C and CMake integration fixes improve mixed C/HIP projects, CMake HIP language support, and downstream package integration.
Submodule Updates
- HIP → 7.2.0 (
chipStar-hip-7) - ROCm-Device-Libs: dynamic datalayout from clang;
irif.htype-punning fix - HIPCC: various fixes including
-no-hip-rt, LLVM 21, shell metacharacter escaping - PoCL support updated to version 7
Full Changelog
For the complete list of changes, see:
git log v1.2.1..v1.3
v1.3-RC2
chipStar v1.3 Release Notes
Overview
v1.3 is a major release with approximately 700 commits since v1.2.1. The main user-visible changes are official CMake integration, macOS support, broader LLVM and HIP compatibility, expanded HIP runtime API coverage, HIPRTC improvements, and many correctness and performance fixes.
New Platform Support
MacOS (ARM64 + x86)
- macOS is now a supported chipStar platform, including Apple CPU Silicon via PoCL
- LLVM 21/22-based macOS builds are supported
- Runtime fixes make queue handling and unified-memory behavior work correctly on macOS
Intel Arc B570 / GPU Kernel 6.8+
- Intel Arc B570 systems are supported with newer Linux kernels
- Fixed timestamp handling on devices with 64-bit timestamps
ARM Mali GPU
- Device-side
printfis now supported on Mali GPUs with thecl_arm_printfextension - Workarounds for Mali driver deadlock (flush queues before finish,
clFlushfor marker/callback events)
Supported Libraries
install_chipstar.py now installs chipStar together with supported CHIP-SPV library ports.
Platform-Independent Libraries
- rocPRIM - Parallel primitives
- hipCUB - CUB-like primitives for HIP
- rocThrust - Thrust parallel algorithms
- rocRAND - Random number generation
- hipRAND - HIP random number interface
- rocSPARSE (new) - Sparse matrix operations
- hipSPARSE (new) - HIP sparse matrix interface
- hipMM (new) - HIP memory manager (RMM port)
Intel MKL-Based Libraries
- H4I-MKLShim - Intel MKL shim layer
- H4I-HipBLAS - hipBLAS via Intel MKL
- H4I-HipFFT - hipFFT via Intel MKL
- H4I-HipSOLVER - hipSOLVER via Intel MKL
Application Support
LLVM and HIP Compatibility
- Added support for LLVM 20, LLVM 21, and LLVM 22
- LLVM 22's integrated SPIR-V backend is detected automatically and is the default/preferred option
- Added compatibility work for upcoming LLVM 23 / Clang offload driver changes
- Updated the bundled HIP stack to HIP 7.2.0 (
chipStar-hip-7) - Fixed HIP 7 API signature changes, including
hipMemcpyHtoD/HtoDAsync - Fixed HIPRTC API parameter types for compatibility with newer HIP headers
Build and Project Integration
- chipStar is officially supported by CMake starting with CMake 4.3.0, making it easier for projects to use chipStar through CMake's HIP language support
- New
install_chipstar.pywith unified install dir,--with-testsflag,rocPRIMprefix support install_chipstar.pynow builds fromcwdwhen run inside the repo- Fixed
make installfailure on macOS (hardcoded.soextension) - Excluded
spdlogfrom install to avoid downstream symbol conflicts; renamed namespace tochipStar_spdlog - Installed
hip-lang-config.cmakefor CMake HIP language support - Installed
FindHIP.cmake
New Features
Async Memory Allocation
- Implemented
hipMallocAsync,hipFreeAsync,hipMallocFromPoolAsync
Managed Memory
- Implemented
hipMemAdviseandhipMemRangeGetAttribute(s) - Implemented
hipMemPrefetchAsync - OpenCL: use
clHostMemAllocINTELfor managed memory; addedclEnqueueMigrateMemINTELsupport - Level0/OpenCL: managed memory support reporting
Device-Side malloc/free
- Implemented device-side
__chip_malloc/__chip_freewith C++new/deletewrappers
HIP Graphs
- Fixed
hipGraphAddDependencies/RemoveDependenciesto iteratefrom[i]→to[i]pairs correctly - Added null-pointer parameter validation across
hipGraph* API functions - Fixed
getCaptureStatus()(was hardcoded to returnNone) - Fixed
hipStreamWaitEventincorrectly flipping stream capture status - Fixed
hipStreamEndCapturereturningIllegalState(401)
Separate Compilation
- Handle
-dcflag inhipccfor separate compilation workflows (#893) - Support unbundling of static device libraries in HIPSPV toolchain
HIPRTC Improvements
- Runtime compilation handles more SPIR-V constructs correctly
- HIPRTC compilation-output caching support
- Auto-include fp16 headers in HIPRTC
- Device variable registration for compiled modules (constant memory in HIPRTC)
- Shell metacharacter escaping in
hiprtcCompileProgram - Fixed
hipRTCcompile error with Clang-22
fp16 / Device Library
- Added float→half conversion functions with rounding modes
- Added
__device__/__host__decorators to fp16 header - Fixed raw bit extraction and error messages in
fp16_conversion.hpp - Fixed
__ocml_cvtrtn_f16_f32and other missing conversion functions - Added float/double
atomicMin/atomicMaxdevicelib implementations
SPIR-V Compatibility
- Promote narrow integer kernel args to
i32for SPIR-V conformance (#849) - Inline kernel arg promotion to avoid wrapper function pattern (#849)
- Improved linking behavior for programs that use atomics and ballot operations
SYCL Interop
- Fixed
hip_sycl_interopandhip_sycl_interop_no_buffersfor Level Zero backend with MKL 2025 UR API
Notable Bug Fixes
- Memory-copy validation is stricter and more compatible with HIP behavior, including invalid pointer handling and
hipMemcpyDefaultdirection inference. - Atomics and ballot intrinsics received several correctness fixes, including
atomicMin/atomicMaxon floating-point types,__chip_all(), and__byte_perm. - Stream, event, and queue synchronization fixes address hangs and races seen with OpenCL, Level Zero, ARM Mali, and default-stream behavior.
- Module loading/unloading and backend selection are more robust, including fixes for default backend selection and
hipModuleUnload. - C and CMake integration fixes improve mixed C/HIP projects, CMake HIP language support, and downstream package integration.
Submodule Updates
- HIP → 7.2.0 (
chipStar-hip-7) - ROCm-Device-Libs: dynamic datalayout from clang;
irif.htype-punning fix - HIPCC: various fixes including
-no-hip-rt, LLVM 21, shell metacharacter escaping - PoCL support updated to version 7
Full Changelog
For the complete list of changes, see:
git log v1.2.1..v1.3
v1.3-RC1
chipStar v1.3-RC1 Release Notes
Changes since v1.2.1.
Currently Tested Hardware
- Apple CPU - M4 via PoCL.
- ARM GPU - Mali G-52 via Arm OpenCL
- Intel GPUs via Intel OpenCL or Level Zero provided by intel-compute-runtime
- RISC-V CPU SiFive via PoCL OpenCL
- AMD GPUs via rusticl OpenCL
New toolchain support
- LLVM 22, including the integrated SPIR-V backend.
- LLVM 21.
- PoCL 7.
- LLVM 18 / PoCL 6.0 retired.
New HIP APIs
- Stream-ordered allocator:
hipMallocAsync,hipFreeAsync,hipMallocFromPoolAsync. hipLaunchHostFunc.hipMemAdvise,hipMemRangeGetAttribute,hipMemRangeGetAttributes.- Memory prefetch.
- Managed-memory
hipMemsetand managed-memory reporting in both backends. hipMemcpyDefaultdirection inference.
Ecosystem libraries
install_chipstar.py now installs chipStar alongside the following libraries:
| Library | Description |
|---|---|
| rocPRIM | Parallel primitives |
| hipCUB | CUB-like primitives for HIP |
| rocThrust | Thrust parallel algorithms |
| rocRAND | Random number generation |
| hipRAND | HIP random number interface |
| rocSPARSE | Sparse matrix operations |
| hipSPARSE | HIP sparse matrix interface |
| HipBLAS | HIP BLAS via Intel MKL |
| HipSOLVER | HIP linear solver via Intel MKL |
| HipFFT | HIP FFT via Intel MKL |
| MKLShim | Intel MKL shim layer |
| hipMM | HIP memory manager (RMM port) |
v1.2.1
What's Changed
This minor releases adds some fixes and performance improvements, most notably module caching.
- Update README.md by @zjin-lcf in #935
- Add JIT timings by @pvelesko in #940
- Integrate hipSOVLER by @pvelesko in #941
- remove stl sycl header include by @pvelesko in #942
- Fences fix by @pvelesko in #944
- Integrate HIPCC macro fix by @pvelesko in #946
- Module Caching by @pvelesko in #943
- Print JIT logs to Info, always by @pvelesko in #948
- Use Level Zero Copy Queues by @pvelesko in #949
- Prune known_failures.yaml & arg checks by @pvelesko in #950
- Support specifying include directories (
-I) through hipRTC by @kmaehashi in #951 - implement a missing atomicMax by @pvelesko in #953
- JIT flags append instead of override by @pvelesko in #954
New Contributors
- @kmaehashi made their first contribution in #951
Full Changelog: v1.2...v1.2.1
chipStar release 1.2
Release Notes
This release brings significant stability and performance improvements, enhanced support for CUDA, new HIP/ROCm library ports and integrations for HipBLAS, HipFFT, HipRAND/RocRAND. Initial testing of running HIP/CUDA applications on RISC-V.
Tested Platforms
- Intel, AMD CPUs via Intel Compute Runtime
- Intel GPUs via Neo i915 driver
- ARM Mali GPUs (Quartz64 SBC)
- RISC-V (Starfive Visionfive 2 SBC Debian, experimental)
- AMD GPUs via rusticl(exploratory work)
Notable Changes
-
Introduced
cucc, a drop-in replacement fornvcc:- Added
cucc, enabling direct compilation of CUDA sources. - Added
nvccsoftlink, allowing you to compile CUDA sources without making any changes. - Adjusted CUDA headers to improve compatibility with CUDA sources, including a dummy
cublas_v2.hheader to prevent conflicts with system headers.
- Added
-
Enhanced OpenCL backend:
- Support for
cl_ext_buffer_device_addressextension:- Added support for devices featuring the
cl_ext_buffer_device_addressextension, improving memory management capabilities.
- Added support for devices featuring the
- Optimized queue profiling:
- The OpenCL backend now uses non-profiling queues by default and switches to profiling queues only when needed, resulting in performance improvements.
- Various other performance optimizations
- Support for
-
Fixed Level Zero backend issues:
- Addressed out-of-memory (OOM) errors:
- Fixed memory leaks and improved resource management to prevent OOM errors during heavy workloads.
- Improved thread safety:
- Implemented mutexes and synchronization mechanisms to enhance thread safety within the Level Zero backend.
- Addressed out-of-memory (OOM) errors:
-
Rebased to HIP 6.x and updated hip-tests:
- Updated the codebase to be compatible with HIP 6.x.
Library Support Changes
- Expanded HIP library support:
- HipBLAS integration:
- Introduced the
CHIP_BUILD_HIPBLASoption to enable building HipBLAS.
- Introduced the
- HipFFT integration:
- Introduced the
CHIP_BUILD_HIPFFToption to enable building HipFFT.
- Introduced the
- RocRAND port:
- HipBLAS integration:
v1.2-RC1
What's Changed
- Refactor WaitForThreadExit by @pvelesko in #752
- Fix #757: skip texture tests with iGPU+OpenCL when USM=ON by @franz in #758
- adjust modules due to NFS going down by @pvelesko in #766
- Fix tests were unintentinally skipped by @linehill in #764
- remove Unit_hipMemsetFunctional_ZeroSize_hipMemsetD32 from exclusion list by @pvelesko in #767
- update ROCm-Device-Libs by @pvelesko in #765
- page lock runner test by @pvelesko in #773
- Various improvements by @pvelesko in #774
- Dynamic event pools by @pvelesko in #771
- Update cpp-linter-action version by @pvelesko in #777
- Map device built-ins to compiler built-ins by @linehill in #763
- Level-zero-premature-exit by @pvelesko in #778
- Add sanity check for catching unexpected atomic built-ins by @linehill in #706
- Use a fence for syncing RCL by @pvelesko in #688
- default to Debug by @pvelesko in #776
- Adjustments for future LLVM-18 release by @linehill in #714
- SYCL-HIP Interop - Drop RCL/ICL Quer by @pvelesko in #781
- OpenCL Event Cleanup by @pvelesko in #788
- OpenCL: Fix indirect USM pointer related issues by @linehill in #790
- Add CHIP_LAZY_JIT environment option to control JIT timing by @linehill in #786
- Remove SPIR-V version check in the parser by @linehill in #787
- Backend handles refactor by @pvelesko in #789
- Fixup exluded tests by @pvelesko in #795
- Add CHIP_DEVICE_TYPE to documentation by @karlwessel in #800
- Fix HIP float intrinsics were mapped double built-ins by @linehill in #793
- fix name of the cuda compiler script by @karlwessel in #802
- OpenCL BE: set CHIP_USE_INTEL_USM on by default by @linehill in #791
- Sample and Test Profiling by @pvelesko in #804
- Linter Fix include complaint by @pvelesko in #805
- Changes to reduce kernel launch overheads by @linehill in #794
- Fix Event Collection by @pvelesko in #803
- OpenCL: Skip SVM pointer annotation if possible by @linehill in #785
- Remove a confusing already registered and mapped warning by @linehill in #809
- Level Zero Refactor + Bugfixes by @pvelesko in #817
- Refactor Known Failing Tests by @pvelesko in #822
- Fix called incorrect compiler built-ins by @linehill in #820
- Internalize
__device__functions by @linehill in #819 - Implement FencedCmdLists by @pvelesko in #823
- Rebase HIP 6.x + Update hip-tests by @pvelesko in #796
- HIPCC Fixes by @pvelesko in #827
- Add CHIP_BUILD_HIPBLAS option by @pvelesko in #831
- OpenCL: Use non-profiling queue, switch to profiling when needed by @linehill in #814
- Fixes scripts/configure_llvm.sh by @linehill in #835
- Use loginfo for printing device info by @pvelesko in #839
- OpenCL: Fix memory leak / OoM and stack overflow by @linehill in #837
- Fix bunch texture cases by @linehill in #842
- Ubuntu Fixes by @pvelesko in #825
- Small Fixes by @pvelesko in #844
- Various small optimizations by @linehill in #816
- Level Zero - Fix OOM & Improve Thread Safety by @pvelesko in #845
- Add a workaround for name mangling issue with PowerVR OpenCL by @franz in #828
- Add libCEED to testing + Update hipBLAS w/sync by @pvelesko in #847
- update spirv_hip_complex.h header by @pvelesko in #856
- rtdevlib: fix function signature mismatches by @linehill in #851
- OpenCL: Support devices with cl_ext_buffer_device_address by @linehill in #830
- Add SKIP_TESTS_WITH_DOUBLES Option by @pvelesko in #826
- [HipBLAS] Fix hiblas.h and hipsolver header conflicts by @pvelesko in #852
- Small Fixes by @pvelesko in #862
- New CUDA compiler by @pvelesko in #858
- LLVM Configure script changes by @pvelesko in #864
- known_failures.yaml add hostname key by @pvelesko in #867
- spirv-extractor link fix by @pvelesko in #871
- update configure_llvm for IPO by @pvelesko in #870
- Submodules track branches by @pvelesko in #872
- Fix math function j1 typo in dp_math.hh by @jjennychen in #876
- fixed cudaMallocManaged function parameter type issue by @jjennychen in #878
- CUDA Compiler Refactor by @pvelesko in #875
- Docker Images + update linter github action by @pvelesko in #879
- Update DockerfileFull by @pvelesko in #881
- Implement missing host-side math functions by @pvelesko in #884
- Adding runtime error conversion for Level0 backend by @jjennychen in #886
- Fix 885 by @pvelesko in #889
- Fix 887 by @pvelesko in #888
- handle relocatable code flags cucc by @pvelesko in #892
- Add more implicit casts to dim3 by @pvelesko in #895
- update HIPCC to preserve ordering by @pvelesko in #899
- spirv_hip_fp16.h header file updates by @jjennychen in #896
- ARM CI by @pvelesko in #903
- skip kernel annotation on CPU by @pvelesko in #905
- use github.sha for docker by @pvelesko in #907
- docker ref fix by @pvelesko in #908
- docker build only on merge to main by @pvelesko in #909
- Expand the use of error maps by @pvelesko in #891
- Ajust known_failures for abort,assert by @pvelesko in #906
- Properly annotate Intel USM kernels by @pvelesko in #911
- Make adjustments for LLVM-19 by @linehill in #901
- Fix device-side functions by @pvelesko in #913
- Enable building of hipFFT by @pvelesko in #912
- OpenCL Backend Fixes by @pvelesko in #914
- hipStreamSemantics Fixes by @pvelesko in #917
- Cleanup by @pvelesko in #918
New Contributors
- @karlwessel made their first contribution in #800
- @jjennychen made their first contribution in #876
Full Changelog: v1.1...v1.2-RC1
chipStar release 1.1
chipStar release 1.1
This release cycle focused on stabilization and performance improvements
over the 1.0 release. The release was measured to run some benchmarks up
to twice as fast as 1.0, with an average improvement of 30% measured on HeCBench.
Further highlights are described in the following.
Release Highlights
- Added support for Clang/LLVM 17. LLVM 15 and 16 are still supported.
- Ability to Use the Intel Unified Shared Memory Extension, with OpenCL backend
- Optimized Atomic Operations
- Use of Immediate Command Lists for Low Latency Dispatch, with Level Zero backend
- Improved portability to other platforms & devices
- Improved Asynchronous Execution
The full release notes are available in docs/release_notes/chipStar_1.1.rst
The full sources of the release (including git submodules) are available packaged in the attached file chipStar-1.1.tar.gz
(SHA256: 9258a313c503073a082ca310cebf048d84c4ab698facfc8d1d9ce1381ffb9fc5).
v1.1-RC4
The 4th release candidate for v1.1. Please test and add your results to the test log and any major problems or regressions as issues to the 1.1 milestone.
v1.1-RC3
The third release candidate for v1.1. Please test and add your results to the test log and any major problems or regressions as issues to the 1.1 milestone.
v1.1-RC2
The second release candidate for v1.1. Please test and add your results to the test log and any major problems or regressions as issues to the 1.1 milestone.