chipStar v1.3 Release Notes
Overview
v1.3 is a major release with approximately 700 commits since v1.2.1. The main user-visible changes are official CMake integration, macOS support, broader LLVM and HIP compatibility, expanded HIP runtime API coverage, HIPRTC improvements, and many correctness and performance fixes.
New Platform Support
MacOS (ARM64 + x86)
- macOS is now a supported chipStar platform, including Apple CPU Silicon via PoCL
- LLVM 21/22-based macOS builds are supported
- Runtime fixes make queue handling and unified-memory behavior work correctly on macOS
Intel Arc B570 / GPU Kernel 6.8+
- Intel Arc B570 systems are supported with newer Linux kernels
- Fixed timestamp handling on devices with 64-bit timestamps
ARM Mali GPU
- Device-side
printfis now supported on Mali GPUs with thecl_arm_printfextension - Workarounds for Mali driver deadlock (flush queues before finish,
clFlushfor marker/callback events)
Supported Libraries
install_chipstar.py now installs chipStar together with supported CHIP-SPV library ports.
Platform-Independent Libraries
- rocPRIM - Parallel primitives
- hipCUB - CUB-like primitives for HIP
- rocThrust - Thrust parallel algorithms
- rocRAND - Random number generation
- hipRAND - HIP random number interface
- rocSPARSE (new) - Sparse matrix operations
- hipSPARSE (new) - HIP sparse matrix interface
- hipMM (new) - HIP memory manager (RMM port)
Intel MKL-Based Libraries
- H4I-MKLShim - Intel MKL shim layer
- H4I-HipBLAS - hipBLAS via Intel MKL
- H4I-HipFFT - hipFFT via Intel MKL
- H4I-HipSOLVER - hipSOLVER via Intel MKL
Application Support
LLVM and HIP Compatibility
- Added support for LLVM 20, LLVM 21, and LLVM 22
- LLVM 22's integrated SPIR-V backend is detected automatically and is the default/preferred option
- Added compatibility work for upcoming LLVM 23 / Clang offload driver changes
- Updated the bundled HIP stack to HIP 7.2.0 (
chipStar-hip-7) - Fixed HIP 7 API signature changes, including
hipMemcpyHtoD/HtoDAsync - Fixed HIPRTC API parameter types for compatibility with newer HIP headers
Build and Project Integration
- chipStar is officially supported by CMake starting with CMake 4.3.0, making it easier for projects to use chipStar through CMake's HIP language support
- New
install_chipstar.pywith unified install dir,--with-testsflag,rocPRIMprefix support install_chipstar.pynow builds fromcwdwhen run inside the repo- Fixed
make installfailure on macOS (hardcoded.soextension) - Excluded
spdlogfrom install to avoid downstream symbol conflicts; renamed namespace tochipStar_spdlog - Installed
hip-lang-config.cmakefor CMake HIP language support - Installed
FindHIP.cmake
New Features
Async Memory Allocation
- Implemented
hipMallocAsync,hipFreeAsync,hipMallocFromPoolAsync
Managed Memory
- Implemented
hipMemAdviseandhipMemRangeGetAttribute(s) - Implemented
hipMemPrefetchAsync - OpenCL: use
clHostMemAllocINTELfor managed memory; addedclEnqueueMigrateMemINTELsupport - Level0/OpenCL: managed memory support reporting
Device-Side malloc/free
- Implemented device-side
__chip_malloc/__chip_freewith C++new/deletewrappers
HIP Graphs
- Fixed
hipGraphAddDependencies/RemoveDependenciesto iteratefrom[i]→to[i]pairs correctly - Added null-pointer parameter validation across
hipGraph* API functions - Fixed
getCaptureStatus()(was hardcoded to returnNone) - Fixed
hipStreamWaitEventincorrectly flipping stream capture status - Fixed
hipStreamEndCapturereturningIllegalState(401)
Separate Compilation
- Handle
-dcflag inhipccfor separate compilation workflows (#893) - Support unbundling of static device libraries in HIPSPV toolchain
HIPRTC Improvements
- Runtime compilation handles more SPIR-V constructs correctly
- HIPRTC compilation-output caching support
- Auto-include fp16 headers in HIPRTC
- Device variable registration for compiled modules (constant memory in HIPRTC)
- Shell metacharacter escaping in
hiprtcCompileProgram - Fixed
hipRTCcompile error with Clang-22
fp16 / Device Library
- Added float→half conversion functions with rounding modes
- Added
__device__/__host__decorators to fp16 header - Fixed raw bit extraction and error messages in
fp16_conversion.hpp - Fixed
__ocml_cvtrtn_f16_f32and other missing conversion functions - Added float/double
atomicMin/atomicMaxdevicelib implementations
SPIR-V Compatibility
- Promote narrow integer kernel args to
i32for SPIR-V conformance (#849) - Inline kernel arg promotion to avoid wrapper function pattern (#849)
- Improved linking behavior for programs that use atomics and ballot operations
SYCL Interop
- Fixed
hip_sycl_interopandhip_sycl_interop_no_buffersfor Level Zero backend with MKL 2025 UR API
Notable Bug Fixes
- Memory-copy validation is stricter and more compatible with HIP behavior, including invalid pointer handling and
hipMemcpyDefaultdirection inference. - Atomics and ballot intrinsics received several correctness fixes, including
atomicMin/atomicMaxon floating-point types,__chip_all(), and__byte_perm. - Stream, event, and queue synchronization fixes address hangs and races seen with OpenCL, Level Zero, ARM Mali, and default-stream behavior.
- Module loading/unloading and backend selection are more robust, including fixes for default backend selection and
hipModuleUnload. - C and CMake integration fixes improve mixed C/HIP projects, CMake HIP language support, and downstream package integration.
Submodule Updates
- HIP → 7.2.0 (
chipStar-hip-7) - ROCm-Device-Libs: dynamic datalayout from clang;
irif.htype-punning fix - HIPCC: various fixes including
-no-hip-rt, LLVM 21, shell metacharacter escaping - PoCL support updated to version 7
Full Changelog
For the complete list of changes, see:
git log v1.2.1..v1.3