chore: merge dev branch fix/macos backend-install-ux#8197
Merged
Conversation
Contributor
f9aebb3 to
41f6781
Compare
NSOpenPanel resolves filter extensions via UTType.typeWithFilenameExtension:, which only handles single-component extensions. `tar.gz` returns nil and `gz` resolves to a sibling UTI of `.tar.gz` files, so neither enables them in the picker. Skip the filter on macOS and rely on the existing extension check in installBackend() to reject invalid picks.
…ate the UI Promise.all rejected the whole load() on the first failing onLoad, and ExtensionProvider only set finishedSetup on success — so a single throw (e.g. ggml_backend_error from llamacpp's configureBackends) hid the entire app, blocking MLX/OpenAI/Anthropic/etc. providers that had otherwise loaded fine. - ExtensionManager.load() uses Promise.allSettled and logs each rejected extension by name. - ExtensionProvider wraps setup in try/finally so setFinishedSetup(true) always fires.
oomError/backendError are populated from global Tauri events emitted by the llamacpp router, but the chat route rendered the banner regardless of which provider the current model belonged to. A router-side Metal init crash would surface as "GGML backend encountered an error" on top of an MLX/OpenAI/Anthropic chat that was otherwise working. Mask the raw values behind selectedProvider === 'llamacpp' so the banner, the implicit stop() on error, and the Reload button only fire for llamacpp-backed threads. Store contents are untouched; switching back to a llamacpp model resurfaces the error as before.
…tpak Previous flow ran lddtree against llama-server itself, whose transitive tree (libc, libstdc++, libcurl, libgomp, conditional codepaths) produced mostly-noise "missing" entries. The user-actionable question is whether the GPU backend (ggml-cuda / ggml-vulkan) can resolve its system deps at dlopen time — that's what actually fails when CUDA/Vulkan runtimes are absent. - verify_backend_dependencies now scans bin_dir for files whose name contains "cuda" or "vulkan" and has a .so/.so.N/.dll extension and passes only those to the analyzer. llama-server is no longer analyzed. CPU-only backends produce an empty path list and verify trivially. - verify_backend_installation short-circuits to verified=true on Linux flatpak (jan_utils::system::is_flatpak()); sandbox library layout makes lddtree results meaningless and the checker was firing false positives there.
Lets users opt out of the startup GPU backend library check. Defaults to enabled to preserve current behavior.
The BackendUpdater dialog ran checkForUpdate() unconditionally on mount, hitting the network even when the user had disabled auto-update. Gate the call on autoUpdateEnabled; the manual "Check for Updates" button in provider settings remains the opt-in path.
The dropdown's ModelSupportStatus called read_gguf_metadata + the heavy isModelSupported probe every time the selected model changed. Swap to the estimateModelFit heuristic the hub already uses (file size + KV heuristic against RAM/VRAM) — cheap, synchronous after the one-shot sizeBytes lookup, and good enough for an at-a-glance indicator. Tooltip now labels the result as an estimate.
Move sampler editing out of the model sidebar into a composer-anchored popover scoped to the active assistant, and gate the whole surface on a provider/model capability table so users can't accidentally send params the backend will reject. - New `providerCaps.ts` maps each built-in provider to supported/maybe sampler capabilities; custom providers default to permissive. Adds `isModelLevelRejected` for family-specific rejections (OpenAI o-series and gpt-5* reject temperature/top_p/penalties; grok-3-mini/4 quirks) and `getMutualExclusionDrops` for cross-param conflicts (Anthropic's temperature+top_p). - `ModelFactory.createModel` strips unsupported samplers at the single dispatch chokepoint so the wire request never carries rejected keys. - `createCustomFetch` now retries once with all injected sampling params stripped when the upstream returns a sampling-rejection error, and toasts the user so they know their overrides were dropped this turn. - `predefinedParams.ts` gains typed `ParamDef` schema with capability, controller props, `disabledBy` for live mutual-exclusion gating (mirostat shadows top-k/top-p/min-p, dynatemp gates its exponent, etc), effect hints, and category/group metadata. - `ParametersSection` replaces the chip wall with a categorized "Add parameter" menu and grouped blocks for coupled samplers (mirostat, dry, xtc, dynatemp). Active rows render in stable canonical order with inline warnings when the current provider/model rejects the key. - `SamplerPopover` adds an assistant switcher to its header (subsumes the legacy + > Use Assistant submenu and the standalone bot avatar button in ChatInput) plus a gear-icon shortcut to the assistant settings route. Bounded to `--radix-popover-content-available-height` so the header stays anchored when the body overflows. - `ModelSetting` sidebar skips keys present in `paramsSettings` so sampler rows no longer duplicate between the sidebar and the popover. - `SliderControl` drops the negative-margin hack that clipped the value input, inlines min/max scale labels, and supports a `warnAbove`/ `warnBelow` band to tint the slider range for risky values.
Provider `models` arrays can carry duplicates (registry + locally imported, or upstream `/v1/models` returning the same id twice), producing duplicate React keys in the picker rows.
The auto-reconnect monitor introduced in #7791 ran list_all_tools every 2s against every connected server, and the stderr forwarder logged every line at WARN regardless of the server's reported level. STDIO servers (Python MCPs in particular) echo a ListToolsRequest INFO line to stderr on each probe, drowning the log in WARN entries. - Health probe interval 2s → 30s. Reconnect signal still races the timer via tokio::select! so explicit reconnects stay instant. - Route stderr lines through the level token the server itself prints (ERROR/WARN/DEBUG/TRACE), defaulting to INFO when no token is present — stderr != error on most servers.
The Zustand merge of incoming settings into the providers store spread the existing controller_props on top of the fresh ones, so metadata the extension recomputes (recommended, options) survived across refetches and outlived the underlying setting. The user's `value` selection is the only field that should be preserved — keep that and let everything else come from the fresh fetch.
A dropdown with one option implies a choice the user doesn't have. Render the value as plain text when options.length <= 1 — no chevron, no popover. Applies globally to DropdownControl.
- Schema: replace the single composite version_backend setting with two independent llamacpp_version + llamacpp_backend selectors. Migration in onLoad splits any prior version_backend string. Internal version_backend lives on as a derived field so the Rust plugin and router commands stay untouched. - New check_for_updates toggle (default on) gates the remote release fetch. auto_update_engine now requires it. Lets users disable update checks without hiding the dropdown options they already have, and the manual "Check for Updates" button always hits the network. - Recommended backend is computed from the upstream-released set only via the new fetchRemoteBackends helper. Side-loaded custom backends from "Install from File" no longer bias the recommendation. When remote is unavailable (offline / check_for_updates off) no hint is surfaced — the previous "fall back to merged" behavior caused custom installs to be recommended over official ones. - Persistence moves from localStorage to <jan_data>/llamacpp/settings.json. Atomic-ish writes via tmp + mv, serialized through a single writeChain so concurrent writes don't interleave. One-shot idempotent migration on onLoad: file presence is the marker; localStorage is only cleared after a successful parse + file write. Survives localStorage wipes, lets users inspect / edit the file. - Drops core's "preserve old recommended" surprise — registerSettings is overridden to write through the file store and the unawaited call in configureBackends is now awaited so persistRecommended doesn't race the merge. Tests: 106/106 (extension) + provider settings + hooks unchanged.
…r message Errored generations were being persisted as empty assistant messages because extractContentPartsFromUIMessage always padded to length 1 with an empty-text fallback. After a few reload-and-retry cycles, threads would render N empty rows with timestamps and action icons but no content. - onFinish now gates persist on uiMessageHasMeaningfulContent, which inspects the raw UIMessage parts (ignoring empty-text fallbacks and bare tool stubs). Empty assistant messages never reach disk. - On status === 'error', stamp the most recent user ThreadMessage with metadata.error so the failure survives reload and thread navigation. - A successful assistant onFinish strips metadata.error from prior messages — forward progress clears stale errors. Editing the user message clears it too. - On thread load, drop and delete any persisted assistant rows matching threadMessageIsEmpty. Lossless one-shot cleanup for users who already hit the bug. - MessageItem renders an inline destructive-tinted error card under user messages with metadata.error, with a Regenerate button wired to the existing onRegenerate flow.
ChatInput.test: added providers:[] to the useModelProvider mock so the new SamplerPopover doesn't blow up reading providers.find, and stubbed Link on the @tanstack/react-router mock. AddEditAssistant.test: extended the @/lib/predefinedParams mock with the exports introduced by the sampler refactor (paramCategories, paramGroups, LLAMACPP_ONLY_PARAM_KEYS, evaluateDisabled, isGroupedParamKey), added a ResizeObserver polyfill needed by Radix sliders, and removed three obsolete tests that drove the pre-refactor chip-palette UI; the surviving cases cover save/edit/validation.
The isModelSupported gate counted model weights + KV cache only, so multimodal models with a sibling mmproj.gguf got greenlit on systems where the projector pushed total allocation past free VRAM, then crashed with a CUDA OOM during slot init. Stat mmproj.gguf next to model.gguf (local paths only) and add its size to total_required.
The extension's logger template-stringed every arg into the file log,
so `logger.error('Error in load command:', err)` wrote
"[object Object]" — useless when diagnosing model-load crashes.
Format Errors as `message\nstack`, objects as JSON, primitives as-is.
The send button and Enter handler required non-empty prompt text, so users couldn't ask a multimodal model to describe an image (or transcribe audio) without typing a placeholder. Permit submit when any image/audio attachment is ready, even with empty prompt.
create_message and modify_message could race when the UI stamps metadata onto a freshly-sent user message immediately after sending (e.g. metadata.error after a CUDA-OOM model-load failure). If the modify's UPDATE landed first, it silently affected 0 rows; the create then INSERTed the original row, dropping the metadata edit. - modify_message now UPSERTs (ON CONFLICT(id) DO UPDATE) so a stamp ahead of the create still lands on disk. - create_message uses INSERT OR IGNORE so a late create does not clobber the row already inserted by the modify.
Stack of fixes for the per-turn error UI.
1) JSONL race. Desktop persistence is messages.jsonl (SQLite is mobile
only). modify_message bailed silently when the message id was not
yet in the file — exactly the case when the UI stamps metadata onto
a freshly-sent user message before create_message acquired the
per-thread lock. Edit was dropped, restart found metadata=null.
- modify_message upserts when no row matches the id.
- create_message dedupes by id under the lock so a late create after
a modify-upsert does not duplicate or clobber.
2) Per-message error store. The old approach stamped errors onto the
AI SDK UIMessage's custom metadata in chatMessages. AI SDK reshapes
chatMessages on subsequent sendMessage calls, dropping the custom
field, so the inline card vanished as soon as the user sent another
message. New useMessageErrors Zustand store keys errors by id and is
immune to that reshaping. ThreadMessage's metadata.error is still
written for restart restoration; the thread-load path hydrates the
store from those persisted fields. MessageItem reads from the store.
3) ThreadList wipe. ThreadList eagerly fetched messages on mount and
wrote them back via setMessages(threadId, fetchedMessages). The
truthy guard `if (fetchedMessages)` always passed because an empty
array is truthy, so for a brand-new thread (empty on disk) it raced
the optimistic addMessage write and clobbered it with []. Gate on
length > 0.
4) Banner dedupe. Narrow the global error banner to the three llamacpp
signals (oom / backend / context-limit) which have unique UI; the
inline card owns generic useChat errors. No more duplicate cards on
regenerate-after-error.
Predefined remote providers expose a fixed sampling surface and reject unknown JSON fields (e.g. Gemini 400s on temperature/top_p/top_k). Hide the composer SamplerPopover for those providers and strip all paramsSettings keys from the request body so stored assistant overrides set while on a local model can't leak into a remote request.
Reasoning toggle used size=sm with an inline on/off/auto label, making it taller and wider than its neighbors and breaking the row rhythm. Drop the inline label and switch to icon-xs; state is still conveyed via icon color/opacity, tooltip, and the dropdown's checkmark.
tsc -b (project-references build) was rejecting the unknown-typed
existingValue spread into controller_props.value as {} | null |
undefined. Narrow to string | number | boolean | undefined so it
matches ProviderSetting.
ESLint forbids _-prefixed unused vars. Use object spread + delete to drop a key instead of destructuring it into an underscore-named bind.
useMessages selector reached s.messages[threadId]; the test mock has no messages key, so the optional chain prevents a TypeError without changing prod behavior. Drop three tests that asserted the old global error banner for generic chat errors. That banner was narrowed to contextLimitError/oomError/ backendError when error UI moved to per-message via useMessageErrors; the tests asserted DOM that no longer renders.
Sweep across the main src-tauri crate and 5 plugins (hardware, llamacpp, mlx, rag, vector-db). ~88 warnings eliminated. Idiomatic fixes throughout: Default impls, &Path over &PathBuf, .as_deref(), strip_prefix, .is_ok()/.is_some_and(), collapsed if-let chains, ? over manual None-returning matches, trait-bound consolidation in generics, etc. 10 #[allow(clippy::too_many_arguments)] left on stable internal and Tauri-command signatures where refactoring to argument structs is out of scope for a lint sweep. One #[allow(clippy::zombie_processes)] on the intentionally-detached setsid child in jan-cli. No new unsafe. No --fix autofixes. cargo clippy --all-targets -D warnings is clean across every crate; cargo build succeeds.
Read general.name from gguf metadata, trim and replace whitespace runs with '-'. Fall back to the model file basename (sans .gguf), then to modelId if metadata is unavailable.
…I on hydration - Register refresh_system_info in the hardware plugin's COMMANDS and default permission set; without it the visibility-change handler hit 'Command plugin:hardware|refresh_system_info not allowed by ACL'. - Disable SamplerPopover trigger while useAssistant is loading so edits can't land on the hardcoded in-memory default and get clobbered when setAssistants() resolves from disk. - Drop hardcoded sampler params from the in-memory defaultAssistant; the assistant-extension seeds them to disk on first run and the file is the only source of truth.
The file-backed updateSettings override dropped the base class's onSettingUpdate dispatch loop, so changes to ctx_size, n_gpu_layers, flash_attn, and other PRESET_AFFECTING_KEYS were persisted but never propagated to this.config or scheduleRouterRestart — the live router kept serving requests with the old preset until the next app restart. Diff the new value against the persisted one, write the file, then invoke onSettingUpdate for each actually-changed key. The existing isUpdatingBackend guard keeps the version_backend recursion safe. Regression introduced in 5e79398 (localStorage → file persistence).
The llamacpp OOM/backend banner lived only in useAppState (in-memory), so restart wiped it and switching threads leaked the banner from the offending thread onto unrelated llamacpp threads. - LlamacppOomListener stamps metadata.oomError / metadata.backendError onto the last user message of currentStreamThreadId at error time and persists via useMessages.updateMessage (modify_message upsert from 5eeced0). - $threadId derives useAppState.oomError/backendError from the active thread's message metadata on every thread switch — single source of truth, no cross-thread leak. - handleSubmit / handleRegenerate now also strip the stamped metadata so the derive effect doesn't resurrect a dismissed banner. - Test fixture extended with setOomError / setBackendError mocks.
41f6781 to
11f0f28
Compare
Custom providers can now speak the Anthropic Messages API wire format (LiteLLM, Bedrock proxies, self-hosted Claude gateways) in addition to OpenAI-compatible. Picks the right SDK at dispatch time via a new `api_type` discriminant on ProviderObject; built-in 'anthropic' is backfilled by a v15->v16 zustand migration. - AddProviderDialog gets an API-format selector and now requires an API key (matches existing backend gates that silently refuse to register or list models for key-less providers). - model-factory + ai-model branch to @ai-sdk/anthropic when api_type === 'anthropic', regardless of provider name. - custom-chat-transport's serial-tool-use repair keys off api_type instead of the provider name, so custom Anthropic proxies get the same fixup. - providerCaps clamps samplers (top_p/temperature mutex) for any Anthropic-wire provider.
The router was inflating models_max by the number of installed embedders (`+N embedding`), but only one embedding is ever loaded at a time (RAG issues one load() per request). The phantom slots prevented eviction of stale chat models until the pool was genuinely full. Cap the bonus at +1 when any embedder is installed; the log still reports the installed count for diagnostics.
Minh141120
approved these changes
May 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe Your Changes
Sampling & chat composer
paramsSettingskeys from the request body so assistant overrides set on a local model can't leak into a remote request. (2147fea)on/off/autolabel so the row keeps a consistent rhythm; state still readable via icon color + tooltip + dropdown checkmark. (bd1ccba)Threads & error UX
metadata.erroracross restart and dedupe the error UI — errors now survive thread reload. (5eeced0, f6e4911)llamacpp
mmproj.ggufin VRAM precheck. (20b705e)Error/object log args instead of dumping[object Object]. (1780ff7).tar.gzbackends in the macOS file picker. (2f03053)general.name— read the metadata, trim and dash-join whitespace; fall back to the file basename (sans.gguf), thenmodelId. (ec0d85d)models_max— was inflating by N per installed embedder, but only one embedding loads at a time; the phantom slots stalled chat-model eviction. (a4ac378)onSettingUpdatefrom overriddenupdateSettingsso the router restarts when version/backend changes via the API path, not just the UI. (a4047c9)MCP
provider+id. (d6a3f06)Models / providers
api_typediscriminant onProviderObjectroutes the model factory andai-model.tsthrough@ai-sdk/anthropicwhen set, so LiteLLM/Bedrock proxies and self-hosted Claude gateways work without a built-in entry. Built-in Anthropic backfilled via a v15→v16 store migration. Custom providers now require an API key (matches existing backend gates). (b9c3fbf)Reliability
onLoadfailures so one bad extension can't gate the entire UI. (1c7658c)refresh_system_infopermission and avoid rendering sampler controls before settings hydrate (no more flash of stale defaults). (99b075f)Fixes Issues
Self Checklist