Building local-first AI apps
Patterns from building a NotebookLM-style desktop app (Tauri + React + LanceDB + Ollama/gateway). Organized product-outward: providers → retrieval → citations → streaming → chat loop → demos.
1. Provider abstraction
- Route chat through a facade that speaks Ollama OR any OpenAI-compatible gateway
(
/chat/completions,/models, Bearer). Keep embeddings and OCR independently swappable (local Ollama, a bundled CPU model like Model2Vec/potion, or the gateway) — users on weak hardware want cloud chat with local embeddings so their documents never leave the machine. - Make all UI provider-aware. Don't show "Ollama offline" to a gateway user who doesn't run Ollama; branch status copy and health checks on the active provider.
- Reverse-engineering a gateway's real endpoints/auth: its official CLI is usually a
downloadable npm/JS bundle —
grepit for the API host, paths, auth header scheme, and model-list shape. (IBM Bob: LiteLLM under the hood; staticbob_keys useAuthorization: Apikey+X-API-KEY, JWTs useBearer; chat needsx-instance-id/x-team-idfrom/admin/v1/profile; usage endpoints are SSO-gated → link to the portal, don't fake accounting.)
2. Retrieval (RAG) patterns
- Top-k retrieval biases to the largest sources (more chunks = more chances), so corpus-level questions ("what documents do I have?") can miss small sources entirely. Not an embedding bug. Fix: inject the full source-title manifest into the system prompt, separate from the per-question excerpts.
- Waterfill the corpus budget across sources for whole-notebook generation instead of head-truncating (which silently drops later sources): allocate smallest-first so leftovers flow to bigger sources; mark truncation explicitly in the prompt.
- Word-window chunking (~280 words, ~40 overlap) is model-agnostic and good enough; store the chunk text on the citation so the UI never needs a second lookup.
- Chat + generation should return numbered citations the model must use (
[n]hard-required in the system prompt), persisted with the message.
3. Ingestion hygiene
- Duplicate detection at ingest, before embedding. Same content in a collection silently poisons retrieval (the duplicate occupies top-k slots). Prefilter by char count, then exact content compare; fail with a friendly "Already here as "Title"". For URLs, fail fast on an exact URL match before fetching — "use Refresh instead".
- Friendly display titles. Filenames, slugs, and arXiv IDs make terrible titles and starve
the source manifest. Heuristics first (markdown: first
#heading), then a tiny LLM completion ("reply with only a 3–8 word title" + first ~1500 chars + filename). Best-effort only — titling must never fail an import. Skip when the name already contains whitespace (human-written). - Store failed imports as error rows the user can see and retry, not silent drops.
4. Citation UX — close the loop
A citation that dead-ends at a snippet builds no trust. The full loop: inline marker → click → source opens scrolled to the highlighted passage.
- Inline
[n]markers → clickable chips. In react-markdown, a tiny dependency-free remark plugin walking mdast text nodes: regex\[(\d{1,2})\], replace with#cite-nlinks (only for n ≤ citations.length; skip inside links/code), render via theacomponent override as superscript chip buttons with a hover preview (source title + snippet start). - Locating the passage in the source is a whitespace problem. Chunks are space-joined word
windows; stored content keeps newlines — so exact substring search fails. Match
whitespace-tolerantly: regex-escape the first ~12 words joined with
\s+to find the start, the last ~12 words (searched only within the chunk-length window) for the end. Highlight the range,scrollIntoView({block:'center'}). - Add find-in-source (case-insensitive match count, prev/next, Enter cycles) — reading tools make the reader a feature, not a modal.
- Drop raw similarity scores from the UI (
1 - distancemeans nothing to users).
5. Streaming and cancellation
- Stream everything users wait on: chat tokens and document generation (emit
artifact://tokenevents into a live preview that follows its tail). A spinner for a 3-minute PRD reads as broken. - Scope cancellation tokens per activity (
"chat","artifact") in aHashMap<String, CancellationToken>— one global token means starting a chat kills the running document build. Race withtokio::select!; a user-initiated stop is an info toast, not an error. - On cancel of streamed chat, keep the partial text.
6. Chat-loop details that separate polished from janky
- IME guard:
if (e.key === "Enter" && !e.shiftKey && !e.nativeEvent.isComposing)— withoutisComposing, CJK users send mid-composition. - Keep the composer typable while a response streams; gate only submit.
- Suggested follow-ups fill the composer (and focus it) instead of firing immediately — users want to edit.
- Failed sends hand the prompt back to the composer; never eat typed text.
- Autoscroll only when already near the bottom; jump to latest on first load of a conversation.
- Destructive deletes go through an in-app confirm (never native
window.confirm); wrap CRUD store actions in a sharedguard()so backend failures surface as toasts instead of silent no-ops. - Auto-open generated documents where the user acted; hide badges that repeat the title; strip a leading markdown heading from card previews so the title isn't shown twice.
- Harden one Modal (role, aria-modal, labelled-by, focus trap + restore, Escape) and every dialog inherits the a11y.
7. Demo corpus design (for screenshots and sales demos)
Fabricate an interlocking document set for a fictional company; design it so every feature has a money shot:
- 6–9 documents of different types (roadmap, customer interviews, metrics review, pricing, press-release draft, competitive intel, support report, capacity notes) that cross-reference the same numbers, names, and dates — grounded answers then cite multiple sources.
- Plant 2–3 discoverable contradictions (a price that differs between pricing doc and press release; a GA date that differs between roadmap and PR) so "find problems" features and contradiction questions surface real conflicts. Keep everything else consistent so only the planted conflicts fire.
- Include quotable verbatims (customer quotes, support tickets) — answers that quote read great.
- Name one file as a slug (
eng_capacity_h2_notes.txt) to demo auto-titling; give markdown files clean#headings. - Add a "not in the sources" question to the demo script to show honest refusal.
- Write a
DEMO-SCRIPT.mdwith setup steps, per-feature questions, and where the planted conflicts are.
Related skills
shipping-tauri-apps— building, signing, and releasing the desktop shell.driving-tauri-apps— automating the running app for verification and screenshots.