Architecture
Electron boundaries, setup gating, encrypted secrets, provider adapters, voice services, and packaged lifecycle.
System shape
Desktop shell
Main process
Owns windows, hotkeys, tray/menu-bar lifecycle, config, encrypted secrets, permissions, updates, diagnostics, screen capture, STT, TTS, and provider orchestration.
UI boundary
Renderer + preload
Renders overlay/setup/settings, records microphone audio, streams chunks and VU levels, plays TTS audio, and talks to main only through typed IPC.
External services
Voice + reasoning
ElevenLabs handles STT and TTS. An OpenAI-compatible provider handles screenshot-plus-transcript reasoning.
Electron app structure
FlowLens is a desktop-first Electron app because the product depends on OS-level behavior:
- global shortcuts
- transparent always-on-top overlay windows
- screen capture
- tray/menu-bar background control
- launch-at-login
- packaged installers
- secure local secret storage through Electron
safeStorage
Main process responsibilities
- enforce the single-instance lock
- create the overlay window and the setup/settings window
- register the global hotkey
- gate overlay invocation until onboarding is complete
- create and update the tray/menu-bar menu
- persist config through Electron
userData - migrate legacy config from
~/.flowlens/config.json - encrypt and decrypt API keys through the secret store
- check platform and permission status
- hide the overlay before screen capture
- assemble the invocation pipeline
- resolve the active OpenAI-compatible provider
- call ElevenLabs STT and TTS
- handle update checks, launch-at-login, diagnostics, and cleanup reset
Renderer and preload responsibilities
- render the overlay, setup wizard, and settings window
- expose narrow IPC methods through preload
- start and stop microphone capture
- stream audio chunks to main before sending
flowlens:audio-stop - compute live VU levels for the Matrix component
- wait through natural pauses using speech-turn detection
- play TTS audio chunks streamed from main
- render structured response cards and copyable output
- request settings, voice lists, permissions, diagnostics, cleanup, and update checks through IPC
Runtime flow
Hotkey gate
The global shortcut first checks onboarding status. If setup is incomplete, the setup window opens. If complete, the invocation pipeline starts.
Screen capture
The main process hides the overlay, waits briefly for the window manager, captures the primary screen through desktopCapturer, and restores the overlay.
Audio capture
The renderer creates MicCapture, records with MediaRecorder, streams chunks to main, and exposes analyser data for the matrix visualizer.
Voice and provider
Audio goes to ElevenLabs scribe_v2. The transcript, screenshot, active mode, and conversation state go to the active provider.
Structured answer
The provider response is parsed into spoken_summary, card_content, clarifying_question, and actionable_output.
Response playback
The overlay renders the answer. If voice playback is enabled, ElevenLabs TTS streams audio chunks back to the renderer.
Provider adapter layer
The current adapter is OpenAI-compatible, with small provider compatibility branches instead of separate full adapters. It reads:
providerKeyfor the active secretproviderBaseUrlfor the API rootproviderProtocolfor the wire protocolmodelfor the request body
For standard providers, the adapter posts to /chat/completions, sends the screenshot as an image content part, requests JSON output, and validates the structured response before the overlay sees it.
Provider-specific behavior is isolated here:
| Provider family | Handling |
|---|---|
| OpenAI-compatible | Normal chat-completions payload with image content and native JSON response format |
| Gemini compatible endpoint | Uses Google's OpenAI-compatible base URL with the same screenshot-plus-text payload shape |
| OpenCode Go | Infers openai-chat, anthropic-messages, or alibaba-chat behavior per model; omits unsupported JSON response-format parameters; disables the short FlowLens timeout for long-running calls; falls back from prompt-only markdown into a structured response when needed |
The goal is to keep the rest of the app provider-agnostic. The overlay, setup flow, response card, TTS, and copyable output continue to work against the same response contract.
Structured response contract
{
"spoken_summary": "Short answer for TTS and compact UI.",
"card_content": "Markdown body for the overlay.",
"clarifying_question": null,
"actionable_output": "Copy-ready final text."
}This contract keeps the UI stable. The model can reason freely, but it must return a predictable shape.
Settings and onboarding architecture
Setup and settings are normal BrowserWindows loaded with a role query:
role=setuploads the first-run wizardrole=settingsloads the full settings surface
The wizard saves draft settings, runs connection checks, and blocks completion until provider, ElevenLabs, microphone, and screen checks pass. The tray and hotkey both rely on the same onboarding status.
Overlay layout and positioning
Overlay size is state-driven:
| State | Typical size |
|---|---|
| recording | compact recording layout |
| processing | compact analysis layout |
| response compact | scrollable response layout |
| response expanded | larger reading layout |
| settings | bounded 460 x 560 layout |
The overlay defaults to bottom right. If the user drags it, the main process persists a custom top-left position with display ID and clamps it into the nearest work area on resize.
Security and privacy boundaries
| Boundary | Decision |
|---|---|
| Raw API keys | Stored encrypted in main-process secret store |
| Renderer settings | Receives masked key status only |
| Screenshot capture | Explicit invocation only |
| Microphone capture | Active request only |
| Diagnostics | Redacts API keys, auth tokens, screenshots, audio, transcripts, and response-like content |
| Factory reset | Clears FlowLens-owned settings, secrets, logs, and overlay position |
Packaged lifecycle
FlowLens uses electron-builder with:
- Windows NSIS and portable targets
- macOS DMG and ZIP targets
- app ID
com.flowlens.desktop - GitHub Releases as the update provider
- extra tray icon resource packaging
electron-updaterstatus tracking
The packaged app removes the developer-run loop from normal use. A user installs once, finishes onboarding in a normal setup window, optionally enables launch-at-login, and then interacts through the global hotkey and tray/menu-bar menu. The app stays alive in the background after windows close unless the user chooses Quit.