Architecture

Electron boundaries, setup gating, encrypted secrets, provider adapters, voice services, and packaged lifecycle.

System shape

Desktop shell

Main process

Owns windows, hotkeys, tray/menu-bar lifecycle, config, encrypted secrets, permissions, updates, diagnostics, screen capture, STT, TTS, and provider orchestration.

UI boundary

Renderer + preload

Renders overlay/setup/settings, records microphone audio, streams chunks and VU levels, plays TTS audio, and talks to main only through typed IPC.

External services

Voice + reasoning

ElevenLabs handles STT and TTS. An OpenAI-compatible provider handles screenshot-plus-transcript reasoning.

The main process owns privileged work and secrets. The renderer owns interaction, audio capture, visualization, and presentation.

Electron app structure

FlowLens is a desktop-first Electron app because the product depends on OS-level behavior:

global shortcuts
transparent always-on-top overlay windows
screen capture
tray/menu-bar background control
launch-at-login
packaged installers
secure local secret storage through Electron safeStorage

Main process responsibilities

enforce the single-instance lock
create the overlay window and the setup/settings window
register the global hotkey
gate overlay invocation until onboarding is complete
create and update the tray/menu-bar menu
persist config through Electron userData
migrate legacy config from ~/.flowlens/config.json
encrypt and decrypt API keys through the secret store
check platform and permission status
hide the overlay before screen capture
assemble the invocation pipeline
resolve the active OpenAI-compatible provider
call ElevenLabs STT and TTS
handle update checks, launch-at-login, diagnostics, and cleanup reset

Renderer and preload responsibilities

render the overlay, setup wizard, and settings window
expose narrow IPC methods through preload
start and stop microphone capture
stream audio chunks to main before sending flowlens:audio-stop
compute live VU levels for the Matrix component
wait through natural pauses using speech-turn detection
play TTS audio chunks streamed from main
render structured response cards and copyable output
request settings, voice lists, permissions, diagnostics, cleanup, and update checks through IPC

Runtime flow

Hotkey gate

The global shortcut first checks onboarding status. If setup is incomplete, the setup window opens. If complete, the invocation pipeline starts.

Screen capture

The main process hides the overlay, waits briefly for the window manager, captures the primary screen through desktopCapturer, and restores the overlay.

Audio capture

The renderer creates MicCapture, records with MediaRecorder, streams chunks to main, and exposes analyser data for the matrix visualizer.

Voice and provider

Audio goes to ElevenLabs scribe_v2. The transcript, screenshot, active mode, and conversation state go to the active provider.

Structured answer

The provider response is parsed into spoken_summary, card_content, clarifying_question, and actionable_output.

Response playback

The overlay renders the answer. If voice playback is enabled, ElevenLabs TTS streams audio chunks back to the renderer.

Provider adapter layer

The current adapter is OpenAI-compatible, with small provider compatibility branches instead of separate full adapters. It reads:

providerKey for the active secret
providerBaseUrl for the API root
providerProtocol for the wire protocol
model for the request body

For standard providers, the adapter posts to /chat/completions, sends the screenshot as an image content part, requests JSON output, and validates the structured response before the overlay sees it.

Provider-specific behavior is isolated here:

Provider family	Handling
OpenAI-compatible	Normal chat-completions payload with image content and native JSON response format
Gemini compatible endpoint	Uses Google's OpenAI-compatible base URL with the same screenshot-plus-text payload shape
OpenCode Go	Infers `openai-chat`, `anthropic-messages`, or `alibaba-chat` behavior per model; omits unsupported JSON response-format parameters; disables the short FlowLens timeout for long-running calls; falls back from prompt-only markdown into a structured response when needed

The goal is to keep the rest of the app provider-agnostic. The overlay, setup flow, response card, TTS, and copyable output continue to work against the same response contract.

Structured response contract

{
  "spoken_summary": "Short answer for TTS and compact UI.",
  "card_content": "Markdown body for the overlay.",
  "clarifying_question": null,
  "actionable_output": "Copy-ready final text."
}

This contract keeps the UI stable. The model can reason freely, but it must return a predictable shape.

Settings and onboarding architecture

Setup and settings are normal BrowserWindows loaded with a role query:

role=setup loads the first-run wizard
role=settings loads the full settings surface

The wizard saves draft settings, runs connection checks, and blocks completion until provider, ElevenLabs, microphone, and screen checks pass. The tray and hotkey both rely on the same onboarding status.

Overlay layout and positioning

Overlay size is state-driven:

State	Typical size
recording	compact recording layout
processing	compact analysis layout
response compact	scrollable response layout
response expanded	larger reading layout
settings	bounded `460 x 560` layout

The overlay defaults to bottom right. If the user drags it, the main process persists a custom top-left position with display ID and clamps it into the nearest work area on resize.

Security and privacy boundaries

Boundary	Decision
Raw API keys	Stored encrypted in main-process secret store
Renderer settings	Receives masked key status only
Screenshot capture	Explicit invocation only
Microphone capture	Active request only
Diagnostics	Redacts API keys, auth tokens, screenshots, audio, transcripts, and response-like content
Factory reset	Clears FlowLens-owned settings, secrets, logs, and overlay position

Packaged lifecycle

FlowLens uses electron-builder with:

Windows NSIS and portable targets
macOS DMG and ZIP targets
app ID com.flowlens.desktop
GitHub Releases as the update provider
extra tray icon resource packaging
electron-updater status tracking

The packaged app removes the developer-run loop from normal use. A user installs once, finishes onboarding in a normal setup window, optionally enables launch-at-login, and then interacts through the global hotkey and tray/menu-bar menu. The app stays alive in the background after windows close unless the user chooses Quit.