How FlowLens Works

The runtime flow from hotkey press to overlay answer, including setup gating, screen capture, STT, multimodal analysis, and TTS playback.

The current runtime starts with setup gating, then runs one explicit capture and one voice turn with at most one clarifying follow-up.

Global hotkey

The user presses the configured shortcut. If onboarding is incomplete, FlowLens opens setup. If complete, the overlay invocation begins.

Overlay and capture

The overlay appears, then the main process hides it long enough to capture the primary screen through Electron desktopCapturer.

Microphone capture

The renderer starts MicCapture with the saved microphone device ID. It streams audio chunks to main and exposes analyser data for the Matrix visualizer.

Speech turn detection

FlowLens waits for speech to start, tolerates short pauses, and ends the turn after sustained trailing silence or a max-duration cap.

Speech-to-text

The main process sends the completed audio buffer to ElevenLabs scribe_v2 and receives transcript text plus duration metadata.

Multimodal request assembly

FlowLens sends the screenshot, transcript, selected mode, and optional prior turn state to the configured OpenAI-compatible provider.

Structured response

The provider response is parsed and validated into spoken_summary, card_content, clarifying_question, and actionable_output.

Overlay rendering

The overlay shows the answer in a scrollable card and exposes copy, retry, cancel, dismiss, and follow-up paths where relevant.

Spoken playback

If voice playback is enabled, ElevenLabs TTS reads the short summary and streams audio chunks back to the renderer.

Runtime properties that matter

The screen is captured only on explicit invocation.
The microphone is opened only for the active request.
Audio chunks are delivered before the final stop signal.
The matrix visualizer receives live VU levels instead of static frames during recording.
Config is read per invocation, so changed provider, model, voice, and microphone settings take effect without restarting.
TTS stops on cancel or dismiss.
Setup must be valid before the overlay can invoke the pipeline.

Why this flow works well in a demo

Every step is visible or explainable. The trigger is obvious, the matrix confirms listening, the overlay shows processing, the structured response is easy to scan, and the spoken summary makes the result feel immediate.