Clarifying Follow-Up

One extra voice turn when FlowLens needs just enough context to finish the answer well.

What it does

If the first multimodal pass cannot answer confidently, FlowLens can ask a single clarifying question and give the user one more voice turn to resolve the ambiguity.

Input

the initial screenshot
the first transcript
the first assistant response
one spoken follow-up answer from the user

Output

an updated structured response that replaces the earlier draft
a better final actionable_output
TTS playback for the new short summary if voice playback is enabled
the same overlay experience, now grounded by the additional detail

This is the smallest possible conversational loop that still improves quality. It gives the model a way to recover when the screenshot is incomplete without turning the product into a full chat session.

Current implementation intentionally does not do

open-ended conversations
more than one follow-up turn
persistent memory across separate invocations
threaded session history

What it does

Input

Output

Why it is useful

Current implementation intentionally does not do

On this page