Google has added Agentic Vision functionality to its Gemini 3 Flash model. The company says it combines visual reasoning and code execution to provide answers based on visual evidence. Google says this feature fundamentally changes the way AI models process images.
Introduced on January 27, Agentic Vision is available through the Gemini API in Google AI Studio development tools and Vertex AI in the Gemini app.
Gemini Flash’s Agentic Vision transforms image understanding from a static act to an agentic process, Google says. Through a combination of visual reasoning and code execution, the model develops a plan to incrementally grow, inspect, and manipulate images. Until now, multimodal models typically processed the world with a single, static line of sight. According to Google, if you missed small details like a serial number or a sign in the distance, you had to guess. In contrast, Agentic Vision transforms image understanding into active exploration, introducing the agent’s “think-act-observe” loop into image understanding tasks, the company said.
