
Google launches Gemini 3 and embeds it directly into Search
Google announced Gemini 3 on November 18, presenting it as its strongest general-purpose AI model and integrating it into Google Search on the same day. The first variant released, Gemini 3 Pro, was made available immediately to AI Pro and Ultra subscribers and to developers via the Gemini API and Vertex AI. A reasoning-oriented mode, Gemini 3 Deep Think, began rolling out to AI Ultra users in the weeks that followed; Gemini 3 Flash, the lower-latency variant, shipped on December 17.
In a company blog post, Google CEO Sundar Pichai described Gemini 3 as "our most intelligent model" and emphasised same-day integration into Search and the Gemini app. Reuters and the New York Times reported that Google chose to skip a staged rollout, instead making the new model the default behind Search's AI Mode for English-language users from launch day.
On benchmarks, Gemini 3 set a new state of the art on Humanity's Last Exam, surpassing the previous high held by GPT-5 Pro at 31.64. The New York Times cited a 72% accuracy figure on a standard benchmark test, a measurable improvement over Gemini 2.5. The model also topped LMArena, the human-preference leaderboard. Google's reported SWE-bench Verified score for Gemini 3 Pro was 69.6% — behind specialised agentic systems but the highest reported for a general-purpose API model at launch. (Higher coding scores reported elsewhere refer to the later Gemini 3.1 Pro release.)
Alongside the model, Google introduced Antigravity, a new agent-style coding interface, and shipped multiple agentic features in the Gemini app — including a "Gemini Agent" capable of multi-step tasks across calendar, mail, and the browser. The company described all three (Search integration, Antigravity, agentic Gemini app) as proof points that the model is meant to be exposed through products from day one rather than positioned primarily as a developer API.

Reaction across the field was rapid. OpenAI shipped no immediate counter-release; analysts at major banks read Gemini 3's same-day Search integration as Google asserting that distribution, not benchmark margin, is the dominant moat in this cycle. Smaller frontier labs and academic researchers focused on Deep Think's reasoning mode and on the question of whether Gemini 3 had narrowed the practical gap with GPT-5 in everyday use, beyond benchmark numbers.
For us, the relevant signal in this release is not the benchmark delta but the choice to ship Gemini 3 into Search on day one. It confirms a thesis we have been operating on: leading platforms now treat their best model as a product layer of an existing surface, not as a standalone product. For a small studio building characters, voice, and persona, that means the value will increasingly sit in what an experience feels like end-to-end — voice latency, character memory, a coherent personality — rather than in any single model API call.
We will keep tracking three things: (1) whether Deep Think's published reasoning gains translate into concrete improvements for the long, multi-turn conversation that our character work depends on; (2) how Gemini 3 in Japanese — particularly with mixed Japanese / English / Chinese context, which our products care about — compares with GPT-5 and Claude Opus 4.5 on real workloads; (3) whether Google further unbundles the model layer (cheaper Flash, longer-context Pro, more permissive Vertex licensing) for studios building product-level applications on top.


