Apple spent months promising a smarter, more personal Siri that would lean on the company’s vaunted on‑device silicon. But as WWDC approaches, the picture looks messier: Apple is distilling Google’s massive Gemini models to run locally where it can, while routing tougher queries into Google Cloud—powered in part by Nvidia’s confidential compute technology.
The short version: Apple is trying to have its privacy cake and eat it too. Distillation will shrink Gemini into a tamer model that can plausibly run on iPhones and Macs. For the heavy lifting, Apple appears to be licensing and using Google’s cloud-hosted Gemini variants on Nvidia hardware—encrypted while in use—to make Siri feel genuinely capable.
How distillation and confidential compute fit together
Distillation is the engineering trick Apple is reportedly using. Take a behemoth model—Google’s Gemini at an effective scale some reports describe as having trillions of parameters—and train a much smaller model to mimic its behavior. The result isn’t identical intelligence, but it can capture many of the same conversational skills at far lower memory and compute cost. That’s how Apple hopes to give iPhones a smarter assistant without requiring monster servers for every little task.
Still, distillation has limits. Mobile NPUs and Neural Engines are optimized for efficient, contextual tasks: image fixes, transcription, or predictive typing. They don’t have RAM or raw throughput to host multi‑trillion‑parameter networks. For queries that demand the full weight of Gemini, Apple has reportedly struggled to run the undistilled model on its own Private Cloud Compute infrastructure (the same Apple‑silicon Macs it prefers to tout). So it’s turning outward.
Apple’s workaround: some requests will be handled in Google Cloud using Gemini, but processed on Nvidia GPUs with confidential compute enabled. Confidential compute encrypts data while it’s being processed, which helps Apple argue it isn’t just handing user data over to a third party in the clear. The tradeoff: encrypted processing is slower and can introduce additional latency—something users might notice when Siri pauses to answer.
What users will actually see
Apple will almost certainly market this as a hybrid approach: fast, private local handling when possible; remote compute when necessary. In practice, that could mean simple context-aware commands, transcription, and device-centric tasks are handled on your iPhone, while more involved multi‑step chats, deep knowledge pulls, or complex reasoning jump to the cloud.
It’s also unlikely that Siri will tell you which engine answered your question. Companies prefer a "seamless" experience; but seamless can obscure meaningful privacy tradeoffs. Confidential compute helps—but it isn’t the same as keeping everything on the device.
Why Apple changed direction
A few forces nudged Apple here. First: expectations. Consumers now expect assistants that can synthesize, reason, and follow long conversations—skills still easiest to deliver with very large models. Second: engineering reality. Apple’s distillation work can go a long way, but not far enough for every use case. And third: time. With WWDC weeks away, Apple needs to show forward movement on the Siri promises from 2024.
Apple has reportedly also been scouting acquisitions—startups like Liquid AI, which focus on running capable models on edge hardware—to speed that on‑device progress. But buying technology takes time, and the immediate fix is a cloud hybrid.
What this means for WWDC and beyond
At WWDC Apple will almost certainly emphasize on‑device wins—how years of custom silicon give it an advantage, how private inference can keep personal data local for many tasks. If you want background on Apple’s existing device‑focused AI work, its recent smaller advances like Personal Voice and other features are a useful reminder of what local models already do well iOS 26's on‑device AI features. And the new Siri is expected to arrive as part of the iOS 27 refresh, where Apple has been testing a standalone Siri app and multitasking extensions that hint at a chatty, context‑aware assistant Siri’s redesign and multitasking tests.
Whether users will accept a hybrid answer depends on how Apple tells the story. Retaining the "Private Cloud Compute" label—even when routing some work through Google Cloud on Nvidia silicon—gives Apple a branding cushion. But branding isn’t encryption: users and regulators will care about who sees the data, where it is decrypted, and under what legal jurisdiction.
Expect a tactful WWDC message: local AI for everyday privacy, cloud for the heavy reasoning, and new marketing language about encrypted processing to soothe concerns. Engineers will keep squashing latency and refining the distilled models. And Apple’s acquisition and optimization work will continue in the background—because the company clearly views on‑device AI as a long game, not a one‑weekend patch.
The coming Siri may feel smarter than what shipped last year. But it will be the product of compromise: clever on‑device tricks where possible, and a quiet partnership with Google and Nvidia where necessary. That compromise might please users who want results; it may leave privacy purists asking whether the promise of fully local AI ever truly existed.




