Cloudflare expands Workers AI with large language model capabilities, starting with Kimi K2.5. Lower inference costs and optimized stacks mean your agent workflows just got cheaper to run.

Run large language models at the edge with lower latency and reduced costs for agent workflows, without leaving the Cloudflare ecosystem.
Signal analysis
Here at industry sources, we tracked a significant gap: Cloudflare's Workers AI platform handled smaller models well, but large language models remained out of reach for edge-deployed workloads. That changes now. Cloudflare has added native support for large models on Workers AI, with Kimi K2.5 as the launch offering. This matters because it removes a deployment constraint for builders running inference workloads on Cloudflare's global edge network.
Kimi K2.5 brings multi-modal capabilities (text and image understanding) to the platform, paired with optimizations to the inference stack itself. The real differentiator here isn't just model availability - it's the cost reduction. Cloudflare is pricing this specifically for agent use cases, which typically involve repeated inference calls and context management. If you're building agents, this is a direct cost play.
The inference stack optimization is the technical linchpin. Cloudflare has tuned how models load, execute, and unload on their workers infrastructure. This means lower latency and fewer cold-start penalties, which compounds when you're running agents that make multiple sequential calls.
For builders, the question is immediate: does this replace your current inference strategy, or does it complement it? The answer depends on three factors: latency requirements, cost sensitivity, and model selection.
If you're currently routing inference requests to external APIs (OpenAI, Anthropic, or even Kimi's direct offering), Cloudflare's edge deployment cuts request latency significantly. Your agent calls execute closer to your users and your data. For chat-heavy applications or real-time agent interactions, this is material. If you're in regions where network latency to API providers is already a pain point, this becomes critical.
Cost-wise, Cloudflare's pricing model for agent workloads is worth modeling out. Agent frameworks make many small inference calls - token counting, intermediate reasoning, tool selection. Each call hits an API provider in traditional setups. Running on Cloudflare's edge, batched and optimized, reduces the per-call overhead. Run the math on your agent call volume before deciding.
This move positions Cloudflare directly against Vercel's Edge AI, AWS Lambda inference, and specialized platforms like Together AI. But Cloudflare has an asymmetric advantage: they already own the edge infrastructure and routing layer. Adding inference is vertical integration, not bolted-on capability. Builders already using Cloudflare for CDN, DDoS protection, or Workers compute now have a native inference option without platform switching.
The choice of Kimi K2.5 as launch model is strategic. Kimi is strong on reasoning and multi-modal tasks, areas where agents perform heavily. This isn't a me-too play with generic models - Cloudflare picked a model optimized for the agent workflow narrative. More model options will follow, but the beachhead is clear.
What you're seeing is the infrastructure layer becoming AI-native. Six months ago, edge inference was novelty. Now it's table stakes for platforms competing for builder mindshare. Cloudflare moving here signals that inference on the edge isn't experimental anymore. The momentum in this space continues to accelerate.
Best use cases
Open the scenarios below to see where this shift creates the clearest practical advantage.
One concise email with the releases, workflow changes, and AI dev moves worth paying attention to.
More updates in the same lane.
The latest Cursor update enhances AI tool integration, streamlining developer workflows and increasing productivity.
Unlock new productivity with the latest Cursor update, featuring enhanced AI tools for developers.
OpenAI's recent update introduces enhanced features that streamline developer workflows and boost automation capabilities.