Frontier Models and Open Source
Closed-source models still lead the frontier, but open-source is closing the gap fast. What this means for organizations building AI products and the future of model access.
The Gap Is Closing
For most of AI's recent history, the most capable models have been closed-source — proprietary systems from well-funded labs, accessible only through paid APIs. GPT-4, Claude, Gemini. If you wanted the best, you paid for it and accepted the terms.
That dynamic is shifting. Open-source and open-weight models are now performing within striking distance of their closed-source counterparts — and in some cases, surpassing them. According to Epoch AI's systematic analysis, the best open-weight models now lag the most capable closed models by approximately three months on a holistic capability index. A year ago, that gap was measured in years.
This isn't a minor technical footnote. It's reshaping who can build sophisticated AI products and at what cost.
Where Closed-Source Still Leads
The leading closed-source models remain at the absolute frontier:
Claude Opus 4.6 leads on coding tasks, scoring highest on SWE-bench Verified — the benchmark that measures real-world software engineering capability. It's the model powering Claude Code and the backbone of Anthropic's agentic ecosystem.
GPT-5.2 dominates pure mathematical reasoning, achieving a perfect score on AIME 2025 and the highest marks on FrontierMath — the hardest open reasoning benchmark.
Gemini 3 Pro became the first model to surpass 1500 Elo on LMArena, with a million-token context window and state-of-the-art multimodal processing.
What's notable is that no single closed model dominates everything. The frontier is specialized — different models lead in different domains.
The enduring advantages of closed-source models tend to be in areas harder to measure on benchmarks: safety and alignment polish, instruction-following nuance, reliability under edge cases, enterprise support infrastructure, and the integrated product experiences built around them.
Where Open-Source Has Caught Up
The numbers tell the story:
General knowledge. On MMLU — the standard benchmark for broad knowledge — the gap between the best open and closed models shrank from 17.5 percentage points to just 0.3 points in a single year. DeepSeek V3.2 scores 94.2%, effectively tying the best proprietary models.
Mathematics. Qwen3 from Alibaba scores 92 out of 100 on AIME 2025 and outperforms OpenAI's O3 on HMMT, an advanced math competition benchmark — using only 22 billion active parameters via mixture-of-experts architecture.
Coding. GLM-4.7 from Zhipu AI achieves 91.2% on SWE-bench, surpassing nearly all proprietary models on real-world software engineering tasks. Kimi K2.5 from Moonshot AI leads open-source rankings in coding and reasoning benchmarks under a fully open license.
Conversational quality. Meta's Llama 4 Maverick crossed 1400 Elo on LMArena, outperforming GPT-4o, DeepSeek V3, and Gemini 2.0 Flash. Google's Gemma 3 at just 27 billion parameters beats models fifteen times its size on the same benchmark.
On well-defined academic benchmarks, several open-source models now match or exceed proprietary models. The remaining frontier advantages show up in the harder-to-quantify dimensions of real-world reliability and product integration.
The Cost Equation
This is where open-source becomes transformative for organizations building AI products.
Closed-source API pricing averages around $1.86 per million tokens. Open model APIs through providers like Together.ai and Fireworks.ai start at $0.20-0.50 per million tokens. Self-hosted open models can run at a fraction of even those costs.
For organizations processing high volumes — tens of millions of tokens per month — the savings compound dramatically. Self-hosting a Llama 70B model breaks even against GPT-4 API costs at roughly 20-30 million tokens per month. Beyond that threshold, savings scale linearly.
The efficiency innovations driving this are architectural. Mixture-of-experts models like DeepSeek V3.2 (685B total parameters, 37B active per token) and Qwen3-235B (22B active per token) achieve frontier performance while using a fraction of the compute per inference. You get large-model quality at small-model cost.
This math matters because it determines who can afford to build AI products. When running a sophisticated AI feature costs $0.20 per million tokens instead of $2.00, the barrier to entry drops by an order of magnitude. Startups, independent developers, and organizations in cost-sensitive industries can build products that were previously only viable for well-funded tech companies.
The DeepSeek Moment
If there's a single event that crystallized this shift, it's DeepSeek R1's release in January 2025. Trained for an estimated $6 million — compared to the tens of billions spent by Western labs — R1 matched OpenAI o1's reasoning capabilities. Released under the MIT license, it demonstrated that frontier-level reasoning could be achieved at a fraction of the cost.
The implications went beyond the model itself. DeepSeek R1 validated that reasoning ability can be developed through reinforcement learning without expensive supervised fine-tuning, suggesting that the cost floor for training capable models is far lower than previously assumed.
The industry's response was immediate. Within months, multiple open-source models incorporated similar techniques, and the efficiency-focused development paradigm became the default for new open-source projects.
The Tooling Ecosystem
Raw model weights aren't useful without infrastructure to run them. The open-source ecosystem has matured rapidly on this front:
Inference serving. vLLM (from UC Berkeley, now with Red Hat as primary corporate contributor) has become the production standard for deploying open models at scale, with support for multi-modal models and heterogeneous pipelines.
Local deployment. Ollama, LM Studio, and Jan have made running models locally as simple as a single command. These tools handle model downloading, quantization, and serving with minimal configuration — enabling experimentation on consumer hardware.
Drop-in API compatibility. LocalAI provides a drop-in replacement for the OpenAI API that runs entirely locally, supporting multiple model formats and recently adding Anthropic API compatibility. Organizations can migrate from closed APIs to self-hosted open models without rewriting application code.
Coding tools. Tabby provides a self-hosted alternative to GitHub Copilot. Continue.dev integrates open models into VS Code and JetBrains. These tools are particularly valuable for organizations in regulated industries that cannot send code to external servers.
This infrastructure layer means that choosing an open model doesn't mean building everything from scratch. The deployment and integration problem has largely been solved.
What This Means for Building Products
For us at Koobo, the practical impact of the open-source surge is about expanding what's possible:
Model selection becomes a real choice. Instead of defaulting to the most expensive frontier model for everything, we can match models to tasks. Use a frontier closed model for the hardest reasoning tasks. Use an efficient open model for high-volume processing. Use a small local model for prototyping and development.
Cost-sensitive features become viable. Features that would be prohibitively expensive with closed APIs — like running analysis on every document in a large corpus, or providing real-time assistance to every user simultaneously — become feasible with open models.
Deployment flexibility. Some customers need on-premise deployment. Some industries have strict data residency requirements. Open models can run anywhere — a cloud instance, a private data center, or a laptop.
Reduced dependency risk. Building on a single provider's API creates a single point of failure. When you can swap between providers and self-hosted models, you're more resilient to pricing changes, service disruptions, or capability regressions.
The Hybrid Future
The frontier isn't a binary choice between open and closed. The most effective approach for building sophisticated AI products is hybrid — using each where it's strongest.
Closed-source models excel when you need the absolute best reasoning, the strongest safety guarantees, integrated tooling ecosystems (like Claude Code's agentic capabilities), or enterprise support contracts.
Open-source models excel when you need cost efficiency at scale, deployment flexibility, domain-specific fine-tuning, data privacy guarantees, or independence from any single provider.
The organizations building the most capable AI products will be fluent in both — choosing the right model for each task, mixing providers freely, and keeping their architectures flexible enough to adopt new models as the landscape continues to evolve.
What's clear is that the days of closed-source models being the only serious option are over. The frontier is broadening, costs are dropping, and the tools available to builders have never been better.