As the race to develop ever-larger large language models (LLMs) intensifies, a growing body of research challenges the prevailing assumption that bigger is always better. Since the launches of formidable AI systems like GPT-4 and Claude, major technology players and investors have poured billions into expanding computational capacities. A striking example of this surge is The Stargate Project, a $500 billion initiative led by OpenAI, SoftBank, Oracle, and Abu Dhabi’s MGX aiming to build sprawling AI data centres and energy infrastructure, beginning with a 1.2 gigawatt facility in Texas. Oracle alone plans to purchase around 400,000 of Nvidia’s most advanced GB200 chips to power this ambitious site, intended to be among the world’s largest AI infrastructure complexes once operational around mid-2026.
Despite this emphasis on sheer scale, a recent position paper from NVIDIA Research provocatively argues that small language models (SLMs) are not only good enough for many tasks but are in fact better suited for agentic AI applications. Agentic AI refers to systems that automate subtasks—like scheduling, generating documents, code execution, or API calls—in precise, highly repetitive, and narrowly scoped contexts. Unlike broad, conversational LLMs designed for general-purpose language understanding, agentic systems demand models that are fast, economical, and reliable rather than exhaustive in knowledge.
The paper reveals that many existing agentic AI frameworks actually use LLMs inefficiently. An analysis of popular open-source systems such as MetaGPT and Open Operator showed that between 40% to 70% of LLM queries could be replaced by well-tuned SLMs—models that are compact enough to run with low latency on personal devices like smartphones or laptops. Notably, models with fewer than 10 billion parameters are now demonstrating performance on par with or exceeding older, larger LLMs in key benchmarks. Microsoft’s Phi-2 model, for example, with 2.7 billion parameters, matches far larger 30 billion parameter models in code generation and common sense reasoning, while operating 15 times faster. Similarly, NVIDIA’s Hymba-1.5B and Hugging Face’s SmolLM2 series showcase impressive instruction-following and tool use, competing with models many times their size.
The advantages of SLMs go beyond speed and efficiency. They bring better reliability in delivering narrowly formatted outputs—an essential feature in agentic systems where a hallucinated or inconsistent response can disrupt entire workflows. This reliability is critical in safety-sensitive applications. Additionally, SLMs are cheaper to train and run, opening possibilities for more equitable access in regions like India, where steep subscription fees for large commercial models are prohibitive.
The research thus advocates a modular, “LEGO brick” approach to AI, favouring assembling diverse small, specialised models for most routine tasks, supplemented by large models only where absolutely necessary. This heterogeneous architecture promises enhanced scalability, simpler debugging, and reduced operational costs, aligning better with real-world AI deployment needs.
The evolution of SLMs is also supported by advances in training methodologies. Microsoft’s Phi-4 model incorporates extensive synthetic data to prioritise reasoning and problem-solving, proving that carefully curated data quality is more impactful than sheer volume. The company emphasises that such models avoid the information overload of large models trained on indiscriminate datasets, instead honing specialised competencies efficiently.
While OpenAI and partners like SoftBank and Oracle continue to invest vast sums into expansive AI infrastructure projects like Stargate—an endeavour highlighted by President Donald Trump as a symbol of American technological leadership—the emerging research underscores that the future of many AI applications, especially agentic systems, may lie in nimble, focused SLMs rather than colossal generalists. The tension between these approaches reflects a broader industry debate about where to place bets in AI’s next phase: massive scale or agile specialization.
Both strategies have their place. Large LLMs excel in universal natural language understanding and open-ended tasks requiring broad knowledge. Meanwhile, the research from NVIDIA and others makes a compelling, evidence-backed case that for many practical AI tasks—those driven by precision, speed, and economic operation—small language models represent a more sensible, scalable, and sustainable path forward.
Source: Noah Wire Services



