2025 | Trends in Artificial Intelligence (131/340)

130 AI Model Compute Costs High / Rising + Inference Costs Per Token Falling = Performance Converging + Developer Usage Rising To understand where AI model economics may be heading, one can look at the mounting tension between capabilities and costs. Training the most powerful large language models (LLMs) has become one of the most expensive / capital-intensive efforts in human history. As the frontier of performance pushes toward ever-larger parameter counts and more complex architectures, model training costs are rising into the billions of dollars. Ironically, this race to build the most capable general-purpose models may be accelerating commoditization and driving diminishing returns, as output quality converges across players and differentiation becomes harder to sustain. At the same time, the cost of applying/using these models – known as inference – is falling quickly. Hardware is improving – for example, NVIDIA’s 2024 Blackwell GPU consumes 105,000x less energy per token than its 2014 Kepler GPU predecessor. Couple that with breakthroughs in models’ algorithmic efficiency, and the cost of inference is plummeting. Inference represents a new cost curve, and – unlike training costs – it’s arcing down, not up. As inference becomes cheaper and more efficient, the competitive pressure amongst LLM providers increases – not on accuracy alone, but also on latency, uptime, and cost-per-token*. What used to cost dollars can now cost pennies. And what cost pennies may soon cost fractions of a cent. The implications are still unfolding. For users (and developers), this shift is a gift: dramatically lower unit costs to access powerful AI. And as end-user costs decline, creation of new products and services is flourishing, and user and usage adoption is rising. For model providers, however, this raises real questions about monetization and profits. Training is expensive, serving is getting cheap, and pricing power is slipping. The business model is in flux. And there are new questions about the one-size-fits-all LLM approach, with smaller, cheaper models trained for custom use cases** now emerging. Will providers try to build horizontal platforms? Will they dive into specialized applications? Only time will tell. In the short term, it’s hard to ignore that the economics of general-purpose LLMs look like commodity businesses with venture-scale burn. *Cost-per-token = The expense incurred for processing or generating a single token (a word, sub-word, or character) during the operation of a language model. It is a key metric used to evaluate the computational efficiency and cost-effectiveness of deploying AI models, particularly in applications like natural language processing. **E.g., OpenEvidence

2025 | Trends in Artificial Intelligence - Page 131

2025 | Trends in Artificial Intelligence Page 130 Page 132