Peaking? Token burnout? This might be the most crucial chart in the entire market
Token expenditure growth is showing signs of fatigue, and the market’s core focus on AI is rapidly shifting from “whether the technology is feasible” to “whether the cost is affordable”.
On June 9th, Andreas Steno Larsen, a macro strategist, stated on social media that the trend of the Silicon Data LLM Token Expenditure Index is the most noteworthy chart in the entire market at present.
The index has more than doubled since December last year and climbed significantly until May 2026, but has recently experienced a decline. Andreas Steno Larsen warned that if the pricing of tokens continues to weaken, the current cycle, ranging from memory to broader hardware and data center transactions, may come to an end.
Meanwhile, tech giants are urgently trying to contain the uncontrolled consumption of AI computing power within their organizations.
Wallstreetcn previously mentioned that Amazon and Microsoft are cutting back on internal AI tools or halting projects that track usage, in order to combat the “Tokenmaxxing” behavior of employees who consume computing power ineffectively to boost their internal rankings.
On the server side, GitHub Copilot switched its billing model from per request to per token on June 1st, causing some users’ monthly bills to skyrocket by over ten times, sparking widespread market skepticism about the sustainability of the AI subsidy model.
This series of signals is reshaping investors’ risk assessment of AI infrastructure transactions. Marginal changes in token spending directly affect the capital expenditure expectations of NVIDIA, memory chip manufacturers, and cloud service providers through the transmission chain of GPU computing power, DRAM memory, and data center demand.
01 Indicator peaks: Hardware trading logic faces challenges. The Silicon Data LLM Token Spending Index is an indicator weighted by expenditure, measuring the payment price per million LLM Tokens across the entire market. It is regarded as a proxy indicator for the market’s willingness to pay marginally for AI.
Due to major providers such as OpenAI, Anthropic, and Google billing customers based on token consumption, token expenditure directly ties AI usage to the demand for GPUs, DRAM, and data centers.
The recent stagnation of this index has sparked concerns in the capital market regarding the hardware cycle. Silicon Data’s commentary points out that the recent decline may indicate a slowdown in the migration to high-end closed-source models. If Token spending continues to be weak, the marginal revenue to fund incremental GPU, DRAM, and data center procurement will weaken, which will alter the risk profile of companies that have formulated capital expenditure plans around Token-driven growth.
Although a single decline does not constitute an absolute trend, as a leading indicator of the hardware cycle, this data suggests that enterprises’ reliance on high-cost cutting-edge models may face a systematic decline.
02 Billing Crisis: Technology giants call a halt to the “ineffective consumption” of enterprise AI. The boom in AI is facing its first real billing crisis.
According to Axios, citing a message from an AI consultant, one of its corporate clients recently spent $500 million on Claude in a single month, simply because no upper limit was set on employee usage.
Within the enterprise, the practice of using AI usage as an evaluation criterion has also backfired. It is reported that Amazon’s developer platform Kiro once had an internal ranking list called “Kirorank”. Similar attempts to increase Token consumption to gain ranking advantages have also occurred within Meta.
Dave Treadwell, a senior vice president at Amazon, admitted that employees had pushed up the company’s operating costs by asking AI to perform meaningless tasks in order to inflate the rankings. He explicitly instructed employees to “not use AI for the sake of using it,” and the beta dashboard was subsequently taken offline. Amazon has now shifted to using the “normalized deployment” metric to replace token consumption, tracking the actual value of AI-generated code.
03 Pricing Rebound: The End of the Subsidy Era On the supply side, the AI industry’s long-standing business model of relying on subsidies for growth is reaching its limits.
On June 1st, GitHub Copilot officially switched to billing based on token usage. Some users on the Reddit community stated that their monthly fees are expected to skyrocket from less than $45 to over $847.
Mario Rodriguez, Chief Product Officer of GitHub, previously stated that with the rise of intelligent agent AI, the old pricing model has become unsustainable. Arun Chandrasekaran, an analyst at Gartner, pointed out in an interview with Business Insider that as advanced reasoning models push up computing power consumption, more enterprises will shift to pay-per-usage billing.
Investor Tommy Shaughnessy has warned about the systemic risks of this subsidy model. He pointed out that the profit margins of major AI companies are currently deeply negative, and once companies face the real price of pay-per-use, the actual consumption rate will far exceed expectations. For example, Uber’s annual AI budget will be exhausted within four months in 2026. If investors lose confidence in return expectations, the capital flow supporting GPU procurement and model training will face a reversal.
04 Cost Reconstruction: Cheap Models May Dominate the Market Faced with high inference costs, the market is seeking low-cost alternatives.
Rich Privorotsky, head of Goldman Sachs’ One-Delta division, believes that with DeepSeek’s 75% price reduction and Xiaomi MiMo’s nearly 99% price cut, the alleviation of infrastructure bottlenecks is triggering a price war.
Wallstreetcn previously mentioned that Brian Armstrong, CEO of Coinbase, predicted that 80% of AI workloads will migrate to models with 99% lower cost within 12 to 18 months, while only 20% of tasks requiring extreme intelligence will remain on cutting-edge models. He pointed out that energy and computing power will become the real bottlenecks.
Clement Delangue, CEO of Hugging Face, cited data from Stanford University to confirm this trend: the accuracy of local models in real-world queries has jumped to 71.3%, with extremely low cost. Ali Ansari, CEO of Micro1, sees this as a “healthy swing” from overuse to rational use.
There is currently a significant divergence of opinion on Wall Street regarding the true investment returns of AI. According to Jim Schneider of Goldman Sachs, by 2030, agent-based AI will drive a 24-fold increase in token consumption, and cloud service providers’ gross profit margins will turn positive in the short term. Economic research from JPMorgan Chase also indicates that the jump in Python packages on PyPI demonstrates the improvement in real productivity.
However, the bearish camp is equally adamant. Jim Covello, a semiconductor analyst at Goldman Sachs, pointed out in a report that the current prosperity of the industry chain is at the expense of upstream consumption, with almost all value flowing to semiconductor companies. This situation is unsustainable.
Josh Pantony, CEO of Boosted.ai, emphasized that enterprises’ concerns about data openness weaken the effectiveness of AI agents. Considering multiple factors such as cost, return, and security, the true value that the next AI investment can generate will be the market’s final verdict on this technology investment.