Tested on Claude’s strongest model Fable 5 in history, ordinary people should use with caution

0 0

The worst news for ordinary people is coming.

Just now, Anthropic announced the launch of Claude Fable 5 and Claude Mythos 5.

Among them, Fable 5 is Anthropic’s first Mythos-level model open to the public, while Mythos 5 is primarily targeted at a select few cybersecurity defense agencies, critical infrastructure providers, and biomedical researchers who subsequently join the trusted access program.

However, few people have noticed that according to the official instructions, Fable 5 will be included in the Pro, Max, Team, and seat-based Enterprise plans from now until June 22nd, with no additional charges. On June 23rd, Fable 5 will be removed from these subscription plans, and continued use will require usage credits.

In other words, the past model of unlocking the strongest AI with a single “monthly card” may be gone forever. For users, what they need to consider in the future may not only be the subscription price, but also the actual token cost behind every invocation and every long-task execution.

Welcome to the era of Token billing.

Claude Fable 5 makes its debut, but it is also the fiercest “Token Assassin”. Anthropic also provides an explanation for the naming of Fable and Mythos. Fable originates from the Latin word fabula, meaning “a short story told”, which is similar in meaning to the Greek word Mythos.

The two new names appear to be two distinct models, but in reality, they are closer to being two versions of the same underlying model. Fable 5 is currently open to the public with stricter security restrictions;

Mythos 5 is currently only available to a select few cybersecurity defense organizations and critical infrastructure partners through the Project Glasswing initiative.

According to the introduction on Anthropic’s official blog, Fable 5 is currently the most capable model available in the company’s general models, with significant improvements in software engineering, knowledge work, visual understanding, scientific research, and other fields. The longer and more complex the task, the greater its advantage over the previous Claude model.

The significance of Fable 5 lies in the fact that Mythos-level abilities are being made accessible to ordinary users on a large scale for the first time. The benchmark test scores are shown in the figure below, highlighting a significant lead.

However, the name of the model itself has also sparked some discussion. Tibo, the former head of OpenAI Codex, even made a tongue-in-cheek comment, saying that Anthropic used the name Fable, which OpenAI had wanted to use but didn’t.

In terms of competency, software engineering is one of the directions that the authorities emphasize most.

Anthropic mentioned that in early tests, Stripe assigned Fable 5 to handle the migration task of a Ruby codebase with 50 million lines of code. If this task were to be completed manually by an engineering team, it would have taken over two months. However, Fable 5 completed it in just one day.

The FrontierCode test of Cognition also shows that Fable 5 excels in complex production-level coding tasks. This evaluation focuses not on ordinary coding challenges, but on whether the model can complete difficult programming tasks and meet the requirements of high-quality production code libraries.

Anthropic also emphasized that Fable 5 is more token-efficient than previous Claude models. Of course, this is just hearsay. Similar claims have been made every time a new Claude model is released in the past, but almost all of them have turned out to be token assassins, providing quite a few jokes for the vast Internet.

In terms of knowledge work, Fable 5 achieved the highest score in Hebbia’s financial benchmark test, with improvements mainly in document reasoning, chart comprehension, and complex problem analysis. The transaction analysis evaluation of IMC also showed that Fable 5 performed strongly in fact retrieval, concept reasoning, cause analysis, and expectation analysis.

Visual ability is also a key focus in publishing. Anthropic claims that Fable 5 can extract precise numbers from complex scientific charts and reconstruct application source code based on web page screenshots.

The official also presented a more intuitive case: Fable 5 completed “Pokemon Red and Blue” solely relying on game graphics, without utilizing additional maps, navigation tools, or game state information. The previous Claude model, when performing similar tasks, required more complex auxiliary systems.

Long-term context and memory capabilities have also been enhanced. Anthropic found in the testing of “Slay the Spire” that after providing the model with persistent file memory, the performance improvement of Fable 5 reached three times that of Opus 4.8, and the frequency of entering the final chapter also increased by three times.

The field of life sciences is particularly sensitive. According to Anthropic, internal protein design experts utilize Mythos 5 to expedite certain drug design processes by approximately 10 times.

In one case, Mythos 5, with the aid of protein design and bioinformatics tools, completed a full set of processes that scientists typically handle without human assistance, including selecting binding sites, invoking design tools, and handling failed results. Among the 14 protein targets, 9 yielded candidate solutions worthy of further investigation.

The enhancement of life science and cybersecurity capabilities also explains why Anthropic did not directly release complete Mythos-level abilities.

When Fable 5 is opened to the public, a new set of security classifiers is provided. As long as the user’s request involves high-risk directions such as network security, biology, chemistry, or model distillation, the system will automatically switch to Claude Opus 4.8 for response and inform the user that the model has changed.

Anthropic stated that in early data, over 95% of Fable 5 sessions would not trigger such changes. For common tasks such as writing, programming, analysis, design, and data processing, Fable 5 itself can still be used in most cases. However, as soon as one enters high-risk areas, the model’s capabilities will be limited.

Cybersecurity is the most strictly restricted field. Anthropic acknowledges that Mythos-level models excel in discovering and exploiting software vulnerabilities and possess considerable proxy attack capabilities, potentially encompassing reconnaissance, discovery, and lateral movement. To prevent the abuse of such abilities, the cybersecurity classifier in Fable 5 covers a wide range of aspects.

The situation is similar in the fields of biology and chemistry. Anthropic believes that the model already possesses the ability to complete real scientific tasks, and it is no longer sufficient to only block a small number of biological weapons-related issues. Therefore, for the time being, Fable 5 will temporarily revert to using Opus 4.8 to handle most requests related to biology and chemistry.

It’s worth mentioning that Anthropic has also added a layer of hidden protection for Fable 5, specifically tailored for the development of cutting-edge large models.

It primarily restricts Claude from assisting in tasks such as building pre-training pipelines, distributed training infrastructure, or ML accelerator design, to prevent the model from accelerating the training of next-generation cutting-edge models by other institutions.

Unlike the security restrictions that will be switched to Opus 4.8 after being triggered, this type of protection does not directly prompt the user. Instead, it reduces the performance of Fable 5 on related tasks through methods such as prompt word modification, steering vectors, or PEFT. Currently, there have been victims who have spoken out.

As of now, Claude Fable 5 is now open to users worldwide. Developers can call claude-fable-5 through the Claude API. The Claude API and the pay-as-you-go Enterprise plan have been fully available since the release date.

The prices for Fable 5 and Mythos 5 are the same, both at $10 per million input tokens and $50 per million output tokens. According to Anthropic, this is already less than half of the price of Claude Mythos Preview, but for high-intensity long tasks, the price is still not low.

AI finally counts six fingers. Compared to the official blog, actual testing can better illustrate where Fable 5 has improved. According to my actual testing, Fable 5 can now recognize six fingers.

Coinciding with the end of the college entrance examination, we also took a Chinese composition question from the National College Entrance Examination Volume I to practice our skills. How should I put it? The overall writing style is relatively fluent and not “ordinary”.

For a more specific comparison, please refer to the actual testing by @Hypergent. In the asteroid visualization task, Fable 5 not only completes data extraction but also designs an interactive display that includes orbital trajectories and hover details, enhancing information expression while ensuring performance.

In the planning task of the fitness resort, Fable 5 leveraged GPT-Image-2 and Nano Banana to generate a site plan that is more in line with practical usage logic. It can consider regional connectivity, functional distribution, and pedestrian flow, rather than simply arranging buildings.

Fable 5 can combine astronomical phenomena with visual expression to showcase simulations of the impact of solar flares on auroras, while Opus 4.8 fails to even load properly.

The evaluation from Andrej Karpathy, former AI Director of Tesla and co-founder of OpenAI (who has now joined Anthropic), can better illustrate the feelings of developers.

However, in terms of design aesthetics, humans still have a slight edge at present.

Wharton School professor Ethan Mollick’s practical testing better reflects the changes in Fable 5. After obtaining early access, he focused on testing complex tasks such as gameplay, maps, and research tools.

One of the most representative projects is an isochronous map. Mollick requested Fable 5 to build an interactive map based on real traffic data, showing the reachable range of different cities within a certain time frame. The model then invoked multiple agents to collect flight, railway, and road data, while completing code writing and testing, and continuously revising the results based on feedback.

Mollick also had Fable 5 develop a research tool named Concord. The model first generated a 19-page design document and then worked continuously for 9 and a half hours, ultimately completing the software development. This tool is used to analyze open-ended research data and calibrate the judgments of humans and AI.

The actual testing also revealed significant issues. Mollick believes that Fable 5 will still encounter errors and omissions, necessitating manual inspection and refinement. Simultaneously, the consumption of tokens due to long tasks is exceedingly high, and the price of Fable 5 is notably higher than that of Opus 4.8. Once deployed in a production environment, cost may pose the greatest practical challenge.

High-intensity long-task capabilities will ultimately be reflected in the cost of use. As a Pro user of the $20 package, I even ran out of credits after just a few simple tasks.

The Claude client also displays “Fable 5 included until June 22”. As mentioned at the beginning, according to Anthropic’s arrangement, after the free inclusion period ends, Fable 5 will be removed from some subscription plans, and continued use will require usage credits.

In the past, users could enjoy the world’s most powerful intelligence to a large extent by paying a not-too-expensive monthly fee. Subscription models blur the true cost and allow ordinary individuals to stand on the same starting line as some giants at certain times.

After the advent of token billing, everything will change.

AI will evolve from a near-monthly subscription service to a production resource consumed on a per-use basis. The most powerful models are also becoming more expensive and precisely priced production tools.

Some people may not care much about costs, such as letting Fable 5 perform 24-hour long-chain tasks, refactoring 50 million lines of code, independently developing a complete application, continuously running research projects, and repeatedly testing and modifying results.

However, more ordinary users will subconsciously weigh the pros and cons before each invocation: Is this question worth spending tokens? Is this task worth entrusting to the strongest model? After this attempt fails, should I continue to let it try again?

The worst news couldn’t be worse. AI hasn’t weakened. On the contrary, it is becoming stronger at an unprecedented rate, powerful enough to independently complete more and more mental tasks that were originally done by humans.

Meanwhile, the cost of acquiring such capabilities is constantly rising. The information gap that was just narrowed by large models between ordinary people and advanced productivity may widen again due to expensive token billing.

Anthropic is no exception, and it will be difficult for other players like OpenAI in the future. The stronger the cutting-edge models become, the higher the training and inference costs rise. This is especially true for these two AI companies, which are currently vying for IPOs and need to prove to the capital market that they can not only train stronger models but also turn their model capabilities into sustainable revenue.

Therefore, rather than viewing the release of Fable 5 as an upgrade of the model, it is more akin to a prelude to a thorough adjustment of the AI subscription system. If the window for AI’s widespread adoption is starting to count down, this is definitely not the best news.

# AI资讯