Anthropic, an artificial intelligence company founded by exiles from OpenAI, has introduced the first AI model that can produce either conventional output or a controllable amount of “reasoning” needed to solve more grueling problems.
Anthropic says the new hybrid model, called Claude 3.7, will make it easier for users and developers to tackle problems that require a mix of instinctive output and step-by-step cogitation. “The [user] has a lot of control over the behavior—how long it thinks, and can trade reasoning and intelligence with time and budget,” says Michael Gerstenhaber, product lead, AI platform at Anthropic.
Claude 3.7 also features a new “scratchpad” that reveals the model’s reasoning process. A similar feature proved popular with the Chinese AI model DeepSeek. It can help a user understand how a model is working over a problem in order to modify or refine prompts.
Dianne Penn, product lead of research at Anthropic, says the scratchpad is even more helpful when combined with the ability to ratchet a model’s “reasoning” up and down. If, for example, the model struggles to break down a problem correctly, a user can ask it to spend more time working on it.
Frontier AI companies are increasingly focused on getting the models to “reason” over problems as a way to increase their capabilities and broaden their usefulness. OpenAI, the company that kicked off the current AI boom with ChatGPT, was the first to offer a reasoning AI model, called o1, in September 2024. OpenAI has since introduced a more powerful version called o3, while rival Google has released a similar offering for its model Gemini, called Flash Thinking. In both cases, users have to switch between models to access the reasoning abilities—a key difference compared to Claude 3.7.
A user view of Claude 3.7
Courtesy of Anthropic
The difference between a conventional model and a reasoning one is similar to the two types of thinking described by the Nobel-prize-winning economist Michael Kahneman in his 2011 book Thinking Fast and Slow: fast and instinctive System-1 thinking and slower more deliberative System-2 thinking.
The kind of model that made ChatGPT possible, known as a large language model or LLM, produces instantaneous responses to a prompt by querying a large neural network. These outputs can be strikingly clever and coherent but may fail to answer questions that require step-by-step reasoning, including simple arithmetic.
An LLM can be forced to mimic deliberative reasoning if it is instructed to come up with a plan that it must then follow. This trick is not always reliable, however, and models typically struggle to solve problems that require extensive, careful planning. OpenAI, Google, and now Anthropic are all using a machine learning method known as reinforcement learning to get their latest models to learn to generate reasoning that points toward correct answers. This requires gathering additional training data from humans on solving specific problems.
Penn says that Claude’s reasoning mode received additional data on business applications including writing and fixing code, using computers, and answering complex legal questions. “The things that we made improvements on are … technical subjects or subjects which require long reasoning,” Penn says. “What we have from our customers is a lot of interest in deploying our models into their actual workloads.”
Anthropic says that Claude 3.7 is especially good at solving coding problems that require step-by-step reasoning, outscoring OpenAI’s o1 on some benchmarks like SWE-bench. The company is today releasing a new tool, called Claude Code, specifically designed for this kind of AI-assisted coding.
“The model is already good at coding,” Penn says. But “additional thinking would be good for cases that might require very complex planning—say you’re looking at an extremely large code base for a company.”