Lobster founder’s tweet attracted 8 million people to watch, and the whole internet is clamoring about what the Loop Project is?
There is a particularly popular term in the AI industry these days, called the loop project.
The reason was that OpenClaw founder Steinberg posted an X, saying ‘You shouldn’t write prompt words for programming agents anymore.’. You should design a loop to prompt words for your agent. ”
However, I thought the comment section would be thriving, and everyone actively discussed the loop project.
The actual situation is that this X has turned into a melee below.
Some people question whether the loop will consume a large number of tokens and require manual testing unless there are unlimited tokens. Some people mock this as hype for a new concept, stating that ‘loop engineering will replace harness engineering’.
This X has now reached 8 million views.
The first person to propose the term ‘loop project’ was actually Boris, the founder of Claude Code.
He once mentioned in an interview, ‘I no longer write prompt words for Claude Code. Those loops write for me and let them determine what specific modifications need to be made.’. My job is only to write loops. ”
Obviously, not everyone is buying into the loop project, after all, it has only been one or two months since the last new concept of “harness”.
Everyone hasn’t had a chance to digest the previous content yet, and now they need to accept new knowledge.
But controversy is controversy, what does the concept of loop engineering itself mean? What is the difference between it and loops in programming?
What is a loop?
First, let’s address the first question. What exactly is a loop project?
The word ‘loop’ directly translates to ‘loop’.
Agent loop, Actually, it’s similar to loops in programming.
In traditional programming, what loops do is very clear.
For example, if you write a for loop to traverse an array, the machine will move from the first element to the last element. In programming, the essence of a loop is to make the machine repeatedly execute a clear sequence of instructions.
In the context of AI agents, loops are also executed repeatedly.
So what is the difference between the two?
In fact, the loop in the Agent does not execute “instructions”, it executes “targets”. Through the following cycle, continuously approach the target with the output result. When the result meets the target, the loop terminates.
Goal → Action → Observation → Evaluation → Revision → Next round of action
Each step in this formula is not fixed.
The agent needs to observe the current state, determine what actions should be taken, execute the actions, observe the results, evaluate whether the expectations have been met, and then decide on the next steps.
In traditional loops, each executed loop follows the same code logic. Although you may handle different data, the way of processing is fixed.
So you need to consider all possible scenarios carefully and write the corresponding processing logic.
For example, how to deal with situation A and situation B, which are the if and else in programming loops.
However, complex tasks in the real world often have too many variables, and it is impossible to anticipate all situations in advance, which leads to bugs in the program when situations that you have not set before occur.
The value of Agent Loop lies here.
You don’t need to write down all the situations, you just need to give the agent a goal, provide necessary tools and context, and then let it explore on its own in the loop.
It may take detours and make mistakes, but with feedback mechanisms and evaluation criteria, it can gradually approach the correct answer in multiple iterations.
This working method is particularly effective when dealing with open-ended tasks. The common feature of tasks such as writing code, fixing bugs, conducting research, and building products is that there is no single correct path, and they require constant adjustment of direction during the process. Traditional programs are difficult to cope with this uncertainty, but agents can handle it in loops.
Australian shepherd Geoffrey Huntley’s release of ralph in July 2025 is a typical Agent loop.
It is essentially a bash script that repeatedly inputs the same prompt word file to the agent. But its true innovation lies in its discipline, resetting the context to a fixed set of anchor files with each iteration, rather than letting the conversation grow infinitely.
To verify Ralph’s ability, Jeffrey used this method to build an entire programming language, which cost approximately $297 in total.
This case illustrates that the core value of loop is not to make agents smarter, but to create an environment for agents to continuously improve.
In this environment, the agent does not need to get it right in one go. It can make mistakes, learn from failures, and accumulate progress through multiple iterations.
In the spring of 2026, both Codex and Claude Code will release the/oal command, productizing ralph. This command will continue to run in a loop until a validation is completed.
But what Steinberg referred to as a loop is no longer just about “letting an agent repeatedly do a certain task”, but treating the loop as an AI work system that can run for a long time, collaborate with each other, and schedule automatically.
Specifically, Steinberg believes that the loop is the fundamental unit of work.
Previously, the instructions we gave to AI were to help me fix a bug or write an article. All tasks are one-time and come to an end once completed.
But the loop mentioned by Steinberg, although also a type of task, is a continuously operating unit of work. For example, checking GitHub issues every day to determine which ones need to be fixed, automatically assigning them to agents, running tests after fixing, continuing to make changes if they fail, and submitting PR if they succeed.
The focus here is no longer on ‘fixing a bug’, but on having a long-standing process in handling a type of work.
When you have multiple such loops running simultaneously, new problems arise. Who will coordinate them? Who decides the priority? Who will check the quality of their work?
Therefore, when designing loops, Steinberg had already started using loops to supervise other loops.
Responsible for observing the global situation through a total loop → it discovers several tasks → distributes them to multiple sub loops → each sub loop runs on its own → the total loop checks their progress and results
The prompt word is input, loop is process
Steinberg’s tweet sparked controversy because it touched on a topic.
Is the prompt word project outdated?
As of now, prompt words are still the main way for you to communicate intentions with agents, and they still need to be clear, specific, and contain necessary context.
Let me put it this way, a poorly written prompt word will never suddenly improve just because you put it in the loop.
But single prompt words are no longer the core of the agent.
The reason is simple. If you can clearly state all the requirements from the beginning and the agent only needs to output once to meet all your requirements, then there is no need for context anymore.
The reality is that you may only realize that you missed an important condition after seeing the preliminary results, or that the output of the agent, although meeting your literal requirements, exposes problems in actual use.
More importantly, many feedback messages do not exist at the beginning of the task.
For example, bugs can only be known during testing.
Previously, you needed to monitor every output of the agent, determine if it was correct, and think about how to guide it in the next step.
Now all you need to do is design the loop, define the goals and evaluation criteria, and let it run on its own.
Ultimately, the loop project is about adding a framework to the agent, allowing it to know what to look at, what to do, how to judge, and when to stop each round.
I’ll give you an example and you’ll understand:
You need to have the agent generate a login page.
The approach of prompt word engineering is to write a detailed prompt word. Please help me write a login page. It needs a username and password input box, a login button, and a forgotten password link. The style should be simple and modern, with blue as the main color tone. There should be form verification, the username cannot be empty, and the password should be at least 8 characters. If the login fails, an error message should be displayed
If your prompt words are written well enough, the agent may generate a page that looks good.
But can this page really be used? Is the logic of form validation correct? Is it displaying properly on different browsers? Are there any security vulnerabilities?
The approach of loop engineering is that you need to design the entire process.
The first step is to generate page code based on the requirements. The second step is to run automated testing to check if the basic functions are functioning properly. Step three, start the browser and take a screenshot to check the visual effect. Step four, if the test fails or the screenshot shows an issue, analyze the specific problem. Step five, modify the code to solve the problem. Step six, retest and repeat this process until all acceptance criteria are met.
In this process, the initial prompt words may be simple because you know there will be multiple iterations later. Agents don’t need to do everything right the first time, they can see specific feedback in each round and make targeted improvements.
What is the loop project designed for
How should I write a loop project?
We need to design 5 components.
The first component is the target.
This may sound like nonsense, but in reality, many loops fail because the goal definition is not clear enough.
‘Help me optimize’ is not a good goal. What is optimization? To what extent is optimization considered complete? What are the constraints? These are all unclear.
A good goal should be like this. Reduce the response time of this interface from 800 milliseconds to below 300 milliseconds. Maintain existing behavior, all tests must pass. Output a description of the changes and list the specific optimizations made.
Every part of this goal is verifiable.
The clear goal is actually to provide a stable anchor point for the agent, which can be used for calibration in each iteration.
The second component is context management.
The context actually includes many things, not just your conversation with the model.
The current state of the code repository, related documents, requirement specifications, error logs, test results, user preferences, historical decisions, and previous rounds of attempts and results are all context.
Many agents perform poorly, not because the model is not smart enough, but because the context fed to it in each loop is too dirty, too few, or too random.
Too dirty refers to the situation where there is too much irrelevant information mixed in the context, and the agent needs to spend a lot of tokens to deal with this noise, while ignoring the truly important parts.
Too little refers to the lack of key information, and the agent does not have enough materials to make correct judgments.
Too random refers to the inconsistent organization of context in each round, which prevents the agent from establishing a stable understanding pattern.
The Ralph loop mentioned earlier has an important innovation, which is its context management system.
It resets the context to a fixed set of anchor files every iteration, instead of letting the conversation history grow infinitely.
Although simple, it does solve the problem of context pollution.
You need to decide which information should be retained, which should be discarded, and which should be summarized and retained.
The loop system in 2026 will start using Git based state management. Each round of changes will be submitted to Git, and the agent can view the history of submissions to understand what was done before and why it was done.
The third component is the tool.
Simply put, it means which tools the Agent can call.
A clever woman cannot cook without rice, and the choice of tools needs to match the task.
If you let the agent write code but don’t provide it with a tool to run tests, it won’t be able to verify if the code is correct.
But having more tools is not necessarily better. With each additional tool, the decision-making space of the agent increases, and it needs to choose from more options. If there are too many tools, the agent may get lost in the use of the tools and forget the true goal.
A good loop design will carefully select a toolset. Only provide the necessary tools to complete the task, each with a clear purpose and timing of use. This way, the agent can focus on the task itself rather than the choice of tools.
The fourth component is evaluation.
This is the soul of the loop. Without evaluation, the loop will become spinning aimlessly.
The key to evaluation is automation.
If each round requires human judgment, the loop loses its ability to run autonomously. So you need to design evaluation criteria that can be automatically executed, so that the agent can judge whether the current state meets the requirements on its own.
However, automated evaluation also has limitations. Some quality standards are difficult to judge using quantitative standards, such as code readability, design aesthetics, and text fluency.
For these aspects, you may need to introduce manual checkpoints to allow people to intervene in the evaluation at critical nodes.
There is a concept in AI called human in the loop.
A good loop is not about kicking people out, but about putting them at the most critical checkpoint. Automated processing of most routine judgments, with humans responsible for decisions that require subjective judgment or have higher risks.
The fifth component is the stop condition.
Starting from the oldest programming, any loop must have an exit condition.
For example, in the loop counter i, the value of i will increase by 1 every time it is cycled. When the value of i is greater than the specified value, the loop will stop.
For agents, the ideal stopping condition is task completion, but reality often does not go so smoothly.
Sometimes the agent gets stuck in a dead loop, repeatedly trying the same solution and failing each time, but it doesn’t know how to give up. Sometimes agents will continue to make small changes, with slight improvements every time, but they can never reach perfection and don’t know where to stop.
So you need to design multiple stopping conditions.
The most direct condition is success, if all evaluations pass and the task meets the standards, you can stop. Then there is the failure condition, where there is no improvement in multiple consecutive rounds, or the number of errors exceeds the threshold, indicating that the current solution may not work and should be stopped and reconsidered.
There are also resource limitations, running time exceeding the limit, and costs exceeding the budget, which should also be stopped.
More importantly, there are risk checkpoints. When the agent needs to perform high-risk operations, such as deleting data, it should stop and wait for manual confirmation. Once these operations go wrong, the cost is high and they should not be fully automated.
Put these five components together, and you will get a complete loop.