MiniMax has launched Mavis, a veritable “three provinces and six departments” of agents.

I assigned a task, and the agent activated the plan mode, outlining 7 steps.
I approved it, and it started running. After three steps, it stopped and reported: "I have completed steps 1, 2, and 3, and the results are as follows… Shall we continue with steps 4, 5, 6, and 7?"
I said continue. It ran two more steps, then stopped again: "I've completed 4 and 5, and the results are these and which… Shall we continue with 6 and 7?"
After a whole night, when I asked the agent to do some long-term tasks, there were no long-term effects; the dialog box just kept showing "Continue".
This has been my experience for a long time, using various agents to get things done.

This experience is illogical. While "stopping to confirm" is a good work habit when working with AI, in many tasks I never actively asked it to stop, but it would still stop anyway.
In its latest technical blog post, MiniMax attributes this behavior of its agent products to "contextual anxiety." The core issue is that the model itself is ambiguous about when a very long task is considered complete. Simply put, it's not that they can't do it, but that they're afraid to. They're afraid of making a mistake at every step, which is why they stop halfway through and ask questions.

Today, the MiniMax Agent desktop client underwent a major update. A new mode called Mavis has been added (actually, it's an abbreviation for "MiniMax as a Jarvis").
It's well known that having one agent act as the boss and a group of agents as employees—this traditional multi-agent framework is nothing new. However, MiniMax points out that previous mainstream multi-agent frameworks essentially relied on cue word arrangement to let the model play a "role-playing" role. But this approach doesn't last long, as it encounters problems such as context anxiety, long-term task degradation, and self-checking issues, as mentioned earlier.
Multi-agent systems require a reliable infrastructure that is continuously running and maintained, and where multiple agents do not "collude." This is what MiniMax does.
Real-world testing experience: Let the agent "nitpick" the other party
MiniMax calls its Agent Team infrastructure the Team Engine, which has three core roles: Leader, Worker, and Verifier. As the names suggest, one manages, one performs the work, and one verifies.
The most crucial difference is that the Worker and Verifier are in an "adversarial" relationship, and neither can get away with it.

A while ago, APPSO was researching a topic: "All model vendors with ambitions in coding/agent should develop their own independent coding/agent products."
(That's right, MiniMax was a negative example before, but unexpectedly, it proved itself even before the article was published!)
So we ran this problem again on MiniMax's Agent Team.
This task was divided into 5 workers. After each worker completes its task, it will organize the results and submit them to the leader (displaying the status as "Mavis sent to General" or "General sent to Mavis", etc.).

A worker had been running for 12 minutes without returning any results. APPSO noticed that the leader was getting impatient, so it sent a bash command to check its status:

After all 5 workers have completed their tasks, the leader generates 5 verifiers—displayed in the task list as agents wearing "yellow hats":

The verifier quickly found the error! One of the verifiers discovered a clear data error in the corresponding worker's deliverables and issued a "failure" penalty. Immediately afterwards, the corresponding worker restarted (displayed as running, indicated by a small blue circle).

Click into the corresponding worker's workspace to observe its thought process: "The verifier rejected my previous deliverables based on the following three errors… I need to go back and re-verify the key facts and check and correct the specific numerical issues…"
And I must say, the agents are all "uncompromising" with each other, making them really reliable in their work.

This back-and-forth occurred dozens of times in the five 1v1 agent battles. During the process, Mavis also said that he "learned something new" and updated his memory.

While the previous task is underway, we will begin a new in-depth study, analyzing the tourism market during the May Day holiday based on authoritative data and delivering a multi-dimensional analysis report.
This research is far more complex than the previous task. Moreover, because of the ongoing confrontation, the Agent Team spends significantly more time on in-depth research than a typical single agent.
However, the final report was indeed much cleaner and more credible compared to other AI in-depth research deliverables.

APPSO has been preparing for many offline events recently, and planning and devising solutions has always been a challenge. We've also entrusted this task to Mavis to see how it goes.
I need to plan an offline AI developer salon in Guangzhou. Please provide me with as many venues as possible suitable for tech events with hundreds or thousands of attendees, along with approximate quotes, and information on similar events. Then, please help me plan the theme, promotion, and operation of this AI event, compiling all of this into a rigorous business plan format, as well as a beautifully designed website that matches the theme.

The planning process alone took longer than previous in-depth research tasks. Mavis replied, "This task is large-scale and requires multiple agents to work in parallel—site research, competitor analysis, theme planning, business plan, and website development."
Mavis's strength lies in its ability to continuously add new requirements:
In addition to the long report, it would be best if you could also draft a preliminary formal contract, including contracts for cooperation with the venue, cooperation with invited guests, and other possible contracts, as well as preliminary financial statements. Also, please provide a PowerPoint presentation to showcase this plan, the more detailed the better.
Upon receiving new requirements, the Agent Team further refined the plan and launched more workflows. In the end, we launched as many as nine parallel tasks.

If we open Mavis's thought process, we can see a large number of messages sent between agents. These agents work under a dedicated Team Engine, transmitting each other's status; some are waiting, some are executing, and some are verifying.

Look at this Verifier, doesn't it resemble a nitpicking "client"?

The final number of files delivered by the entire task reached an astonishing 10 or more, including xls, ppt, html web pages, and corresponding .md versions.

▲ The financial budget spreadsheet generated by Agent Team includes a project budget summary, cash flow forecasts, ticket price and sponsorship pricing models, and a detailed cost ledger.
Next, let's talk about another major feature of Mavis: it can connect to chat platforms and supports multitasking.
Similar to OpenClaw and Hermes Agent, which MiniMax already supports, Mavis can also assign tasks through WeChat and Lark, two IM platforms. The integration process is extremely simplified; simply click the settings button, scan the QR code, and name the application, and you can use Mavis within WeChat/Lark.

When a typical agent product connects to an IM, and we assign it a task that takes a long time to complete, it often means that after the message is sent, we can no longer consult it on other issues.
One reason is that these agents cannot open multiple dialog windows simultaneously; another reason is the limitation of the agent's working mode. Running multiple tasks in a single session can easily lead to contextual confusion and pollution.
MiniMax's solution is to decouple the logic of "instant response" and "execution".
I had APPSO research the recent oil price surge in Lark; after the task started, I also had it research the important products released by Silicon Valley AI giants in the past month.
Mavis didn't stop the previous task, but instead told me that the new task was already completed, while the task about rising oil prices was still being processed.

This is another key design principle of Mavis: the benefits of context isolation.
Each Agent Team, and each agent within the team, only sees a summary of information relevant to their own mission, and only reads the full text when details are needed.
This approach has two advantages: firstly, it keeps token costs under control, preventing the context from easily overflowing even with a large team; secondly, it prevents context pollution, ensuring that incorrect information encountered by the agent during searches won't wipe out the entire team.
In the most extreme scenario, we tried assigning him 8 tasks in a very short time using Lark, and there were no instances of context confusion.
The whole experience is a lot like working with a colleague with extremely high cognitive bandwidth: not only can they reply to messages instantly, but they can also work in the background without being interrupted. If you want to know the progress, you can just ask directly without worrying about disturbing their "flow state".

Agents handling different sessions only see information relevant to their own tasks and do not share an ever-expanding conversation history.
In short, Mavis achieves end-to-end context isolation, from the IM channel to the task hub, and then to each molecular agent in the molecular task.
Finally, while answering questions about the new AI products released this month by major AI companies and important embodied intelligence products, it also successfully completed the main thread of the oil mission, giving us a detailed report that even mentioned the recent news that Japanese potato chip packaging is going to turn black and white.

After testing, did you notice that Mavis's arrangement strategy is actually somewhat similar to the "Three Provinces and Six Ministries" skill that was popular for a while?
What each character does, when it starts, and when it hands over will be determined by the state machine at the engine level, rather than by the black box of the model making its own decisions.
In short, this means using engineering-level controllability, rigor, and determinism in multi-agent work orchestration to fundamentally address the uncontrollability and randomness of the model.
This approach completely solves the classic problem of past agents/models "acting as both referees and players".

Uniform credit limits, ample agent availability.
After testing Mavis, let's talk about another equally important thing MiniMax did that affects all paying users: this time, the Token Plan and Agent Plan have been merged.

After the merger, whether it's for ordinary users' "daily use," such as communicating and using the Agent on the official website and in the app, or accessing the official API to call other tools (such as coding products or OpenClaw/Hermes Agent), a unified plan can now be used. Furthermore, both M2.7 and subsequent flagship models, as well as multimodal models for music, video, and voice, are all included in this single plan.
All credit limits are shared, and users can decide how to spend them. MiniMax also offers a bonus: users who previously subscribed to two plans simultaneously will receive an extra month of membership.
Why do this? From the user's perspective, it's actually quite reasonable.
To put it simply, in the Agent era, users' motivation to pay comes from the demand for "model computing power". As the models improve in coding, agent and multimodal capabilities, the scenarios for these demands will only become more diverse and will naturally occur in model vendors' products (official website, independent products, CLI) as well as outside of products (independently deployed agents that access external APIs).
This is actually a problem that all major AI giants are facing: OpenAI currently separates user subscriptions and API billing, as does Anthropic; as for smaller agent startups, they use their own subscription fees to pay the underlying API fees instead of users paying for them.

This time, MiniMax took the lead in dismantling the internal walls of its product matrix. APPSO believes that in today's highly commoditized market where users always flock to the newest and cheapest model APIs, this unified package strategy actually helps model manufacturers maintain user loyalty.
Let's go back to the product itself.
As mentioned earlier, APPSO is writing an article about "model vendors who are serious about coding/agents must develop their own coding/agent products." MiniMax can be said to have arrived late, but it's not far off.
Today, Mavis is not the first product to bet on a multi-agent architecture. In the past six months, companies such as ChatGPT, Manus, and Genspark have all joined this "multi-agent" war.
After completing the actual test, APPSO's impression was that Mavis performed better and had a more stable architecture than its competitors in terms of "the product running an extremely complex/long-term task on its own." While other products' multi-agent approaches were limited to prompt word arrangement and task splitting, Mavis implemented adversarial hard constraints at the engineering level—the resulting difference was quite significant.
However, while this architecture looks promising, there's an unavoidable reality: it's expensive.

MiniMax introduced the concept of "Cost of Consensus" in its technical blog. In layman's terms, while several agents "check and balance" each other, making the process and results more reliable, the process of reaching consensus has a cost, with token consumption being several times that of a single agent; moreover, just like arguing, getting into a heated argument can lead to straying from the topic, and the accuracy may even decrease instead of increasing.
According to MiniMax's analysis, its Agent Team architecture specifically has three types of costs:
First, there's the handover cost. Information needs to be reorganized when it's transferred between agents. Each handover requires "translating" the information into a form that the next agent can use, which consumes tokens.
Secondly, there's the cost of sharing (context information). Context isolation is designed to control this cost to some extent. However, even if each agent only looks at the "summary" passed from other agents, as the size of the agent team increases, storing and distributing the summaries will still incur costs.
Thirdly, there's the cost of aggregation. APPSO has always wanted to emphasize this point: don't assume that a workflow with hundreds or thousands of skills and an extremely complex "three provinces and six departments" system is the ultimate solution—it's often not. In fact, you might be falling into the trap set by token vendors… You may have made the work more detailed, but you also need to spend more tokens to aggregate and organize the final results.
These costs combined mean that having multiple agents is never a simple matter of "the more agents the better".
However, from another perspective: the more complex the information exchange in a task, the higher its inherent value often is. A thorough research report requiring multiple verifications and repeated checks, and a casual question, shouldn't be measured by the same logic. Mavis is expensive because of its meticulousness, and those meticulously handled tasks are worth the price.
They would rather spend more money to ensure everything goes perfectly than do a shoddy job; this is what high-value users behind complex tasks value.
Of course, the MiniMax team also did some engineering design to avoid token waste caused by program redundancy.
MiniMax's advice to users is that Agent Teams are for "expensive and complex" tasks; they are a strategic option, not the default. Users should assess the task's complexity, workflow length, risk, and the value of experience reuse—the higher these factors are, the more worthwhile it is to use Agent Teams. Conversely, a single agent or even a regular chat can be used.

Does having more agents necessarily mean more intelligence? Not at all. But the significance of Mavis is that it allows truly complex, knowledge-intensive tasks to be handled by a proven engineering system with adversarial mechanisms, verification, clear division of responsibilities, and reward/punishment systems, instead of letting the model make decisions on its own.
It may not necessarily make AI smarter, but it will definitely make it harder for AI to slack off—which is a long-standing problem for large models themselves.
After all, in real interpersonal work, we don't really need our colleagues to be very smart… just don't be lazy or try to be clever, that's often enough, isn't it?
By Du Chen and Zhang Zihao
#Welcome to follow iFanr's official WeChat account: iFanr (WeChat ID: ifanr), where more exciting content will be presented to you as soon as possible.



































The self-packing sun protection jacket features multiple pockets and can also be packed away as a portable pouch; another lightweight version of the sun protection jacket also uses Glyde fabric, and the back ventilation design provides stable protection for changeable weather.



















https://x.com/SHL0MS/status/2054280631807316329


















