
At the beginning of the year, the Mac Mini was out of stock, with waiting times reaching as long as a month and a half.
The Mac mini is a great product, that's something everyone knows. With competitive pricing through domestic channels and excellent performance from the M chip, the entry-level configuration can be had for under 3,000 RMB, making it a perfect main machine for creative beginners.
However, the recent surge in popularity of the Mac mini has little to do with creative work or everyday use.
Those who follow tech news should know what's going on: OpenClaw (formerly known as Clawdbot) has suddenly become popular.
OpenClaw offers multiple deployment options: you can install it on your own computer or dedicate a separate computer to it; deploying it in a cloud-based virtual machine/sandbox environment is also fine; later, some mainstream AI services also launched cloud-based one-click deployment alternatives, significantly lowering the barrier to entry for novice users.
However, in the early stages, the most common deployment option was to buy a single Mac mini.

The reason is definitely not because it is cheap, but more importantly: for OpenClaw to be meaningful, it needs to be given a "physical body" so that it can access files and operate software.
A cloud server can run OpenClaw, but it's still not your computer. It doesn't have your files, software, or the various accounts logged into your browser, and there's no so-called "context." A Mac mini can sit on your desk 24/7 without needing to be turned off, and you don't even need a separate monitor if you can remotely control it via a chatbot.
The only significant cost of using OpenClaw on your own computer is the token fee for the large model API accessed on the backend; many early adopters have suffered losses because of this. However, if you buy a high-spec Mac mini and download a sufficiently large model to run locally, it's practically like getting free labor, aside from electricity and internet costs…

A MacBook is fine too, but…
According to reports from Tom's Hardware and TechRadar, after OpenClaw gained popularity, the waiting time for the 24GB and 32GB Mac mini configurations has increased to between 6 days and 6 weeks; the delivery time for the more powerful Mac Studio has also increased from two weeks to nearly two months.
These waiting times are the votes cast by early OpenClaw players using real purchases.
(Note: The shortage of some models is also related to Apple's recent launch of new Mac desktop computers. In the past, older models would sell out as the new model was about to be released. The popularity of OpenClaw is not the only reason.)
As if by some strange twist of fate, the Mac has become the top choice for " AI PC" in 2026; on the contrary, the Windows PC industry, which has been touting "AI PC" for several years, has not benefited at all.
Chipmakers like Intel, AMD, and Qualcomm, along with mainstream PC brands, have been marketing the concept of "AI PCs" since 2023. Many of these latest Windows computers are certified Copilot+ PCs, boasting impressive GPU and NPU performance, and some are even significantly cheaper than equivalent Macs.
But the question is, why are people still flocking to Macs?

Why a Mac?
The debate over whether Windows PCs or Macs are better will never have a definitive answer. However, when it comes to AI development, Macs have become the unspoken choice.
While the "brain" of the large model resides on cloud servers, the developers' hands are on Macs. This has little to do with the Mac's form factor or user experience: the key is that macOS has UNIX roots.
The core functions of an AI Agent include manipulating files, calling command-line tools, scheduling APIs, and even controlling graphical interfaces. To put it more simply, the Agent is an intelligent and automated "script engineer," except that the scripts are generated in real-time by a large language model. macOS, being a UNIX-like system, has excellent native support for bash and zsh commands.
This solves the most basic environment setup problem in AI development. On Windows, you might need to install a WSL2 virtual machine first. But on Mac, everything from the Python environment to the complex C++ compilation toolchain is basically ready to use out of the box. Package managers like Homebrew make installing various tools and dependencies a simple matter of a single command.
Additionally, macOS complies with the POSIX standard, offering slightly higher reliability when handling file paths, multi-threaded tasks, and network protocols. Agents often need to frequently read and write data and call APIs; efficient system-level scheduling allows agents to operate at a faster pace on a Mac.
This native feel and stability allow developers and early adopters to get started more quickly and spend more time on actual agent orchestration.

Windows has WSL and PowerShell, which cover most of the functionalities. However, WSL is a compatibility layer built on top of Windows, and it suffers from legacy issues such as path conventions, registry mechanisms, and permission models. Therefore, there will indeed be more friction between AI models and agent projects running on Windows.
Taking Ollam and LM Studio as examples, these two tools make it as simple as "download, install, and run" for edge inference of large models. The Windows version of Ollam was released six months later than the macOS version; although LM Studio has supported both platforms from the beginning, the Mac version has always had a better reputation in the community; the same is true for OpenClaw.
Delving deeper into the hardware level, memory is the lifeblood of reasoning and execution in large language models.
Taking OpenClaw as an example again, users can access cloud models by paying with tokens, but its strength lies in driving model inference on the client side. According to general research, in order for OpenClaw to work like a person with a normal IQ, the minimum number of backend model parameters is around 7 billion, and it often needs to reach at least 32 billion parameters to work relatively stably.
Even after 4-bit quantization, such a large model still requires approximately 20GB of memory (some of which needs to be reserved for the context window).
At this point, the architecture of Windows PCs becomes inadequate. Physical isolation exists between CPU memory and video memory, and data is transferred via the PCIe bus, making it susceptible to bandwidth bottlenecks. Frequent data transfers can impact the speed of the inference process.
Not to mention, large models generally rely on GPUs for accelerated inference, requiring sufficient video memory to hold them. Among NVIDIA's consumer-grade graphics cards, only those with 24GB of video memory (90 series) meet the configuration requirements, but the total cost of building a complete system (considering only new machines) would be at least 10,000 RMB, and with a new card, it would soar to 40,000 to 50,000 RMB.

Apple's Unified Memory Architecture allows Macs with M-series chips to handle larger-scale models with ease when performing inference on the device.
In simple terms, the effect of a unified memory architecture is that the CPU, GPU, and neural computing engine can share the same memory pool, eliminating the overhead of physical bus transfers. This allows Macs to achieve extremely high memory bandwidth and provides better performance for multi-machine interconnection.
Taking the Mac mini as an example, choosing the higher-performance M4 Pro processor, paired with 48GB of memory, and selecting the basic configuration for the rest, the total price of the machine is around 13,000 yuan, which can reach the configuration level of the 32 billion parameter model generally recommended by the OpenClaw community.
Of course, this is only a professional configuration that requires high token throughput. If you are an enthusiast and just want to try out OpenClaw, you can run it with a standard M4 chip and 32GB of RAM.

Of course, this cost comparison is based on the premise that it's dedicated to edge inference/running OpenClaw, rather than being used as a primary machine. A similarly priced Windows PC can also be used for gaming and video editing, offering greater versatility.
Furthermore, Mac's unified memory and the dedicated VRAM of a PC platform's graphics card are not the same thing. Unified memory is shared by the system and the model; even on a Mac mini with 32GB of RAM, the macOS system and other software still require several gigabytes of memory. On the other hand, the RTX 3090's dedicated VRAM allows the model to use all of it, and it can even run larger quantization models in conjunction with the CPU and memory.
If you only use the cloud API as the core of OpenClaw and don't consider edge deployment, then the ease of use of Mac still holds an advantage.
In addition, although CUDA provides a unified memory programming interface, the CPU memory and GPU memory are still physically separate, and data transfer and bandwidth bottlenecks have not been eliminated.
Next, let's look at power consumption.
The agent operates in a continuous loop: task triggering, reasoning, execution, waiting, and then triggering again. A Windows PC with the aforementioned configuration would run at around 300-400W (local deployment), and the heat dissipation, noise, and electricity costs are not insignificant.
The Mac mini typically has a stable power consumption of around 10-40W, with a peak power of 65W (M4) or 155W (M4 Pro). Its heat dissipation is controllable, with almost no fan noise, resulting in quieter operation. This low-latency, low-power continuous operation creates a subtle difference in user experience.

A Mac mini shell kit 3D printed by a netizen, named "Clawy MacOpenClawface".
Of course, our discussion will focus more on OpenClaw, a scenario primarily driven by reasoning. If your work involves local fine-tuning and you prioritize efficiency, then on the macOS platform, you'll often need Mac Studio, or at least a top-of-the-line MacBook Pro, to even begin to grasp the basics.
At the same time, the fact that Macs don't support CUDA is something that may never change. However, CUDA's real battleground is model training; inference scenarios rely on it much less, since Apple has MLX as its trump card for inference (which will be discussed in detail later).
Returning to OpenClaw: its creator, Peter Steinberger, has publicly stated that he prefers Windows and finds it more powerful. In the Lex Fridman podcast, he said that the Mac mini is not the only "physical" option, and that running OpenClaw via WSL2 is already very mature; he even publicly criticized Apple for "messing up" in the field of AI and expressed dissatisfaction with the closed nature of Apple's ecosystem.

Objectively speaking, for users with limited technical skills, the Mac mini is indeed the most worry-free and easiest-to-use solution for deployment. The main reason is its power consumption, quiet operation, and small size, making it like a "server node" that can be plugged into a corner, is on standby 24 hours a day, and requires no maintenance.
Another example related to power consumption: A few days ago, an engineer named Manjeet Singh successfully reverse engineered the Neural Engine (ANE) on the M4 processor and found that the ANE has extremely high power efficiency: its efficiency is as high as 6.6 TOPS/W when the computing power is fully utilized.
Compared to Apple's M4 GPU, which is approximately 1 TOPS/W, Nvidia's H100 is about 0.13, and its A100 is 0.08 TOPS/W.
To put it in perspective, the throughput of a single A100 card is 50 times that of the M4 ANE, but the power consumption of the M4 ANE is 80 times that of the A100. The original author wrote in the article: "For edge inference, the performance of the ANE is outstanding."

Let's start with the neural engine
In 2011, Apple first implemented real-time face detection and other functions that were later regarded as AI tasks by hard-writing in the image processing unit (ISP) of the A5 processor.
In 2014, Apple acquired PrimeSense and began developing a new coprocessor specifically for neural network computing. This work was realized three years later in the iPhone X: the A11 Bionic processor incorporated the aforementioned Neural Engine (ANE), with a computing power of only 0.6 TOPS, to drive Face ID and Portrait mode.
At that time, AI hadn't yet reached the era of large-scale models; it mainly relied on various machine learning algorithms. The market didn't react much to Apple's launch of this coprocessor. But Apple never gave up and continued to invest heavily.
Three years later, the M1 was released, along with a unified memory architecture, and ANE was also introduced to the Mac. The more ample power budget for desktop platforms allowed ANE's computing power to jump to 11 TOPS. Subsequent generations saw further improvements: M2 at 15.8 TOPS, M3 at 18 TOPS, M4 at 38 TOPS, and by the end of 2025, M5 had reached 57 TOPS. From M1 to M5, Apple's ANE computing power increased more than fivefold.

Other PC manufacturers can't help but envy the logic behind this growth. Before Apple added AI acceleration hardware to Macs, tens of millions, even hundreds of millions, of iPhones were already running the same ANE architecture. Power consumption performance, stability, and edge cases under extreme conditions had already been verified on commercially available models, and then transferred to Macs.
Intel and AMD have virtually no consumer-grade presence in the mobile market; while Qualcomm has also put Snapdragon chips into hundreds of millions of Android phones, it is merely a chip supplier. AI on Android is developed by Google (Gemini) and major phone manufacturers in collaboration with third-party AI labs; Windows AI (Copilot) is developed by Microsoft.
Apple's difference lies in its vertical integration, controlling both hardware and software. Other chip manufacturers do not have this unified control.
Of course, inferring large language models on a Mac has little to do with ANE; it's better suited for AI tasks with fixed patterns, such as Face ID and facial recognition. The GPU handles the majority of the computation.
(Note: The situation has recently undergone slight changes. First, the ANE on the M-series chips now handles the prompt injection prefill stage; and regarding the M4 ANE reverse engineering mentioned earlier , the engineer also implemented a method to skip CoreML and directly call the ANE, significantly improving throughput . Following this line of thought, it might be possible to find a general method to directly utilize ANE to accelerate inference and even training.)
In late 2023, Apple open-sourced MLX, providing developers with a model inference framework specifically optimized for the M-series chips. Last year, the basic model framework was released with Apple Smart, allowing app developers to access the system's built-in basic models on iPhones and Macs without needing an internet connection and without data leaving the device.
Apple's repeated delays in developing AI are undeniable. However, it's also an undeniable fact that Apple began experimenting with AI as early as 10 years ago, laying the foundation for desktop AI development many years ago.
On the Windows side, the term " AI PC" won't start appearing in press releases and presentations from Intel, AMD, and PC manufacturers until the end of 2023.

Screenshot from AMD's official website in 2023
In May 2024, Microsoft released the Copilot+ PC certification system, with its flagship feature called "Recall". The basic logic is that the system continuously takes screenshots of the screen content, and then Windows' system-level AI can help you recall what you have seen in the past.
Regardless of the actual significance of this feature at the time of its release, its security was first found to have serious problems: just one month after its release, researchers discovered that the Recall feature stored all screenshots in an unencrypted local plaintext database.
Microsoft abruptly removed the Recall feature. Six months later, Microsoft released a beta version again, but it was delayed once more due to new security issues. Recall was finally officially launched in April 2025, but it was switched to being disabled by default, and data was stored in encrypted form when enabled.

From the initial announcement to actual usability, it took nearly a year. It's fair to say that the flagship feature of the entire Windows ecosystem's AI PC underwent a complete redesign, a process no less awkward than the repeated leaps and bounds of Apple's AI/new Siri. However, perhaps because the Windows ecosystem's voice is so low, few people have paid attention to AI PCs, and many have never even heard of it.
Regarding the certification standards for the Copilot+ PC system, Microsoft primarily targets the Neural Processing Engine (NPU), requiring 40 TOPS. However, this computing power is used for narrow consumer-facing tasks such as real-time captioning, background blurring, and photo enhancement; large-scale language model inference is never within its scope (similar to Apple's ANE).
When developers attempt to perform large-scale language model inference on the device, they find that although these computers are called AI PCs, they are not optimized for AI inference purposes. Microsoft Copilot's core computing power comes from the Azure cloud, and is almost unrelated to the computing power on the device itself. For users who have purchased a Windows AI PC, the most noticeable AI improvement is probably real-time captioning and automatic photo classification.

When it comes to edge inference, there is another key factor: the optimization paths in the Windows AI ecosystem are fragmented.
NVIDIA GPUs use CUDA and TensorRT, Intel NPUs use OpenVINO, Qualcomm NPUs use the QNN SDK, and AMD NPUs use their own driver stack. Model storage formats are also quite fragmented, with a general format for CPU+GPU inference (GGUF, more accurately CPU inference + GPU hierarchical offloading) and a GPU-only format (EXL2).
This means that running models and model-driven functionalities on Windows AI PCs will be more complex in terms of the inference backend. Microsoft has ONNX Runtime and DirectML (which is currently in a state of renewal) as a unified abstraction layer, but the cost of unification is sacrificing the peak performance of each vendor. Apple is currently the only PC vendor that has developed and continuously maintains an LLM inference framework specifically for its own PC hardware; this framework is MLX.
On open-source model platforms like Hugging Face, you can easily find a large number of models that use the MLX framework. As long as they have the MLX suffix and your memory/processor allows, they can be used "out of the box".
However, the recent departure of Awni Hannun, one of MLX's key contributors, from Apple has introduced some uncertainty into the project's future development. Hannun also stated that the MLX team still has many excellent employees, so there's no need to worry.

Our own experience
Over the past year, iFanr has conducted numerous tests on deploying AI models on edge devices and has also interviewed some external developers. Two instances are worth mentioning.
Last Chinese New Year, DeepSeek burst onto the scene, and the new Mac Studio was released shortly after. We ran the DeepSeek R1 671B model (note: in reality, only memory is needed, the hard drive doesn't need to be that large; a 1TB SSD model costing over 70,000 RMB would suffice) and the distilled 70B version on an M3 Ultra Mac Studio (512GB + 16TB) priced at nearly 100,000 RMB.
Our conclusion at the time was that a 70B processor was sufficient for everyday edge-deployed dialogue, and spending tens of thousands of dollars on a machine just to chat with AI was simply a waste of money. The model capabilities at the time were indeed not very good; it was only later that new multimodal models and agent capabilities emerged.
However, the fact that the massive number of parameters in the 671B model can be used for edge inference on a desktop machine is still a remarkable feat. On a 512GB unified memory, the 671B model occupied 400GB. With the context, the macOS system itself, and other tasks, it was almost at full load, but the machine ran quietly throughout, with noise levels within the normal range and no overheating.
In traditional AI infrastructure logic, this scale of parameters falls under the data center level, and consumer-grade hardware shouldn't theoretically appear in this scenario. But that M3 Ultra Mac Studio actually appeared quietly nonetheless.
Later, we interviewed Exo Labs, a startup team from Oxford University in the UK. They used four Mac Studios with 512GB of uniform memory to form a computing cluster with 128 CPU cores, 320 GPU cores, 2TB of uniform memory, and a total memory bandwidth of over 3TB/s.
The team developed the Exo V2 scheduling platform for this Mac cluster, which can load two DeepSeek models (V3+R1, 8-bit quantization) simultaneously. Not only can the two models infer in parallel, but researchers can also use QLoRA technology to perform local fine-tuning, significantly reducing the training time. The entire system's power consumption is kept below 400W, and there is virtually no fan noise during operation.
The traditional solution with equivalent computing power would require about 20 NVIDIA A100s, costing more than 2 million RMB at the time; in contrast, the total cost of Exo Labs' solution was only 400,000 RMB (similarly, the SSD was significantly overkill, so it could actually be under 300,000 RMB).
The founder of Exo Labs told us at the time that Oxford had its own GPU cluster, but applications required waiting in line for months, and only one card could be applied for at a time. These constraints forced them to innovate, and they happened to find the right tools: a unified memory architecture, MLX, and Mac computers.

In our article at the time, we wrote: "If Nvidia's H-series graphics cards are the pinnacle of AI development, then Mac Studio is becoming the Swiss Army knife in the hands of small and medium-sized teams."
Apple actually knew about this a long time ago.

What is a true AI PC?
Last year, Apple released the Basic Model Framework, which allows iOS and macOS developers to call the system's built-in basic models with zero network latency, zero API fees, and data without leaving the device.
Although Apple's modeling team nearly disintegrated later on , Apple didn't stagnate in its iterations. It always knew where developers were and what they wanted. Its response was to integrate large-model-driven AI capabilities into the operating system's infrastructure, making them easier for developers to utilize.
Last week, Apple open-sourced the python-apple-fm-sdk. Previously, complete testing and optimization of Apple's basic modules required a Swift environment; now, this SDK broadens the path, allowing developers accustomed to Python workflows to participate as well.

Apple's privacy design philosophy is consistent throughout: the underlying models called by the python-apple-fm-sdk run entirely locally, and the data never leaves the device. In scenarios where Apple's entire AI system must be deployed to the cloud, it uses Private Cloud Compute, where data is processed and then deleted, and Apple has no access to it.
Conversely, Recall also allows AI to access users' private data, but the first version stored it in an unencrypted plaintext database. One approach prevents leaks through its architecture, while the other only patches the data after an incident occurs.
However, the advantage of Mac as an AI development and deployment tool is more like an "adaptability advantage," or something that was acquired unexpectedly.
This means that Apple initially developed the Neural Engine to serve Face ID and Portrait mode; the unified memory architecture was a necessary step to break free from its long-standing dependence on Intel; and the open-sourcing of MLX was a response to developers' demand for efficient inference tools. The explosion of AI Agent scenarios, which the Mac happened to be able to capitalize on, was an unexpected benefit of these and many other unmentioned engineering decisions.
The Mac wasn't initially designed for AI; its product positioning has always been closer to that of a "creator's tool." Apple's long-term target users have been video editors, artists, and software engineers. They need machines with low noise, sustained performance, large memory capacity, and the ability to run around the clock.
AI model inference and the currently popular Agent deployment just happen to require the exact same thing.

Looking back, when Apple heavily invested in machine learning more than a decade ago, it most likely couldn't have foreseen the explosive popularity of OpenClaw in 2025. You could even argue that ten years ago, Apple probably wouldn't have liked OpenClaw, a platform that seemed to offer "high returns and even greater opportunities," where users' privacy and data security were disregarded, and various software engineering regulations were ignored once the illusion took hold…
But how to put it? Even if Apple doesn't like it now, it has no choice. Like Murphy's Law, perhaps some things were destined from the start. Every card Apple has played over the years, whether intentional or accidental, has become a winning hand in this year's Agent Year (hopefully this time it really is).
The Windows camp, which began pushing AI PCs in 2023, has actually been trying to catch up with the architectural advantage that Apple established when it launched the M1 in 2020. Of course, given the constant bad news Apple has faced regarding AI in 2025, it's possible to close this gap. But Apple won't stop and wait.
This week, Apple launched the M5 Pro and M5 Max, featuring chips with a dual-chip fusion architecture, and specifically named LM Studio as an LLM performance benchmark in its press release.
In the past, Apple didn't talk much about "large language models" in its hardware product launches, especially in the context of on-device inference—but things are different now.

In conclusion
We've raved about Apple all over the place, but let's calm down and ask ourselves a question about the title: Is today's Mac a true AI PC?
iFanr believes that Apple hasn't done enough. To date, we haven't seen a personal computing product that can be called an AI PC, or truly "native AI hardware."
Returning to OpenClaw, the true form of an AI PC is already becoming clear from today's edge-deployed agents.

Meme, AI generated
At the application level, the concept of "applications" geared towards humans may partially regress to a state without graphical interfaces. After all, humans need graphical interfaces, while agents do not. Moreover, you'll find that more and more people are recently becoming accustomed to interaction methods based on dialogue and command lines.
Today, early adopters of agents are finding tools and skills to equip them with; in the future, agents will themselves be pulling new tools and plugins from public code repositories to enhance themselves.
At the system level, the permission system will restructure the working principle of the agent, allowing the agent to directly manipulate various interfaces. At the underlying level, there will be a model orchestration and scheduling mechanism that switches between models as needed based on the task.
Local inference and privacy-preserving cloud inference will form a complete, secure, and privacy-preserving closed loop. Regardless of where the data is transmitted, it is vectorized, encrypted, and stored, and is destroyed immediately upon use…
In other words, a true AI PC should be a system that treats AI as a "first-class citizen" from the very beginning of its design, starting from the ground up.

Meme, AI generated
By this standard, both Mac and Windows are currently in a transitional phase. Mac is closer because the Unix environment, unified hardware, and mature ecosystem were already in place before the era of AI agents arrived. Windows carries a heavier historical baggage, making changes more difficult, and it's still catching up.
But after going around in circles, we still haven't gotten to the most fundamental question: Does a true AI PC really need to be a "PC"?
If we change our perspective, all agent deployment and operation are on the cloud; user-related data, i.e., "context," is also securely and privately stored in the cloud; humans only need a terminal device as a "communicator" and sensors to take photos and record audio to upload the necessary data to the agent, and this device does not even need much edge computing power.
Mac is the best AI PC today, but the "AI PC" of the future may be more like… iPhone?
By Du Chen
#Welcome to follow iFanr's official WeChat account: iFanr (WeChat ID: ifanr), where more exciting content will be presented to you as soon as possible.