本地模型Openclaw“养虾”探索记录

过去一周工作之余，做了几件事情

上周六设备到货后，第一时间安装上LM Studio，然后下载了多个不同类型的不同级别的模型进行探索，先说结论：同时让多人使用无限token进行养虾的想法基本不可能，因为并发的瓶颈无法解决，体验不好

试验过的模型有：

minimax-2.5qwen3-coder-nextqwen3.5-a35b-3bnemotron-supernemotron-nano

最终选定qwen3-coder-next@Q4_K_M作为Clade的编码模型，用了一周，虽然速度不是很快，但够用，编码质量还可以；选定Qwen3.5-35B-A3B@Q4_K_M作为Openclaw的主要模型

minimax2.5作为230B的MOE架构的模型，我下载的是MLX 4bit量化板，模型大小在130GB左右，再加上几十GB的缓存需求，这台192GB内存的设备是能够跑起来的，在processing prompt阶段速度较慢，但是输出tokens时速度还不错，首次/无缓存时响应太慢；4T的硬盘空间允许我将其留在机器上，偶尔遇到复杂的内容或者需要写文章可以跑起来用试试

MiniMax-M2.5 is an MoE model (230B total) extensively trained with reinforcement learning in hundreds of thousands of real-world environments, delivering SOTA results in coding, agentic tool use, search, and office work.

qwen3.5-35b-a3b单人使用能够兼顾速度和质量，体验还不错，将其作为Openclaw的主模型使用。它还具备优秀的图片识别能力，现在孩子作业我都是直接拍照让其批改或给出解答，后续打算利用这个图片识别能力进行一些探索性的尝试

Qwen3.5 represents a significant leap forward, integrating breakthroughs in multimodal learning, architectural efficiency, reinforcement learning scale, and global accessibility. It has 35B total parameters and 3B activated, supporting a native context length of 262,144 tokens.

qwen3-coder-next，80B的总参数，MOE架构激活参数仅3B,同时和qwen3.5-35b-a3b一起跑起来还是很轻松，编码使用能够兼顾速度和质量，也还不错。我将其作为Claude Code的主模型使用，一个缺点就是Claude带上去上下文一般比较长，processing prompt阶段有点慢，之后的阶段还是比较快的，后续继续研究是否有参数调节可以提提速

Qwen Coder Next is an 80B MoE with 3B active parameters designed for coding agents and local development. Excels at long-horizon reasoning, complex tool usage, and recovery from execution failures.

nemotron-nano对中文的支持不是太好，相比起来还是qwen系列最合适;或许是个宝，还可以挖掘一下看看，比如额Qwen-coder-next在编码方面比较一下

Nemotron 3 Nano by NVIDIA
General purpose reasoning and chat model trained from scratch by NVIDIA. Contains 30B total parameters with only 3.5B active at a time for low-latency MoE inference.
Features a reasoning toggle to enable or disable intermediate reasoning traces, with improved accuracy on complex queries when reasoning is enabled. Includes native agentic capabilities for tool use, making it suitable for AI agents, RAG systems, chatbots, and other AI-powered applications.
Supports a context length of 1M tokens.

nemotron-super，看介绍还可以，不过12B的激活参数速度就要慢上许多，而且运行它时其他模型就没有多少内存空间可以用了

Nemotron 3 Super
General purpose reasoning and chat model trained by NVIDIA. Contains 120B total parameters with only 12B active at a time, using a hybrid LatentMoE architecture with Multi-Token Prediction layers for efficient high-throughput inference.
Supports multiple languages including English, Spanish, French, German, Japanese, Italian, and Chinese.
Supports a context length of 1M tokens.

另外尝试用ComfyUI为平台，尝试了文生图和图生视频，文生图效率还可以，图生视频简单使用了，没有生成太长的视频；文生视频还没搞明白怎么配置；这方面不只是技术问题，还有创意和审美都是我的弱项，所以基本就是了解一下明白是咋回事；这方面SeeDance2.0确实牛，把这个过程做的简单化，谁都可以做视频，抖音上已经刷到不少做的好的视频了

不用操心token费用的感觉很好，可以随意的尝试自己想做的事，虽然效率和质量差一点，但也能接受了；这周消耗的token，如果是购买API或套餐，也要好几百块钱

资源就是一道门槛，没有资源的时候很多事情只能想想，有了资源就可以直接上手尝试了

今天下午了解了omlx，已经用上了Google这几天传得火热的Turbo Quant技术，实测大大提升了Claude的编码速度，具体情况下次发文再说