前言
一、为什么需要多模型混合调度?
1.1 不存在银弹:每个模型都有擅长领域
| Claude 3 Opus | ||
| GPT-4o | ||
| DeepSeek V3/V2.5 | ||
| Doubao Kimi K2 | ||
| Llama 3 / Mistral | ||
| Gemini |
OpenClaw 的设计哲学是:把正确的任务交给正确的模型,而不是强迫用户绑定到某一家厂商。
1.2 成本优化:快慢模型搭配使用
1.3 高可用:故障自动转移
二、OpenClaw 模型抽象层设计
2.1 统一模型接口
`typescript interface ModelProvider { // 模型名称 name: string;// 支持的功能 capabilities: { streaming: boolean; functionCall: boolean; vision: boolean; };// 核心推理方法 complete(params: CompletionParams): Promise;// 流式推理 completeStreaming(params: CompletionParams): AsyncIterable;// 计算 token 数量 countTokens(text: string): Promise; } `
2.2 模型注册机制
`typescript // 在 config.ts 中注册模型 registerModel('deepseek', new DeepSeekProvider({ apiKey: process.env.DEEPSEEK_API_KEY, baseURL: process.env.DEEPSEEK_BASE_URL, defaultModel: 'deepseek-chat' }));registerModel('qianfan-deepseek', new QianfanProvider({ ak: process.env.QIANFAN_AK, sk: process.env.QIANFAN_SK, defaultModel: 'deepseek-v3' }));registerModel('doubao', new DoubaoProvider({ apiKey: process.env.DOUBAO_API_KEY, endpoint: process.env.DOUBAO_ENDPOINT, }));registerModel('claude', new AnthropicProvider({ apiKey: process.env.ANTHROPIC_API_KEY, })); `
2.3 模型元数据管理
`yaml modelMetadata:id: deepseek-chat provider: deepseek contextWindow: 64000 maxOutput: 4096 costPer1kTokens: input: 0.001 output: 0.002 capabilities: streaming: true functionCall: true vision: false preferredFor: ['code', 'reasoning'] `
三、核心调度算法
3.1 选择策略:四种调度模式
typescript// 在请求中指定模型const response = await openclaw.complete({ messages: [...], model: 'claude-3-opus' });
typescript// 任务类型到模型的映射const taskTypeMapping = { 'code': ['deepseek-v3', 'claude-3-opus', 'gpt-4o'], 'reasoning': ['deepseek-v3', 'claude-3-opus'], 'chat': ['doubao', 'gpt-3.5-turbo'], 'vision': ['gpt-4o', 'gemini-pro-vision'], 'long-context': ['kimi-k2', 'claude-3-sonnet'], };
代价 = 价格权重 × 价格 + 延迟权重 × 延迟 + 负载权重 × 当前负载typescript const fallbackOrder = ['deepseek-v3', 'doubao', 'claude-sonnet'];for (const modelId of fallbackOrder) {try {return await tryModel(modelId, params);} catch (err) {// 记录错误,继续下一个logger.warn(Model failed, trying next...);}}
3.2 负载均衡
`typescript class LoadBalancer {private modelHealth: Map;// 基于熔断模式 shouldTry(modelId: string): boolean { const health = this.modelHealth.get(modelId); if (!health) return true;// 如果连续失败超过阈值,熔断一段时间if (health.failureCount > 3 && Date.now() - health.lastFailure < 60000) {return false;}return true;}recordSuccess(modelId: string) {// 重置失败计数const health = this.modelHealth.get(modelId);if (health) {health.failureCount = 0; health.isHealthy = true;}}recordFailure(modelId: string) {// 更新失败统计let health = this.modelHealth.get(modelId);if (!health) {health = {lastFailure: 0,failureCount: 0,isHealthy: true};this.modelHealth.set(modelId, health);}health.lastFailure = Date.now();health.failureCount++;}} `
3.3 并发控制和排队
`typescript class ConcurrencyLimiter {private concurrentRequests: number = 0;private queue: Deferred[] = [];async acquire(): Promise {if (this.concurrentRequests < this.maxConcurrency) {this.concurrentRequests++; return;} // 排队等待 const deferred = new Deferred(); this.queue.push(deferred); await deferred.promise; this.concurrentRequests++; }release(): void {this.concurrentRequests--;if (this.queue.length > 0) {const next = this.queue.shift();next.resolve();}}} `
四、会话级模型覆盖
4.1 会话级别模型指定
``n/openclaw set-model deepseek-v3 `4.2 单次请求覆盖
typescript// coding-agent 总是使用强大的代码模型const result = await agent.complete({messages: codePrompt,model: 'claude-3-opus', // 强制使用 Opus 保证代码质量});
4.3 配置文件默认模型
json { model: { defaultModel: deepseek-v3, fallbackModel: doubao, overrides: { coding-agent: claude-3-opus, summarize: kimi-k2, weather: doubao } } }五、实践:如何配置多模型调度
5.1 最小化配置(单模型)
json { model: { defaultProvider: deepseek, defaultModel: deepseek-chat, providers: { deepseek: { apiKey: YOUR_API_KEY, baseURL: https://api.deepseek.com/v1 } } } }5.2 生产环境配置(多模型+故障转移)
json { model: { defaultModel: deepseek-v3, fallbackOrder: [deepseek-v3, doubao-kimi, claude-3-sonnet], enableAutoFallback: true, providers: { deepseek: { apiKey: YOUR_DEEPSEEK_KEY, maxConcurrency: 5 }, doubao: { apiKey: YOUR_DOUBAO_KEY, endpoint: YOUR_ENDPOINT, maxConcurrency: 10 }, anthropic: { apiKey: YOUR_ANTHROPIC_KEY, maxConcurrency: 3 } }, skillOverrides: { coding-agent: claude-3-opus, summarize: doubao-kimi-k2, weather: doubao } } }5.3 成本优化配置
json { model: { schedulingStrategy: cost-optimized, maxCostPerRequest: 0.05, preferCheaper: true, providers: { local-llama: { baseURL: http://localhost:8000/v1, costPer1kInput: 0, costPer1kOutput: 0 }, deepseek: { apiKey: ..., costPer1kInput: 0.001, costPer1kOutput: 0.002 } } } }六、流式输出处理
`typescript // OpenClaw 统一流式输出格式 interface StreamingChunk { text: string; // 增量文本 done: boolean; // 是否结束 usage?: { // token 使用量(最后一块) promptTokens: number; completionTokens: number; totalTokens: number; }; toolCall?: ToolCall; // 工具调用信息 }// 不同厂商转换示例 class OpenAICompatibleAdapter implements ModelProvider { async *completeStreaming(params: CompletionParams): AsyncIterable { const response = await this.openai.chat.completions.create({ ...params, stream: true, });forawait (const chunk of response) {yield {text: chunk.choices[0]?.delta?.content || '',done: chunk.choices[0]?.finish_reason === 'stop',};}} }class AnthropicAdapter implements ModelProvider { async *completeStreaming(params: CompletionParams): AsyncIterable { const response = await this.anthropic.messages.create({ ...params, stream: true, });forawait (constevent of response) {if (event.type === 'content_block_delta') {yield {text: event.delta.text,done: false,};} else if (event.type === 'message_stop') {yield {text: '',done: true,usage: event.message.usage,};}}} } `
七、token 使用量统计和成本追踪
` typescript class TokenTracker { private usageByModel: Map;recordUsage(modelId: string, promptTokens: number, completionTokens: number) { const meta = this.getModelMetadata(modelId); const cost = promptTokens * meta.costPer1kInput / 1000 + completionTokens * meta.costPer1kOutput / 1000;// 更新统计const usage = this.usageByModel.get(modelId) ||{ promptTokens: 0, completionTokens: 0, cost: 0 };usage.promptTokens += promptTokens;usage.completionTokens += completionTokens;usage.cost += cost;this.usageByModel.set(modelId, usage);}getStats() { return Object.fromEntries(this.usageByModel); } } `你可以随时运行 /status 查看:``n📊 Model Usage Statistics:deepseek-v3: Input tokens: 125,432 Output tokens: 48,921 Estimated cost: .22claude-3-opus: Input tokens: 36,120 Output tokens: 12,450 Estimated cost: .15Total estimated cost: .37 `
八、最佳实践和经验总结
8.1 推荐组合
默认聊天/日常任务:Doubao Kimi K2(性价比高,中文好)代码生成/复杂任务:Claude 3 Opus 或 DeepSeek V3超长上下文处理:Kimi K2(128k+)本地隐私任务:Llama 3 70B(本地部署)故障转移:Always have at least one backup
夜雨聆风