Omni-note 多模态AI笔记 v1.1 使用文档-夜雨聆风

Omni-note 多模态AI笔记 v1.1 使用文档

概述

Omni-note 是一款多模态AI笔记应用，集成了文本生成、视觉识别、看图作文、文生图、语音识别、笔记知识库和MCP智能体等多项AI功能。软件版本 linux-x64-v1.1 已发布到 GitHub：https://github.com/turingevo/Omni-note^[1]

一、系统要求与环境

操作系统: Ubuntu 22.04
架构: Linux x64
必需推理工具:

llama.cpp^[2]
Funasr^[3]
stable-diffusion.cpp^[4]

二、文件目录结构

omni-note/    doc/                    # 文档目录    instance/               # 实例数据目录    model_server/           # 模型服务脚本    OmniNote                # 主程序

三、配置说明

1. 配置路径

说明	路径
实例文件夹	`instance/`
配置文件路径	`instance/config.py`
音频文件路径	`instance/audio/`
图片文件路径	`instance/uploads/`
模型文件路径	`instance/models/`

2. 模型API配置

编辑 instance/config.py 文件：

MODEL_CONFIGS = {'LLM': {'model': "你的LLM模型名称",'model_server': "模型服务地址",'api_key': "模型API_KEY"    },'VLM': {'model': "你的VLM模型名称",'model_server': "VLM模型服务地址",'api_key': "VLM模型API_KEY"    }}

3. 智能体配置

AGENTS_LLM = {"model": "Qwen/Qwen3-235B-A22B-Thinking-2507","model_server": "https://api-inference.modelscope.cn/v1","api_key": "<你的API_KEY>",}AGENTS = [    {"mcpServers": {"12306-mcp": {"args": ["-y", "12306-mcp"],"command": "npx"            }        }    },    {"mcpServers": {"weather": {"command": "node","args": ["<你的路径>/mcp/open-meteo-weather/dist/index.js"]            }        }    },    {"mcpServers": {"excel": {"command": "uvx","args": ["excel-mcp-server", "stdio"]            }        }    },]

四、模型服务说明

Omni-note 提供统一的模型服务接口，兼容行业标准API：

模型服务脚本	说明	工具	标准化API兼容
`vlm_server.sh`	视觉语言模型服务	llama.cpp	OpenAI API格式^[5]
`embeddings_server.sh`	嵌入模型服务	llama.cpp	OpenAI Embedding API格式^[6]
`reranker_server.sh`	重排模型服务	llama.cpp	Jina AI Rerank API格式^[7]
`audio_server.sh`	语音识别模型服务	Funasr	FUNASR API格式^[8]
`img_server.sh`	图像生成模型服务	stable-diffusion.cpp	stable-diffusion.cpp API格式^[9]

如果不使用内置 model_server，可通过配置 instance/config.py 指定其他模型API服务（需按照标准化API兼容格式）。

模型下载与配置

1. VLM 视觉语言模型

下载地址: https://www.modelscope.cn/models/unsloth/Qwen3.5-4B-GGUF/^[10]
下载 Qwen3.5-4B-UD-Q4_K_XL.gguf 和 mmproj-F16.gguf
放置在 instance/models/Qwen3.5-4B-GGUF/ 目录下

2. 嵌入模型

下载地址: https://huggingface.co/turingevo/bge-base-zh-v1.5-Q4_K_M-GGUF^[11]
放置在 instance/models/bge-base-zh-v1.5-GGUF/ 目录下

3. 重排模型

下载地址: https://huggingface.co/turingevo/bge-reranker-base-Q4_K_M-GGUF^[12]
放置在 instance/models/bge-reranker-base-GGUF/ 目录下

4. 图像生成模型

下载地址: https://www.modelscope.cn/models/TuringEvo/tiny-sd-gguf^[13]
放置在 instance/models/tiny-sd-gguf/ 目录下

5. 语音识别模型 (Funasr)

必需下载以下模型文件到 instance/models/funasr_model：

说明	模型名称
FSMN语音端点检测-中文-通用-16k	iic/speech_fsmn_vad_zh-cn-16k-common-onnx^[14]
Paraformer语音识别-中文-通用-16k-离线	iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx^[15]
Paraformer语音识别-中文-通用-16k-实时	iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx^[16]
CT-Transformer标点-中文-通用-实时	iic/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx^[17]
Ngram语言模型-中文	iic/speech_ngram_lm_zh-cn-ai-wesp-fst^[18]
基于FST的中文ITN	thuduj12/fst_itn_zh^[19]

注意: 等待命令行界面出现 listen on port: 10095 后才能使用。

五、启动说明

1. 启动笔记服务

chmod a+x ./OmniNote./OmniNote

启动后可见 MCP 服务器初始化日志，服务运行在 http://0.0.0.0:7000

2. 启动模型服务

依次启动需要的模型服务脚本：

./model_server/vlm_server.sh./model_server/embeddings_server.sh./model_server/reranker_server.sh./model_server/audio_server.sh./model_server/img_server.sh

3. 打开笔记UI

浏览器访问: http://localhost:7000^[20]

六、功能使用说明

AI对话框

在编辑器任意位置点击，鼠标悬停在小方块上，弹出对话框。点击每项的 ? 获取使用说明。

1. 文本生成

使用LLM模型进行文本内容生成。

2. 视觉识别

使用VLM模型对图片内容进行识别和分析。

3. 看图作文

结合视觉理解和文本生成，根据图片创作文字内容。

4. 文生图

使用图像生成模型根据文字描述生成图片。

5. 语音识别

前提: 确保在 instance/config.py 中配置了正确的语音识别模型服务。

6. 笔记知识库

确保配置 instance/config.py 文件中指定正确的嵌入和重排序模型服务
切换到 知识库页面
首次使用点击 构建知识库，弹窗显示成功
每次笔记内容更新后，请再次点击 构建知识库，确保查询最新的笔记
在对话框中输入需要查找的笔记内容即可

7. MCP智能体

安装各种MCP智能体，安装完成后需要重启程序：

https://www.modelscope.cn/mcp^[21]
https://github.com/search?q=mcp&type=repositories^[22]

切换到 智能体助手 页面，输入指令，自动调度智能体完成工作。

七、可用MCP智能体示例

12306-mcp

用于查询12306火车票信息

weather

获取天气信息和预报
可用工具：

get-weather-by-address: 根据地址获取天气
get-forecast-by-address: 根据地址获取天气预报
get-current-location-weather: 获取当前位置天气
get-current-location-forecast: 获取当前位置天气预报
get-weather-by-coordinates: 根据坐标获取天气
geocode-address: 地址解析

excel

Excel表格处理智能体

八、自定义模型API

如果不需要使用内置的 model_server，可以通过修改 instance/config.py 配置文件，指定其他符合标准化API格式的模型服务。

感谢您使用 Omni-note 多模态AI笔记！

引用链接

[1]https://github.com/turingevo/Omni-note

[2]llama.cpp: https://github.com/ggml-org/llama.cpp

[3]Funasr: https://github.com/modelscope/FunASR

[4]stable-diffusion.cpp: https://github.com/leejet/stable-diffusion.cpp

[5]OpenAI API格式: https://github.com/openai/openai-openapi

[6]OpenAI Embedding API格式: https://github.com/openai/openai-openapi

[7]Jina AI Rerank API格式: https://doc.ai-api.chat/jinaai-rerank/

[8]FUNASR API格式: https://github.com/modelscope/FunASR/blob/main/runtime/docs/websocket_protocol_zh.md

[9]stable-diffusion.cpp API格式: https://github.com/leejet/stable-diffusion.cpp/blob/master/examples/server/api.md

[10]https://www.modelscope.cn/models/unsloth/Qwen3.5-4B-GGUF/

[11]https://huggingface.co/turingevo/bge-base-zh-v1.5-Q4_K_M-GGUF

[12]https://huggingface.co/turingevo/bge-reranker-base-Q4_K_M-GGUF

[13]https://www.modelscope.cn/models/TuringEvo/tiny-sd-gguf

[14]iic/speech_fsmn_vad_zh-cn-16k-common-onnx: https://www.modelscope.cn/models/iic/speech_fsmn_vad_zh-cn-16k-common-onnx

[15]iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx: https://www.modelscope.cn/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx

[16]iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx: https://www.modelscope.cn/models/iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx

[17]iic/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx: https://www.modelscope.cn/models/iic/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx

[18]iic/speech_ngram_lm_zh-cn-ai-wesp-fst: https://www.modelscope.cn/models/iic/speech_ngram_lm_zh-cn-ai-wesp-fst

[19]thuduj12/fst_itn_zh: https://www.modelscope.cn/models/thuduj12/fst_itn_zh

[20]http://localhost:7000

[21]https://www.modelscope.cn/mcp

[22]https://github.com/search?q=mcp&type=repositories