从设备向量到知识文档:SonnetDB Measurement KNN 与文档关联检索
前一篇讲的是 document collection 内部的 Hybrid Search。这一篇更接近工业现场:设备事件本身是时序 measurement,知识库是 JSON 文档,设备和租户信息在关系表里。SonnetDB 的 hybrid_search(...) 可以把这些模型放到一个查询里。
先写入设备事件向量:
usingSonnetDB.Engine;usingSonnetDB.Sql;usingSonnetDB.Sql.Execution;usingvardb=Tsdb.Open(newTsdbOptions{RootDirectory="data/measurement-knowledge-demo"});SqlExecutor.Execute(db,""" CREATE MEASUREMENT incidents ( device_id TAG, embedding FIELD VECTOR(3), severity FIELD FLOAT ) """);SqlExecutor.Execute(db,""" INSERT INTO incidents (device_id, embedding, severity, time) VALUES ('pump-1', [1, 0, 0], 9.0, 1000), ('pump-2', [0.8, 0.2, 0], 7.5, 2000), ('fan-1', [0, 1, 0], 3.0, 3000) """);
再写知识库文档:
SqlExecutor.Execute(db,"CREATE DOCUMENT COLLECTION knowledge");SqlExecutor.Execute(db,""" INSERT INTO knowledge (id, document) VALUES ('kb-pump-1', '{"device_id":"pump-1","title":"Pump overheating guide","body":"pump alarm overheating recovery","category":"fault","embedding":[1,0,0]}'), ('kb-pump-2', '{"device_id":"pump-2","title":"Pump pressure guide","body":"pump alarm pressure inspection","category":"fault","embedding":[0.8,0.2,0]}'), ('kb-fan-1', '{"device_id":"fan-1","title":"Fan maintenance note","body":"fan maintenance normal","category":"normal","embedding":[0,1,0]}') """);SqlExecutor.Execute(db,"CREATE JSON INDEX idx_knowledge_device ON knowledge ('$.device_id')");SqlExecutor.Execute(db,"CREATE FULLTEXT INDEX ft_knowledge_body ON knowledge ('$.body') USING unicode");
现在可以把 measurement KNN、知识文档 JSON path、全文 BM25 融合起来:
varresult=(SelectExecutionResult)SqlExecutor.Execute(db,""" SELECT measurement.device_id AS device, document_id, json_value(document, '$.title') AS title, measurement_distance() AS m_distance, bm25_score() AS text_score, hybrid_score() AS score FROM hybrid_search( source => incidents, documents => knowledge, vector_field => embedding, vector => [1, 0, 0], k => 5, measurement_join_tag => device_id, document_join_path => '$.device_id', document_join_index => idx_knowledge_device, text_index => ft_knowledge_body, text_field => '$.body', text => 'pump alarm overheating', measurement_weight => 0.7, text_weight => 0.3) WHERE time >= 1000 AND category = 'fault' ORDER BY score DESC """);
这个查询做了几件事:
-
先在 incidents.embedding上按向量召回相近事件。 -
用 device_id关联knowledge文档里的$.device_id。 -
对知识正文 $.body做全文 BM25 评分。 -
只保留 category = 'fault'的知识文档。 -
最后按融合分数排序。
加租户维表过滤
如果你还维护设备维表,可以继续 JOIN:
SqlExecutor.Execute(db,""" CREATE TABLE devices ( id STRING, tenant STRING, site STRING, PRIMARY KEY (id) ) """);SqlExecutor.Execute(db,"CREATE INDEX idx_devices_tenant ON devices (tenant)");SqlExecutor.Execute(db,""" INSERT INTO devices (id, tenant, site) VALUES ('pump-1', 'tenant-1', 'north'), ('pump-2', 'tenant-2', 'south'), ('fan-1', 'tenant-1', 'east') """);
然后:
vartenantScoped=(SelectExecutionResult)SqlExecutor.Execute(db,""" SELECT measurement.device_id AS device, d.site AS site, document_id, hybrid_score() AS score FROM hybrid_search( source => incidents, documents => knowledge, vector_field => embedding, vector => [1, 0, 0], measurement_join_tag => device_id, document_join_path => '$.device_id', text => 'pump alarm') JOIN devices d ON measurement.device_id = d.id WHERE d.tenant = 'tenant-1' ORDER BY score DESC """);
这就是 SonnetDB 多模型能力最有意思的地方:向量不是孤立的,全文不是孤立的,关系维表也不是孤立的。它们可以一起回答一个业务问题。
官网地址:https://sonnetdb.com
技术文章站:https://iotpaper.net
开源仓库:https://github.com/IoTSharp/SonnetDB

夜雨聆风