乐于分享
好东西不私藏

从设备向量到知识文档:SonnetDB Measurement KNN 与文档关联检索

从设备向量到知识文档:SonnetDB Measurement KNN 与文档关联检索

前一篇讲的是 document collection 内部的 Hybrid Search。这一篇更接近工业现场:设备事件本身是时序 measurement,知识库是 JSON 文档,设备和租户信息在关系表里。SonnetDB 的 hybrid_search(...) 可以把这些模型放到一个查询里。

先写入设备事件向量:

usingSonnetDB.Engine;usingSonnetDB.Sql;usingSonnetDB.Sql.Execution;usingvardb=Tsdb.Open(newTsdbOptions{RootDirectory="data/measurement-knowledge-demo"});SqlExecutor.Execute(db,"""    CREATE MEASUREMENT incidents (        device_id TAG,        embedding FIELD VECTOR(3),        severity FIELD FLOAT    )    """);SqlExecutor.Execute(db,"""    INSERT INTO incidents (device_id, embedding, severity, time)    VALUES      ('pump-1', [1, 0, 0], 9.0, 1000),      ('pump-2', [0.8, 0.2, 0], 7.5, 2000),      ('fan-1',  [0, 1, 0], 3.0, 3000)    """);

再写知识库文档:

SqlExecutor.Execute(db,"CREATE DOCUMENT COLLECTION knowledge");SqlExecutor.Execute(db,"""    INSERT INTO knowledge (id, document)    VALUES      ('kb-pump-1', '{"device_id":"pump-1","title":"Pump overheating guide","body":"pump alarm overheating recovery","category":"fault","embedding":[1,0,0]}'),      ('kb-pump-2', '{"device_id":"pump-2","title":"Pump pressure guide","body":"pump alarm pressure inspection","category":"fault","embedding":[0.8,0.2,0]}'),      ('kb-fan-1',  '{"device_id":"fan-1","title":"Fan maintenance note","body":"fan maintenance normal","category":"normal","embedding":[0,1,0]}')    """);SqlExecutor.Execute(db,"CREATE JSON INDEX idx_knowledge_device ON knowledge ('$.device_id')");SqlExecutor.Execute(db,"CREATE FULLTEXT INDEX ft_knowledge_body ON knowledge ('$.body') USING unicode");

现在可以把 measurement KNN、知识文档 JSON path、全文 BM25 融合起来:

varresult=(SelectExecutionResult)SqlExecutor.Execute(db,"""    SELECT measurement.device_id AS device,           document_id,           json_value(document, '$.title') AS title,           measurement_distance() AS m_distance,           bm25_score() AS text_score,           hybrid_score() AS score    FROM hybrid_search(        source => incidents,        documents => knowledge,        vector_field => embedding,        vector => [1, 0, 0],        k => 5,        measurement_join_tag => device_id,        document_join_path => '$.device_id',        document_join_index => idx_knowledge_device,        text_index => ft_knowledge_body,        text_field => '$.body',        text => 'pump alarm overheating',        measurement_weight => 0.7,        text_weight => 0.3)    WHERE time >= 1000 AND category = 'fault'    ORDER BY score DESC    """);

这个查询做了几件事:

  • 先在 incidents.embedding 上按向量召回相近事件。
  • 用 device_id 关联 knowledge 文档里的 $.device_id
  • 对知识正文 $.body 做全文 BM25 评分。
  • 只保留 category = 'fault' 的知识文档。
  • 最后按融合分数排序。

加租户维表过滤

如果你还维护设备维表,可以继续 JOIN:

SqlExecutor.Execute(db,"""    CREATE TABLE devices (        id STRING,        tenant STRING,        site STRING,        PRIMARY KEY (id)    )    """);SqlExecutor.Execute(db,"CREATE INDEX idx_devices_tenant ON devices (tenant)");SqlExecutor.Execute(db,"""    INSERT INTO devices (id, tenant, site)    VALUES      ('pump-1', 'tenant-1', 'north'),      ('pump-2', 'tenant-2', 'south'),      ('fan-1',  'tenant-1', 'east')    """);

然后:

vartenantScoped=(SelectExecutionResult)SqlExecutor.Execute(db,"""    SELECT measurement.device_id AS device,           d.site AS site,           document_id,           hybrid_score() AS score    FROM hybrid_search(        source => incidents,        documents => knowledge,        vector_field => embedding,        vector => [1, 0, 0],        measurement_join_tag => device_id,        document_join_path => '$.device_id',        text => 'pump alarm')    JOIN devices d ON measurement.device_id = d.id    WHERE d.tenant = 'tenant-1'    ORDER BY score DESC    """);

这就是 SonnetDB 多模型能力最有意思的地方:向量不是孤立的,全文不是孤立的,关系维表也不是孤立的。它们可以一起回答一个业务问题。


官网地址:https://sonnetdb.com

技术文章站:https://iotpaper.net

开源仓库:https://github.com/IoTSharp/SonnetDB