乐于分享
好东西不私藏

资源受限场景下的少样本跨域文档检索模型

资源受限场景下的少样本跨域文档检索模型

引文格式杨得草,苗怡然,陈超,等.资源受限场景下的少样本跨域文档检索模型[J].西华师范大学学报(自然科学版),2025,46(6):667-676.

作者:杨得草,苗怡然,陈超,于久桓,李齐治,彭德中

通讯作者:杨得草(1996—),工程师,男,主要从事核技术支持工作。

摘要随着互联网的发展,网络上每天会产生数以万计的数据,用户难以从海量数据中准确检索出想要的内容。为帮助用户精准搜索到目标信息,本文提出了一种基于内在语义对比学习与句子向量聚合的小样本跨域文本检索模型。内在语义对比学习不仅解决了数据分布不一致导致的泛化问题,还克服了NLP中难以通过数据增强进行对比学习的难题;句子向量聚合模块解决了模型在显存不足时难以处理长文档的问题。在构建的小样本跨域文本检索的数据集上的实验表明,本文提出的方法能够有效提高检索性能,并且解决显存不足时长文本难以处理的问题。

关键词文档检索;文档表示;对比学习;邻域泛化;小样本学习

参考文献

1]全国数据资源调查工作组(国家工业信息安全发展研究中心).全国数据资源调查报告(2023年)[R.福州:第七届数字中国建设峰会·数据资源与数字安全论坛,2024.

2]中国互联网络信息中心.53次中国互联网络发展状况统计报告[R.北京:中国互联网络信息中心,2024.

3WANG X,PENG D Z,HU P,et al.Cross-domain alignment for zero-shot sketch-based image retrievalJ.IEEE Transactions on Circuits and Systems for Video Technology,2023,33(11)7024-7035.

4ZHANG H X,CHENG D Q,KOU Q Q,et al.Indicative Vision Transformer for end-to-end zero-shot sketch-based image retrievalJ.Advanced Engineering Informatics,2024,60102398.

5WU L,WANG Y,SHAO L.Cycle-consistent deep generative hashing for cross-modal retrievalJ.IEEE Transactions on Image Processing,2019,28(4)1602-1612.

6]廖颖.面向长文档的智能问答技术研究[D.秦皇岛:燕山大学,2023.

7]杨帆.基于语义增强特征融合的多模态图像检索模型[D.大连:大连海事大学,2023.

8SUN Y,REN Z W,HU P,et al.Hierarchical consensus hashing for cross-modal retrievalJ.IEEE Transactions on Multimedia,2023,26824-836.

9KOBAYASHI S.Contextual augmentationdata augmentation by words with paradigmatic relationsJ.2018.

10WEI J,ZOU K.EDAeasy data augmentation techniques for boosting performance on text classification tasksC//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP),20196382-6388.

11 FADAEE M,BISAZZA A,MONZ C.Data augmentation for low-resource neural machine translationC// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2Short Papers),2017567-573.

12ZHAO W X,LIU J,REN R Y,et al.Dense text retrieval based on pretrained language modelsa surveyJ.ACM Transactions on Information Systems,2024,42(4)1-60.

13RAO J,DING L,QI S H,et al.Dynamic contrastive distillation for image-text retrievalJ.IEEE Transactions on Multimedia,2023,258383-8395.

14LING C,ZHAO X J,LU J Y,et al.Domain specialization as the key to make large language models disruptivea comprehensive surveyEB/OL.(2024-03-29)2024-07-22.https//arxiv.org/abs/2305.18703v7.

15LIU H R,MA Y,YAN M,et al.DiDAdisambiguated domain alignment for cross-domain retrieval with partial labelsC//AAAI Conference on Artificial Intelligence,2024.

16]郑敏.基于判别性特征学习的细粒度图像文本检索研究[D.北京:北京交通大学,2022.

17]汪浩然.基于语义和常识指导的跨模态图文检索技术研究[D.天津:天津大学,2021.

18DEVLIN J,CHANG M W,LEE K,et al.BERTpre-training of deep bidirectional transformers for language understandingEB/OL.20181810.04805.https//arxiv.org/abs/1810.04805v2.

19WANG X Y,DU Y J,CHEN D,et al.Constructing better prototype generators with 3D CNNs for few-shot text classificationJ.Expert Systems with Applications,2023,225120124.

20KIM Y.Convolutional neural networks for sentence classificationC//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).Doha,Qatar.Stroudsburg,PA,USAACL,20141746-1751.

21LIU P,QIU X,XUANJING H.Recurrent Neural Network for Text Classification with Multi-Task LearningC//Proceeding of the 25th International Joint Conference on Artificial Intelligence.20162873-2879.

22QIN Y,PENG D Z,PENG X,et al.Deep evidential learning with noisy correspondence for cross-modal retrievalC//Proceedings of the 30th ACM International Conference on Multimedia.October 10-14,2022,Lisboa,Portugal.ACM,20224948-4956.

23GROENENDIJK R,KARAOGLU S,GEVERS T,et al.Multi-loss weighting with coefficient of variationsC//2021 IEEE Winter Conference on Applications of Computer Vision (WACV).January 3-8,2021.Waikoloa,HI,USA.IEEE,20211469-1478.

24GAO T Y,YAO X C,CHEN D Q.SimCSEsimple contrastive learning of sentence embeddingsC//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.Online and Punta Cana,Dominican Republic.Stroudsburg,PA,USAACL,20216894-6910.

25MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector spaceJ.1st International Conference on Learning Representations,ICLR 2013-Workshop Track Proceedings,20131-12.

FINANCE
扫码关注

网址:igne.cbpt.cnki.net/portal

通信地址:四川省南充市顺庆区师大路 1号

邮政编码:637009

办公室E-mail:jcwnuns@126.com

联系电话:0817-2568651