互联网 qkzz.net
全刊杂志网:首页 > 女性 > 文章正文
刊社推荐

一种有效的专题信息集中和检索策略


  摘 要:Internet上专题资源网页汇聚和检索是垂直搜索引擎中的核心问题,HITS算法是早期解决这个问题的经典算法,很多文献对它进行了改进,但无论索引的主题相关率还是引擎的查准率都有提高的余地。提出一种基于锚文本和标题信息过滤并结合网页内容相关度判断的HITS专题检索策略,利用专题训练集判断主题相关度,很好地解决了只依靠查询字符串判断的弊端。实验表明,此策略能很好地提高专题信息汇聚精确度和检索的准确率,并且减少了非相关URL的下载量。

  关键词:HITS算法; 锚文本; 网页标题; 专题相关度; 向量模型; 专题训练集

  中图分类号:TP301.6文献标志码:A

  文章编号:1001-3695(2010)06-2106-03

  doi:10.3969/j.issn.1001-3695.2010.06.032

  Effective strategy of topic distillation and retrieval

  WANG Yuxina, LIU Haifenga, GUO Heb, CHEN Xinb

   (a.School of Electronic & Information Engineering, b.School of Software, Dalian University of Technology, Dalian Liaoning 116023, China)

  Abstract:The strategy of topic distillation and retrieval on Internet is the key work in research of vertical search engine. HITS algorithm is a classical method for this problem at an earlier time, and some improvements are made by other researchers afterwards. Nevertheless, no matter the theme relation rate or accuracy grade of engine still have room to be improved. This paper proposed a strategy of topic distillation and retrieval by filtering Web pages based on anchor texts and titles combining relation grade of Web pages. Using the topic training collection to judge relation grade could overcome the shortcomings of depending on inquiring strings. The experiment results prove that this strategy can improve the accuracy of topic distillation and retrieval, and reduce the downloaded information of unrelated URLs.

......
很抱歉,暂无全文,若需要阅读全文或喜欢本刊物请联系《计算机应用研究》杂志社购买。
欢迎作者提供全文,请点击编辑
分享:
 

了解更多资讯,请关注“木兰百花园”
分享:
 
精彩图文


关键字
支持中国杂志产业发展,请购买、订阅纸质杂志,欢迎杂志社提供过刊、样刊及电子版。
关于我们 | 网站声明 | 刊社管理 | 网站地图 | 联系方式 | 中图分类法 | RSS 2.0订阅 | IP查询
全刊杂志赏析网 2017