(1.江苏科技大学 电子信息学院, 江苏 镇江 212003; 2.东南大学 计算机科学与工程学院, 南京 210096)
摘 要:针对DBSCAN算法需用户设置参数值、易产生挖掘结果偏差等不足,提出改进算法DBTC(densitybased text clustering),该算法不仅能够发现任意形状的簇,还有效地解决了基于密度的DBSCAN聚类算法在文本挖掘中参数设置困难和高密度的簇被相连的低密度簇包含的问题。理论分析和实验结果表明,算法是有效可行的。
关键词:分词; 文本聚类; 向量空间模型; 核心对象
Text cluster mining algorithm based on density
ZHAO Kang1, LU Jieping1, NI Weiwei2, WANG Guiping1
(1.School of Electronics & Information, Jiangsu University of Science & Technology, Zhenjiang Jiangsu 212003, China; 2. College of Computer Science & Engineering, Southeast University, Nanjing 210096, China)
Abstract:Focusing on the problem that the DBSCAN algorithm needs to set parameters by users and tends to warp the mining result, proposed an improved text clustering algorithms DBTC (densitybased text clustering). The algorithm not only could find arbitrary shaped clusters, but also efficiently solved these problems which were it was too difficult for users to determine the parameters and the highdensity cluster was completely contained to the linked lowdensity cluster. Theoretic analysis and experimental results indicate that the algorithm is effective and efficient. ......