Approach of pinpointing subject information in Web
pages based on heuristic rules
HU Jin-zhu, ZHOU Xing, SHU Jiang-bo, XIONG Chun-xiu
(Dept. of Computer Science, Huazhong Normal University, Wuhan 430079, China)
Abstract:At present, most of information extraction methods aim at the extraction of subject information block, not further penetrate into the extraction of each independent subject information. To solve this problem, this article proposed an approach of pinpointing subject information in Web pages based on heuristic rules. Firstly, for each independent subject, it analyzed its various characteristic, and formulated corresponding heuristic rules. Then, it obtained weight matrix of heuristic rules by using the feature that different rules had different importance to locate subject. Finally, according to localization algorithm of heuristic rules, it pinpointed each subject. The method has been applied to an automatic extraction system, and the experimental result shows the effectiveness and accuracy of the method. ......