Remove-Duplicate Algorithm Based on Meta Search Result

Hongbin Wang,Ming He,Lianke Zhou,Zijin Li,Haomin Zhan,Rang Wang

Remove-Duplicate Algorithm Based on Meta Search Result

2018

According to the characteristics of duplicate web pages in the meta search engine, a duplicate web pages detection algorithm is proposed based on a web page URL, title and abstract, and according to their different characteristics, different similarity computing method is proposed, firstly, the page URL is standardization processed in the algorithm, and then for the title detection, the algorithm improves the title string fuzzy matching algorithm and calculate the similarity based on the word frequency of each items in the query, for the abstract judgment, similarity computing is in accordance with the sentences of the abstract, for each sentence the algorithm gives three weights, and calculates the weights of similarity on base of each summary statement, the effect of the algorithm is obvious, it has been verified by experiment that the algorithm is superior to the traditional algorithm in the precision and recall rate.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations