Entity resolution in disjoint graphs: An application on genealogical data
2016
Entity Resolution (ER) is the process of identifying
references referringto the same entity from one or more data sources. In the ER process, most existing approaches exploit the content information of
references, categorized as content-based ER, or additionally consider linkage information among
references, categorized as context-based ER. However, in new applications of ER, such as in the
genealogicaldomain, the very limited linkage information among
referencesresults in a disjoint graph in which the existing content-/context-based ER techniques have very limited applicability. Therefore, in this paper we propose first, to use the
homophilyprinciple for augmentation of the original input graph by connecting the potential similar
references, and second, to use a Random Walk based approach to consider contextual information available for each
referencein the augmented graph. We evaluate the proposed method by applying it to a large
genealogicaldataset and we succeed to predict 420,000
referencematches with precision 92% and discover six novel and informative patterns among them which can not be detected in the original disjoint graph.
Keywords:
-
Correction
-
Source
-
Cite
-
Save
48
References
7
Citations
NaN
KQI