Experiences from the Catch – MISS-project

Bijan Ranjbar-Sahraei and Julia Efremova

MiSS: Mining Social Structures from Genealogical Data
Partners – Eindhoven University of Technology, Maastricht University, Brabant Historical Information Center
From 2012 to 2016

The starting point of MiSS project was the large collection of historical documents maintained by the Brabant Historical Information Center (BHIC). A document could be anything ranging from scans of birth and death certificates to estate tax and loan declarations. The status of the collection was that the documents had been tagged by source and subject. Researchers could use keyword-based search to find relevant documents for their research (either a scan or a pointer to a physical location) based on those tags. The database, however, was not at all flawless; many names were duplicate, had several alternative spellings, or even contained mistakes. Furthermore, important semantic links such as the husband-wife relations were only implicitly available, making simple tasks such as finding out if two given persons were related, very labor intensive.

Therefore, MiSS addresses the problem of how to derive identities of persons and social structures from large sets of genealogical data available as structured and unstructured data with incomplete information. In order to do so MiSS investigates and deploys a combination of techniques from data mining, machine learning and human computation. The project goals are (a) a semantically enriched and cleaned version of the current database of the BHIC; (b) the development of advanced search tools to support historical research; and (c) providing automatic tools for supporting large scale prosopographical research.