Linked Data Infrastructures for Digital Humanities – Experiences from Practical Applications

Eero Hyvonen

A major challenge in publishing linked Cultural Heritage (CH) on the web is semantic interoperability. This is due to the heterogeneity of the contents and the distributed content creation model where publishers focus on their own data with little consideration on the others’ data. Since historical collections nearly always contain text, various shared challenges of knowledge extraction are encountered in the content creation process. For example, one has to deal with OCR errors, data cleaning, recognize names and other entities, and more complex structures based on them. A tricky question encountered immediately is how to do semantic disambiguation between names and concepts in order to do data linking correctly. As a solution approach, the “Sampo” model is presented arguing for establishing a shared Linked Data content and service infrastructure for an application domain, and more generally beyond that, on a national and international level. The model is based on using domain independent data standards, on a model for aligning metadata models, and on sharing domain specific ontologies for populating the metadata models. The harmonized data is published for machines as a linked data service infrastructure, to be used by applications for human users. To illustrate and evaluate the model and need for shared Linked Data infrastructures, several applications on the Web, such as CultureSampo, BookSampo, WarSampo, and Norssi Alumni on the Semantic Web, are presented. Here a related linguistic service ARPA, ontology services ONKI/Finto and HIPLA for keyword thesauri and historical places and maps, and the Linked Data Finland platform for data services are used.


Eero Hyvönen is director of Helsinki Centre for Digital Humanities (HELDIG) at the University of Helsinki and professor of semantic media technology at Aalto University, Department of Computer Science where he directs the Semantic Computing Research Group (SeCo) specializing on Semantic Web technologies and applications. A major recent theme (2001-) in his research has been development of the national level semantic web infrastructure and its applications in different areas. Eero Hyvönen has published nearly 400 research articles and books and has got several international and national awards. He acts in the editorial boards of Semantic Web – Interoperability, Usability, Applicability, Semantic Computing, International Journal of Metadata, Semantics, and Ontologies, and International Journal on Semantic Web and Information Systems, and has co-chaired and acted in the programme committees of tens of major conferences.