Europeana semantic enrichment
Automatic semantic enrichment at Europeana
Europeana enriches its data providers’ metadata by automatically linking text strings found in the metadata to controlled terms from Linked Open dataset or vocabularies. This process of “augmenting” the source metadata with additional terms is called semantic enrichment.
The enrichment process can be summarised to two main steps:
- Matching the metadata of Europeana CH objects to external semantic data, results in links between these objects and resources from external datasets. The example below shows that the object was automatically enriched with the concept of “Costume” from the DBpedia dataset.
- The created links point to additional data such as translated labels or broader labels. In the example given above, this means that the record is supplemented with all the translated labels of the DBpedia concept, as well as, with a link to the broader concept in DBpedia “Fashion” and all its translated labels.
For more details refer to the Europeana Semantic Enrichment Framework
Example of a Europeana record semantically enriched (or contextualisation) with concepts terms from DBpedia. A man building a wig on to the head of a woman on a kind of scaffolding; another woman wearing a tall wig looks on, Wellcome Trust: http://www.europeana.eu/portal/record/9200105/BibliographicResource_3000006114081.html
Help Europeana semantic enrichment by enriching your own metadata
The Europeana Data Model (EDM) gives support for contextual resources — the so-called ‘semantic layer', including concepts from ‘value vocabularies' like thesauri, authority lists, classifications, either coming from the network of Europeana's providers or from third-party data sources. This means that data providers are strongly encouraged to include links from open and multilingual vocabularies in the metadata you send to Europeana following the EDM recommendations for metadata on contextual resources.
Europeana developed internally a small enrichment tool in order to ‘dereference' the URIs, i.e., fetch all the multilingual and semantic data attached to vocabulary concepts. Europeana currently dereferences several vocabularies we encourage you to use as well.
Selecting target datasets for semantic enrichment
The selection of of the datasets you will perform enrichment with is a crucial step to improve the quality of the enrichment or the overall metadata. We recommend to follow the following steps during your selection:
- Analyse the source data: a good knowledge of the source data in terms of topic coverage, gaps , quality issues is necessary before selecting an enrichment target.
- Identify the enrichment requirements: before performing an enrichment, the enricher should have already define the expected results. For instance an enrichment could be performed to improve the overall quality of a dataset. In this case the quality issues to be fixed should be identified before performing the enrichment.
- Find datasets available on the Web. We recommend to select datasets available on the Web. Several inventories are available to help enrichers to source enrichment targets.
Select the enrichment targets. Before selecting a target, the enricher will have to evaluate potential targets. We have identified a series of criteria that can be used by an enricher to evaluate targets against his source data.
- Availability and Access: We recommend to select targets available on the Web and compliant with the Linked Data recipies. These targets should re-usable under an open licence.
- Granularity and Coverage. The enricher should select targets that have the same coverage than the source data or that can complement the source data.
- Quality. The enricher should pay attention to the quality of the target in terms of semantic and data modelling.
- Connectivity. We recommend to select target with incoming and outgoing links to other targets.
- Test the selected target on a sample of source data. One the target is selected, it should be tested on a sample of data before being applied to the whole dataset. A test will allow to verify whether the target really covers the source data or whether it doesn’t introduce semantic ambiguities.
More details and examples of targets datasets and vocabularies can be found in this document.