Data and Multilinguality

xml:lang tags

Throughout the Europeana Semantic Elements (ESE) and Europeana Data Model (EDM) documentation, providers are urged to use 'xml:lang' tags in all appropriate metadata elements i.e. all those elements that have a text string as a value.

<dc:description xml:lang="fr">végétation des montagnes de France</dc:description>

This tag indicates the language of that value and would allow it to be shown in the display or used in multilingual functionality. At the moment, the Europeana system cannot use these tags, but it is expected that functionality will be developed to use them (see next section). 

Multiple records in different languages for the same object

If metadata records for the same object exist in several languages, they should not be submitted as separate records. This would result in redundant objects in the portal with no way to link them. The metadata should be submitted as one record and each element that has multiple language versions should be repeated (ideally also with an 'xml:lang' tag). This way, all the values will be displayed in the portal.

For example, in this record from Bibliothèque nationale de France [1] there are four separate dc:type elements in two different languages. At the moment, they are displayed in one ‘Type’ field without any indication of the different languages:

Type: texte | text | publication en série imprimée | printed serial |

If 'xml:lang' tags are used then future functionality could display the values in a more useful fashion by labelling the different languages.

For example:

Subject: [es] baño termal, cura; recuperación, viaje; [en] cure, recovery, travel, [fr] cure, rétablissement, station thermale

If already-submitted records have since been enriched with values in multiple languages then the data sets can be resubmitted to replace the earlier monolingual versions.

The 'dc:title' element

This is a special case because at the moment only one instance of 'dc:title' can be displayed in the portal. So although we would still suggest providing translations in repeated elements using the 'xml:lang' tag, it will not be used as described above. To ensure translated titles are displayed, please put the alternative versions in the 'dcterms:alternative' element as well.

For EDM data, it is possible to submit several 'dc:title' properties with 'xml:lang' tags which would indicate that an object has a title in several languages . However, it would not be able to say which title is the original and which is a translation.

Multilingual enrichment using vocabularies

Once the functionality is in place, providers will be able to submit multilingual versions of their vocabularies. These will be ingested together with their data to support multilingual functionality and data enrichment. Examples of such vocabularies are those created by Musical Instrument Museums Online (MIMO)  in their EDM case study. The instrument keywords are listed in several languages together with broader and related terms.

At present,  MIMO repeats the 'dc:type' element; once for the Uniform Resource Identifier (URI) of the vocabulary term and once for the term itself (in the pivotal language). So their vocabulary is entering Europeana one value at a time and a human readable term is also presented directly to the user.

dc:type =
dc:type = Lute