FSD Bulletin

Issue 19 (1/2006)

ISSN 1795-5262

Front page
Previous issues
Editorial staff

» latest issue

FSD Bulletin is the electronic newsletter of the Finnish Social Science Data Archive. The Bulletin provides information and news related to the data archive and social science research.


Finnish Social Science Data Archive
E-mail: fsd@tuni.fi

Privacy Policy

ELSST - Meeting the Challenge of a Multilingual Thesaurus

Taina Jääskeläinen

Multilingual thesaurus ELSST (European Language Social Science Thesaurus) was translated into Finnish as part of the EU-funded Madiera project. FSD is responsible for the Finnish version of the thesaurus. What kind of challenges does thesaurus translation offer? How do multilingual thesauri function in information retrieval?

ELSST contains over 3.000 terms and has been translated wholly or partly to eight languages. The English thesaurus functions as the source thesaurus. European data archives use ELSST to index their data. In Madiera, the joint portal of European data archives, the thesaurus can be used as a multilingual search tool.


Mapping exercises

Thesauri are generally used for indexing, that is, for describing the content of a, e.g., a book or an article, or for information retrieval on a given subject. Therefore, multilingual thesauri fulfil their purpose only if equivalent terms in different languages refer to one and the same concept. Another important requirement is that selected terms are commonly used in the target language, or are generally accepted and used by experts. Otherwise the terms would be useless for information retrieval, and the thesaurus would not function satisfactorily as a search tool.

In thesaurus translation, it is a question of mapping concepts. Conceptual equivalence is independent of the words used to operationalize it, and therefore direct translations are not always useful. For example, the direct translation of the Finnish term ylioppilastutkinto is matriculation examination, which would not say a lot to a non-Finn. If we mapped the concept and looked for an equivalent term instead, we would use upper secondary school certificate.

For each thesaurus term, the translator must first clarify to what particular concept or phenomenon the source term refers, what is the scope of the term. This is not as easy as one would expect since thesaurus terms appear alone without context. For example, we would expect the term bills to refer to, for example, a request for payment of money owed. However, it may just as well refer to proposals for legislation which, if adopted by Parliament, become new laws. Lack of context is the biggest difference between normal translation and thesaurus translation.

It is usually possible to reach exact or near-exact equivalence between the source and target language terms but not always. Most problems are caused by linguistic or cultural differences.

Linguistic intricacies

When translating a thesaurus, one cannot help noticing that languages tend to divide the world differently. For example, in ELSST there is an English term blood sports, referring to sports involving the killing or injuring of animals. However, there is no equivalent concept in Finnish. The translator has three alternatives: 1) a direct translation resulting in a "coined", artificial term, 2) mapping to a narrower concept in which case a translation scope note is needed for both languages to explain the difference between the terms, 3) a combination of terms where the target language needs more than one concept to express the content of the source language term. The alternatives would look like this:

    UF metsästys (=hunting), härkätaistelut (=bullfighting), koiratappelut (=dogfighting), kukkotappelut (=cockfighting)
    Translation Scope Note (in Finnish): Englantilainen termi kattaa myös härkätaistelut ja muut eläinten tappamista tai vahingoittamista sisältävät lajit.
    Translation Scope Note: The Finnish term refers to hunting only.

The third alternative is too long and therefore not feasible. The first alternative produces an artificial preferred term (VERIURHEILU) and is feasible only if all the narrower terms normally used by the Finns on the issue are included as non-preferred terms (UF). Non-preferred terms lead the searcher to the term actually used in the thesaurus, i.e. the preferred term. The best alternative seems to be the second where the solution was partial equivalence. The selected target language term is narrower than the source language term - but at least it is a commonly used one.

Homonyms are often problematic. Let's take one example: two straight-forward English terms sons and boys have the same translation in Finnish pojat. However, one target language term could not function as an equivalent for two source terms. Multilingual thesauri often solve this by using a qualifier: sons would be mapped as pojat (jälkikasvu = offspring), and boys as pojat.

One of the biggest challenges in ELSST was the term editors, which has six equivalent terms in Finnish päätoimittajat, toimituspäälliköt, erikoistoimittajat, kustannustoimittajat, leikkaajat, and uutispäälliköt. A handy term for the English but definitely less so for the Finns.

Unlike most other languages in ELSST, Finnish does not belong to the Indo-European language family but to the Uralic one. This makes finding Finnish equivalents harder. It is often manifest in ELSST term translations that other languages have the same origin - and that Finnish does not.

For example:
  • tolerance (English)
  • tolerans (Swedish)
  • tolerance (French)
  • Toleranz (German)
  • tolerancia (Spanish)
  • toleranse (Norwegian)
  • suvaitsevaisuus (Finnish)
  • reform (English)
  • reform (Swedish)
  • reforme (French)
  • Reform (German)
  • reforma (Spanish)
  • reform (Norwegian)
  • uudistukset (Finnish)

On the other hand, Finnish may be less prone to be led astray by near cognates, that is, words which look similar and have the same origin but may differ in scope or meaning, due to changes of semantics over time.

Getting confused by systems

Differences in education, pension, social security and other systems mean that it is often impossible to find equivalents for terms connected to these systems. For example, in Britain the NHS (National Health Service) is responsible for primary health care whereas in Finland it belongs to the domain of municipal health centres. This affects the terms used of primary health care.

In Britain, there is a distinction between Personal social services and Social services, but no similar distinction is in Finland. Personal social services refer to care of particular groups of needy persons. Legal terms are also complicated as legislation differs from country to country. The scope of a legal term in one country is seldom the same as in another country.

As a general rule, it is better to use general or standardised terms in a multilingual thesaurus, especially for terms covering systems. Finding equivalents is much easier if thesaurus terms are not culture-specific.

Much less difficult is the case where the target language has named the phenomenon even if the phenomenon in itself does not exist in that country. Thus, for example, finding a Finnish equivalent term (täytevaalit) for the English term by-elections did not cause any problems even though by-elections are not held in Finland.

Unexpected search results

Cultural differences have an impact not only on thesaurus translation but also on information retrieval. Let us assume that a non-Finnish scholar would like to find data on married women in paid employment from Finland and Britain. She would start her search using the English ELSST term married women workers and would then extent her search to the Finnish equivalent term työssäkäyvät aviovaimot. To her surprise, she would not find data on the issue from Finland.

Naturally there are Finnish datasets where respondents have been asked about their employment, job, and marital status. These datasets could be used for research on working married women. However, since most Finnish women are in paid employment, the existence of a term like married women workers would not even occur to a Finn doing the indexing. The data would probably be indexed with terms connected to occupational life.

A scholar in Britain would probably know to look for data on the Romany (the Roma, gypsies) with the ELSST term travelling people. However, in Finland the Romany generally have fixed abode and their children attend a local school. A Finnish scholar would start with terms like Romany or minority groups, and would probably not see a connection between the Romany and the term travelling people. The same goes for people doing the indexing. Multilingual thesauri are thus only a partial solution to cross-language information retrieval.

More information

»  MADIERA Project
»  HASSET thesaurus which forms the basis of ELSST

Taina Jääskeläinen works as a translator at the FSD. She is also a qualified information officer.