The road travelled for CESSDA Vocabulary Service

CESSDA member archives in over twenty European countries have long been using the international DDI metadata standard to describe their data holdings for enhancing the interoperability and discoverability of data. The DDI Alliance has created 26 vocabularies to be used in certain metadata elements in the standard, many focused on describing methods adopted to create the data. These controlled vocabularies form an inherent part of cross-national metadata harmonization. However, managing the multilingual vocabulary content requires appropriate software tools. This is a story of the development of CESSDA Vocabulary Service and my own role as a member in the project team.

Diagonally split screen. User interface above and traffic junction by night below.

The DDI Alliance produces its vocabularies in English but in the European context, local languages are important. The vocabulary concepts or terms and their definitions are needed in great many languages. These local language variants are provided by CESSDA archives or associated organisations for their own use - but how to manage all this multilingual content and its changes? The DDI Alliance did not have a vocabulary tool, resorting to laborious manual means to publish their vocabularies. So CESSDA put up a project in 2018 to either adopt or develop a vocabulary tool.

Working towards the beta: the project team

First, the project team researched already available open-source vocabulary tools of the time. The tools were found to be lacking, mainly in handling multilinguality or producing language-specific version history, or tool maintenance was not guaranteed. The decision was to develop a simple, bespoke vocabulary platform, with an Editor and a browsing and download interface.

Lo and behold, towards the end of 2019 we had produced a beta version of CESSDA Vocabulary Service which was taken into use immediately. It allows vocabulary creation, maintenance, translation, browsing and downloading. We had the usual problems with scant developer resources as well as the expected and unexpected issues with the requirements and coding.

As expected, producing software took more time than anticipated. Coding the actual entering and amending of content in different languages seemed quite straightforward. However, it turned out not to be that simple under the hood when talking about complicated the workflow, access control, and versioning. Specified user groups had different access rights for different agencies in different languages. The Editor needed to have the appropriate action buttons - and no wrong buttons - for appropriate users at the right stages of the workflow.

In addition to access, managing the workflow itself needed attention. Translations cannot be done to a moving target, so they cannot be started till the source vocabulary was ready and entered into the tool. If the source vocabulary content is later amended, a new cycle starts, with the tool automatically copying the previous content as a draft for each language to start working on.

Working towards the production version: the service team

By the beginning of 2021, CESSDA had decided to streamline its services and tools management, assigning each service and tool an owner, first called the Content Contact and later the Service Owner. This was a good move for a federated consortium, giving each service a dedicated person to coordinate things, discuss user requirements with developers and platform maintenance team, and act as a contact point, with a regular contract.

CESSDA Vocabulary Service logo
CESSDA Vocabulary Service logo

I ended up wearing the Service Owner hat for both the CESSDA Vocabulary Service and the CESSDA Data Catalogue for the 2021-2022 period.

The vocabulary tool was up and running but still a beta, and not stable enough. The German CESSDA member organisation (GESIS) who had originally done the coding reported no applicants for developer job announcements. The whole Europe seemed to suffer from lack of developers. Eventually, CESSDA was able to contract a developer team from the Institute of Informatics, Slovak Academy of Sciences (IISAS). The development was on again.

So, there we were, fully occupied with online sprint meetings at regular intervals, with a lot of coding and feverish testing and issue writing in between. The international team consisted of the development team in Slovakia, me in Finland and the CESSDA platform team in Norway and the UK. The meetings were recorded so developers could always go back and check what was demonstrated or discussed.

Regular sprint meetings take time but are worth it. The exchange of information between the developers and the content side is continuous which prevents one of the pitfalls of software development, namely extensive coding to a misunderstood goal. The experienced and good-humoured developers did not hesitate to suggest better ways to solve some issues than I had been able to think of. I improved in my ability to write clear step-to-step descriptions of new functionalities or bugs, with screenshots where needed. The platform team was able to translate IT speak into user speak and vice versa where needed.

Issues were resolved, the tool made stable and new languages added, Japanese being one exciting addition. That did require consultation with the Japanese organisation's IT people on the best language analyser to use under the hood. Their content people needed to review whether the language came out correctly. Good progress was made although all of us had many other tasks, not related to the vocabulary tool at all.

Handling new new requirements

Next step was to figure out how to match the user requirements with linked open data requirements which had evolved in the past few years. The versioning strategy needed to be changed. Concepts should never be deleted but deprecated instead, with information of any replacing concepts in place. Alongside these changes, the system still needed to meet a key user requirement of having language-specific version history. This is vital for organisations using vocabulary content in their own language in their metadata and systems. Both may need to be updated when there are changes in the content in that language. Organisations need to know if there have been any changes and what those changes are.

Taina Jääskeläinen sits behind of the computer screen. A graphical user interface is projected onto the background.
As a Service Owner, Senior Specialist Taina Jääskeläinen spent hours and hours in online sprint meetings with the developer team.

Open-source vocabulary tools had developed a lot since 2018, particularly VocBench which CESSDA uses for its ELSST thesaurus. Still, the language-specific version history remained an issue. VocBench's new general search and browsing user interface ShowVoc was a beta and not yet very user-friendly, so the decision was to keep on developing the CESSDA Vocabulary Service further.

Changing the versioning and adopting deprecation may sound like small changes but in fact required extensive amendments in the workflow and role-specific functionalities. My workdays began with wading through notices from the issue tracker and testing changes. Sprint meeting schedule was speeded up to every two weeks. While for the Editor users, many things remained the same, there were a great number of amendments under the hood. I was also amazed to find out how many changes the 30+ page long Editor User Guide needed. When they had time, developers also tackled other outstanding issues.

I was wearing two Service Owner hats and there were releases prepared also for CESSDA Data Catalogue. My colleagues became used to seeing my office door closed with a note 'Webinar in progress, do not disturb' outside or me wandering around the corridors glassy-eyed and my mind elsewhere for a bit of a breather.

Moving on

By now the end of 2022 was looming and I felt it was time for someone else to take charge. Preparations for a handover to a new Service Owner from Slovenia began. After all the hard work, the new version of the vocabulary tool was successfully released in January 2023. I said regretful goodbyes to my all-time-favourite developer and platform teams and went happily gallivanting abroad on a holiday, leaving my work laptop behind.

Text: Taina Jääskeläinen. Photo: Tuomas J. Alaterä, intersection image: Rostislav Kralik CC0