AI for Verifying Text File Anonymisation: A Powerful Assistant Under Expert Guidance

Artificial intelligence can, at its best, detect identifiers more accurately than a human. Properly guided AI can speed up and improve the verification of anonymisation in research data, provided it is implemented thoughtfully and with data protection practices in mind. However, reliability varies.

People at a bus stop on a bright spring day. The figures are only white silhouettes.

This article is based on FSD's test report and was produced with assistance from the Copilot 365 Researcher agent.

Even though AI's ability to find identifiers can be very good, it is still not a completely reliable assistant. This was confirmed in experiments conducted in the FSD AIMS 2030 project, where AI solutions were tested for checking anonymisation of qualitative textual data. The tests compared different Copilot tools (Chat, Analyst, Researcher) as well as a custom-built agent for ensuring anonymity. The results show that AI can speed up the verification of anonymisation, but expert human judgment and decisions are still needed.

Background of the Tests

The FSD ensured permission for using AI in anonymisation checks from the Tampere University's legal team. The data files to be checked were assessed to belong to a data class that allows the use of university-provided AI tools. The archived data files had already been anonymised by researchers; the FSD only needed to assess the sufficiency and consistency of the anonymisation. The original, non-anonymised data collected from and about people by researchers, however, likely always contains a significant amount of personal data and special categories of personal data. AI applications cannot currently be used for anonymising such data (as of autumn 2025).

In spring and autumn 2025, the FSD conducted a series of tests to see if AI could be used to check the anonymisation of qualitative interview transcripts. The experiments covered Copilot tools and eventually also a fully custom-built agent. Two anonymised interview transcripts were used as test material, with sizes of 29,368 and 25,134 characters (excluding spaces). For the tests, identifiers (e.g., personal names, place names, organisation names, job titles) were added to the files. Lists of the added identifiers were made for checking purposes, to evaluate how well each method could find them.

Tested tools

  • Copilot Chat – General-purpose AI chat.
  • Copilot Analyst – Copilot's built-in agent for more structured data (e.g., tables).
  • Copilot Researcher – Copilot's "deep analysis" agent, which can fetch additional information and refine its answers.
  • "Text Identifier Lister" – A custom-built agent, tailored specifically for anonymisation checking in list format.

The following sections review the operation, results, and observed strengths and limitations of each tool in anonymisation checking. A summary table compares the performance of the different tools.

Copilot Chat: A Fast Chat Assistant, but Limited in Identifying Entities

We first tested the Copilot Chat tool (the chat function in Teams and Office applications). We gave it a precise prompt, asking it to find all persons, places, organisations, and job titles in the text and mark them as identifiers in a specific way.

The results were incomplete: Chat found only some of the identifiers. It did not recognise all common first names (such as Hannu) at all, nor did it find all place names (Lappeenranta, Kitee, Rovaniemi). Various prompt refinements did not significantly improve the results.

Observations: Copilot Chat completely missed several identifiers that were added for the test. This means that if it were used for checking anonymisation, some personal data might remain unremoved.

Conclusion

Copilot Chat as such is not suitable for checking the anonymisation of transcripts, because its identification results are incomplete.

The strength of the chat tool lies more in quickly retrieving and combining individual pieces of information: for example, it can fetch background information about a specific project from multiple sources based on a single question. The chat tool can thus be used in planning anonymisation. For example, it can help determine whether an organisation mentioned in an interview is local or national. Even in such use, the user should verify the information provided by Chat from reliable sources, as there may be errors or hallucinations among the results.

Copilot Analyst: A Tool for Structured Data That Doesn't Adapt to Text

Next, we tested the Copilot Analyst agent, which is Microsoft 365 Copilot's built-in tool for tabular data. Analyst is designed, for example, for analysing Excel and CSV files and can also optimise its responses iteratively, although this makes it slower than the Copilot Chat.

Our expectations were low, since anonymisation checking focuses on unstructured text. Analyst was given a similar listing prompt as Chat – to list all personal names, places, organisations, and occupations in the order they appear in the text.

The results were again incomplete: the agent did produce separate lists for different categories (e.g., places separately, organisations separately), but each list was missing some items – in other words, it failed to identify some identifiers.

Processing took about 2–3 minutes per file, which is slightly longer than Chat, apparently because Analyst goes through the material more thoroughly.

Observations: In one test, Analyst did separate personal and place names into their own sections, but for example, one person's surname was missing from the list entirely. Similarly, some organisation names did not appear in the listing at all. These gaps mean that you cannot rely on Analyst to find all identifiers completely.

Conclusion

M365 Copilot Analyst is not suitable for checking the anonymisation of qualitative text data, as its lists were incomplete in the same way as Copilot Chat. This is understandable, since Analyst is primarily designed for handling structured, numerical data, not for reading long free-text documents.

Analyst will be tested later at the Data Archive to support anonymisation and other processing of quantitative datasets.

Copilot Researcher: Deep Analysis with Findings – Accurate but Slow

Copilot Researcher is a more advanced agent capable of deep analysis: it can, when needed, retrieve background information from the internet and iterate its responses to improve the final result.

We expected that Researcher would perform better at finding identifiers than Chat or Analyst – and this was indeed the case, though with certain reservations.

Marking Identifiers Directly in the Text

First, we tried an approach where Researcher was asked to mark identifiers directly in the text using specific symbols. Since the agent can supplement its knowledge by searching the web and also evaluates its own response before finishing, each run was quite slow. For example, when it was asked to mark all personal and place names in the text with a # symbol, it took well over an hour to get the final result.

However, the resulting annotated transcript was perfect: Researcher found all the requested names, including a Finnish place name in inflected form (Espoosta), which the testers themselves had missed in the prepared checklist.

The task was continued with several different types of identifiers. When asked to mark organisation names and occupations in the text, Researcher again found everything – although it interpreted some words as identifiers too readily. For example, words like "päivystäjä" (on-call worker), "esimies" (supervisor), and "päällikkö" (manager) were marked as identifiers, even though in the test data they did not refer to a specific person or organisation to be anonymised.

Thus, the AI followed the instructions literally, which resulted in detecting additional "identifiers" that a human reader would likely leave unanonymised.

In a third experiment, a very comprehensive prompt was given, asking to mark all possible persons, organisations, occupations, and places in the text with double square brackets [[...]], and to return the same text with the brackets. This run took about 1 hour and 15 minutes and again produced a comprehensive result.

At the end, Researcher also provided a summary of the identifiers it found and returned the entire transcript so that every occurrence of an identifier was in brackets. The result was very accurate – the AI even found two place names (Nuorgam and Kilpisjärvi) that had been added to the test data, but had not made it into the human-created checklist. Also, for example, a passage referring to listening to Elvis was marked as it should be, i.e., Elvis was recognised as a proper name.

The downside of this approach is that Researcher also marks as identifiers a significant amount of information that does not actually need to be anonymised. As noted, it marks, just in case, for example, common job titles and already anonymised location information (such as "neighboring city") as identifiers, even though these do not need to be removed in anonymisation.

This means that if the user receives the entire text fully annotated, they must go through it and remove unnecessary brackets from places that were not actually identifiers. This slows down the workflow. In addition, direct text editing with AI is cumbersome: waiting over an hour per file is practically too long.

Conclusion (Researcher, marking in text)

Marking identifiers directly in the text works perfectly, but produces a large number of extra annotations and is a slow method. When processing large texts, one must be prepared for waiting times of even several hours. Thus, the method is not particularly efficient.

Listing Identifiers in a Separate List

We concluded that producing lists of identifiers might be a better approach than marking them directly in the text. This way, slow text editing is avoided: you simply get a list of all detected identifier-type expressions, and a data specialist can check from the list which identifiers have already been anonymised (in [square brackets]) and which still require action.

The first listing experiment immediately produced promising results. The Researcher agent returned names, places, organisations, and occupations in the order they appeared in the text, all in a single list, exactly as requested. The result was practically almost perfect – "pääkokki" (head chef) and "turkulainen" (person from Turku) did not appear in the results on the first try. When the prompt was supplemented to specify that, for example, "oululainen" (person from Oulu) or any other expression referring to a place of residence should also be counted as location information, "turkulainen" was included in the list on the second run, but "pääkokki" (dead chief) was still missing.

The accuracy of the second run was therefore excellent, but it already required 35 minutes of processing – even the listing method can take time if the prompt is very broad. As a result, Researcher provided a list containing all expressions interpreted as identifiers (except "pääkokki"), including repetitions, in their original grammatical forms. From the list, it was easy to distinguish which items the researcher had already anonymised, as they appeared in square brackets, e.g., [Kansalaisjärjestö] (civil society organisation), and which were those that still needed to be anonymised.

By using such a list, the person reviewing the data can quickly pick out the spots where anonymisation may still be incomplete and take the necessary actions on the material.

Copilot Researcher works very well for checking the anonymisation of text files in list format. The AI found practically almost all identifiers, including some that the human eye had missed. The main drawback is the rather long waiting time per file (typically from tens of minutes up to an hour).

Test of Researcher-Guided Anonymisation

As one experiment, we also asked Copilot Researcher to anonymise the test text from start to finish based on the FSD's Data Management Guidelines for anonymisation. We provided the agent with a link to our public guidelines and instructed it to mark identifiers in square brackets as specified in the guidelines.

Researcher agent was able to ask clarifying questions (e.g., whether age can be retained, how to handle indirect identifiers) before starting, and by answering these, we got the process underway. The final result was ready in 45 minutes: the AI produced a table showing, with examples, how the original expressions were anonymised (for example, "Härmän Liikenne Oy:n" → "[company]n") and provided the anonymised text.

Researcher followed the instructions quite well – for example, it replaced all proper names with the format [[person's name]]. At the same time, however, it became clear that a human data curator would do some things differently: the AI left in the text a detailed description of a person's work history (several job titles in a row), because these were, individually, common professions. However, combined, they might reveal the person's identity. Most likely, a researcher would have generalised such information. It was also noted that the AI did not remove all place names that the original researcher or the FSD's data specialist would have removed – a few cities were still mentioned as such, because Copilot followed the guidelines allowing some locations if they were not identifying on their own.

Conclusion

Researcher is technically capable of anonymising the text according to the guidelines, but it does not replace a human: the final result still needs to be critically evaluated and it takes time.

Custom Agent "Text Identifier Lister": Tailored for Speed and Accuracy

As the tests progressed, we developed a solution that combines the accuracy of the Researcher agent and the listing method, but works faster: we created our own Copilot agent called the "Text Identifier Lister." This is a user-configurable agent to which precise instructions are given for the task at hand.

The task defined for the Text Identifier Lister was: "find and list all words, names, and numbers in text files that are considered identifiers." The instructions specified in detail all types of identifiers to be anonymised (personal names, occupations, degrees, organisation names, project names, place names, age-related mentions, rare diseases, etc.). It was also emphasised that information already in square brackets (already anonymised) should be included in the list with the brackets. The agent was given background information that, in principle, the researcher has already anonymised the material according to the FSD's guidelines.

Thus, the agent acts as a kind of automatic checklist creator, based on which the data specialist can perform the final review.

Testing with Sample Data

First, we tried the agent with the same test transcript that had been used in the Researcher agent experiments. The results were very promising: the list was generated in less than a minute, and it contained the same identifiers as the Copilot Researcher's listing method. In other words, the accuracy was almost perfect, although the occupation "pääkokki" (head chef) was still missing from the list.

The agent's listing also included many items that are not relevant for anonymisation from an archiving perspective, but which, according to the agent's instructions, were counted as identifiers. The list included "Kirjasto" (Library), "Tuntematon sotilas" (The Unknown Soldier), "Museo" (Museum), and other items mentioned in the document that are not personal data. This extra information is not a problem, as the data curator can simply ignore those entries that clearly do not relate to identifiability. For example, "Tuntematon sotilas" (The Unknown Soldier, a Finnish war novel by Väinö Linna) is a valid identifier according to the instructions, since the agent is supposed to find the names of theses and other works. It is even useful to see all unusual terms in the list, as they draw the data specialist's attention to all indirect identifiers.

Verification of a Real Data File

As the final and most important test, we applied our developed agent to a real research dataset. We selected an interview dataset submitted for archiving in 2025, which the researcher had stated was anonymised. Before running the agent, we conducted a preliminary check on two randomly selected files. Based on this check, the data appeared to be well anonymised (no obvious real names in the text, only some locations and names marked with square brackets). We then ran the Text Identifier Agent on one randomly selected interview file.

The agent produced an identification list in under a minute. The expressions on the list confirmed that the data had been successfully anonymised: at the top of the list were two real names (those of the anonymiser and interviewer found in the document metadata) and one organisation (Tutkimustie Oy, responsible for transcription), but all other personal names were in square brackets (e.g., [Johannes], [Maria]), indicating they were pseudonyms used by the researcher in place of real names.

Similarly, age and location information was marked in square brackets as categorised data (e.g., [60–69 years], [small municipality 1]), meaning these had also been successfully anonymised by the researcher. The agent also found many individual identifiers, such as general professions without brackets (e.g., family caregiver, supervisor). These extra identifiers do not hinder the verification process, as an experienced data specialist can quickly identify which items require closer inspection. Based on the test, no further anonymisation was needed for the processed data file.

Based on this test, our custom-built Identifier Listing Agent performed excellently on real-world data: it quickly confirmed the success of the anonymisation and highlighted all relevant items for review. The tailored Text Identifier Agent proved to be the best method for verifying anonymisation. It identified relevant entities just as comprehensively as Copilot Researcher.

Conclusion

Overall, the self-developed Identifier Listing Agent is a valuable tool: the list it produces serves as an excellent basis for checking anonymisation. Additionally, the agent's instructions can be customised for specific datasets—if certain types of identifiers are known not to occur or if a new category needs to be added, the instructions can be adjusted before running the agent.

Summary: Comparison of Agents in Identifying Identifiers

The table below summarises how different solutions performed in the task of identifying identifiers based on our tests:

Table 1: Comparison of performance in detecting identifiers.
Tool Identifier Detection Processing Time Notes
Copilot Chat Incomplete – found only some identifiers; many names and place names were missed. Seconds per file (fast response time) Not suitable for anonymisation checking. Better suited for quick retrieval of individual facts than for systematic identifier detection.
Copilot Analyst Incomplete – listed some identifiers, but information was missing from each category. 2–3 min per file Not suitable for checking text-based material. Designed for numerical data, so may work for anonymisation and processing of quantitative data, but not yet tested.
Copilot Researcher Very comprehensive – found almost all identifiers (even some missed by humans). Only rare cases (e.g., "head chef") were missed. 10–15+ min for listing (can exceed 30 min depending on prompt)
1 h+ if marking directly in text
Very accurate and thorough. Listing format recommended – easier to interpret and use. Slow: requires waiting, so work time must be organised accordingly. Marks many "unnecessary" identifiers if instructions are not limited (human must filter).
Custom Agent ("Text Identifier Lister") Very comprehensive – found the same identifiers as the Researcher listing. Less than 1 min per file (notably fast) Accurate and fast combined. Follows instructions literally: also produces non-anonymisable terms in the list, which the data curator can ignore.

The table clearly shows that Copilot Chat and Analyst are not able to provide comprehensive identifier detection – they missed many essential pieces of information. Copilot Researcher, on the other hand, found all identifiers very accurately, but its usability suffers from slowness and from producing a large amount of data that requires interpretation. Our own agent combined the best aspects: it achieved the accuracy level of Researcher with almost the speed of the Chat tool.

Conclusions and Words of Caution

Overall, the test results show that there are clear differences between the Copilot tools in checking anonymisation. Copilot Chat and Copilot Analyst were not able to identify nearly all identifiers, so they are not suitable as such for ensuring the anonymity of qualitative texts. Copilot Researcher, on the other hand, found practically almost all identifiers—even some that were missed by the human eye—but its use is very slow and produces a large amount of extra data to interpret. In contrast, our custom agent "Text Identifier Lister" combined the best aspects: it achieved almost the same level of accuracy as Researcher, but completed the task in less than a minute per file.

With the help of AI, the process of checking anonymisation can be significantly accelerated. Instead of the data curator reading through a 40-page interview transcript, they can assess anonymisation using the identifier checklist. However, fully automated checking is still a long way off. In any case, a human must evaluate which identifiers need to be removed or modified and which do not. This is best accomplished by searching for the context of the identifier in the transcript.

Although AI can be of great help, as a language model it also has its limitations. Whereas programs based on algorithms tend to work logically and reliably, language model-based AI does not always function as expected. At times, the agent gets stuck when generating a checklist for a file: initially, the list is produced normally, but then it starts repeating a single word endlessly. The next day, the agent may complete the same task with the same file quickly and without any issues. Sometimes, certain identifiers appear only once or a few times in the list, and not every time in the order of appearance as instructed. Thus, the content of the identifier lists varies between different runs—sometimes the list is shorter, sometimes longer—even if the agent does not get stuck.

Instead of reliability, the AI user must accept a certain degree of random uncertainty. The IT helpdesk cannot assist with this. One unresolved mystery in the tests was why the AI did not recognise ‘pääkokki' (head chef) as an occupation in the transcript. It did recognise "pääsuunnittelija" (chief designer, 'head designer' in Finnish) and "käsikirurgi" (hand surgeon), so the issue is not that an occupation starting with a body part causes it analytical problems.

The results confirm that carefully guided AI can significantly speed up the process of checking the sufficiency of anonymisation—as long as it is used thoughtfully and data protection practices are followed. Applying Tampere University data classification (Opens in a new tab) is the first step when considering AI-assisted processing and checking of data files.

Kuva: pxhere.com