Guides, Examples and Vocabularies

Use and archiving of social media data

Social media provides a rich source of data for scientific research. The diversity of social media content offers researchers the opportunity to use various research methods and examine phenomena from many different perspectives. Although some distinctive features can be identified when considering the nature of social media research, the use of social media data still requires careful consideration of the same fundamental ethical and legal questions that concern all scientific research. Answering these questions in the context of social media is not always easy. Social media and other user-generated online content cannot be examined as a cohesive entity. Rather, data collected from social media are uniquely characterised by each platform’s nature, purpose of use, technical implementation, and user culture.

In some cases, social media research may closely resemble the collection and processing of big data. When this is the case, the people behind the data may feel more distant. On the other hand, if research is conducted on a specific blog and an integrated discussion board, the research often becomes closely associated with the individual behind the blog as well as the technology-mediated community of the discussion board. When considering issues such as the privacy and anonymity of the research subjects, these types of differences must be taken into account. If an online platform is designed for people to connect with others and to create networks, the users often publicly display their real names and photos of themselves. However, some services, such as certain imageboards, may have established the anonymity of their users as part of their basic ideology. On these platforms, the users generally identify themselves with a pseudonym.

Diversity in social media research also stems from the use of different tools and methodology. Researchers may, for instance, join an online service as a user to observe discussions. Alternatively, they might collect research data by directly copying user-generated content, either manually or by utilising various automated tools. Data may also be collected with the help of different APIs (Application Programming Interface). In addition, different contractual arrangements can be made between researchers and social media platforms. In social media research, the research topic, platform, and data collection tools are rarely identical to previously conducted research. As such, it is difficult to specify clear codes of practice or recommendations that would be generally applicable in all situations, and any problems need to be addressed on a case-by-case basis. Additionally, unpredictable changes may occur in, for instance, the nature of the platform, the terms of service, or the tools used for data collection. To be able to cope with such changes, the researcher should be sufficiently familiar with all the details that may affect the research.

What to take into account when conducting social media research? Anchor link icon

At very least evaluate following issues.

  1. Rights of the research participants in general. Voluntary participation and consent are fundamental principles that protect the rights of the research participants. How are these fulfilled in the context of the present research? Has the need for an ethical assessment of the research been considered?
  2. Terms and conditions of the online platform and data collection tools. What conditions have been set by the platform that may affect the research? And what guarantees are given in the terms of service to the research subjects as users of the platform? When studying the terms and conditions, it should be noted that not all relevant information is necessarily presented in one document. Important information may also be included in, for instance, the platform’s community guidelines or privacy documents. Additionally, if data collection is conducted with the help of tools, such as APIs, the collector may be bound by separate terms and conditions concerning the use of the API.
  3. Nature of the platform and the research topic. Thinking about the nature of the social media platform, would the users/research subjects expect their discussions to be private and accessible only to a limited number of users, or would they consider their discussion to be public and open? In some cases, the sensitive nature of the research topic also needs to be considered. Some people might feel more comfortable when discussing a sensitive topic privately, whereas others would prefer open and public discussion about sensitive topics. It is also good to remember that not all users share similar views on whether their discussions are public or private.
  4. Protecting personal data. Personal data are often processed in social media research, which means that all rules and regulations regarding personal data processing must be followed. If, for instance, the use of an API is regulated by a separate agreement, the agreement may also contain regulations regarding the processing of personal data. If the rules and regulations are unclear, it is advisable to contact the data protection officer of your organisation as early as possible.

Archiving social media data Anchor link icon

Is it possible to archive social media data? Collecting data from social media does not in itself prevent the archiving and reuse of the data. However, archiving may be prevented by a data collection process that is ethically or legally inadequate. In some cases, social media data also contain photos or audio-visual material, which presents additional challenges for archiving.

Problems also arise from the difficulty of anonymising social media data. If the data were copied directly from an online platform, a substantial amount of additional information about the research subjects may be discovered by connecting the research data to other content available online. Even if the original intention was for the research subjects to remain anonymous, their real identities or other background information can often be revealed in this manner. Most often the archiving of social media data is prevented because the data cannot be irreversibly anonymised.