Data collection is the process of gathering and measuring data from various relevant sources. Guidelines and established methodologies vary across disciplines; so some researchers may use codebooks or protocols, whereas others primarily use traditional methodological steps.
Some tools that can help researchers in collecting data include:
LimeSurvey:LimeSurvey is a free and open source online statistical survey web app distributed under the GNU General Public License. As a web server-based software, it enables researchers using a web interface to develop and publish online surveys, collect responses, create statistics, and export the resulting data to other applications.
SurveyMonkey: SurveyMonkey is a private company based in the USA offering a free version of the statistical survey online tool.
Google Forms: A private company based in the USA offering a free easy-to-use online survey tool. It is advised to avoid using Google forms for personal or sensitive data.
Before collecting data, getting approval to perform research with human (or animal) participants at your institution and abide by the other national or discipline regulations is obligatory.
As regulated by GDPR, personal and sensitive data can be collected and processed only if there is a signed consent of the person whose data will be collected and eventually published. This is referred to as informed consent.
The purpose ofinformed consentis to get permission or approval to participate in the research, and it is valid only if it is freely given, specific, informed and unambiguous. This means that the:
consent of participation must be given voluntarily and without any coercion or pressure
participants must be informed about the objectives of your research and how their personal data will be processed via an understandable and easily accessible form and in clear and simple language
consent must be given for each individual processing activity - if you have more processing activities, participants must be free to choose which purpose they accept, rather than having one generic consent
statement or a clear affirmative act, preferably in writing, is required. (Participants need to opt in.)
Also, if the research involves processing personal data, the informed consent must be accompanied with a GDPR statement.
More information on informed consent and templates:
All personal and sensitive data must undergo an anonymisation and/or pseudonymisation procedure to protect the respondent's or participant's identity. Examples of data anonymisation tools include:
Citizen science projects are extremely diverse, but they all share a common goal: to produce scientific knowledge by bringing together research professionals and citizens with the support of partners (most often associations) who help implement the projects. Defining these types of projects is difficult, and it should be noted that each participatory science approach is unique.
The CitieS-Health Toolkit provides a customised and interactive collection of adaptable instruments.
An important aspect of managing data is protecting the privacy of individuals who participate in research projects. The European General Data Protection Regulation (GDPR) regulates personal data processing. Important definitions include:
Personal data- refers to any information relating to an identified or identifiable natural person (‘data subject’).
Data processing- refers to any action performed on data, such as collecting, storing, modifying, distributing and deleting data.
Direct and indirect identification: Some identifiers enable you to single out an individual directly, such as name, address, IP-address, etc. Individuals can also be identified indirectly through:
a combination of informationthat may uniquely single out an individual (e.g. a male with breast cancer combined with the town of residency in a breast cancer registry, a pregnant woman over 50, etc.). This includes information in one record and information across different data files or datasets.
unique informationor patterns that are specific to an individual (e.g. genomic data, a very specific occupation, such as the president of a large company, repeated physical measurements or movement patterns that create a unique profile of an individual or measurements that are extreme and could be linked to subjects such as high-level athletes).
data that is linked todirectly identifying informationthrough a random identification code or number such as a tax or health care identification.
Pseudonymous data- refers to any information that is generally indirectly identifiable is considered to be pseudonymous. This means that it is not anonymous and still qualifies as personal data. Therefore privacy laws, such as the GDPR, apply. This is for example the case when direct identifiers are removed from the research data and put into a key file (or what is usually called a subject identification log). Direct identifiers can then be mapped to the research data through unique codes, so that reidentification is possible.
Ethics and scientific integrity are two very different notions. Ethics is when a study reflects the moral values set out by a community and scientific integrity is good science. Subsequently, a study can be a very good science but with bad ethics or the vice-versa. Based on the fact that ethics is linked to the moral values of a community, it changes depending on a country’s culture. Therefore care must be taken when carrying out multicultural research.