The advances in online data mining and the rising popularity of online social networking data is posing challenging questions in regards to ethics and privacy. How can academic research provide a comprehensive framework to secure data management and guarantee appropriate handling?
Given the current popularity of data crunching, big data and visualisation of massive datasets the question of data management under ethical guidelines in a lot of cases are pressing. Current institutional protocols do not cover these new aspects that arise from the accessibility of large datasets of online data.
Social science so far still builds on the basics of informed consent with all involved participants. These protocols were implemented in the late seventies, long before the internet. Most of the protocols have been updated around the year 2000 in regards to online research involving online questionnaires and sometimes research with chat rooms.
The dramatic changes online social networking data brought along with API's allowing the construction of large scale datasets connecting to Facebook, Twitter, Foursquare and the like are based on the multiplication of dimensions. Researchers are no longer working with 10, 100 or 1000 participants, but potentially with data relating to millions of individual users. Still the data in as detailed as a qualitative dataset with 100 participants might be, potentially in specific cases even more detailed. This is especially the case in regards to time and location.
Currently the discussion mainly circles around the question whether the data is free and publicly available implying that if it is to be considered so no additional measures would be necessary. The argument in this case would be that the individual users are voluntarily sharing the data publicly for free. This is however a very naive and short sighted argument. There are of course a number of complicating issues to be considered. There are three main elements to this.
Image by urbanTick for NCL / A screenshot of a Twitter data table with the different columns containing metadata. Each row represents one tweet.
The first aspect is the dynamic nature of the data. Since the data is time based and it is being produced at such a vast quantity content very quickly is superseded and disappears in the platform's thumbs in many cases unretrievable for the individual user. In practice this can result in the fact that sets of mined data are becoming unique. In this case the acquiring of such a dataset is an act of making for which the research would have to take responsibility.
The second aspect is that the service operational aspects. It requires the user to share the information as otherwise the usage of the service in most cases would simply be impossible. If the user would not be willing to share the information this would in most cases result in the exclusion of the user or at least mean a dramatic reduction of the capacity of the service. Another aspect of the usability is that the way the user interacts with the platform easily can lead the user to believe to be acting in a private environment. In the individual setting the service only provides information of a closed circle of connections to other users. This means that the users might be tempted to share private information easily not being aware that on a larger scale all activities are public. Furthermore, it is unclear if the user has, by agreeing to use the service also agreed for all his information to be mined and researched towards specific conditions in relation to a vast number of other users.
The third aspect is the fact that no the individual datapoint, message or information is causing concern for privacy, but the series of datapoints. These newly available datasources contain a lot of metadata and continuous data which has the potential to be analysed towards patterns. In other words it is not about one or two places the individual has been to, but about the possibility to infer a very personal pattern from the information distinctively describing the personal habits in both time and space.
From these considerations and points of discussion the now published paper Agile Ethics for Massified Research and Visualization as part of the special edition of Information, Communication and Society, edited by A. Carusi is available online from Taylor & Francis.
The paper is written together with Dr. Tim Webmoore at Stanford and beside the discussion of implications as well as aspects of the development of a framework the Twitter work serves as a practical example.
The topic has already been discussed in an earlier blog post Privacy - Aspects of an Ecology of Ownership that lead at a later stage to the paper. Also a version of the paper has been presented at the Visualisation in the Age of Computerisation conference in Oxford in early 2011.
Neuhaus, F. & Webmoor, T., 2011. Agile Ethics for Massified Research and Visualization. Information, Communication & Society, pp.1-23.