Big data refers to large, complex, potentially linkable data from diverse sources, ranging from the genome and social media, to individual health information and the contributions of citizen science monitoring, to large-scale long-term oceanographic and climate modeling and its processing in innovative and integrated “data mashups.” Over the past few decades, thanks to the rapid expansion of computer technology, there has been a growing appreciation for the potential of big data in environment and human health research.
The promise of big data mashups in environment and human health includes the ability to truly explore and understand the “wicked environment and health problems” of the 21st century, from tracking the global spread of the Zika and Ebola virus epidemics to modeling future climate change impacts and adaptation at the city or national level. Other opportunities include the possibility of identifying environment and health hot spots (i.e., locations where people and/or places are at particular risk), where innovative interventions can be designed and evaluated to prevent or adapt to climate and other environmental change over the long term with potential (co-) benefits for health; and of locating and filling gaps in existing knowledge of relevant linkages between environmental change and human health. There is the potential for the increasing control of personal data (both access to and generation of these data), benefits to health and the environment (e.g., from smart homes and cities), and opportunities to contribute via citizen science research and share information locally and globally.
At the same time, there are challenges inherent with big data and data mashups, particularly in the environment and human health arena. Environment and health represent very diverse scientific areas with different research cultures, ethos, languages, and expertise. Equally diverse are the types of data involved (including time and spatial scales, and different types of modeled data), often with no standardization of the data to allow easy linkage beyond time and space variables, as data types are mostly shaped by the needs of the communities where they originated and have been used. Furthermore, these “secondary data” (i.e., data re-used in research) are often not even originated for this purpose, a particularly relevant distinction in the context of routine health data re-use. And the ways in which the research communities in health and environmental sciences approach data analysis and synthesis, as well as statistical and mathematical modeling, are widely different.
There is a lack of trained personnel who can span these interdisciplinary divides or who have the necessary expertise in the techniques that make adequate bridging possible, such as software development, big data management and storage, and data analyses. Moreover, health data have unique challenges due to the need to maintain confidentiality and data privacy for the individuals or groups being studied, to evaluate the implications of shared information for the communities affected by research and big data, and to resolve the long-standing issues of intellectual property and data ownership occurring throughout the environment and health fields. As with other areas of big data, the new “digital data divide” is growing, where some researchers and research groups, or corporations and governments, have the access to data and computing resources while others do not, even as citizen participation in research initiatives is increasing. Finally with the exception of some business-related activities, funding, especially with the aim of encouraging the sustainability and accessibility of big data resources (from personnel to hardware), is currently inadequate; there is widespread disagreement over what business models can support long-term maintenance of data infrastructures, and those that exist now are often unable to deal with the complexity and resource-intensive nature of maintaining and updating these tools.
Nevertheless, researchers, policy makers, funders, governments, the media, and members of the general public are increasingly recognizing the innovation and creativity potential of big data in environment and health and many other areas. This can be seen in how the relatively new and powerful movement of Open Data is being crystalized into science policy and funding guidelines. Some of the challenges and opportunities, as well as some salient examples, of the potential of big data and big data mashup applications to environment and human health research are discussed.