Abstract
Data Science is one of today’s most rapidly growing academic fields and has significant implications for all conventional scientific studies. However, most of the relevant studies so far have been limited to one or several facets of Data Science from a specific application domain perspective and fail to discuss its theoretical framework. Data Science is a novel science in that its research goals, perspectives, and body of knowledge is distinct from other sciences. The core theories of Data Science are the DIKW pyramid, data-intensive scientific discovery, data science lifecycle, data wrangling or munging, big data analytics, data management and governance, data products development, and big data visualization. Six main trends characterize the recent theoretical studies on Data Science: growing significance of DataOps, the rise of citizen data scientists, enabling augmented data science, diversity of domain-specific data science, and implementing data stories as data products. The further development of Data Science should prioritize four ways to turning challenges into opportunities: accelerating theoretical studies of data science, the trade-off between explainability and performance, achieving data ethics, privacy and trust, and aligning academic curricula to industrial needs.