Quantitative Science Studies
Latest Publications


TOTAL DOCUMENTS

168
(FIVE YEARS 168)

H-INDEX

7
(FIVE YEARS 7)

Published By MIT Press - Journals

2641-3337

2022 ◽  
pp. 1-47
Author(s):  
Philip J. Purnell

Abstract Research managers benchmarking universities against international peers face the problem of affiliation disambiguation. Different databases have taken separate approaches to this problem and discrepancies exist between them. Bibliometric data sources typically conduct a disambiguation process that unifies variant institutional names and those of its sub-units so that researchers can then search all records from that institution using a single unified name. This study examined affiliation discrepancies between Scopus, Web of Science, Dimensions, and Microsoft Academic for 18 Arab universities over a five-year period. We confirmed that digital object identifiers (DOIs) are suitable for extracting comparable scholarly material across databases and quantified the affiliation discrepancies between them. A substantial share of records assigned to the selected universities in any one database were not assigned to the same university in another. The share of discrepancy was higher in the larger databases, Dimensions and Microsoft Academic. The smaller, more selective databases, Scopus and especially Web of Science tended to agree to a greater degree with affiliations in the other databases. Manual examination of affiliation discrepancies showed they were caused by a mixture of missing affiliations, unification differences, and assignation of records to the wrong institution. Peer Review https://publons.com/publon/10.1162/qss_a_00175


2022 ◽  
pp. 1-17
Author(s):  
Mike Thelwall ◽  
Pardeep Sud

Abstract Scientometric research often relies on large-scale bibliometric databases of academic journal articles. Long term and longitudinal research can be affected if the composition of a database varies over time, and text processing research can be affected if the percentage of articles with abstracts changes. This article therefore assesses changes in the magnitude of the coverage of a major citation index, Scopus, over 121 years from 1900. The results show sustained exponential growth from 1900, except for dips during both world wars, and with increased growth after 2004. Over the same period, the percentage of articles with 500+ character abstracts increased from 1% to 95%. The number of different journals in Scopus also increased exponentially, but slowing down from 2010, with the number of articles per journal being approximately constant until 1980, then tripling due to megajournals and online-only publishing. The breadth of Scopus, in terms of the number of narrow fields with substantial numbers of articles, simultaneously increased from one field having 1000 articles in 1945 to 308 in 2020. Scopus’s international character also radically changed from 68% of first authors from Germany and the USA in 1900 to just 17% in 2020, with China dominating (25%). Peer Review https://publons.com/publon/10.1162/qss_a_00177


2022 ◽  
pp. 1-26
Author(s):  
Keisuke Okamura

Abstract Scholarly communications have been rapidly integrated into digitised and networked open ecosystems, where preprint servers have played a pivotal role in accelerating the knowledge transfer processes. However, quantitative evidence is scarce regarding how this paradigm shift beyond the traditional journal publication system has affected the dynamics of collective attention on science. To address this issue, we investigate the citation data of more than 1.5 million eprints on arXiv (https://arxiv.org) and analyse the long-term citation trend for each discipline involved. We find that the typical growth and obsolescence patterns vary across disciplines, reflecting different publication and communication practices. The results provide unique evidence on the attention dynamics shaped by the research community today, including the dramatic growth and fast obsolescence of Computer Science eprints, which has not been captured in previous studies relying on the citation data of journal papers. Subsequently, we develop a quantitatively-and-temporally normalised citation index with an approximately normal distribution, which is useful for comparing citational attention across disciplines and time periods. Further, we derive a stochastic model consistent with the observed quantitative and temporal characteristics of citation growth and obsolescence. The findings and the developed framework open a new avenue for understanding the nature of citation dynamics. Peer Review https://publons.com/publon/10.1162/qss_a_00174


2022 ◽  
pp. 1-18
Author(s):  
Kayvan Kousha ◽  
Mike Thelwall

Abstract Two partly conflicting academic pressures from the seriousness of the Covid-19 pandemic are the need for faster peer review of Covid-19 health-related research and greater scrutiny of its findings. This paper investigates whether decreases in peer review durations for Covid-19 articles were universal across 97 major medical journals, Nature, Science, and Cell. The results suggest that on average, Covid-19 articles submitted during 2020 were reviewed 1.7–2.1 times faster than non-Covid-19 articles submitted during 2017–2020. Nevertheless, whilst the review speed of Covid-19 research was particularly fast during the first five months (1.9–3.4 times faster) of the pandemic (January–May 2020), this speed advantage was no longer evident for articles submitted November–December 2020. Faster peer review also associates with higher citation impact for Covid-19 articles in the same journals, suggesting it did not usually compromise the scholarly impact of important Covid-19 research. Overall, then, it seems that core medical and general journals responded quickly but carefully to the pandemic, although the situation returned closer to normal within a year. Peer Review https://publons.com/publon/10.1162/qss_a_00176


2021 ◽  
pp. 1-38
Author(s):  
Olga Zagovora ◽  
Roberto Ulloa ◽  
Katrin Weller ◽  
Fabian Flöck

Abstract With this work, we present a publicly available dataset of the history of all the references (more than 55 million) ever used in the English Wikipedia until June 2019. We have applied a new method for identifying and monitoring references in Wikipedia, so that for each reference we can provide data about associated actions: creation, modifications, deletions, and reinsertions. The high accuracy of this method and the resulting dataset was confirmed via a comprehensive crowdworker labelling campaign. We use the dataset to study the temporal evolution of Wikipedia references as well as users’ editing behaviour. We find evidence of a mostly productive and continuous effort to improve the quality of references: (1) there is a persistent increase of reference and document identifiers (DOI, PubMedID, PMC, ISBN, ISSN, ArXiv ID), and (2) most of the reference curation work is done by registered humans (not bots or anonymous editors). We conclude that the evolution of Wikipedia references, including the dynamics of the community processes that tend to them should be leveraged in the design of relevance indexes for altmetrics, and our dataset can be pivotal for such an effort. Peer Review https://publons.com/publon/10.1162/qss_a_00171


2021 ◽  
pp. 1-23
Author(s):  
Mike Thelwall ◽  
Abrizah Abdullah ◽  
Ruth Fairclough

Abstract This article assesses the balance of research concerning women and men over the past quarter century using the crude heuristic of counting Scopus-indexed journal articles relating to women or men, as suggested by their titles or abstracts. A manual checking procedure together with a word-based heuristic was used to identify whether an article related to women or men. The heuristic includes both explicit mentions of women and men, implicit mentions, and a set of gender-focused health issues and medical terminology. Based on the results, more published articles now relate to women than to men. Moreover, more than twice as many articles relate exclusively to women than exclusively to men, with the ratio increasing from 2.16 to 1 in 1996 to 2.25 to 1 in 2020. Monogender articles mostly addressed primarily female health issues (maternity, breast cancer, cervical cancer) with fewer about primarily male health issues (testicular cancer, pancreatic cancer, health needs of men who have sex with men). Some articles also explicitly addressed gender inequality, such as empowering women entrepreneurs. The findings suggests that the androcentrism of early science has eroded in terms of research topics. This apparent progress should be encouraging for women researchers and society. Peer Review https://publons.com/publon/10.1162/qss_a_00173


2021 ◽  
pp. 1-18
Author(s):  
Alberto Baccini ◽  
Giuseppe De Nicolao

Abstract During the Italian research assessment exercise (2004–2010), the governmental agency (ANVUR) in charge of its realization performed an experiment on the concordance between peer review and bibliometrics at an individual article level. The computed concordances were at most weak for science, technology, engineering and mathematics. The only exception was the moderate concordance found for the area of economics and statistics. In this paper, the disclosed raw data of the experiment are used to shed light on the anomalous results obtained for economics and statistics. In particular, the data permit to document that the protocol of the experiment adopted for economics and statistics was different from the one used in the other areas. Indeed, in economics and statistics a same group of scholars developed the bibliometric ranking of journals for evaluating articles, managed peer reviews and formed the consensus groups for deciding the final scores of articles after having received the referee’s reports. This paper shows that the highest level of concordance in economics and statistics was an artifact mainly due to the role played by consensus groups in boosting the agreement between bibliometrics and peer review. Peer Review https://publons.com/publon/10.1162/qss_a_00172


2021 ◽  
pp. 1-37
Author(s):  
Aidan Kelley ◽  
Daniel Garijo

Abstract An increasing number of researchers rely on computational methods to generate or manipulate the results described in their scientific publications. Software created to this end—scientific software—is key to understanding, reproducing, and reusing existing work in many disciplines, ranging from Geosciences to Astronomy or Artificial Intelligence. However, scientific software is usually challenging to find, set up, and compare to similar software due to its disconnected documentation (dispersed in manuals, readme files, web sites, and code comments) and the lack of structured metadata to describe it. As a result, researchers have to manually inspect existing tools in order to understand their differences and incorporate them into their work. This approach scales poorly with the number of publications and tools made available every year. In this paper we address these issues by introducing a framework for automatically extracting scientific software metadata from its documentation (in particular, their readme files); a methodology for structuring the extracted metadata in a Knowledge Graph (KG) of scientific software; and an exploitation framework for browsing and comparing the contents of the generated KG. We demonstrate our approach by creating a KG with metadata from over ten thousand scientific software entries from public code repositories.


2021 ◽  
pp. 1-42
Author(s):  
Aline Menin ◽  
Franck Michel ◽  
Fabien Gandon ◽  
Raphaël Gazzotti ◽  
Elena Cabrio ◽  
...  

Abstract The unprecedented mobilization of scientists, consequent of the COVID-19 pandemics, has generated an enormous number of scholarly articles that is impossible for a human being to keep track and explore without appropriate tool support. In this context, we created the Covid-on-the-Web project, which aims to assist the access, querying, and sense making of COVID-19 related literature by combining efforts from semantic web, natural language processing, and visualization fields. Particularly, in this paper, we present (i) an RDF dataset, a linked version of the “COVID-19 Open Research Dataset” (CORD-19), enriched via entity linking and argument mining, and (ii) the “Linked Data Visualizer” (LDViz), 28 which assists the querying and visual exploration of the referred dataset. The LDViz tool assists the exploration of different views of the data by combining a querying management interface, which enables the definition of meaningful subsets of data through SPARQL queries, and a visualization interface based on a set of six visualization techniques integrated in a chained visualization concept, which also supports the tracking of provenance information. We demonstrate the potential of our approach to assist biomedical researchers in solving domain-related tasks, as well as to perform exploratory analyses through use case scenarios.


2021 ◽  
pp. 1-30
Author(s):  
Michael Färber ◽  
David Lamprecht

Abstract Several scholarly knowledge graphs have been proposed to model and analyze the academic landscape. However, although the number of data sets has increased remarkably in recent years, these knowledge graphs do not primarily focus on data sets but rather associated entities such as publications. Moreover, publicly available data set knowledge graphs do not systematically contain links to the publications in which the data sets are mentioned. In this paper, we present an approach for constructing an RDF knowledge graph that fulfills these mentioned criteria. Our data set knowledge graph, DSKG, is publicly available at http://dskg.org and contains metadata of data sets for all scientific disciplines. To ensure high data quality of the DSKG, we first identify suitable raw data set collections for creating the DSKG. We then establish links between the data sets and publications modeled in the Microsoft Academic Knowledge Graph that mention these data sets. As the author names of data sets can be ambiguous, we develop and evaluate a method for author name disambiguation and enrich the knowledge graph with links to ORCID. Overall, our knowledge graph contains more than 2,000 data sets with associated properties, as well as 814,000 links to 635,000 scientific publications. It can be used for a variety of scenarios, facilitating advanced data set search systems and new ways of measuring and awarding the provisioning of data sets.


Sign in / Sign up

Export Citation Format

Share Document