software repositories Latest Research Papers

AbstractResearch in Mining Software Repositories (MSR) is research involving human subjects, as the repositories usually contain data about developers’ and users’ interactions with the repositories and with each other. The ethics issues raised by such research therefore need to be considered before beginning. This paper presents a discussion of ethics issues that can arise in MSR research, using the mining challenges from the years 2006 to 2021 as a case study to identify the kinds of data used. On the basis of contemporary research ethics frameworks we discuss ethics challenges that may be encountered in creating and using repositories and associated datasets. We also report some results from a small community survey of approaches to ethics in MSR research. In addition, we present four case studies illustrating typical ethics issues one encounters in projects and how ethics considerations can shape projects before they commence. Based on our experience, we present some guidelines and practices that can help in considering potential ethics issues and reducing risks.

Download Full-text

Quick remedy commits and their impact on mining software repositories

Empirical Software Engineering ◽

10.1007/s10664-021-10051-z ◽

2021 ◽

Vol 27 (1) ◽

Author(s):

Fengcai Wen ◽

Csaba Nagy ◽

Michele Lanza ◽

Gabriele Bavota

Keyword(s):

Software Maintenance ◽

Noisy Data ◽

Mining Software Repositories ◽

Software Repositories ◽

Bug Fixing ◽

Manual Analysis ◽

Data Points ◽

Software Maintenance And Evolution ◽

Code Changes ◽

Different Parts

AbstractMost changes during software maintenance and evolution are not atomic changes, but rather the result of several related changes affecting different parts of the code. It may happen that developers omit needed changes, thus leaving a task partially unfinished, introducing technical debt or injecting bugs. We present a study investigating “quick remedy commits” performed by developers to implement changes omitted in previous commits. With quick remedy commits we refer to commits that (i) quickly follow a commit performed by the same developer, and (ii) aim at remedying issues introduced as the result of code changes omitted in the previous commit (e.g., fix references to code components that have been broken as a consequence of a rename refactoring) or simply improve the previously committed change (e.g., improve the name of a newly introduced variable). Through a manual analysis of 500 quick remedy commits, we define a taxonomy categorizing the types of changes that developers tend to omit. The taxonomy can (i) guide the development of tools aimed at detecting omitted changes and (ii) help researchers in identifying corner cases that must be properly handled. For example, one of the categories in our taxonomy groups the reverted commits, meaning changes that are undone in a subsequent commit. We show that not accounting for such commits when mining software repositories can undermine one’s findings. In particular, our results show that considering completely reverted commits when mining software repositories accounts, on average, for 0.07 and 0.27 noisy data points when dealing with two typical MSR data collection tasks (i.e., bug-fixing commits identification and refactoring operations mining, respectively).

Download Full-text

How Do Developers Blog?

ACM SIGSOFT Software Engineering Notes ◽

10.1145/3468744.3468753 ◽

2021 ◽

Vol 46 (3) ◽

pp. 26-29

Author(s):

Dennis Pagano ◽

Walid Maalej

Keyword(s):

Social Media ◽

Software Engineering ◽

Knowledge Sharing ◽

Mining Software Repositories ◽

Version Control ◽

Software Repositories

A decade ago, the rise of GitHub and StackOverflow as social version control and knowledge sharing environments was about to start. Social media like Twitter were mocked by some software engineering researchers and practitioners as "tools for kids not professionals". At that time, we published one of the first papers [12] on social media in software engineering at MSR 2011, the Mining Software Repositories Conference.

Download Full-text

Comparing fine-grained source code changes and code churn for bug prediction - A retrospective

ACM SIGSOFT Software Engineering Notes ◽

10.1145/3468744.3468751 ◽

2021 ◽

Vol 46 (3) ◽

pp. 21-23

Author(s):

Martin Pinzger ◽

Emanuel Giger ◽

Harald C. Gall

Keyword(s):

Source Code ◽

Software Systems ◽

Research Directions ◽

Software Repositories ◽

The Road ◽

Fine Grained ◽

Influential Paper ◽

Code Changes ◽

To Receive ◽

Source Code Changes

More than two decades ago, researchers started to mine the data stored in software repositories to help software developers in making informed decisions for developing and testing software systems. Bug prediction was one of the most promising and popular research directions that uses the data stored in software repositories to predict the bug-proneness or number of bugs in source files. On that topic and as part of Emanuel's PhD studies, we submitted a paper with the title Comparing fine-grained source code changes and code churn for bug prediction [8] to the 8th Working Conference on Mining Software Engineering, held 2011 in beautiful Honolulu, Hawaii. Ten years later, it got selected as one of the finalists to receive the MSR 2021 Most Influential Paper Award. In the following, we provide a retrospective on our work, describing the road to publishing this paper, its impact in the field of bug prediction, and the road ahead.

Download Full-text

GrimoireLab: A toolset for software development analytics

PeerJ Computer Science ◽

10.7717/peerj-cs.601 ◽

2021 ◽

Vol 7 ◽

pp. e601

Author(s):

Santiago Dueñas ◽

Valerio Cosentino ◽

Jesus M. Gonzalez-Barahona ◽

Alvaro del Castillo San Felix ◽

Daniel Izquierdo-Cortazar ◽

...

Keyword(s):

Software Development ◽

Preliminary Analysis ◽

Data Retrieval ◽

Mining Software Repositories ◽

Data Sources ◽

Data Sets ◽

Community Based ◽

Academic Environments ◽

Software Repositories ◽

Main Components

Background After many years of research on software repositories, the knowledge for building mature, reusable tools that perform data retrieval, storage and basic analytics is readily available. However, there is still room to improvement in the area of reusable tools implementing this knowledge. Goal To produce a reusable toolset supporting the most common tasks when retrieving, curating and visualizing data from software repositories, allowing for the easy reproduction of data sets ready for more complex analytics, and sparing the researcher or the analyst of most of the tasks that can be automated. Method Use our experience in building tools in this domain to identify a collection of scenarios where a reusable toolset would be convenient, and the main components of such a toolset. Then build those components, and refine them incrementally using the feedback from their use in both commercial, community-based, and academic environments. Results GrimoireLab, an efficient toolset composed of five main components, supporting about 30 different kinds of data sources related to software development. It has been tested in many environments, for performing different kinds of studies, and providing different kinds of services. It features a common API for accessing the retrieved data, facilities for relating items from different data sources, semi-structured storage for easing later analysis and reproduction, and basic facilities for visualization, preliminary analysis and drill-down in the data. It is also modular, making it easy to support new kinds of data sources and analysis. Conclusions We present a mature toolset, widely tested in the field, that can help to improve the situation in the area of reusable tools for mining software repositories. We show some scenarios where it has already been used. We expect it will help to reduce the effort for doing studies or providing services in this area, leading to advances in reproducibility and comparison of results.

Download Full-text

Topic recommendation for software repositories using multi-label classification algorithms

Empirical Software Engineering ◽

10.1007/s10664-021-09976-2 ◽

2021 ◽

Vol 26 (5) ◽

Author(s):

Maliheh Izadi ◽

Abbas Heydarnoori ◽

Georgios Gousios

Keyword(s):

Classification Algorithms ◽

Software Repositories

Download Full-text

Individual differences limit predicting well-being and productivity using software repositories: a longitudinal industrial study

Empirical Software Engineering ◽

10.1007/s10664-021-09977-1 ◽

2021 ◽

Vol 26 (5) ◽

Author(s):

Miikka Kuutila ◽

Mika Mäntylä ◽

Maëlick Claes ◽

Marko Elovainio ◽

Bram Adams

Keyword(s):

Individual Differences ◽

Fixed Effects ◽

Prediction Models ◽

Well Being ◽

Software Project ◽

Structured Interviews ◽

Software Developers ◽

Software Repositories ◽

Generalized Linear Mixed Effects ◽

Individualized Prediction

AbstractReports of poor work well-being and fluctuating productivity in software engineering have been reported in both academic and popular sources. Understanding and predicting these issues through repository analysis might help manage software developers’ well-being. Our objective is to link data from software repositories, that is commit activity, communication, expressed sentiments, and job events, with measures of well-being obtained with a daily experience sampling questionnaire. To achieve our objective, we studied a single software project team for eight months in the software industry. Additionally, we performed semi-structured interviews to explain our results. The acquired quantitative data are analyzed with generalized linear mixed-effects models with autocorrelation structure. We find that individual variance accounts for most of the R2 values in models predicting developers’ experienced well-being and productivity. In other words, using software repository variables to predict developers’ well-being or productivity is challenging due to individual differences. Prediction models developed for each developer individually work better, with fixed effects R2 value of up to 0.24. The semi-structured interviews give insights into the well-being of software developers and the benefits of chat interaction. Our study suggests that individualized prediction models are needed for well-being and productivity prediction in software development.

Download Full-text

Analysing Time-Stamped Co-Editing Networks in Software Development Teams using git2net

Empirical Software Engineering ◽

10.1007/s10664-020-09928-2 ◽

2021 ◽

Vol 26 (4) ◽

Author(s):

Christoph Gote ◽

Ingo Scholtes ◽

Frank Schweitzer

Keyword(s):

Software Engineering ◽

Computational Social Science ◽

Software Project ◽

High Resolution Data ◽

Software Repositories ◽

Fine Grained ◽

Development Teams ◽

History Of ◽

Software Development Teams ◽

Advance Theory

AbstractData from software repositories have become an important foundation for the empirical study of software engineering processes. A recurring theme in the repository mining literature is the inference of developer networks capturing e.g. collaboration, coordination, or communication from the commit history of projects. Many works in this area studied networks of co-authorship of software artefacts, neglecting detailed information on code changes and code ownership available in software repositories. To address this issue, we introduce , a scalable software that facilitates the extraction of fine-grained co-editing networks in large repositories. It uses text mining techniques to analyse the detailed history of textual modifications within files. We apply our tool in two case studies using repositories of multiple Open Source as well as a proprietary software project. Specifically, we use data on more than 1.2 million commits and more than 25,000 developers to test a hypothesis on the relation between developer productivity and co-editing patterns in software teams. We argue that opens up an important new source of high-resolution data on human collaboration patterns that can be used to advance theory in empirical software engineering, computational social science, and organisational studies.

Download Full-text

Leveraging Models to Reduce Test Cases in Software Repositories

2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR) ◽

10.1109/msr52588.2021.00035 ◽

2021 ◽

Author(s):

Golnaz Gharachorlu ◽

Nick Sumner

Keyword(s):

Test Cases ◽

Software Repositories

Download Full-text

software repositories
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A systematic process for Mining Software Repositories: Results from a systematic literature review

Ethics in the mining of software repositories

Quick remedy commits and their impact on mining software repositories

How Do Developers Blog?

Comparing fine-grained source code changes and code churn for bug prediction - A retrospective

GrimoireLab: A toolset for software development analytics

Topic recommendation for software repositories using multi-label classification algorithms

Individual differences limit predicting well-being and productivity using software repositories: a longitudinal industrial study

Analysing Time-Stamped Co-Editing Networks in Software Development Teams using git2net

Leveraging Models to Reduce Test Cases in Software Repositories

Export Citation Format

software repositoriesRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A systematic process for Mining Software Repositories: Results from a systematic literature review

Ethics in the mining of software repositories

Quick remedy commits and their impact on mining software repositories

How Do Developers Blog?

Comparing fine-grained source code changes and code churn for bug prediction - A retrospective

GrimoireLab: A toolset for software development analytics

Topic recommendation for software repositories using multi-label classification algorithms

Individual differences limit predicting well-being and productivity using software repositories: a longitudinal industrial study

Analysing Time-Stamped Co-Editing Networks in Software Development Teams using git2net

Leveraging Models to Reduce Test Cases in Software Repositories

software repositories
Recently Published Documents