scholarly journals SPI: Automated Identification of Security Patches via Commits

2022 ◽  
Vol 31 (1) ◽  
pp. 1-27
Author(s):  
Yaqin Zhou ◽  
Jing Kai Siow ◽  
Chenyu Wang ◽  
Shangqing Liu ◽  
Yang Liu

Security patches in open source software, providing security fixes to identified vulnerabilities, are crucial in protecting against cyber attacks. Security advisories and announcements are often publicly released to inform the users about potential security vulnerability. Despite the National Vulnerability Database (NVD) publishes identified vulnerabilities, a vast majority of vulnerabilities and their corresponding security patches remain beyond public exposure, e.g., in the open source libraries that are heavily relied on by developers. As many of these patches exist in open sourced projects, the problem of curating and gathering security patches can be difficult due to their hidden nature. An extensive and complete security patches dataset could help end-users such as security companies, e.g., building a security knowledge base, or researcher, e.g., aiding in vulnerability research. To efficiently curate security patches including undisclosed patches at large scale and low cost, we propose a deep neural-network-based approach built upon commits of open source repositories. First, we design and build security patch datasets that include 38,291 security-related commits and 1,045 Common Vulnerabilities and Exposures (CVE) patches from four large-scale C programming language libraries. We manually verify each commit, among the 38,291 security-related commits, to determine if they are security related. We devise and implement a deep learning-based security patch identification system that consists of two composite neural networks: one commit-message neural network that utilizes pretrained word representations learned from our commits dataset and one code-revision neural network that takes code before revision and after revision and learns the distinction on the statement level. Our system leverages the power of the two networks for Security Patch Identification. Evaluation results show that our system significantly outperforms SVM and K-fold stacking algorithms. The result on the combined dataset achieves as high as 87.93% F1-score and precision of 86.24%. We deployed our pipeline and learned model in an industrial production environment to evaluate the generalization ability of our approach. The industrial dataset consists of 298,917 commits from 410 new libraries that range from a wide functionalities. Our experiment results and observation on the industrial dataset proved that our approach can identify security patches effectively among open sourced projects.

Sensors ◽  
2019 ◽  
Vol 19 (17) ◽  
pp. 3782 ◽  
Author(s):  
Julius Venskus ◽  
Povilas Treigys ◽  
Jolita Bernatavičienė ◽  
Gintautas Tamulevičius ◽  
Viktor Medvedev

The automated identification system of vessel movements receives a huge amount of multivariate, heterogeneous sensor data, which should be analyzed to make a proper and timely decision on vessel movements. The large number of vessels makes it difficult and time-consuming to detect abnormalities, thus rapid response algorithms should be developed for a decision support system to identify abnormal movements of vessels in areas of heavy traffic. This paper extends the previous study on a self-organizing map application for processing of sensor stream data received by the maritime automated identification system. The more data about the vessel’s movement is registered and submitted to the algorithm, the higher the accuracy of the algorithm should be. However, the task cannot be guaranteed without using an effective retraining strategy with respect to precision and data processing time. In addition, retraining ensures the integration of the latest vessel movement data, which reflects the actual conditions and context. With a view to maintaining the quality of the results of the algorithm, data batching strategies for the neural network retraining to detect anomalies in streaming maritime traffic data were investigated. The effectiveness of strategies in terms of modeling precision and the data processing time were estimated on real sensor data. The obtained results show that the neural network retraining time can be shortened by half while the sensitivity and precision only change slightly.


Sensors ◽  
2020 ◽  
Vol 20 (11) ◽  
pp. 3055
Author(s):  
Olivier Pieters ◽  
Tom De Swaef ◽  
Peter Lootens ◽  
Michiel Stock ◽  
Isabel Roldán-Ruiz ◽  
...  

The study of the dynamic responses of plants to short-term environmental changes is becoming increasingly important in basic plant science, phenotyping, breeding, crop management, and modelling. These short-term variations are crucial in plant adaptation to new environments and, consequently, in plant fitness and productivity. Scalable, versatile, accurate, and low-cost data-logging solutions are necessary to advance these fields and complement existing sensing platforms such as high-throughput phenotyping. However, current data logging and sensing platforms do not meet the requirements to monitor these responses. Therefore, a new modular data logging platform was designed, named Gloxinia. Different sensor boards are interconnected depending upon the needs, with the potential to scale to hundreds of sensors in a distributed sensor system. To demonstrate the architecture, two sensor boards were designed—one for single-ended measurements and one for lock-in amplifier based measurements, named Sylvatica and Planalta, respectively. To evaluate the performance of the system in small setups, a small-scale trial was conducted in a growth chamber. Expected plant dynamics were successfully captured, indicating proper operation of the system. Though a large scale trial was not performed, we expect the system to scale very well to larger setups. Additionally, the platform is open-source, enabling other users to easily build upon our work and perform application-specific optimisations.


2003 ◽  
Vol 2003 (01) ◽  
pp. 0102
Author(s):  
Terry Bollinger

This report documents the results of a study by The MITRE Corporation on the use of free and open-source software (FOSS) in the U.S. Department of Defense (DoD). FOSS gives users the right to run, copy, distribute, study, change, and improve it as they see fit, without asking permission or making fiscal payments to any external group or person. The study showed that FOSS provides substantial benefits to DoD security, infrastructure support, software development, and research. Given the openness of its source code, the finding that FOSS profoundly benefits security was both counterintuitive and instructive. Banning FOSS in DoD would remove access to exceptionally well-verified infrastructure components such as OpenBSD and robust network and software analysis tools needed to detect and respond to cyber-attacks. Finally, losing the hands-on source code accessibility of FOSS source code would reduce DoD’s ability to respond rapidly to cyberattacks. In short, banning FOSS would have immediate, broad, and strongly negative impacts on the DoD’s ability to defend the U.S. against cyberattacks. For infrastructure support, the deep historical ties between FOSS and the emergence of the Internet mean that removing FOSS applications would strongly negatively impact the DoD’s ability to support web and Internet-based applications. Software development would be hit especially hard due to many leading-edge and broadly used tools being FOSS. Finally, the loss of access to low-cost data processing tools and the inability to share results in the more potent form of executable FOSS software would seriously and negatively impact nearly all forms of scientific and data-driven research.


2012 ◽  
Vol 3 (1) ◽  
pp. 11-14
Author(s):  
Ebtesam Najim AlShemmary ◽  
Bahaa Qasim Al-Musawi

Governments and their agencies are often challenged by high cost and flexible telephonic, Web based data services. Emerging technologies, such as those of Voice over Internet Protocol (VoIP) that allow convergent systems where voice and Web technologies can utilize the same network to provide both services, can be used to improve such services. This paper describe VoIP system for the enterprise network (e.g. company, university) that have been developed based on Asterisk which is a kind of open source software to implement IP-PBX system. Through the development and evaluation, we have confirmed that VoIP system based on Asterisk is very powerful as a whole and most PBX functions to be required for the enterprise network can be realized. Interesting findings include that the University of Kufa has a potential to implement the project. By connecting multiple Asterisk servers located in different sites based on IAX2, large scale enterprise network can be developed. Since the software recommended for installation is open source, the project could be used as a source of valuable information by students who specialize in real-time multi-media systems.


2013 ◽  
Vol 1492 ◽  
pp. 85-90 ◽  
Author(s):  
Megan Kreiger ◽  
Joshua M. Pearce

ABSTRACTAlthough additive layer manufacturing is well established for rapid prototyping the low throughput and historic costs have prevented mass-scale adoption. The recent development of the RepRap, an open source self-replicating rapid prototyper, has made low-cost 3-D printers readily available to the public at reasonable prices (<$1,000). The RepRap (Prusa Mendell variant) currently prints 3-D objects in a 200x200x140 square millimeters build envelope from acrylonitrile butadiene styrene (ABS) and polylactic acid (PLA). ABS and PLA are both thermoplastics that can be injection-molded, each with their own benefits, as ABS is rigid and durable, while PLA is plant-based and can be recycled and composted. The melting temperature of ABS and PLA enable use in low-cost 3-D printers, as these temperature are low enough to use in melt extrusion in the home, while high enough for prints to retain their shape at average use temperatures. Using 3-D printers to manufacture provides the ability to both change the fill composition by printing voids and fabricate shapes that are impossible to make using tradition methods like injection molding. This allows more complicated shapes to be created while using less material, which could reduce environmental impact.As the open source 3-D printers continue to evolve and improve in both cost and performance, the potential for economically-viable distributed manufacturing of products increases. Thus, products and components could be customized and printed on-site by individual consumers as needed, reversing the historical trend towards centrally mass-manufactured and shipped products. Distributed manufacturing reduces embodied transportation energy from the distribution of conventional centralized manufacturing, but questions remain concerning the potential for increases in the overall embodied energy of the manufacturing due to reduction in scale. In order to quantify the environmental impact of distributed manufacturing using 3-D printers, a life cycle analysis was performed on a plastic juicer. The energy consumed and emissions produced from conventional large-scale production overseas are compared to experimental measurements on a RepRap producing identical products with ABS and PLA. The results of this LCA are discussed in relation to the environmental impact of distributed manufacturing with 3-D printers and polymer selection for 3-D printing to reduce this impact. The results of this study show that distributed manufacturing uses less energy than conventional manufacturing due to the RepRap's unique ability to reduce fill composition. Distributed manufacturing also has less emissions than conventional manufacturing when using PLA and when using ABS with solar photovoltaic power. The results of this study indicate that open-source additive layer distributed manufacturing is both technically viable and beneficial from an ecological perspective.


2019 ◽  
Vol 11 (24) ◽  
pp. 2997 ◽  
Author(s):  
Clément Dechesne ◽  
Sébastien Lefèvre ◽  
Rodolphe Vadaine ◽  
Guillaume Hajduch ◽  
Ronan Fablet

The monitoring and surveillance of maritime activities are critical issues in both military and civilian fields, including among others fisheries’ monitoring, maritime traffic surveillance, coastal and at-sea safety operations, and tactical situations. In operational contexts, ship detection and identification is traditionally performed by a human observer who identifies all kinds of ships from a visual analysis of remotely sensed images. Such a task is very time consuming and cannot be conducted at a very large scale, while Sentinel-1 SAR data now provide a regular and worldwide coverage. Meanwhile, with the emergence of GPUs, deep learning methods are now established as state-of-the-art solutions for computer vision, replacing human intervention in many contexts. They have been shown to be adapted for ship detection, most often with very high resolution SAR or optical imagery. In this paper, we go one step further and investigate a deep neural network for the joint classification and characterization of ships from SAR Sentinel-1 data. We benefit from the synergies between AIS (Automatic Identification System) and Sentinel-1 data to build significant training datasets. We design a multi-task neural network architecture composed of one joint convolutional network connected to three task specific networks, namely for ship detection, classification, and length estimation. The experimental assessment shows that our network provides promising results, with accurate classification and length performance (classification overall accuracy: 97.25%, mean length error: 4.65 m ± 8.55 m).


Author(s):  
Pavel Katunin ◽  
Jianbo Zhou ◽  
Ola M. Shehata ◽  
Andrew A. Peden ◽  
Ashley Cadby ◽  
...  

Modern data analysis methods, such as optimization algorithms or deep learning have been successfully applied to a number of biotechnological and medical questions. For these methods to be efficient, a large number of high-quality and reproducible experiments needs to be conducted, requiring a high degree of automation. Here, we present an open-source hardware and low-cost framework that allows for automatic high-throughput generation of large amounts of cell biology data. Our design consists of an epifluorescent microscope with automated XY stage for moving a multiwell plate containing cells and a perfusion manifold allowing programmed application of up to eight different solutions. Our system is very flexible and can be adapted easily for individual experimental needs. To demonstrate the utility of the system, we have used it to perform high-throughput Ca2+ imaging and large-scale fluorescent labeling experiments.


2019 ◽  
Author(s):  
Jon Paul Janet ◽  
Chenru Duan ◽  
Tzuhsiung Yang ◽  
Aditya Nandy ◽  
Heather Kulik

<p>Machine learning (ML) models, such as artificial neural networks, have emerged as a complement to high-throughput screening, enabling characterization of new compounds in seconds instead of hours. The promise of ML models to enable large-scale, chemical space exploration can only be realized if it is straightforward to identify when molecules and materials are outside the model’s domain of applicability. Established uncertainty metrics for neural network models are either costly to obtain (e.g., ensemble models) or rely on feature engineering (e.g., feature space distances), and each has limitations in estimating prediction errors for chemical space exploration. We introduce the distance to available data in the latent space of a neural network ML model as a low-cost, quantitative uncertainty metric that works for both inorganic and organic chemistry. The calibrated performance of this approach exceeds widely used uncertainty metrics and is readily applied to models of increasing complexity at no additional cost. Tightening latent distance cutoffs systematically drives down predicted model errors below training errors, thus enabling predictive error control in chemical discovery or identification of useful data points for active learning.</p>


Sign in / Sign up

Export Citation Format

Share Document