scholarly journals South Africa Crime Visualization, Trends Analysis, and Prediction Using Machine Learning Linear Regression Technique

2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Ibidun Christiana Obagbuwa ◽  
Ademola P. Abidoye

South Africa has been classified as one of the most homicidal, violent, and dangerous places across the globe. However, the two elements that pushed South Africa high in the crime rank are the rates of social violence and homicide. It was reported by Business Insider that South Africa is among the most top 15 ferocious nations on earth. By 1995, South Africa was rated the second highest in terms of murder. However, the crime rate has reduced for some years and suddenly rose again in recent years. Due to social violence and crime rates in South Africa, foreign investors are no longer interested in continuing or starting a business with the nation, and hence, its economy is declining. South Africa’s government is looking for solutions to the crime issue and to redeem the image of the country in terms of high crime ranking and boost the confidence of the investors. Many traditional approaches to data analysis in crime-related studies have been done in South Africa, but the machine learning approach has not been adequately considered. The police station and many other agencies that deal with crime hold a lot of databases that can be used to predict or analyze criminal happenings across the provinces of South Africa. This research work aimed at offering a solution to the problem by building a model that can predict crime. The machine learning approach shall be used to extract useful information from South Africa's nine provinces' crime data. A crime prediction system that can analyze and predict crime is proposed. To accomplish this, South Africa crime data on 27 crime categories were obtained from the popular data repository “Kaggle.” Diverse data analytics steps were applied to preprocess the datasets, and a machine learning algorithm (linear regression) was used to build a predictive model to analyze data and predict future crime. The appropriate authorities and security agencies in South Africa can have insight into the crime trends and alleviate them to encourage the foreign stakeholders to continue their businesses.

2020 ◽  
Vol 2 (4) ◽  
Author(s):  
Ibrahim Goni ◽  
Murtala Mohammad

The mobile Cyber Crime detection is challenged by number of mobile devices (internet of things), large and complex data, the size, the velocity, the nature and the complexity of the data and devices has become so high that data mining techniques are no more efficient since they cannot handle Big Data and internet of things. The aim of this research work was to develop a mobile forensics framework for cybercrime detection using machine learning approach. It started when call was detected and this detection is made by machine learning algorithm furthermore intelligent mass media towers and satellite that was proposed in this work has the ability to classified calls whether is a threat or not and send signal directly to Nigerian communication commission (NCC) forensic lab for necessary action. 


2022 ◽  
pp. 181-194
Author(s):  
Bala Krishna Priya G. ◽  
Jabeen Sultana ◽  
Usha Rani M.

Mining Telugu news data and categorizing based on public sentiments is quite important since a lot of fake news emerged with rise of social media. Identifying whether news text is positive, negative, or neutral and later classifying the data in which areas they fall like business, editorial, entertainment, nation, and sports is included throughout this research work. This research work proposes an efficient model by adopting machine learning classifiers to perform classification on Telugu news data. The results obtained by various machine-learning models are compared, and an efficient model is found, and it is observed that the proposed model outperformed with reference to accuracy, precision, recall, and F1-score.


Entropy ◽  
2019 ◽  
Vol 21 (10) ◽  
pp. 1015 ◽  
Author(s):  
Carles Bretó ◽  
Priscila Espinosa ◽  
Penélope Hernández ◽  
Jose M. Pavía

This paper applies a Machine Learning approach with the aim of providing a single aggregated prediction from a set of individual predictions. Departing from the well-known maximum-entropy inference methodology, a new factor capturing the distance between the true and the estimated aggregated predictions presents a new problem. Algorithms such as ridge, lasso or elastic net help in finding a new methodology to tackle this issue. We carry out a simulation study to evaluate the performance of such a procedure and apply it in order to forecast and measure predictive ability using a dataset of predictions on Spanish gross domestic product.


Author(s):  
B.D. Britt ◽  
T. Glagowski

AbstractThis paper describes current research toward automating the redesign process. In redesign, a working design is altered to meet new problem specifications. This process is complicated by interactions between different parts of the design, and many researchers have addressed these issues. An overview is given of a large design tool under development, the Circuit Designer's Apprentice. This tool integrates various techniques for reengineering existing circuits so that they meet new circuit requirements. The primary focus of the paper is one particular technique being used to reengineer circuits when they cannot be transformed to meet the new problem requirements. In these cases, a design plan is automatically generated for the circuit, and then replayed to solve all or part of the new problem. This technique is based upon the derivational analogy approach to design reuse. Derivational Analogy is a machine learning algorithm in which a design plan is saved at the time of design so that it can be replayed on a new design problem. Because design plans were not saved for the circuits available to the Circuit Designer's Apprentice, an algorithm was developed that automatically reconstructs a design plan for any circuit. This algorithm, Reconstructive Derivational Analogy, is described in detail, including a quantitative analysis of the implementation of this algorithm.


2021 ◽  
Author(s):  
Diti Roy ◽  
Md. Ashiq Mahmood ◽  
Tamal Joyti Roy

<p>Heart Disease is the most dominating disease which is taking a large number of deaths every year. A report from WHO in 2016 portrayed that every year at least 17 million people die of heart disease. This number is gradually increasing day by day and WHO estimated that this death toll will reach the summit of 75 million by 2030. Despite having modern technology and health care system predicting heart disease is still beyond limitations. As the Machine Learning algorithm is a vital source predicting data from available data sets we have used a machine learning approach to predict heart disease. We have collected data from the UCI repository. In our study, we have used Random Forest, Zero R, Voted Perceptron, K star classifier. We have got the best result through the Random Forest classifier with an accuracy of 97.69.<i><b></b></i></p> <p><b> </b></p>


Author(s):  
Ganesh K. Shinde

Abstract: Most important part of information gathering is to focus on how people think. There are so many opinion resources such as online review sites and personal blogs are available. In this paper we focused on the Twitter. Twitter allow user to express his opinion on variety of entities. We performed sentiment analysis on tweets using Text Mining methods such as Lexicon and Machine Learning Approach. We performed Sentiment Analysis in two steps, first by searching the polarity words from the pool of words that are already predefined in lexicon dictionary and in Second step training the machine learning algorithm using polarities given in the first step. Keywords: Sentiment analysis, Social Media, Twitter, Lexicon Dictionary, Machine Learning Classifiers, SVM.


2018 ◽  
Vol 8 (12) ◽  
pp. 2422 ◽  
Author(s):  
Ali Muhamed Ali ◽  
Hanqi Zhuang ◽  
Ali Ibrahim ◽  
Oneeb Rehman ◽  
Michelle Huang ◽  
...  

Kidney cancer is one of the deadliest diseases and its diagnosis and subtype classification are crucial for patients’ survival. Thus, developing automated tools that can accurately determine kidney cancer subtypes is an urgent challenge. It has been confirmed by researchers in the biomedical field that miRNA dysregulation can cause cancer. In this paper, we propose a machine learning approach for the classification of kidney cancer subtypes using miRNA genome data. Through empirical studies we found 35 miRNAs that possess distinct key features that aid in kidney cancer subtype diagnosis. In the proposed method, Neighbourhood Component Analysis (NCA) is employed to extract discriminative features from miRNAs and Long Short Term Memory (LSTM), a type of Recurrent Neural Network, is adopted to classify a given miRNA sample into kidney cancer subtypes. In the literature, only a couple of kidney subtypes have been considered for classification. In the experimental study, we used the miRNA quantitative read counts data, which was provided by The Cancer Genome Atlas data repository (TCGA). The NCA procedure selected 35 of the most discriminative miRNAs. With this subset of miRNAs, the LSTM algorithm was able to group kidney cancer miRNAs into five subtypes with average accuracy around 95% and Matthews Correlation Coefficient value around 0.92 under 10 runs of randomly grouped 5-fold cross-validation, which were very close to the average performance of using all miRNAs for classification.


2019 ◽  
Vol 11 (8) ◽  
pp. 920 ◽  
Author(s):  
Syed Haleem Shah ◽  
Yoseline Angel ◽  
Rasmus Houborg ◽  
Shawkat Ali ◽  
Matthew F. McCabe

Developing rapid and non-destructive methods for chlorophyll estimation over large spatial areas is a topic of much interest, as it would provide an indirect measure of plant photosynthetic response, be useful in monitoring soil nitrogen content, and offer the capacity to assess vegetation structural and functional dynamics. Traditional methods of direct tissue analysis or the use of handheld meters, are not able to capture chlorophyll variability at anything beyond point scales, so are not particularly useful for informing decisions on plant health and status at the field scale. Examining the spectral response of plants via remote sensing has shown much promise as a means to capture variations in vegetation properties, while offering a non-destructive and scalable approach to monitoring. However, determining the optimum combination of spectra or spectral indices to inform plant response remains an active area of investigation. Here, we explore the use of a machine learning approach to enhance the estimation of leaf chlorophyll (Chlt), defined as the sum of chlorophyll a and b, from spectral reflectance data. Using an ASD FieldSpec 4 Hi-Res spectroradiometer, 2700 individual leaf hyperspectral reflectance measurements were acquired from wheat plants grown across a gradient of soil salinity and nutrient levels in a greenhouse experiment. The extractable Chlt was determined from laboratory analysis of 270 collocated samples, each composed of three leaf discs. A random forest regression algorithm was trained against these data, with input predictors based upon (1) reflectance values from 2102 bands across the 400–2500 nm spectral range; and (2) 45 established vegetation indices. As a benchmark, a standard univariate regression analysis was performed to model the relationship between measured Chlt and the selected vegetation indices. Results show that the root mean square error (RMSE) was significantly reduced when using the machine learning approach compared to standard linear regression. When exploiting the entire spectral range of individual bands as input variables, the random forest estimated Chlt with an RMSE of 5.49 µg·cm−2 and an R2 of 0.89. Model accuracy was improved when using vegetation indices as input variables, producing an RMSE ranging from 3.62 to 3.91 µg·cm−2, depending on the particular combination of indices selected. In further analysis, input predictors were ranked according to their importance level, and a step-wise reduction in the number of input features (from 45 down to 7) was performed. Implementing this resulted in no significant effect on the RMSE, and showed that much the same prediction accuracy could be obtained by a smaller subset of indices. Importantly, the random forest regression approach identified many important variables that were not good predictors according to their linear regression statistics. Overall, the research illustrates the promise in using established vegetation indices as input variables in a machine learning approach for the enhanced estimation of Chlt from hyperspectral data.


2018 ◽  
Vol 1 (2) ◽  
pp. 24-32
Author(s):  
Lamiaa Abd Habeeb

In this paper, we designed a system that extract citizens opinion about Iraqis government and Iraqis politicians through analyze their comments from Facebook (social media network). Since the data is random and contains noise, we cleaned the text and builds a stemmer to stem the words as much as possible, cleaning and stemming reduced the number of vocabulary from 28968 to 17083, these reductions caused reduction in memory size from 382858 bytes to 197102 bytes. Generally, there are two approaches to extract users opinion; namely, lexicon-based approach and machine learning approach. In our work, machine learning approach is applied with three machine learning algorithm which are; Naïve base, K-Nearest neighbor and AdaBoost ensemble machine learning algorithm. For Naïve base, we apply two models; Bernoulli and Multinomial models. We found that, Naïve base with Multinomial models give highest accuracy.


2020 ◽  
Author(s):  
Mareen Lösing ◽  
Jörg Ebbing ◽  
Wolfgang Szwillus

&lt;p&gt;Improving the understanding of geothermal heat flux in Antarctica is crucial for ice-sheet modelling and glacial isostatic adjustment. It affects the ice rheology and can lead to basal melting, thereby promoting ice flow. Direct measurements are sparse and models inferred from e.g. magnetic or seismological data differ immensely. By Bayesian inversion, we evaluated the uncertainties of some of these models and studied the interdependencies of the thermal parameters. In contrast to previous studies, our method allows the parameters to vary laterally, which leads to a heterogeneous West- and a slightly more homogeneous East Antarctica with overall lower surface heat flux. The Curie isotherm depth and radiogenic heat production have the strongest impact on our results but both parameters have a high uncertainty.&lt;/p&gt;&lt;p&gt;To overcome such shortcomings, we adopt a machine learning approach, more specifically a Gradient Boosted Regression Tree model, in order to find an optimal predictor for locations with sparse measurements. However, this approach largely relies on global data sets, which are notoriously unreliable in Antarctica. Therefore, validity and quality of the data sets is reviewed and discussed. Using regional and more detailed data sets of Antarctica&amp;#8217;s Gondwana neighbors might improve the predictions due to their similar tectonic history. The performance of the machine learning algorithm can then be examined by comparing the predictions to the existing measurements. From our study, we expect to get new insights in the geothermal structure of Antarctica, which will help with future studies on the coupling of Solid Earth and Cryosphere.&lt;/p&gt;


Sign in / Sign up

Export Citation Format

Share Document