Data mining techniques and SAS as a tool for graphical presentation of principal components analysis and disjoint cluster analysis results

Author(s):  
Emir Slanjankic ◽  
Haris Balta ◽  
Adil Joldic ◽  
Alsa Cvitkovic ◽  
Djenan Heric ◽  
...  
2019 ◽  
pp. 016555151986549
Author(s):  
Hakan Kaygusuz

In this article, chemistry research in 51 different European countries between years 2006 and 2016 was studied using statistical methods. This study consists of two parts: In the first part, different economical, institutional and citation parameters were correlated with the number of publications, citations and chemical industry numbers using principal components analysis and hierarchical cluster analysis. The results of the first part indicated that economical and geographical parameters directly affect the chemistry research outcome. In the second part, research in branches of chemistry and related disciplines such as analytical chemistry, polymer science and physical chemistry were analysed using principal components analysis and hierarchical cluster analysis for each country. Publication data were collected as the number of chemistry publications (in Science Citation Index–Expanded (SCI-E)) between years 2006 and 2016 in different chemistry subdisciplines and related scientific areas. Results of the second part of the study produced geographical and economical clusters of countries, interestingly, without addition of any geographical data.


2013 ◽  
Vol 8 (1) ◽  
pp. 42-62 ◽  
Author(s):  
Qing Liu ◽  
David Pitt ◽  
Xueyuan Wu

AbstractThis paper explores how we can apply various modern data mining techniques to better understand Australian Income Protection Insurance (IPI). We provide a fast and objective method of scoring claims into different portfolios using available rating factors. Results from fitting several prediction models are compared based on not only the conventional loss prediction error function, but also a modified loss function. We demonstrate that the prediction power of all the data mining methods under consideration is clearly evident using a misclassification plot. We also point out that this predictability can be masked by looking at just the conventional prediction error function. We then suggest using the stepwise regression technique to reduce the number of variables used in the data mining methods. Apart from this variable selection method, we also look at principal components analysis to increase understanding of the rating factors that drive claim durations of insured lives. We also discuss and compare how different variable combining techniques can be used to weight available predicting variables. One interesting outcome we discover is that principal components analysis and the weighted combination prediction model together provide very consistent results on identifying the most significant variables for explaining claim durations.


2009 ◽  
pp. 81-114
Author(s):  
Ferruccio Biolcati Rinaldi ◽  
Daniele Checchi ◽  
Chiara Guglielmetti ◽  
Silvia Salini ◽  
Matteo Turri

- Abstract The paper consists of two parts. The first is more general: it introduces to university ranking, shows the leading international ranking, discusses the uses people make of rankings. The second focuses on Italian ranking Censis-la Repubblica developing two different kinds of analyses: after considering indicators validity and reliability, principal components analysis and cluster analysis are applied to a partial replication of Censis-la Repubblica data. A list of points to pay attention comes out of these analyses: it can be useful when defining rankings of complex institutions such as universities.Key words: ranking, university ranking, Censis-la Repubblica, validity and reliability, normalisation and combination of indicators.


1997 ◽  
Vol 48 (2) ◽  
pp. 215-227 ◽  
Author(s):  
Francisco Serrano ◽  
Antonio Guerra-Merchán ◽  
Carmen Lozano-Francisco ◽  
José Luis Vera-Peláez

AbstractNerja Cave is a karstic cavity used by humans from Late Paleolithic to post-Chalcolithic times. Remains of molluscan foods in the uppermost Pleistocene and Holocene sediments were studied with cluster analysis and principal components analysis, in bothQ and R modes. The results from cluster analysis distinguished interval groups mainly in accordance with chronology and distinguished assemblages of species mainly according to habitat. Significant changes in the shellfish diet through time were revealed. In the Late Magdalenian, most molluscs consumed consisted of pulmonate gastropods and species from sandy sea bottoms. The Epipaleolithic diet was more varied and included species from rocky shorelines. From the Neolithic onward most molluscs consumed were from rocky shorelines. From the principal components analysis inQ mode, the first factor reflected mainly changes in the predominant capture environment, probably because of major paleogeographic changes. The second factor may reflect selective capture along rocky coastlines during certain times. The third factor correlated well with the sea-surface temperature curve in the western Mediterranean (Alboran Sea) during the late Quaternary.


1984 ◽  
Vol 54 (1) ◽  
pp. 147-155
Author(s):  
W. Hovenkamp ◽  
F. Hovenkamp ◽  
J.J. van der Heide

A short introduction is provided on the taxonomic status of the genus Niphargus, especially on the species related to N. longicaudatus corsicanus. Previous findings and descriptions are mentioned. An attempt is made to clarify the relationships between Corsican Niphargus populations by means of a cluster analysis and a principal components analysis combined with a cluster analysis. Special attention has been paid to the size-dependent variability of most of the characters. The results of both methods of analysis are compared with each other and evaluated. The morphological differentiation between populations is, on the average, greater than within populations. This, along with the large amount of character variability, makes it very difficult to fit populations into, or to distinguish them from, any of the — often poorly described — taxa of Niphargus.


Psihologija ◽  
2015 ◽  
Vol 48 (1) ◽  
pp. 61-78 ◽  
Author(s):  
Slobodan Markovic ◽  
Djordje Alfirevic

The purpose of the present study was to compare the structure of experience of architectural expressiveness of architects and non-architects. Twenty architects and twenty non-architects rated twenty photographs of architectural objects on thirty expressiveness scales. Principal components analysis revealed four factors for both groups of participants: Aggressiveness, Regularity, Color and Aesthetics. In a cluster analysis two clusters of architectural objects were obtained: Choleric (high Aggressiveness and Color) and Phlegmatic (low Aggressiveness and Color, and high Regularity). All objects were highly rated on Aesthetics. Analysis of variance has shown that architects rated both clusters as less aggressive than non-architects. Also, experts rated the Phlegmatic cluster as more aesthetic, while nonexperts rated the Choleric cluster as more aesthetic. These results supported the Processing Fluency model: compared to non-architects, architects processed the expressive information of minimalistic objects (Phlegmatic cluster) with ease, which led towards positive hedonic reactions and higher.


1979 ◽  
Vol 9 (2) ◽  
pp. 305-311 ◽  
Author(s):  
R. M. Murray ◽  
J. E. Cooper ◽  
A. Smith

SynopsisThe Leyton Obsessional Inventory was administered to 73 obsessive-compulsive neurotics, and their responses compared with those of 100 normal subjects. The ratio of the mean patient to normal scores ranged from 2·4: 1 for obsessional traits and 3·2: 1 for symptoms to 6·2: 1 for resistance and 12·5: 1 for interference with other activities. A principal components analysis on the patients' replies produced 3 unitary components (household order, personal contamination, and doubting) plus 2 bipolar components (checking/parsimony and desire for closure/unpleasant ruminations). These appeared to be more definitive representations of components identified from a similar analysis on normal subjects, suggesting that obsessional neurotics differ from normal subjects quantitatively rather than qualitatively. A cluster analysis on the patients' responses produced 3 subgroups. Thirty-two patients were predominantly hesitant and indecisive (‘doubters’), 30 were concerned with bodily and clothing contamination (‘contaminators’), and 7 were preoccupied with checking (‘checkers’).


Sign in / Sign up

Export Citation Format

Share Document