Some Common Pitfalls in Causal Analysis of Categorical Data

1982 ◽  
Vol 19 (4) ◽  
pp. 461-471 ◽  
Author(s):  
Jay Magidson

Examples of some common pitfalls in the analysis of categorical data are discussed in the context of causal interpretation of the results. Though no statistical technique can replace theory, the author shows that log-linear modeling and chi square automatic interaction detection can provide researchers with powerful tools for gaining valuable causal insights into their data. Examples include the biasing effects of omitted variables, omitted interactions, improper contrast coding, and misspecification of the structure of an hypothesized interaction.

Author(s):  
Kacha Chansilp ◽  

This research presents the data-driven modeling method to derive a combined trading model from the analysis of negative correlations among the top-five active stocks from each sector of the Thailand stock market. The negative movements are computed from the closing price direction of major stocks in the eight biggest sectors. The highly negative correlated stocks among market groups are then used to build predictive trading models with three algorithms: regression analysis, generalized linear modeling, and chi-square automatic interaction detection. An ensemble from the combination of the best two models is then created. The experimental results reveal that the proposed method of trading based on negative movement analysis can accurately predict closing price of the active stock with low error rate around 1.86%.


Psych ◽  
2021 ◽  
Vol 3 (3) ◽  
pp. 522-541
Author(s):  
Jörg-Henrik Heine ◽  
Mark Stemmler

The person-centered approach in categorical data analysis is introduced as a complementary approach to the variable-centered approach. The former uses persons, animals, or objects on the basis of their combination of characteristics which can be displayed in multiway contingency tables. Configural Frequency Analysis (CFA) and log-linear modeling (LLM) are the two most prominent (and related) statistical methods. Both compare observed frequencies (foi…k) with expected frequencies (fei…k). While LLM uses primarily a model-fitting approach, CFA analyzes residuals of non-fitting models. Residuals with significantly more observed than expected frequencies (foi…k>fei…k) are called types, while residuals with significantly less observed than expected frequencies (foi…k<fei…k) are called antitypes. The R package confreq is presented and its use is demonstrated with several data examples. Results of contingency table analyses can be displayed in tables but also in graphics representing the size and type of residual. The expected frequencies represent the null hypothesis and different null hypotheses result in different expected frequencies. Different kinds of CFAs are presented: the first-order CFA based on the null hypothesis of independence, CFA with covariates, and the two-sample CFA. The calculation of the expected frequencies can be controlled through the design matrix which can be easily handled in confreq.


2020 ◽  
Vol 2019 (1) ◽  
pp. 357-367
Author(s):  
Isti Samrotul Hidayati ◽  
I Made Arcana

Metode Chi-squared Automatic Interaction Detection (CHAID) merupakan metode segmentasi berdasarkan hubungan variabel respon dan penjelas menggunakan uji chi-square, yang dalam penerapannya perlu memperhatikan keseimbangan data untuk meminimalkan kesalahan dalam klasifikasi. Salah satu pendekatan yang dapat digunakan pada data yang tidak seimbang adalah metode Synthetic Minority Over-sampling Technique (SMOTE). Dalam penelitian ini, metode CHAID dengan pendekatan SMOTE diterapkan pada Angka Kematian Balita (AKBa) di Kawasan Timur Indonesia (KTI). Tujuannya adalah untuk mengetahui variabel-variabel yang mencirikan kematian balita berdasarkan metode analisis CHAID yang diterapkan dan membandingkannya dengan pendekatan SMOTE. Hasil perbandingan menunjukkan bahwa pendekatan SMOTE lebih baik digunakan dengan nilai sensitivitas sebesar 48,3% dan nilai presisi sebesar 75,9%. Variabel yang signifikan mencirikan kematian balita di KTI adalah berat badan saat lahir, jenis kelahiran, status bekerja ibu dan kekayaan rumah tangga, dengan karakteristik utama adalah balita yang memiliki berat badan lahir rendah dan terlahir kembar.


1979 ◽  
Vol 27 (3) ◽  
pp. 458-468 ◽  
Author(s):  
Henry Lever

There is some controversy concerning the role of ethnicity in South African electoral behaviour. Since the society is segmented on ethnic lines it is to be expected that ethnicity would play a crucial role in affecting political choices. Some writers have gone so far as to suggest that ethnicity is the only significant factor affecting voting preferences. The controversy arose at a time when Goodman's method of log-linear analysis for hierarchical models had not yet been developed. This method provides the most powerful tool available for the multivariate analysis of categorical data. A re-analysis of previously published research using Goodman's method shows that ethnicity is not the only significant factor having a bearing on voting preferences. The first four-way table of voting preferences in South Africa is presented. The order of importance of the variables affecting party choice is: (1) ethnicity (2) socio-economic status (3) age of the voter. The recursive model suggested by the analysis explains approximately 98 per cent of the data.


Sign in / Sign up

Export Citation Format

Share Document