Some Common Pitfalls in Causal Analysis of Categorical Data

Examples of some common pitfalls in the analysis of categorical data are discussed in the context of causal interpretation of the results. Though no statistical technique can replace theory, the author shows that log-linear modeling and chi square automatic interaction detection can provide researchers with powerful tools for gaining valuable causal insights into their data. Examples include the biasing effects of omitted variables, omitted interactions, improper contrast coding, and misspecification of the structure of an hypothesized interaction.

Download Full-text

Log-Linear Modeling of Categorical Data in Developmental Research

Life-Span Development and Behavior ◽

10.4324/9781315789255-6 ◽

2019 ◽

pp. 225-248

Author(s):

Alexander von Eye ◽

Kurt Kreppner ◽

Holger Weßels

Keyword(s):

Categorical Data ◽

Developmental Research ◽

Linear Modeling ◽

Log Linear

Download Full-text

Log-Linear Causal Analysis of Cross-Classified Categorical Data

Statistics and Causality - Wiley Series in Probability and Statistics ◽

10.1002/9781118947074.ch13 ◽

2016 ◽

pp. 311-331 ◽

Cited By ~ 2

Author(s):

Kazuo Yamaguchi

Keyword(s):

Categorical Data ◽

Causal Analysis ◽

Log Linear

Download Full-text

Stock Movement Modeling Based on the Analysis of Negative Correlation

International Journal of e-Education e-Business e-Management and e-Learning ◽

10.17706/ijeeee.2020.10.2.125-134 ◽

2020 ◽

Vol 10 (2) ◽

pp. 125-134

Author(s):

Kacha Chansilp ◽

Keyword(s):

Regression Analysis ◽

Movement Analysis ◽

Data Driven ◽

Linear Modeling ◽

Chi Square ◽

Interaction Detection ◽

Generalized Linear Modeling ◽

Closing Price ◽

Trading Model ◽

Data Driven Modeling

This research presents the data-driven modeling method to derive a combined trading model from the analysis of negative correlations among the top-five active stocks from each sector of the Thailand stock market. The negative movements are computed from the closing price direction of major stocks in the eight biggest sectors. The highly negative correlated stocks among market groups are then used to build predictive trading models with three algorithms: regression analysis, generalized linear modeling, and chi-square automatic interaction detection. An ensemble from the combination of the best two models is then created. The experimental results reveal that the proposed method of trading based on negative movement analysis can accurately predict closing price of the active stock with low error rate around 1.86%.

Download Full-text

Analysis of Categorical Data with the R Package confreq

Psych ◽

10.3390/psych3030034 ◽

2021 ◽

Vol 3 (3) ◽

pp. 522-541

Author(s):

Jörg-Henrik Heine ◽

Mark Stemmler

Keyword(s):

Categorical Data ◽

Null Hypothesis ◽

Model Fitting ◽

R Package ◽

Design Matrix ◽

Linear Modeling ◽

Categorical Data Analysis ◽

Configural Frequency Analysis ◽

Complementary Approach ◽

Log Linear

The person-centered approach in categorical data analysis is introduced as a complementary approach to the variable-centered approach. The former uses persons, animals, or objects on the basis of their combination of characteristics which can be displayed in multiway contingency tables. Configural Frequency Analysis (CFA) and log-linear modeling (LLM) are the two most prominent (and related) statistical methods. Both compare observed frequencies (foi…k) with expected frequencies (fei…k). While LLM uses primarily a model-fitting approach, CFA analyzes residuals of non-fitting models. Residuals with significantly more observed than expected frequencies (foi…k>fei…k) are called types, while residuals with significantly less observed than expected frequencies (foi…k<fei…k) are called antitypes. The R package confreq is presented and its use is demonstrated with several data examples. Results of contingency table analyses can be displayed in tables but also in graphics representing the size and type of residual. The expected frequencies represent the null hypothesis and different null hypotheses result in different expected frequencies. Different kinds of CFAs are presented: the first-order CFA based on the null hypothesis of independence, CFA with covariates, and the two-sample CFA. The calculation of the expected frequencies can be controlled through the design matrix which can be easily handled in confreq.

Download Full-text

Log-linear Causal Analysis of Cross-classified Categorical Data

Sociological Methodology ◽

10.1177/0081175012460661 ◽

2012 ◽

Vol 42 (1) ◽

pp. 257-285 ◽

Cited By ~ 4

Author(s):

Kazuo Yamaguchi

Keyword(s):

Categorical Data ◽

Causal Analysis ◽

Log Linear

Download Full-text

Modeling Categorical Data with Chi Square Automatic Interaction Detection and Correspondence Analysis

Geographical Analysis ◽

10.1111/j.1538-4632.1991.tb00243.x ◽

2010 ◽

Vol 23 (4) ◽

pp. 332-345 ◽

Cited By ~ 4

Author(s):

W.A.V. Clark ◽

M. C. Deurloo ◽

F. M. Dieleman

Keyword(s):

Correspondence Analysis ◽

Categorical Data ◽

Chi Square ◽

Interaction Detection

Download Full-text

Log-Linear Modeling and Analysis: Reflections on the Use of Multivariate Categorical Data in Social Science Research

Exceptionality ◽

10.1207/s15327035ex0703_6 ◽

1997 ◽

Vol 7 (3) ◽

pp. 199-203 ◽

Cited By ~ 2

Author(s):

Jay W. Rojewski ◽

Roger Bakeman

Keyword(s):

Social Science ◽

Categorical Data ◽

Social Science Research ◽

Science Research ◽

Linear Modeling ◽

Modeling And Analysis ◽

Log Linear ◽

Multivariate Categorical

Download Full-text

PENERAPAN CHAID DENGAN PENDEKATAN SMOTE PADA KEMATIAN BALITA DI KAWASAN TIMUR INDONESIA TAHUN 2017

Seminar Nasional Official Statistics ◽

10.34123/semnasoffstat.v2019i1.97 ◽

2020 ◽

Vol 2019 (1) ◽

pp. 357-367

Author(s):

Isti Samrotul Hidayati ◽

I Made Arcana

Keyword(s):

Sampling Technique ◽

Chi Square ◽

Interaction Detection ◽

Chi Squared

Metode Chi-squared Automatic Interaction Detection (CHAID) merupakan metode segmentasi berdasarkan hubungan variabel respon dan penjelas menggunakan uji chi-square, yang dalam penerapannya perlu memperhatikan keseimbangan data untuk meminimalkan kesalahan dalam klasifikasi. Salah satu pendekatan yang dapat digunakan pada data yang tidak seimbang adalah metode Synthetic Minority Over-sampling Technique (SMOTE). Dalam penelitian ini, metode CHAID dengan pendekatan SMOTE diterapkan pada Angka Kematian Balita (AKBa) di Kawasan Timur Indonesia (KTI). Tujuannya adalah untuk mengetahui variabel-variabel yang mencirikan kematian balita berdasarkan metode analisis CHAID yang diterapkan dan membandingkannya dengan pendekatan SMOTE. Hasil perbandingan menunjukkan bahwa pendekatan SMOTE lebih baik digunakan dengan nilai sensitivitas sebesar 48,3% dan nilai presisi sebesar 75,9%. Variabel yang signifikan mencirikan kematian balita di KTI adalah berat badan saat lahir, jenis kelahiran, status bekerja ibu dan kekayaan rumah tangga, dengan karakteristik utama adalah balita yang memiliki berat badan lahir rendah dan terlahir kembar.

Download Full-text

Review of "Analyzing Qualitative/Categorical Data: Log-Linear Models and Latent-Structure Analysis, by Leo A. Goodman", Abt Books, 1978

ACM SIGSIM Simulation Digest ◽

10.1145/1102815.1102830 ◽

1979 ◽

Vol 10 (4) ◽

pp. 69-69

Keyword(s):

Structure Analysis ◽

Categorical Data ◽

Linear Models ◽

Latent Structure ◽

Latent Structure Analysis ◽

Log Linear ◽

Data Log

Download Full-text

Ethnicity and Voting Patterns in South Africa

Political Studies ◽

10.1111/j.1467-9248.1979.tb01215.x ◽

1979 ◽

Vol 27 (3) ◽

pp. 458-468 ◽

Cited By ~ 5

Author(s):

Henry Lever

Keyword(s):

South Africa ◽

South African ◽

Categorical Data ◽

Economic Status ◽

Party Choice ◽

Political Choices ◽

Electoral Behaviour ◽

Log Linear ◽

Published Research

There is some controversy concerning the role of ethnicity in South African electoral behaviour. Since the society is segmented on ethnic lines it is to be expected that ethnicity would play a crucial role in affecting political choices. Some writers have gone so far as to suggest that ethnicity is the only significant factor affecting voting preferences. The controversy arose at a time when Goodman's method of log-linear analysis for hierarchical models had not yet been developed. This method provides the most powerful tool available for the multivariate analysis of categorical data. A re-analysis of previously published research using Goodman's method shows that ethnicity is not the only significant factor having a bearing on voting preferences. The first four-way table of voting preferences in South Africa is presented. The order of importance of the variables affecting party choice is: (1) ethnicity (2) socio-economic status (3) age of the voter. The recursive model suggested by the analysis explains approximately 98 per cent of the data.

Download Full-text