Simultaneous Edit and Imputation For Household Data with Structural Zeros

Abstract Multivariate categorical data nested within households often include reported values that fail edit constraints—for example, a participating household reports a child’s age as older than his biological parent’s age—and have missing values. Generally, agencies prefer datasets to be free from erroneous or missing values before analyzing them or disseminating them to secondary data users. We present a model-based engine for editing and imputation of household data based on a Bayesian hierarchical model that includes (i) a nested data Dirichlet process mixture of products of multinomial distributions as the model for the true latent values of the data, truncated to allow only households that satisfy all edit constraints, (ii) a model for the location of errors, and (iii) a reporting model for the observed responses in error. The approach propagates uncertainty due to unknown locations of errors and missing values, generates plausible datasets that satisfy all edit constraints, and can preserve multivariate relationships within and across individuals in the same household. We illustrate the approach using data from the 2012 American Community Survey.

Download Full-text

Bayesian Hierarchical Growth Model for Experimental Data on the Effectiveness of an Incentive-Based Weight Reduction Method

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i3.840 ◽

2021 ◽

Vol 12 (3) ◽

pp. 1036-1047

Author(s):

Md Azman Shahadan Et.al

Keyword(s):

Growth Model ◽

Weight Reduction ◽

Dirichlet Process ◽

Reduction Method ◽

Growth Models ◽

Dirichlet Process Mixture ◽

Bayesian Hierarchical ◽

Mixture Prior ◽

Hierarchical Growth Model ◽

Hierarchical Growth Models

The objective of this current research is to model the experimental data on the effectiveness of an incentive-based weight reduction method by using Bayesian hierarchical growth models. Three Bayesian hierarchical growth models are proposed, namely parametric Bayesian hierarchical growth model with correlated intercept and slope random effects model, parametric Bayesian hierarchical growth model with no correlated intercept and slope random effects model and semi-parametric Bayesian hierarchical growth model with Dirichlet process mixture prior model. The data is obtained from forty eight (48) students who had participated in an experiment on weight reduction method. The students were divided equally into two groups: single and pair groups. The experiment was carried out over the period of three months with a weight reading session for every two weeks. At the end of the study, we had six repeated measures of each student’s weight in kg and some measures of covariates and factors. Our results showed that the best model for the above data based on the Bayesian fit indexes and the models’ flexibility is the semi-parametric Bayesian hierarchical growth model with Dirichlet process mixture prior model. The results of the semi-parametric model showed that the ‘growth’ or reduction rates of the weight reduction experiment relate to the students’ gender, height in cm, experimental group (single or pair) and time in term of weeks.

Download Full-text

Bayesian non-parametric clustering of single-cell mutation profiles

10.1101/2020.01.15.907345 ◽

2020 ◽

Cited By ~ 1

Author(s):

Nico Borgsmüller ◽

Jose Bonet ◽

Francesco Marass ◽

Abel Gonzalez-Perez ◽

Nuria Lopez-Bigas ◽

...

Keyword(s):

Single Cell ◽

Dirichlet Process ◽

Tumor Heterogeneity ◽

Missing Values ◽

Parametric Method ◽

Simulated Data ◽

Error Rates ◽

Data Sets ◽

Dirichlet Process Mixture ◽

Non Parametric

AbstractThe high resolution of single-cell DNA sequencing (scDNA-seq) offers great potential to resolve intra-tumor heterogeneity by distinguishing clonal populations based on their mutation profiles. However, the increasing size of scDNA-seq data sets and technical limitations, such as high error rates and a large proportion of missing values, complicate this task and limit the applicability of existing methods. Here we introduce BnpC, a novel non-parametric method to cluster individual cells into clones and infer their genotypes based on their noisy mutation profiles. BnpC employs a Dirichlet process mixture model coupled with a Markov chain Monte Carlo sampling scheme, including a modified split-merge move and a novel posterior estimator to predict clones and genotypes. We benchmarked our method comprehensively against state-of-the-art methods on simulated data using various data sizes, and applied it to three cancer scDNA-seq data sets. On simulated data, BnpC compared favorably against current methods in terms of accuracy, runtime, and scalability. Its inferred genotypes were the most accurate, and it was the only method able to run and produce results on data sets with 10,000 cells. On tumor scDNA-seq data, BnpC was able to identify clonal populations missed by the original cluster analysis but supported by supplementary experimental data. With ever growing scDNA-seq data sets, scalable and accurate methods such as BnpC will become increasingly relevant, not only to resolve intra-tumor heterogeneity but also as a pre-processing step to reduce data size. BnpC is freely available under MIT license at https://github.com/cbg-ethz/BnpC.

Download Full-text

STRATEGI MAKSIMALISASI ANGGARAN BELANJA PADA BALAI PENELITIAN DAN PENGEMBANGAN LINGKUNGAN HIDUP DAN KEHUTANAN MANOKWARI

Jurnal Manajemen Pembangunan Daerah ◽

10.29244/jurnal_mpd.v8i2.24821 ◽

2016 ◽

Vol 8 (2) ◽

Author(s):

Arif Hasan ◽

Dedi Budiman Hakim ◽

Irdika Mansur

Keyword(s):

Decision Making ◽

Descriptive Analysis ◽

Secondary Data ◽

Primary Data ◽

Fiscal Year ◽

The Third ◽

Swot Matrix ◽

Three Stages ◽

Using Data

This study aims to analyze causes of the low uptake of the budget and formulate a strategy of maximizing the absorption of expenditure on Balai Penelitian dan Pengembangan Lingkungan Hidup dan Kehutanan Manokwari. Respondents involved are 20 people that consist of: treasury officials and holder output of activity. The data used were secondary data in the form of reports on budget realization (LRA) quarter I, II, III and IV of the fiscal year 2011 to 2015, and the primary data were in the form of interviews with the help of a questionnaire. While the analysis of the data used was descriptive analysis using data tabulation, and the analysis of the three stages strategy of the decision making used IFE and EFE matrix, SWOT matrix and QSPM matrix.The results showed that there are 19 factors causing low of budget absorption until the end of the third quarter, and there were 10 drafts of policy as a strategy for maximizing the absorption of the budget on Balai Penelitian dan Pengembangan Lingkungan Hidup dan Kehutanan Manokwari.ABSTRAKPenelitian ini bertujuan untuk menganalisis penyebab rendahnya penyerapan anggaran belanja dan merumuskan strategi maksimalisasi penyerapan anggaran belanja pada Balai Penelitian dan Pengembangan Lingkungan Hidup dan Kehutanan Manokwari. Responden yang terlibat adalah 20 orang yaitu pejabat perbendaharaan dan pemegang output kegiatan. Data yang digunakan adalah data sekunder berupa laporan realisasi anggaran (LRA) triwulan I, II, III dan IV tahun anggaran 2011 sampai 2015, dan data primer berupa wawancara dengan bantuan kuesioner. Sedangkan analisis data yang digunakan adalah analisis deskriptif menggunakan analisis tabulasi, dan analisis analisis strategi tiga tahap pengambilan keputusan menggunakan matriks IFE dan EFE, matriks SWOT dan matriks QSPM. Hasil penelitian menunjukkan bahwa terdapat 19 faktor penyebab rendahnya penyerapan anggaran belanja sampai akhir triwulan III, dan terdapat 10 rancangan kebijakan sebagai strategi maksimalisasi penyerapan anggaran belanja di Balai Penelitian dan Pengembangan Lingkungan Hidup dan Kehutanan Manokwari.

Download Full-text

Dynamic Dirichlet process mixture model for identifying voting coalitions in the United Nations General Assembly human rights roll call votes

Journal of Applied Statistics ◽

10.1080/02664763.2021.1931820 ◽

2021 ◽

pp. 1-20

Author(s):

Qiushi Yu

Keyword(s):

Human Rights ◽

United Nations ◽

Mixture Model ◽

Dirichlet Process ◽

General Assembly ◽

Roll Call ◽

Dirichlet Process Mixture ◽

Dirichlet Process Mixture Model ◽

United Nations General ◽

United Nations General Assembly

Download Full-text

Dirichlet process mixture models made scalable and effective by means of massive distribution

Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing - SAC '19 ◽

10.1145/3297280.3297327 ◽

2019 ◽

Cited By ~ 1

Author(s):

Khadidja Meguelati ◽

Benedicte Fontez ◽

Nadine Hilgert ◽

Florent Masseglia

Keyword(s):

Mixture Models ◽

Dirichlet Process ◽

Dirichlet Process Mixture ◽

Dirichlet Process Mixture Models ◽

Massive Distribution

Download Full-text

Association between E-cigarette use and chronic obstructive pulmonary disease in non-asthmatic adults in the USA

Journal of Public Health ◽

10.1093/pubmed/fdaa229 ◽

2020 ◽

Author(s):

Godfred O Antwi ◽

Darson L Rhodes

Keyword(s):

Sex Education ◽

Smoking Status ◽

Secondary Data ◽

Chronic Obstructive ◽

Behavioral Risk ◽

Cigarette Use ◽

Obstructive Pulmonary Disease ◽

The Usa ◽

Potential Link ◽

Using Data

Abstract Background Concern about the health impacts of e-cigarette use is growing; however, limited research exists regarding potential long-term health effects of this behavior. This study explored the relationship between e-cigarette use and COPD in a sample of US adults. Methods A secondary data analysis using data from the 2018 Behavioral Risk Factor Surveillance Survey in the USA was computed to examine associations between e-cigarette use and COPD controlling for conventional cigarette smoking status, past month leisure physical activity and demographic characteristics including age, sex, education, race, marital status and body mass index. Results Significant associations between e-cigarette use and COPD among former combustible cigarette smokers and those who reported never using combustible cigarettes were found. Compared with never e-cigarette users, the odds of having COPD were significantly greater for daily e-cigarette users (OR = 1.53; 95% CI: 1.11–2.03), occasional users (OR = 1.43, 95% CI: 1.13–1.80) and former users (OR = 1.46 95% CI: 1.28–1.67). Conclusions Findings from this study indicate a potential link between e-cigarette use and COPD. Further research to explore the potential effects of e-cigarette on COPD is recommended.

Download Full-text

Outlier detection in traffic data based on the Dirichlet process mixture model

IET Intelligent Transport Systems ◽

10.1049/iet-its.2014.0063 ◽

2015 ◽

Vol 9 (7) ◽

pp. 773-781 ◽

Cited By ~ 12

Author(s):

Henry Y.T. Ngan ◽

Anthony G.O. Yeh ◽

Nelson H.C. Yung

Keyword(s):

Outlier Detection ◽

Mixture Model ◽

Dirichlet Process ◽

Traffic Data ◽

Dirichlet Process Mixture ◽

Dirichlet Process Mixture Model

Download Full-text

Clustering disaggregated load profiles using a Dirichlet process mixture model

Energy Conversion and Management ◽

10.1016/j.enconman.2014.12.080 ◽

2015 ◽

Vol 92 ◽

pp. 507-516 ◽

Cited By ~ 27

Author(s):

Ramon Granell ◽

Colin J. Axon ◽

David C.H. Wallom

Keyword(s):

Mixture Model ◽

Dirichlet Process ◽

Dirichlet Process Mixture ◽

Dirichlet Process Mixture Model

Download Full-text

ADAB PEMBELAJARAN AL-QURAN: STUDI KITAB AT-TIBYAN FI ADABI HAMALATIL QURAN

Ar-Risalah: Media Keislaman, Pendidikan dan Hukum Islam ◽

10.29062/arrisalah.v18i2.392 ◽

2020 ◽

Vol 18 (2) ◽

pp. 219

Author(s):

Ismail Ismail ◽

Abdulloh Hamid

Keyword(s):

Data Collection ◽

Research Methods ◽

Secondary Data ◽

Study Data ◽

Library Research ◽

The Face ◽

Using Data

This research is an attempt to know the courtesy reading the Quran in the book of At-Tibyan fi Adabi Hamalatil Quran by Imam Nawawi. The question that is to be answered through this study is (1) how the courtesy to read Al-Quran in the book At-Tibyan, (2) How does the relevance of courtesy to read the Qur'an in the book of At-Tibyan in contemporary times? The research methods use library research. This study is conducted using data collection techniques by conducting observations on certain sources, seeking, studying books, articles, journals, theses or others related to this study. Data collection is divided into two sources, namely primary and secondary data. Then the data are analyzed using descriptive and contextual methods. The results show that courtesy reading the Quran in the book of At-Tibyan fi Adabi Hamalatil Quran includes: Solemn, sincere, ethical, clean and holy State, facing the Qibla, start with Ta'awudz. While the relevance of courtesy reading the Quran in the book of At-Tibyan fi Adabi Hamalatil Quran with the context of contemporary can be a solution in improving the manners of interacting with the Quran, especially in the face of today's characteristics or contemporary.

Download Full-text