Modelling Migration with Poisson Regression

Author(s):  
Robin Flowerdew

Most statistical analysis is based on the assumption that error is normally distributed, but many data sets are based on discrete data (the number of migrants from one place to another must be a whole number). Recent developments in statistics have often involved generalising methods so that they can be properly applied to non-normal data. For example, Nelder and Wedderburn (1972) developed the theory of generalised linear modelling, where the dependent or response variable can take a variety of different probability distributions linked in one of several possible ways to a linear predictor, based on a combination of independent or explanatory variables. Several common statistical techniques are special cases of the generalised linear models, including the usual form of regression analysis, Ordinary Least Squares regression, and binomial logit modelling. Another important special case is Poisson regression, which has a Poisson-distributed dependent variable, linked logarithmically to a linear combination of independent variables. Poisson regression may be an appropriate method when the dependent variable is constrained to be a non-negative integer, usually a count of the number of events in certain categories. It assumes that each event is independent of the others, though the probability of an event may be linked to available explanatory variables. This chapter illustrates how Poisson regression can be carried out using the Stata package, proceeding to discuss various problems and issues which may arise in the use of the method. The number of migrants from area i to area j must be a non-negative integer and is likely to vary according to zone population, distance and economic variables. The availability of high-quality migration data through the WICID facility permits detailed analysis at levels from the region to the output areas. A vast range of possible explanatory variables can also be derived from the 2001 Census data. Model results are discussed in terms of the significant explanatory variables, the overall goodness of fit and the big residuals. Comparisons are drawn with other analytic techniques such as OLS regression. The relationship to Wilson’s entropy maximising methods is described, and variants on the method are explained. These include negative binomial regression and zero-censored and zero-truncated models.

2021 ◽  
Vol 5 (1) ◽  
pp. 1-13
Author(s):  
Yopi Ariesia Ulfa ◽  
Agus M Soleh ◽  
Bagus Sartono

Based on data from the Directorate General of Disease Prevention and Control of the Ministry of Health of the Republic of Indonesia, in 2017, new leprosy cases that emerged on Java Island were the highest in Indonesia compared to the number of events on other islands. The purpose of this study is to compare Poisson regression to a negative binomial regression model to be applied to the data on the number of new cases of leprosy and to find out what explanatory variables have a significant effect on the number of new cases of leprosy in Java. This study's results indicate that a negative binomial regression model can overcome the Poisson regression model's overdispersion. Variables that significantly affect the number of new cases of leprosy based on the results of negative binomial regression modeling are total population, percentage of children under five years who had immunized with BCG, and percentage of the population with sustainable access to clean water.


Author(s):  
Hussein Ahmad Abdulsalam ◽  
Sule Omeiza Bashiru ◽  
Alhaji Modu Isa ◽  
Yunusa Adavi Ojirobe

Gompertz Rayleigh (GomR) distribution was introduced in an earlier study with few statistical properties derived and parameters estimated using only the most common traditional method, Maximum Likelihood Estimation (MLE). This paper aimed at deriving more statistical properties of the GomR distribution, estimating the three unknown parameters via a competitive method, Maximum Product of Spacing (MPS) and evaluating goodness of fit using rainfall data sets from Nigeria, Malaysia and Argentina. Properties of statistical distributions including distribution of smallest and largest order statistics, cumulative or integrated hazard function, odds function, rth non-central moments, moment generating function, mean, variance and entropy measures for GomR distribution were explicitly derived. The fitted data sets reveal the flexibility of GomR distribution over other distributions been compared with. Simulation study was used to evaluate the consistency, accuracy and unbiasedness of the GomR distribution parameter estimates obtained from the method of MPS. The study found that GomR distribution could not provide a better fit for Argentine rainfall data but it was the best distribution for the rainfall data sets from Nigeria and Malaysia in comparison with the distributions; Generalized Weibull Rayleigh (GWR), Exponentiated Weibull Rayleigh (EWR), Type (II) Topp Leone Generalized Inverse Rayleigh (TIITLGIR), Kumarawamy Exponential Inverse Raylrigh (KEIR), Negative Binomial Marshall-Olkin Rayleigh (NBMOR) and Exponentiated Weibull (EW). Furthermore, the estimates from MPSE were consistent as the sample size increases but not as efficient as those from MLE.


Parasitology ◽  
1998 ◽  
Vol 117 (6) ◽  
pp. 597-610 ◽  
Author(s):  
D. J. SHAW ◽  
B. T. GRENFELL ◽  
A. P. DOBSON

Frequency distributions from 49 published wildlife host–macroparasite systems were analysed by maximum likelihood for goodness of fit to the negative binomial distribution. In 45 of the 49 (90%) data-sets, the negative binomial distribution provided a statistically satisfactory fit. In the other 4 data-sets the negative binomial distribution still provided a better fit than the Poisson distribution, and only 1 of the data-sets fitted the Poisson distribution. The degree of aggregation was large, with 43 of the 49 data-sets having an estimated k of less than 1. From these 49 data-sets, 22 subsets of host data were available (i.e. host data could be divided by either host sex, age, where or when hosts were sampled). In 11 of these 22 subsets there was significant variation in the degree of aggregation between host subsets of the same host–parasite system. A common k estimate was always larger than that obtained with all the host data considered together. These results indicate that lumping host data can hide important variations in aggregation between hosts and can exaggerate the true degree of aggregation. Wherever possible common k estimates should be used to estimate the degree of aggregation. In addition, significant differences in the degree of aggregation between subgroups of host data, were generally associated with significant differences in both mean parasite burdens and the prevalence of infection.


2018 ◽  
Vol 52 (4) ◽  
pp. 339-345 ◽  
Author(s):  
Alex Man Him Chau ◽  
Edward Chin Man Lo ◽  
May Chun Mei Wong ◽  
Chun Hung Chu

Oral epidemiology involves studying and investigating the distribution and determinants of dental-related diseases in a specified population group to inform decisions in the management of health problems. In oral epidemiology studies, the hypothesis is typically followed by a cogent study design and data collection. Appropriate statistical analysis is essential to demonstrate the scientific association between the independent factors and the target variable. Analysis also helps to develop and build a statistical model. Poisson regression and its extensions have gained more attention in caries epidemiology than other working models such as logistic regression. This review discusses the fundamental principles and basic knowledge of Poisson regression models. It also introduces the use of a robust variance estimator with a focus on the “robust” interpretation of the model. In addition, extensions of regression models, including the zero-inflated model, hurdle model, and negative binomial model, and their interpretation in caries studies are reviewed. Principles of model fitting, including goodness-of-fit measures, are also discussed. Clinicians and researchers should pay attention to the statistical context of the models used and interpret the models to improve the oral and general health of the communities in which they live.


2021 ◽  
Vol 2123 (1) ◽  
pp. 012028
Author(s):  
Dian Handayani ◽  
A F Artari ◽  
W Safitri ◽  
W Rahayu ◽  
V M Santi

Abstract Crime rate is the number of reported crimes divided by total population. Several factors could contribute the variability of crime rates among areas. This study aims to model the relationship between crime rates among regencies and cities in the East Java Province (Indonesia) and some potentially explanatory variables based on Statistics Indonesia publication in 2020. The crime rate in the East Java Province was consistently at the top three after DKI Jakarta and North Sumatra during 2017 to 2019. Therefore, it is interesting for us to study further about the crime rate in the East Java. Our preliminary analysis indicates that there is an overdispersion in our sample data. To overcome the overdispersion, we fit Generalized Poisson and Negative Binomial regression. The ratio of deviance and degree of freedom based on Negative Binomial is slightly smaller (1.38) than Generalized Poisson (1.99). The results indicate that Negative Binomial and Generalized Poisson regression, compared to standard Poisson regression, are relatively fit to model our crime rate data. Some factors which contribute significantly (α=0.05) for the crime rate in the East Java Province under Negative Binomial as well as Generalized Poisson regression are percentage of poor people, number of households, unemployment rate, and percentage of expenditure.


2019 ◽  
Vol 3 (2) ◽  
pp. 184-201
Author(s):  
Kusni Rohani Rumahorbo ◽  
Budi Susetyo ◽  
Kusman Sadik

Health is a very important thing for humanity. One way to look at a person's health condition is through the number of unhealthy days which can also shows the productivity of the community in a region. Modeling the number of unhealthy days which are examples of count data can be done using Poisson regression. Problems that are often faced in data counts are overdispersion and excess zero. Poisson regression cannot be applied to data that experiences both of these. Zero Inflated Negative Binomial and Hurdle Negative Binomial modeling was performed on data with 2 conditions, uncensored and censored. The explanatory variables used are gender, age, marital status, education level, home ownership status and rural-urban status. According to the results of the AIC and RMSE calculation, Zero Inflated Negative Binomial on censored data showed the best performance for estimating the number of unhealthy days.


NeoBiota ◽  
2018 ◽  
Vol 38 ◽  
pp. 77-96 ◽  
Author(s):  
César Capinha ◽  
Franz Essl ◽  
Hanno Seebens ◽  
Henrique Miguel Pereira ◽  
Ingolf Kühn

Robust predictions of alien species richness are useful to assess global biodiversity change. Nevertheless, the capacity to predict spatial patterns of alien species richness remains largely unassessed. Using 22 data sets of alien species richness from diverse taxonomic groups and covering various parts of the world, we evaluated whether different statistical models were able to provide useful predictions of absolute and relative alien species richness, as a function of explanatory variables representing geographical, environmental and socio-economic factors. Five state-of-the-art count data modelling techniques were used and compared: Poisson and negative binomial generalised linear models (GLMs), multivariate adaptive regression splines (MARS), random forests (RF) and boosted regression trees (BRT). We found that predictions of absolute alien species richness had a low to moderate accuracy in the region where the models were developed and a consistently poor accuracy in new regions. Predictions of relative richness performed in a superior manner in both geographical settings, but still were not good. Flexible tree ensembles-type techniques (RF and BRT) were shown to be significantly better in modelling alien species richness than parametric linear models (such as GLM), despite the latter being more commonly applied for this purpose. Importantly, the poor spatial transferability of models also warrants caution in assuming the generality of the relationships they identify, e.g. by applying projections under future scenario conditions. Ultimately, our results strongly suggest that predictability of spatial variation in richness of alien species richness is limited. The somewhat more robust ability to rank regions according to the number of aliens they have (i.e. relative richness), suggests that models of aliens species richness may be useful for prioritising and comparing regions, but not for predicting exact species numbers.


2017 ◽  
Vol 25 (3) ◽  
pp. 369-381 ◽  
Author(s):  
Sabri Ciftci ◽  
Tevfik Murat Yildirim

Why do representatives prioritize certain types of constituency service in parliamentary systems? This study argues that the choice for constituency-oriented activities is conditioned by both partisan factors and legislative role orientations. Two novel data sets combining behavioral and attitudinal measures of constituency-oriented behavior are used for empirical tests: an elite survey including detailed interviews with 204 members of the Turkish parliament and 4000 parliamentary questions tabled by these members. The results from a series of ordered logit, ordinary least squares (OLS), and negative binomial regression estimations confirm that members of parliament choose different types of constituency-oriented activities based on their visibility to the party leadership and their constituency. This choice is primarily driven by partisanship and members of parliament’s perceptions about the influence of party leader in renomination. The analysis provides important insights about the role of partisan factors as drivers of parliamentary behavior.


2018 ◽  
Vol 7 (6) ◽  
pp. 1
Author(s):  
Bayo H. Lawal

In this paper, we present regression models (GLM) for the class of Conway-Maxwell-Poisson (Com-Poisson) distributions. This class of models include the Com-Poisson, the Com-Poisson negative binomial, the Generalized Com-Poisson and the Extended Com-Poisson distributions, all of which have been presented in various literatures within the last five years. While these distributions have been applied most especially to frequency count data exhibiting over or under dispersion, not much has been presented in the application of this class of models to data having several covariates (the exception being the Com-Poisson itself). Thus in this paper, we present the generalized linear model formulation for these distributions and compare our results with the baseline Com-Poisson and Poisson models. Two data sets are employed in this application. We further extended our discussion to the zero-inflated versions of these distributions and applying same to a well established data with having 64\% zero observations. All the models are fitted using SAS PROC NLMIXED. In all cases, empirical means and variances are generated which leads to our ability to compute the Wald's goodness-of-fit test statistic for all the models employed in this paper.


2014 ◽  
Vol 51 (1) ◽  
pp. 41-49
Author(s):  
A. Mishra

A new generalization of the logarithmic series distribution has been obtained as a limiting case of the zero-truncated Mishra’s [10] generalized negative binomial distribution (GNBD). This distribution has an advantage over the Mishra’s [9] quasi logarithmic series distribution (QLSD) as its moments appear in compact forms unlike the QLSD. This makes the estimation of parameters easier by the method of moments. The first four moments of this distribution have been obtained and the distribution has been fitted to some well known data-sets to test its goodness of fit.


Sign in / Sign up

Export Citation Format

Share Document