scholarly journals A Poisson XLindley Distribution with Applications

Author(s):  
Fatma Zohra Seghier ◽  
Halim Zeghdoudi

Abstract In this paper, a Poisson XLindley distribution (PXLD) has been obtained by compounding Poisson (PD) distribution with a continuous distribution. A general expression for its rth factorial moment about origin has been derived and hence its raw moments and central moments are obtained. The expressions for its coefficient of variation, skewness, kurtosis and index of dispersion have also been given. In particular, the method of maximum likelihood and the method of moments for the estimation of its parameters have been discussed. Finally, two real-life data sets are analyzed to investigate the suitability of the proposed distribution in modeling a real data set on Nipah virus infection, number of Hemocytometer yeast cell count data and epileptic seizure counts data.

2013 ◽  
Vol 3 (4) ◽  
pp. 1-14 ◽  
Author(s):  
S. Sampath ◽  
B. Ramya

Cluster analysis is a branch of data mining, which plays a vital role in bringing out hidden information in databases. Clustering algorithms help medical researchers in identifying the presence of natural subgroups in a data set. Different types of clustering algorithms are available in the literature. The most popular among them is k-means clustering. Even though k-means clustering is a popular clustering method widely used, its application requires the knowledge of the number of clusters present in the given data set. Several solutions are available in literature to overcome this limitation. The k-means clustering method creates a disjoint and exhaustive partition of the data set. However, in some situations one can come across objects that belong to more than one cluster. In this paper, a clustering algorithm capable of producing rough clusters automatically without requiring the user to give as input the number of clusters to be produced. The efficiency of the algorithm in detecting the number of clusters present in the data set has been studied with the help of some real life data sets. Further, a nonparametric statistical analysis on the results of the experimental study has been carried out in order to analyze the efficiency of the proposed algorithm in automatic detection of the number of clusters in the data set with the help of rough version of Davies-Bouldin index.


2017 ◽  
Vol 51 (1) ◽  
pp. 41-60
Author(s):  
C. SATHEESH KUMAR ◽  
S. H. S. DHARMAJA

In this paper, we consider a class of bathtub-shaped hazard function distribution through modifying the Kies distribution and investigate some of its important properties by deriving expressions for its percentile function, raw moments, stress-strength reliability measure etc. The parameters of the distribution are estimated by the method of maximum likelihood and discussed some of its reliability applications with the help of certain real life data sets. In addition, the asymptotic behavior of the maximum likelihood estimators of the parameters of the distribution is examined by using simulated data sets.


Author(s):  
Wahid A. M. Shehata ◽  
Haitham Yousof ◽  
Mohamed Aboraya

This paper presents a novel two-parameter G family of distributions. Relevant statistical properties such as the ordinary moments, incomplete moments and moment generating function are derived.  Using common copulas, some new bivariate type G families are derived. Special attention is devoted to the standard exponential base line model. The density of the new exponential extension can be “asymmetric and right skewed shape” with no peak, “asymmetric right skewed shape” with one peak, “symmetric shape” and “asymmetric left skewed shape” with one peak. The hazard rate of the new exponential distribution can be “increasing”, “U-shape”, “decreasing” and “J-shape”. The usefulness and flexibility of the new family is illustrated by means of two applications to real data sets. The new family is compared with many common G families in modeling relief times and survival times data sets.


2018 ◽  
Author(s):  
Diem-Trang T. Tran ◽  
Aditya Bhaskara ◽  
Matthew Might ◽  
Balagurunathan Kuberan

AbstractThe use of RNA-sequencing has garnered much attention in the recent years for characterizing and understanding various biological systems. However, it remains a major challenge to gain insights from a large number of RNA-seq experiments collectively, due to the normalization problem. Current normalization methods are based on assumptions that fail to hold when RNA-seq profiles become more abundant and heterogeneous. We present a normalization procedure that does not rely on these assumptions, or on prior knowledge about the reference transcripts in those conditions. This algorithm is based on a graph constructed from intrinsic correlations among RNA-seq transcripts and seeks to identify a set of densely connected vertices as references. Application of this algorithm on our benchmark data showed that it can recover the reference transcripts with high precision, thus resulting in high-quality normalization. As demonstrated on a real data set, this algorithm gives good results and is efficient enough to be applicable to real-life data.2012 ACM Subject ClassificationApplied computing → Computational transcriptomics, Applied computing → BioinformaticsDigital Object Identifier10.4230/LIPIcs.WABI.2018.xxxFundingThis material was based on research supported by the National Heart, Lung, and Blood Institute (NHLBI)–NIH sponsored Programs of Excellence in Glycosciences [grant number HL107152 to B.K.], and partially by NSF [CAREER grant 1350344 to M.M.]. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon.


Author(s):  
Brijesh P. Singh ◽  
Sandeep Singh ◽  
Utpal Dhar Das

Migration is a term that encompasses a permanent or temporary change in residence between some specific defined geographical or political areas. In recent years, it has not only contributed a lot to the change in size and composition of the population, but also it leaves a significant impact on the socio-economic characteristics of the origin and destination population. In the present paper an attempt has been made to examine the distribution of the number of rural out migrants from household through composite probability models based on certain assumptions. Poisson distribution compounded with exponential distribution and its composite and in ated form has been examined for some real data set of rural out migration. The parameters of the proposed models have been estimated by method of moments. The distributions are quite satisfactory to explain the phenomenon of rural out migration. Also the distribution of average number of adult migrants has been examined for all the data sets.


Author(s):  
Umar Kabir ◽  
Terna Godfrey IEREN

This article proposed a new distribution referred to as the transmuted Exponential Lomax distribution as an extension of the popular Lomax distribution in the form of Exponential Lomax by using the Quadratic rank transmutation map proposed and studied in earlier research. Using the transmutation map, we defined the probability density function (PDF) and cumulative distribution function (CDF) of the transmuted Exponential Lomax distribution. Some properties of the new distribution were extensively studied after derivation. The estimation of the distribution’s parameters was also done using the method of maximum likelihood estimation. The performance of the proposed probability distribution was checked in comparison with some other generalizations of Lomax distribution using three real-life data sets. The results obtained indicated that TELD performs better than the other distributions comprising power Lomax, Exponential-Lomax, and the Lomax distributions.


Author(s):  
Muhammad H. Tahir ◽  
Muhammad Adnan Hussain ◽  
Gauss Cordeiro ◽  
Mahmoud El-Morshedy ◽  
Mohammed S. Eliwa

For bounded unit interval, we propose a new Kumaraswamy generalized (G) family of distributions from a new generator which could be an alternate to the Kumaraswamy-G family proposed earlier by Cordeiro and de-Castro in 2011. This new generator can also be used to develop alternate G-classes such as beta-G, McDonald-G, Topp-Leone-G, Marshall-Olkin-G and Transmuted-G for bounded unit interval. Some mathematical properties of this new family are obtained and maximum likelihood method is used for estimating the family parameters. We investigate the properties of one special model called a new Kumaraswamy-Weibull (NKwW) distribution. Parameter estimation is dealt and maximum likelihood estimators are assessed through simulation study. Two real life data sets are analyzed to illustrate the importance and flexibility of this distribution. In fact, this model outperforms some generalized Weibull models such as the Kumaraswamy-Weibull, McDonald-Weibull, beta-Weibull, exponentiated-generalized Weibull, gamma-Weibull, odd log-logistic-Weibull, Marshall-Olkin-Weibull, transmuted-Weibull, exponentiated-Weibull and Weibull distributions when applied to these data sets. The bivariate extension of the family is proposed and the estimation of parameters is given. The usefulness of the bivariate NKwW model is illustrated empirically by means of a real-life data set.


Interval data mining is used to extract unknown patterns, hidden rules, associations etc. associated in interval based data. The extraction of closed interval is important because by mining the set of closed intervals and their support counts, the support counts of any interval can be computed easily. In this work an incremental algorithm for computing closed intervals together with their support counts from interval dataset is proposed. Many methods for mining closed intervals are available. Most of these methods assume a static data set as input and hence the algorithms are non-incremental. Real life data sets are however dynamic by nature. An efficient incremental algorithm called CI-Tree has been already proposed for computing closed intervals present in dynamic interval data. However this method could not compute the support values of the closed intervals. The proposed algorithm called SCI-Tree extracts all closed intervals together with their support values incrementally from the given interval data. Also, all the frequent closed intervals can be computed for any user defined minimum support with a single scan of SCI-Tree without revisiting the dataset. The proposed method has been tested with real life and synthetic datasets and results have been reported.


2017 ◽  
Vol 26 (1) ◽  
pp. 153-168 ◽  
Author(s):  
Vijay Kumar ◽  
Jitender Kumar Chhabra ◽  
Dinesh Kumar

AbstractThe main problem of classical clustering technique is that it is easily trapped in the local optima. An attempt has been made to solve this problem by proposing the grey wolf algorithm (GWA)-based clustering technique, called GWA clustering (GWAC), through this paper. The search capability of GWA is used to search the optimal cluster centers in the given feature space. The agent representation is used to encode the centers of clusters. The proposed GWAC technique is tested on both artificial and real-life data sets and compared to six well-known metaheuristic-based clustering techniques. The computational results are encouraging and demonstrate that GWAC provides better values in terms of precision, recall, G-measure, and intracluster distances. GWAC is further applied for gene expression data set and its performance is compared to other techniques. Experimental results reveal the efficiency of the GWAC over other techniques.


2018 ◽  
Vol 12 (3) ◽  
pp. 100-122
Author(s):  
Benjamin Stark ◽  
Heiko Gewald ◽  
Heinrich Lautenbacher ◽  
Ulrich Haase ◽  
Siegmar Ruff

This article describes how the information about an individual's personal health is among ones most sensitive and important intangible belongings. When health information is misused, serious non-revertible damage can be caused, e.g. through making intimidating details public or leaking it to employers, insurances etc. Therefore, health information needs to be treated with the highest degree of confidentiality. In practice it proves difficult to achieve this goal. In a hospital setting medical staff across departments often needs to access patient data without directly obvious reasons, which makes it difficult to distinguish legitimate from illegitimate access. This article provides a mechanism to classify transactions at a large university medical center into plausible and questionable data access using a real-life data set of more than 60,000 transactions. The classification mechanism works with minimal data requirements and unsupervised data sets. The results were evaluated through manual cross-checks internally and by a group of external experts. Consequently, the hospital's data protection officer is now able to focus on analyzing questionable transactions instead of checking random samples.


Sign in / Sign up

Export Citation Format

Share Document