scholarly journals Sparse Multicategory Generalized Distance Weighted Discrimination in Ultra-High Dimensions

Entropy ◽  
2020 ◽  
Vol 22 (11) ◽  
pp. 1257
Author(s):  
Tong Su ◽  
Yafei Wang ◽  
Yi Liu ◽  
William G. Branton ◽  
Eugene Asahchop ◽  
...  

Distance weighted discrimination (DWD) is an appealing classification method that is capable of overcoming data piling problems in high-dimensional settings. Especially when various sparsity structures are assumed in these settings, variable selection in multicategory classification poses great challenges. In this paper, we propose a multicategory generalized DWD (MgDWD) method that maintains intrinsic variable group structures during selection using a sparse group lasso penalty. Theoretically, we derive minimizer uniqueness for the penalized MgDWD loss function and consistency properties for the proposed classifier. We further develop an efficient algorithm based on the proximal operator to solve the optimization problem. The performance of MgDWD is evaluated using finite sample simulations and miRNA data from an HIV study.

2002 ◽  
Vol 18 (5) ◽  
pp. 1019-1039 ◽  
Author(s):  
Tucker McElroy ◽  
Dimitris N. Politis

The problem of statistical inference for the mean of a time series with possibly heavy tails is considered. We first show that the self-normalized sample mean has a well-defined asymptotic distribution. Subsampling theory is then used to develop asymptotically correct confidence intervals for the mean without knowledge (or explicit estimation) either of the dependence characteristics, or of the tail index. Using a symmetrization technique, we also construct a distribution estimator that combines robustness and accuracy: it is higher-order accurate in the regular case, while remaining consistent in the heavy tailed case. Some finite-sample simulations confirm the practicality of the proposed methods.


2016 ◽  
Vol 5 (1) ◽  
pp. 1-18 ◽  
Author(s):  
Laura Balzer ◽  
Jennifer Ahern ◽  
Sandro Galea ◽  
Mark van der Laan

AbstractMany of the secondary outcomes in observational studies and randomized trials are rare. Methods for estimating causal effects and associations with rare outcomes, however, are limited, and this represents a missed opportunity for investigation. In this article, we construct a new targeted minimum loss-based estimator (TMLE) for the effect or association of an exposure on a rare outcome. We focus on the causal risk difference and statistical models incorporating bounds on the conditional mean of the outcome, given the exposure and measured confounders. By construction, the proposed estimator constrains the predicted outcomes to respect this model knowledge. Theoretically, this bounding provides stability and power to estimate the exposure effect. In finite sample simulations, the proposed estimator performed as well, if not better, than alternative estimators, including a propensity score matching estimator, inverse probability of treatment weighted (IPTW) estimator, augmented-IPTW and the standard TMLE algorithm. The new estimator yielded consistent estimates if either the conditional mean outcome or the propensity score was consistently estimated. As a substitution estimator, TMLE guaranteed the point estimates were within the parameter range. We applied the estimator to investigate the association between permissive neighborhood drunkenness norms and alcohol use disorder. Our results highlight the potential for double robust, semiparametric efficient estimation with rare events and high dimensional covariates.


2020 ◽  
Author(s):  
Liang Chen ◽  
Yulong Huo

Summary This paper considers panel data models where the idiosyncratic errors are subject to conditonal quantile restrictions. We propose a two-step estimator based on smoothed quantile regressions that is easy to implement. The asymptotic distribution of the estimator is established, and the analytical expression of its asymptotic bias is derived. Building on these results, we show how to make asymptotically valid inference on the basis of both analytical and split-panel jackknife bias corrections. Finite-sample simulations are used to support our theoretical analysis and to illustrate the importance of bias correction in quantile regressions for panel data. Finally, in an empirical application, the proposed method is used to study the growth effects of foreign direct investment.


2018 ◽  
Vol 35 (1) ◽  
pp. 142-166 ◽  
Author(s):  
Yeonwoo Rho ◽  
Xiaofeng Shao

In unit root testing, a piecewise locally stationary process is adopted to accommodate nonstationary errors that can have both smooth and abrupt changes in second- or higher-order properties. Under this framework, the limiting null distributions of the conventional unit root test statistics are derived and shown to contain a number of unknown parameters. To circumvent the difficulty of direct consistent estimation, we propose to use the dependent wild bootstrap to approximate the nonpivotal limiting null distributions and provide a rigorous theoretical justification for bootstrap consistency. The proposed method is compared through finite sample simulations with the recolored wild bootstrap procedure, which was developed for errors that follow a heteroscedastic linear process. Furthermore, a combination of autoregressive sieve recoloring with the dependent wild bootstrap is shown to perform well. The validity of the dependent wild bootstrap in a nonstationary setting is demonstrated for the first time, showing the possibility of extensions to other inference problems associated with locally stationary processes.


2019 ◽  
Vol 36 (3) ◽  
pp. 410-456 ◽  
Author(s):  
Wenxin Huang ◽  
Sainan Jin ◽  
Liangjun Su

We consider a panel cointegration model with latent group structures that allows for heterogeneous long-run relationships across groups. We extend Su, Shi, and Phillips (2016, Econometrica 84(6), 2215–2264) classifier-Lasso (C-Lasso) method to the nonstationary panels and allow for the presence of endogeneity in both the stationary and nonstationary regressors in the model. In addition, we allow the dimension of the stationary regressors to diverge with the sample size. We show that we can identify the individuals’ group membership and estimate the group-specific long-run cointegrated relationships simultaneously. We demonstrate the desirable property of uniform classification consistency and the oracle properties of both the C-Lasso estimators and their post-Lasso versions. The special case of dynamic penalized least squares is also studied. Simulations show superb finite sample performance in both classification and estimation. In an empirical application, we study the potential heterogeneous behavior in testing the validity of long-run purchasing power parity (PPP) hypothesis in the post–Bretton Woods period from 1975–2014 covering 99 countries. We identify two groups in the period 1975–1998 and three groups in the period 1999–2014. The results confirm that at least some countries favor the long-run PPP hypothesis in the post–Bretton Woods period.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Lingju Chen ◽  
Shaoxin Hong ◽  
Bo Tang

We study the identification and estimation of graphical models with nonignorable nonresponse. An observable variable correlated to nonresponse is added to identify the mean of response for the unidentifiable model. An approach to estimating the marginal mean of response is proposed, based on simulation imputation methods which are introduced for a variety of models including linear, generalized linear, and monotone nonlinear models. The proposed mean estimators are N -consistent, where N is the sample size. Finite sample simulations confirm the effectiveness of the proposed method. Sensitivity analysis for the untestable assumption on our augmented model is also conducted. A real data example is employed to illustrate the use of the proposed methodology.


Author(s):  
Samar Bashath ◽  
Amelia Ritahani Ismail

<p>High dimensional optimization considers being one of the most challenges that face the algorithms for finding an optimal solution for real-world problems. These problems have been appeared in diverse practical fields including business and industries. Within a huge number of algorithms, selecting one algorithm among others for solving the high dimensional optimization problem is not an easily accomplished task. This paper presents a comprehensive study of two swarm intelligence based algorithms: 1-particle swarm optimization (PSO), 2-cuckoo search (CS).The two algorithms are analyzed and compared for problems consisting of high dimensions in respect of solution accuracy, and runtime performance by various classes of benchmark functions.</p><p> </p>


2002 ◽  
Vol 18 (6) ◽  
pp. 1350-1366 ◽  
Author(s):  
Nicholas M. Kiefer ◽  
Timothy J. Vogelsang

Asymptotic theory for heteroskedasticity autocorrelation consistent (HAC) covariance matrix estimators requires the truncation lag, or bandwidth, to increase more slowly than the sample size. This paper considers an alternative approach covering the case with the asymptotic covariance matrix estimated by kernel methods with truncation lag equal to sample size. Although such estimators are inconsistent, valid tests (asymptotically pivotal) for regression parameters can be constructed. The limiting distributions explicitly capture the truncation lag and choice of kernel. A local asymptotic power analysis shows that the Bartlett kernel delivers the highest power within a group of popular kernels. Finite sample simulations suggest that, regardless of the kernel chosen, the null asymptotic approximation of the new tests is often more accurate than that for conventional HAC estimators and asymptotics. Finite sample results on power show that the new approach is competitive.


Author(s):  
Pranab K. Sen ◽  
Julio M. Singer ◽  
Antonio C. Pedroso de Lima

Methodology ◽  
2012 ◽  
Vol 8 (1) ◽  
pp. 23-38 ◽  
Author(s):  
Manuel C. Voelkle ◽  
Patrick E. McKnight

The use of latent curve models (LCMs) has increased almost exponentially during the last decade. Oftentimes, researchers regard LCM as a “new” method to analyze change with little attention paid to the fact that the technique was originally introduced as an “alternative to standard repeated measures ANOVA and first-order auto-regressive methods” (Meredith & Tisak, 1990, p. 107). In the first part of the paper, this close relationship is reviewed, and it is demonstrated how “traditional” methods, such as the repeated measures ANOVA, and MANOVA, can be formulated as LCMs. Given that latent curve modeling is essentially a large-sample technique, compared to “traditional” finite-sample approaches, the second part of the paper addresses the question to what degree the more flexible LCMs can actually replace some of the older tests by means of a Monte-Carlo simulation. In addition, a structural equation modeling alternative to Mauchly’s (1940) test of sphericity is explored. Although “traditional” methods may be expressed as special cases of more general LCMs, we found the equivalence holds only asymptotically. For practical purposes, however, no approach always outperformed the other alternatives in terms of power and type I error, so the best method to be used depends on the situation. We provide detailed recommendations of when to use which method.


Sign in / Sign up

Export Citation Format

Share Document