Sparse Multicategory Generalized Distance Weighted Discrimination in Ultra-High Dimensions

Tong Su; Yafei Wang; Yi Liu; William G. Branton; Eugene Asahchop; Christopher Power; Bei Jiang; Linglong Kong; Niansheng Tang

doi:10.3390/e22111257

Sparse Multicategory Generalized Distance Weighted Discrimination in Ultra-High Dimensions

Entropy ◽

10.3390/e22111257 ◽

2020 ◽

Vol 22 (11) ◽

pp. 1257

Author(s):

Tong Su ◽

Yafei Wang ◽

Yi Liu ◽

William G. Branton ◽

Eugene Asahchop ◽

...

Keyword(s):

Optimization Problem ◽

Finite Sample ◽

High Dimensions ◽

Lasso Penalty ◽

Distance Weighted ◽

Finite Sample Simulations ◽

Group Structures ◽

Consistency Properties ◽

Intrinsic Variable ◽

Variable Group

Distance weighted discrimination (DWD) is an appealing classification method that is capable of overcoming data piling problems in high-dimensional settings. Especially when various sparsity structures are assumed in these settings, variable selection in multicategory classification poses great challenges. In this paper, we propose a multicategory generalized DWD (MgDWD) method that maintains intrinsic variable group structures during selection using a sparse group lasso penalty. Theoretically, we derive minimizer uniqueness for the penalized MgDWD loss function and consistency properties for the proposed classifier. We further develop an efficient algorithm based on the proximal operator to solve the optimization problem. The performance of MgDWD is evaluated using finite sample simulations and miRNA data from an HIV study.

Download Full-text

ROBUST INFERENCE FOR THE MEAN IN THE PRESENCE OF SERIAL CORRELATION AND HEAVY-TAILED DISTRIBUTIONS

Econometric Theory ◽

10.1017/s026646660218501x ◽

2002 ◽

Vol 18 (5) ◽

pp. 1019-1039 ◽

Cited By ~ 14

Author(s):

Tucker McElroy ◽

Dimitris N. Politis

Keyword(s):

Serial Correlation ◽

Heavy Tails ◽

The Self ◽

Finite Sample ◽

Regular Case ◽

Sample Mean ◽

Heavy Tailed Distributions ◽

The Mean ◽

Heavy Tailed ◽

Finite Sample Simulations

The problem of statistical inference for the mean of a time series with possibly heavy tails is considered. We first show that the self-normalized sample mean has a well-defined asymptotic distribution. Subsampling theory is then used to develop asymptotically correct confidence intervals for the mean without knowledge (or explicit estimation) either of the dependence characteristics, or of the tail index. Using a symmetrization technique, we also construct a distribution estimator that combines robustness and accuracy: it is higher-order accurate in the regular case, while remaining consistent in the heavy tailed case. Some finite-sample simulations confirm the practicality of the proposed methods.

Download Full-text

Estimating Effects with Rare Outcomes and High Dimensional Covariates: Knowledge is Power

Epidemiologic Methods ◽

10.1515/em-2014-0020 ◽

2016 ◽

Vol 5 (1) ◽

pp. 1-18 ◽

Cited By ~ 5

Author(s):

Laura Balzer ◽

Jennifer Ahern ◽

Sandro Galea ◽

Mark van der Laan

Keyword(s):

Propensity Score ◽

Risk Difference ◽

Efficient Estimation ◽

High Dimensional ◽

Finite Sample ◽

Exposure Effect ◽

Conditional Mean ◽

Double Robust ◽

Matching Estimator ◽

Finite Sample Simulations

AbstractMany of the secondary outcomes in observational studies and randomized trials are rare. Methods for estimating causal effects and associations with rare outcomes, however, are limited, and this represents a missed opportunity for investigation. In this article, we construct a new targeted minimum loss-based estimator (TMLE) for the effect or association of an exposure on a rare outcome. We focus on the causal risk difference and statistical models incorporating bounds on the conditional mean of the outcome, given the exposure and measured confounders. By construction, the proposed estimator constrains the predicted outcomes to respect this model knowledge. Theoretically, this bounding provides stability and power to estimate the exposure effect. In finite sample simulations, the proposed estimator performed as well, if not better, than alternative estimators, including a propensity score matching estimator, inverse probability of treatment weighted (IPTW) estimator, augmented-IPTW and the standard TMLE algorithm. The new estimator yielded consistent estimates if either the conditional mean outcome or the propensity score was consistently estimated. As a substitution estimator, TMLE guaranteed the point estimates were within the parameter range. We applied the estimator to investigate the association between permissive neighborhood drunkenness norms and alcohol use disorder. Our results highlight the potential for double robust, semiparametric efficient estimation with rare events and high dimensional covariates.

Download Full-text

A simple estimator for quantile panel data models using smoothed quantile regressions

Econometrics Journal ◽

10.1093/ectj/utaa023 ◽

2020 ◽

Author(s):

Liang Chen ◽

Yulong Huo

Keyword(s):

Panel Data ◽

Bias Correction ◽

Direct Investment ◽

Data Models ◽

Asymptotic Bias ◽

Panel Data Models ◽

Finite Sample ◽

Quantile Regressions ◽

Valid Inference ◽

Finite Sample Simulations

Summary This paper considers panel data models where the idiosyncratic errors are subject to conditonal quantile restrictions. We propose a two-step estimator based on smoothed quantile regressions that is easy to implement. The asymptotic distribution of the estimator is established, and the analytical expression of its asymptotic bias is derived. Building on these results, we show how to make asymptotically valid inference on the basis of both analytical and split-panel jackknife bias corrections. Finite-sample simulations are used to support our theoretical analysis and to illustrate the importance of bias correction in quantile regressions for panel data. Finally, in an empirical application, the proposed method is used to study the growth effects of foreign direct investment.

Download Full-text

BOOTSTRAP-ASSISTED UNIT ROOT TESTING WITH PIECEWISE LOCALLY STATIONARY ERRORS

Econometric Theory ◽

10.1017/s0266466618000038 ◽

2018 ◽

Vol 35 (1) ◽

pp. 142-166 ◽

Cited By ~ 1

Author(s):

Yeonwoo Rho ◽

Xiaofeng Shao

Keyword(s):

Unit Root ◽

Unit Root Test ◽

Linear Process ◽

Wild Bootstrap ◽

Finite Sample ◽

Unknown Parameters ◽

Null Distributions ◽

Inference Problems ◽

Unit Root Testing ◽

Finite Sample Simulations

In unit root testing, a piecewise locally stationary process is adopted to accommodate nonstationary errors that can have both smooth and abrupt changes in second- or higher-order properties. Under this framework, the limiting null distributions of the conventional unit root test statistics are derived and shown to contain a number of unknown parameters. To circumvent the difficulty of direct consistent estimation, we propose to use the dependent wild bootstrap to approximate the nonpivotal limiting null distributions and provide a rigorous theoretical justification for bootstrap consistency. The proposed method is compared through finite sample simulations with the recolored wild bootstrap procedure, which was developed for errors that follow a heteroscedastic linear process. Furthermore, a combination of autoregressive sieve recoloring with the dependent wild bootstrap is shown to perform well. The validity of the dependent wild bootstrap in a nonstationary setting is demonstrated for the first time, showing the possibility of extensions to other inference problems associated with locally stationary processes.

Download Full-text

IDENTIFYING LATENT GROUPED PATTERNS IN COINTEGRATED PANELS

Econometric Theory ◽

10.1017/s0266466619000197 ◽

2019 ◽

Vol 36 (3) ◽

pp. 410-456 ◽

Cited By ~ 1

Author(s):

Wenxin Huang ◽

Sainan Jin ◽

Liangjun Su

Keyword(s):

Purchasing Power Parity ◽

Panel Cointegration ◽

Finite Sample ◽

Power Parity ◽

Oracle Properties ◽

Bretton Woods ◽

Long Run ◽

Heterogeneous Behavior ◽

Latent Group ◽

Group Structures

We consider a panel cointegration model with latent group structures that allows for heterogeneous long-run relationships across groups. We extend Su, Shi, and Phillips (2016, Econometrica 84(6), 2215–2264) classifier-Lasso (C-Lasso) method to the nonstationary panels and allow for the presence of endogeneity in both the stationary and nonstationary regressors in the model. In addition, we allow the dimension of the stationary regressors to diverge with the sample size. We show that we can identify the individuals’ group membership and estimate the group-specific long-run cointegrated relationships simultaneously. We demonstrate the desirable property of uniform classification consistency and the oracle properties of both the C-Lasso estimators and their post-Lasso versions. The special case of dynamic penalized least squares is also studied. Simulations show superb finite sample performance in both classification and estimation. In an empirical application, we study the potential heterogeneous behavior in testing the validity of long-run purchasing power parity (PPP) hypothesis in the post–Bretton Woods period from 1975–2014 covering 99 countries. We identify two groups in the period 1975–1998 and three groups in the period 1999–2014. The results confirm that at least some countries favor the long-run PPP hypothesis in the post–Bretton Woods period.

Download Full-text

Identification and Estimation of Graphical Models with Nonignorable Nonresponse

Journal of Mathematics ◽

10.1155/2021/7570222 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Lingju Chen ◽

Shaoxin Hong ◽

Bo Tang

Keyword(s):

Sensitivity Analysis ◽

Graphical Models ◽

Nonlinear Models ◽

Real Data ◽

Finite Sample ◽

Imputation Methods ◽

Observable Variable ◽

The Mean ◽

Nonignorable Nonresponse ◽

Finite Sample Simulations

We study the identification and estimation of graphical models with nonignorable nonresponse. An observable variable correlated to nonresponse is added to identify the mean of response for the unidentifiable model. An approach to estimating the marginal mean of response is proposed, based on simulation imputation methods which are introduced for a variety of models including linear, generalized linear, and monotone nonlinear models. The proposed mean estimators are N -consistent, where N is the sample size. Finite sample simulations confirm the effectiveness of the proposed method. Sensitivity analysis for the untestable assumption on our augmented model is also conducted. A real data example is employed to illustrate the use of the proposed methodology.

Download Full-text

Comparison of Swarm Intelligence Algorithms for High Dimensional Optimization Problem

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v11.i1.pp300-307 ◽

2018 ◽

Vol 11 (1) ◽

pp. 300 ◽

Cited By ~ 1

Author(s):

Samar Bashath ◽

Amelia Ritahani Ismail

Keyword(s):

Swarm Intelligence ◽

Optimization Problem ◽

Optimal Solution ◽

Cuckoo Search ◽

High Dimensional ◽

High Dimensions ◽

Solution Accuracy ◽

Dimensional Optimization ◽

Real World Problems ◽

Comprehensive Study

<p>High dimensional optimization considers being one of the most challenges that face the algorithms for finding an optimal solution for real-world problems. These problems have been appeared in diverse practical fields including business and industries. Within a huge number of algorithms, selecting one algorithm among others for solving the high dimensional optimization problem is not an easily accomplished task. This paper presents a comprehensive study of two swarm intelligence based algorithms: 1-particle swarm optimization (PSO), 2-cuckoo search (CS).The two algorithms are analyzed and compared for problems consisting of high dimensions in respect of solution accuracy, and runtime performance by various classes of benchmark functions.</p><p> </p>

Download Full-text

HETEROSKEDASTICITY-AUTOCORRELATION ROBUST TESTING USING BANDWIDTH EQUAL TO SAMPLE SIZE

Econometric Theory ◽

10.1017/s026646660218604x ◽

2002 ◽

Vol 18 (6) ◽

pp. 1350-1366 ◽

Cited By ~ 90

Author(s):

Nicholas M. Kiefer ◽

Timothy J. Vogelsang

Keyword(s):

Sample Size ◽

Covariance Matrix ◽

Asymptotic Covariance Matrix ◽

Finite Sample ◽

New Approach ◽

Robust Testing ◽

Regression Parameters ◽

Alternative Approach ◽

Local Asymptotic Power ◽

Finite Sample Simulations

Asymptotic theory for heteroskedasticity autocorrelation consistent (HAC) covariance matrix estimators requires the truncation lag, or bandwidth, to increase more slowly than the sample size. This paper considers an alternative approach covering the case with the asymptotic covariance matrix estimated by kernel methods with truncation lag equal to sample size. Although such estimators are inconsistent, valid tests (asymptotically pivotal) for regression parameters can be constructed. The limiting distributions explicitly capture the truncation lag and choice of kernel. A local asymptotic power analysis shows that the Bartlett kernel delivers the highest power within a group of popular kernels. Finite sample simulations suggest that, regardless of the kernel chosen, the null asymptotic approximation of the new tests is often more accurate than that for conventional HAC estimators and asymptotics. Finite sample results on power show that the new approach is competitive.

Download Full-text

From Finite Sample to Asymptotic Methods in Statistics

10.1017/cbo9780511806957 ◽

2009 ◽

Cited By ~ 10

Author(s):

Pranab K. Sen ◽

Julio M. Singer ◽

Antonio C. Pedroso de Lima

Keyword(s):

Asymptotic Methods ◽

Finite Sample

Download Full-text

One Size Fits All?

Methodology ◽

10.1027/1614-2241/a000044 ◽

2012 ◽

Vol 8 (1) ◽

pp. 23-38 ◽

Cited By ~ 1

Author(s):

Manuel C. Voelkle ◽

Patrick E. McKnight

Keyword(s):

Repeated Measures ◽

Structural Equation ◽

Type I Error ◽

Equation Modeling ◽

Type I ◽

Finite Sample ◽

Traditional Methods ◽

Repeated Measures Anova ◽

Special Cases ◽

Latent Curve

The use of latent curve models (LCMs) has increased almost exponentially during the last decade. Oftentimes, researchers regard LCM as a “new” method to analyze change with little attention paid to the fact that the technique was originally introduced as an “alternative to standard repeated measures ANOVA and first-order auto-regressive methods” (Meredith & Tisak, 1990, p. 107). In the first part of the paper, this close relationship is reviewed, and it is demonstrated how “traditional” methods, such as the repeated measures ANOVA, and MANOVA, can be formulated as LCMs. Given that latent curve modeling is essentially a large-sample technique, compared to “traditional” finite-sample approaches, the second part of the paper addresses the question to what degree the more flexible LCMs can actually replace some of the older tests by means of a Monte-Carlo simulation. In addition, a structural equation modeling alternative to Mauchly’s (1940) test of sphericity is explored. Although “traditional” methods may be expressed as special cases of more general LCMs, we found the equivalence holds only asymptotically. For practical purposes, however, no approach always outperformed the other alternatives in terms of power and type I error, so the best method to be used depends on the situation. We provide detailed recommendations of when to use which method.

Download Full-text