Scalable statistics of correlated random variables and extremes applied to deep borehole porosities

Abstract. We analyze scale-dependent statistics of correlated random hydrogeological variables and their extremes using neutron porosity data from six deep boreholes, in three diverse depositional environments, as example. We show that key statistics of porosity increments behave and scale in manners typical of many earth and environmental (as well as other) variables. These scaling behaviors include a tendency of increments to have symmetric, non-Gaussian frequency distributions characterized by heavy tails that decay with separation distance or lag; power-law scaling of sample structure functions (statistical moments of absolute increments) in midranges of lags; linear relationships between log structure functions of successive orders at all lags, known as extended self-similarity or ESS; and nonlinear scaling of structure function power-law exponents with function order, a phenomenon commonly attributed in the literature to multifractals. Elsewhere we proposed, explored and demonstrated a new method of geostatistical inference that captures all of these phenomena within a unified theoretical framework. The framework views data as samples from random fields constituting scale mixtures of truncated (monofractal) fractional Brownian motion (tfBm) or fractional Gaussian noise (tfGn). Important questions not addressed in previous studies concern the distribution and statistical scaling of extreme incremental values. Of special interest in hydrology (and many other areas) are statistics of absolute increments exceeding given thresholds, known as peaks over threshold or POTs. In this paper we explore the statistical scaling of data and, for the first time, corresponding POTs associated with samples from scale mixtures of tfBm or tfGn. We demonstrate that porosity data we analyze possess properties of such samples and thus follow the theory we proposed. The porosity data are of additional value in revealing a remarkable cross-over from one scaling regime to another at certain lags. The phenomena we uncover are of key importance for the analysis of fluid flow and solute as well as particulate transport in complex hydrogeologic environments.

Download Full-text

Extreme value statistics of scalable data exemplified by neutron porosities in deep boreholes

Hydrology and Earth System Sciences Discussions ◽

10.5194/hessd-11-11637-2014 ◽

2014 ◽

Vol 11 (10) ◽

pp. 11637-11686

Author(s):

A. Guadagnini ◽

S. P. Neuman ◽

T. Nan ◽

M. Riva ◽

C. L. Winter

Keyword(s):

Power Law ◽

Heavy Tails ◽

Structure Functions ◽

Separation Distance ◽

Depositional Environments ◽

Extreme Value Statistics ◽

Frequency Distributions ◽

Scale Mixtures ◽

Log Structure ◽

Deep Boreholes

Abstract. Spatial statistics of earth and environmental (as well as many other) data tend to vary with scale. Common manifestations of scale-dependent statistics include a tendency of increments to have symmetric, non-Gaussian frequency distributions characterized by heavy tails that decay with separation distance or lag; power-law scaling of sample structure functions (statistical moments of absolute increments) in midranges of lags; linear relationships between log structure functions of successive orders at all lags, known as extended self-similarity or ESS; and nonlinear scaling of structure function power-law exponents with function order, a phenomenon commonly attributed in the literature to multifractals. Elsewhere we proposed, explored and demonstrated a new method of geostatistical inference that captures all of these phenomena within a unified theoretical framework. The framework views data as samples from random fields constituting scale-mixtures of truncated (monofractal) fractional Brownian motion (tfBm) or fractional Gaussian noise (tfGn). Important questions not addressed in previous studies concern the distribution and statistical scaling of extreme incremental values. Of special interest in hydrology (and many other areas) are statistics of absolute increments exceeding given thresholds, known as peaks over thresholds or POTs. In this paper we explore for the first time the statistical behavior of POTs associated with samples from scale-mixtures of tfBm or tfGn. We are fortunate to have at our disposal thousands of neutron porosity values from six deep boreholes, in three diverse depositional environments, which we show possess the properties of such samples thus following the theory we proposed. The porosity data are of additional value in revealing a remarkable transition from one scaling regime to another at certain lags. The phenomena we uncover are of fundamental importance for the analysis of fluid flow and solute as well as particulate transport in complex hydrogeologic environments.

Download Full-text

Extended power-law scaling of air permeabilities measured on a block of tuff

Hydrology and Earth System Sciences ◽

10.5194/hess-16-29-2012 ◽

2012 ◽

Vol 16 (1) ◽

pp. 29-42 ◽

Cited By ~ 25

Author(s):

M. Siena ◽

A. Guadagnini ◽

M. Riva ◽

S. P. Neuman

Keyword(s):

Power Law ◽

Structure Functions ◽

Self Similarity ◽

Scaling Exponents ◽

Multi Scale ◽

Sample Structure ◽

Heavy Tailed ◽

Non Gaussian ◽

Nonlinear Fashion ◽

Extended Power

Abstract. We use three methods to identify power-law scaling of multi-scale log air permeability data collected by Tidwell and Wilson on the faces of a laboratory-scale block of Topopah Spring tuff: method of moments (M), Extended Self-Similarity (ESS) and a generalized version thereof (G-ESS). All three methods focus on q-th-order sample structure functions of absolute increments. Most such functions exhibit power-law scaling at best over a limited midrange of experimental separation scales, or lags, which are sometimes difficult to identify unambiguously by means of M. ESS and G-ESS extend this range in a way that renders power-law scaling easier to characterize. Our analysis confirms the superiority of ESS and G-ESS over M in identifying the scaling exponents, ξ(q), of corresponding structure functions of orders q, suggesting further that ESS is more reliable than G-ESS. The exponents vary in a nonlinear fashion with q as is typical of real or apparent multifractals. Our estimates of the Hurst scaling coefficient increase with support scale, implying a reduction in roughness (anti-persistence) of the log permeability field with measurement volume. The finding by Tidwell and Wilson that log permeabilities associated with all tip sizes can be characterized by stationary variogram models, coupled with our findings that log permeability increments associated with the smallest tip size are approximately Gaussian and those associated with all tip sizes scale show nonlinear variations in ξ(q) with q, are consistent with a view of these data as a sample from a truncated version (tfBm) of self-affine fractional Brownian motion (fBm). Since in theory the scaling exponents, ξ(q), of tfBm vary linearly with q we conclude that nonlinear scaling in our case is not an indication of multifractality but an artifact of sampling from tfBm. This allows us to explain theoretically how power-law scaling of our data, as well as of non-Gaussian heavy-tailed signals subordinated to tfBm, are extended by ESS. It further allows us to identify the functional form and estimate all parameters of the corresponding tfBm based on sample structure functions of first and second orders.

Download Full-text

Spatial Heterogeneity of the Incidence of Powdery Mildew on Hop Cones

Plant Disease ◽

10.1094/pd-90-1433 ◽

2006 ◽

Vol 90 (11) ◽

pp. 1433-1440 ◽

Cited By ~ 17

Author(s):

David H. Gent ◽

Walter F. Mahaffee ◽

William W. Turechek

Keyword(s):

Powdery Mildew ◽

Spatial Heterogeneity ◽

Power Law ◽

Binomial Distribution ◽

Disease Incidence ◽

Disease Assessment ◽

Cluster Sampling ◽

Data Sets ◽

Ratio Test ◽

Frequency Distributions

The spatial heterogeneity of the incidence of hop cones with powdery mildew (Podosphaera macularis) was characterized from transect surveys of 41 commercial hop yards in Oregon and Washington from 2000 to 2005. The proportion of sampled cones with powdery mildew ( p) was recorded for each of 221 transects, where N = 60 sampling units of n = 25 cones assessed in each transect according to a cluster sampling strategy. Disease incidence ranged from 0 to 0.92 among all yards and dates. The binomial and beta-binomial frequency distributions were fit to the N sampling units in a transect using maximum likelihood. The estimation procedure converged for 74% of the data sets where p > 0, and a loglikelihood ratio test indicated that the beta-binomial distribution provided a better fit to the data than the binomial distribution for 46% of the data sets, indicating an aggregated pattern of disease. Similarly, the C(α) test indicated that 54% could be described by the beta-binomial distribution. The heterogeneity parameter of the beta-binomial distribution, θ, a measure of variation among sampling units, ranged from 0.01 to 0.20, with a mean of 0.037 and a median of 0.015. Estimates of the index of dispersion ranged from 0.79 to 7.78, with a mean of 1.81 and a median of 1.37, and were significantly greater than 1 for 54% of the data sets. The binary power law provided an excellent fit to the data, with slope and intercept parameters significantly greater than 1, which indicated that heterogeneity varied systematically with the incidence of infected cones. A covariance analysis indicated that the geographic location (region) of the yards and the type of hop cultivar had little effect on heterogeneity; however, the year of sampling significantly influenced the intercept and slope parameters of the binary power law. Significant spatial autocorrelation was detected in only 11% of the data sets, with estimates of first-order autocorrelation, r1, ranging from -0.30 to 0.70, with a mean of 0.06 and a median of 0.04; however, correlation was detected in only 20 and 16% of the data sets by median and ordinary runs analysis, respectively. Together, these analyses suggest that the incidence of powdery mildew on cones was slightly aggregated among plants, but patterns of aggregation larger than the sampling unit were rare (20% or less of data sets). Knowledge of the heterogeneity of diseased cones was used to construct fixed sampling curves to precisely estimate the incidence of powdery mildew on cones at varying disease intensities. Use of the sampling curves developed in this research should help to improve sampling methods for disease assessment and management decisions.

Download Full-text

The breakdown of the power-law frequency distributions for the hard X-ray peak count rates of solar flares

Research in Astronomy and Astrophysics ◽

10.1088/1674-4527/13/12/009 ◽

2013 ◽

Vol 13 (12) ◽

pp. 1482-1492 ◽

Cited By ~ 1

Author(s):

You-Ping Li ◽

Wei-Qun Gan ◽

Li Feng ◽

Si-Ming Liu ◽

A. Struminsky

Keyword(s):

Power Law ◽

Solar Flares ◽

Frequency Distributions ◽

X Ray ◽

Peak Count

Download Full-text

The Observational Error of Automated Wind Reports from Aircraft

Bulletin of the American Meteorological Society ◽

10.1175/1520-0477-67.2.177 ◽

1986 ◽

Vol 67 (2) ◽

pp. 177-185 ◽

Cited By ~ 5

Author(s):

Lauren L. Morone

Keyword(s):

Measurement Error ◽

Structure Function ◽

Error Variance ◽

Structure Functions ◽

Separation Distance ◽

Observational Error ◽

Random Measurement Error ◽

Measurement Error Variance ◽

Flight Level ◽

Types Of Error

Data collected from aircraft equipped with AIDS (Aircraft Integrated Data System) instrumentation during the Global Weather Experiment year of 1979 are used to estimate the observational error of winds at flight level from this and other aircraft automated wind-reporting systems. Structure functions are computed from reports that are paired using specific criteria. The value of this function extrapolated to zero separation distance is an estimate of twice the random measurement-error variance of the AIDS-measured winds. Component-wind errors computed in this way range from 2.1 to 3.1 m · s−1 for the two months of data examined, January and August 1979. Observational error, specified in optimum-interpolation analyses to allow the analysis to distinguish among observations of differing quality, is composed of both measurement error and the error of unrepresentativeness. The latter type of error is a function of the resolvable scale of the analysis-prediction system. The structure function, which measures the variability of a field as a function of separation distance, includes both of these types of error. If the resolvable scale of an analysis procedure is known, an estimate of the observational error can be computed from the structure function at that particular distance. An observational error of 5.3 m · s−1 was computed for the u and v wind components for a sample resolvable scale of 300 km. The errors computed from the structure functions are compared to colocation statistics from radiosondes. The errors associated with automated wind reports are found to compare favorably with those estimated for radiosonde winds at that level.

Download Full-text

High-Precision Measurements of the Copolar Correlation Coefficient: Non-Gaussian Errors and Retrieval of the Dispersion Parameter μ in Rainfall

Journal of Applied Meteorology and Climatology ◽

10.1175/jamc-d-15-0272.1 ◽

2016 ◽

Vol 55 (7) ◽

pp. 1615-1632 ◽

Cited By ~ 3

Author(s):

W. J. Keat ◽

C. D. Westbrook ◽

A. J. Illingworth

Keyword(s):

Correlation Coefficient ◽

Error Estimates ◽

Dispersion Parameter ◽

Rain Event ◽

Error Statistics ◽

Frequency Distributions ◽

Ground Clutter ◽

Gaussian Probability Distribution ◽

Raindrop Size ◽

Non Gaussian

AbstractThe copolar correlation coefficient ρhv has many applications, including hydrometeor classification, ground clutter and melting-layer identification, interpretation of ice microphysics, and the retrieval of raindrop size distributions (DSDs). However, the quantitative error estimates that are necessary if these applications are to be fully exploited are currently lacking. Previous error estimates of ρhv rely on knowledge of the unknown “true” ρhv and implicitly assume a Gaussian probability distribution function of ρhv samples. Frequency distributions of ρhv estimates are in fact shown to be highly negatively skewed. A new variable, = log10(1 − ρhv), is defined that does have Gaussian error statistics and a standard deviation depending only on the number of independent radar pulses. This is verified using observations of spherical drizzle drops, allowing, for the first time, the construction of rigorous confidence intervals in estimates of ρhv. In addition, the manner in which the imperfect collocation of the horizontal and vertical polarization sample volumes may be accounted for is demonstrated. The possibility of using L to estimate the dispersion parameter μ in the gamma drop size distribution is investigated. Including drop oscillations is found to be essential for this application; otherwise, there could be biases in retrieved μ of up to approximately 8. Preliminary results in rainfall are presented. In a convective rain case study, the estimates presented herein show μ to be substantially larger than 0 (an exponential DSD). In this particular rain event, rain rate would be overestimated by up to 50% if a simple exponential DSD is assumed.

Download Full-text

Heavy-tailed random matrices

10.1093/oxfordhb/9780198744191.013.13 ◽

2018 ◽

Author(s):

Vladimir Kravtsov

Keyword(s):

Random Matrices ◽

Heavy Tails ◽

Probability Distributions ◽

Basin Of Attraction ◽

Random Variables ◽

Universal Properties ◽

Random Matrix Ensembles ◽

Free Random Variables ◽

Heavy Tailed ◽

Non Gaussian

This article considers non-Gaussian random matrices consisting of random variables with heavy-tailed probability distributions. In probability theory heavy tails of distributions describe rare but violent events which usually have a dominant influence on the statistics. Furthermore, they completely change the universal properties of eigenvalues and eigenvectors of random matrices. This article focuses on the universal macroscopic properties of Wigner matrices belonging to the Lévy basin of attraction, matrices representing stable free random variables, and a class of heavy-tailed matrices obtained by parametric deformations of standard ensembles. It first examines the properties of heavy-tailed symmetric matrices known as Wigner–Lévy matrices before discussing free random variables and free Lévy matrices as well as heavy-tailed deformations. In particular, it describes random matrix ensembles obtained from standard ensembles by a reweighting of the probability measure. It also analyses several matrix models belonging to heavy-tailed random matrices and presents methods for integrating them.

Download Full-text