scholarly journals An anomaly correlation skill score for the evaluation of the performance of hyperspectral infrared sounders

Author(s):  
Hartmut H. Aumann ◽  
Evan Manning ◽  
Chris Barnet ◽  
Eric Maddy ◽  
William Blackwell
2021 ◽  
Author(s):  
Tongtiegang Zhao ◽  
Haoling Chen ◽  
Quanxi Shao

Abstract. Climate teleconnections are essential for the verification of valuable precipitation forecasts generated by global climate models (GCMs). This paper develops a novel approach to attributing correlation skill of dynamical GCM forecasts to statistical El Niño-Southern Oscillation (ENSO) teleconnection by using the coefficient of determination (R2). Specifically, observed precipitation is respectively regressed against GCM forecasts, Niño3.4 and both of them and then the intersection operation is implemented to quantify the overlapping R2 for GCM forecasts and Niño3.4. The significance of overlapping R2 and the sign of ENSO teleconnection facilitate three cases of attribution, i.e., significantly positive anomaly correlation attributable to positive ENSO teleconnection, attributable to negative ENSO teleconnection and not attributable to ENSO teleconnection. A case study is devised for the Climate Forecast System version 2 (CFSv2) seasonal forecasts of global precipitation. For grid cells around the world, the ratio of significantly positive anomaly correlation attributable to positive (negative) ENSO teleconnection is respectively 10.8 % (11.7 %) in December-January-February (DJF), 7.1 % (7.3 %) in March-April-May (MAM), 6.3 % (7.4 %) in June-July-August (JJA) and 7.0 % (14.3 %) in September-October-November (SON). The results not only confirm the prominent contributions of ENSO teleconnection to GCM forecasts, but also present spatial plots of regions where significantly positive anomaly correlation is subject to positive ENSO teleconnection, negative ENSO teleconnection and teleconnections other than ENSO. Overall, the proposed attribution approach can serve as an effective tool to investigate the source of predictability for GCM seasonal forecasts of global precipitation.


2021 ◽  
Author(s):  
Nicola Cortesi ◽  
Verónica Torralba ◽  
Llorenó Lledó ◽  
Andrea Manrique-Suñén ◽  
Nube Gonzalez-Reviriego ◽  
...  

AbstractIt is often assumed that weather regimes adequately characterize atmospheric circulation variability. However, regime classifications spanning many months and with a low number of regimes may not satisfy this assumption. The first aim of this study is to test such hypothesis for the Euro-Atlantic region. The second one is to extend the assessment of sub-seasonal forecast skill in predicting the frequencies of occurrence of the regimes beyond the winter season. Two regime classifications of four regimes each were obtained from sea level pressure anomalies clustered from October to March and from April to September respectively. Their spatial patterns were compared with those representing the annual cycle. Results highlight that the two regime classifications are able to reproduce most part of the patterns of the annual cycle, except during the transition weeks between the two periods, when patterns of the annual cycle resembling Atlantic Low regime are not also observed in any of the two classifications. Forecast skill of Atlantic Low was found to be similar to that of NAO+, the regime replacing Atlantic Low in the two classifications. Thus, although clustering yearly circulation data in two periods of 6 months each introduces a few deviations from the annual cycle of the regime patterns, it does not negatively affect sub-seasonal forecast skill. Beyond the winter season and the first ten forecast days, sub-seasonal forecasts of ECMWF are still able to achieve weekly frequency correlations of r = 0.5 for some regimes and start dates, including summer ones. ECMWF forecasts beat climatological forecasts in case of long-lasting regime events, and when measured by the fair continuous ranked probability skill score, but not when measured by the Brier skill score. Thus, more efforts have to be done yet in order to achieve minimum skill necessary to develop forecast products based on weather regimes outside winter season.


Author(s):  
Hermann Anetzberger ◽  
Stephan Reppenhagen ◽  
Hansjörg Eickhoff ◽  
Franz Josef Seibert ◽  
Bernd Döring ◽  
...  

2013 ◽  
Vol 28 (3) ◽  
pp. 802-814 ◽  
Author(s):  
Timothy W. Armistead

Abstract The paper briefly reviews measures that have been proposed since the 1880s to assess accuracy and skill in categorical weather forecasting. The majority of the measures consist of a single expression, for example, a proportion, the difference between two proportions, a ratio, or a coefficient. Two exemplar single-expression measures for 2 × 2 categorical arrays that chronologically bracket the 130-yr history of this effort—Doolittle's inference ratio i and Stephenson's odds ratio skill score (ORSS)—are reviewed in detail. Doolittle's i is appropriately calculated using conditional probabilities, and the ORSS is a valid measure of association, but both measures are limited in ways that variously mirror all single-expression measures for categorical forecasting. The limitations that variously affect such measures include their inability to assess the separate accuracy rates of different forecast–event categories in a matrix, their sensitivity to the interdependence of forecasts in a 2 × 2 matrix, and the inapplicability of many of them to the general k × k (k ≥ 2) problem. The paper demonstrates that Wagner's unbiased hit rate, developed for use in categorical judgment studies with any k × k (k ≥ 2) array, avoids these limitations while extending the dual-measure Bayesian approach proposed by Murphy and Winkler in 1987.


2010 ◽  
Vol 27 (3) ◽  
pp. 409-427 ◽  
Author(s):  
Kun Tao ◽  
Ana P. Barros

Abstract The objective of spatial downscaling strategies is to increase the information content of coarse datasets at smaller scales. In the case of quantitative precipitation estimation (QPE) for hydrological applications, the goal is to close the scale gap between the spatial resolution of coarse datasets (e.g., gridded satellite precipitation products at resolution L × L) and the high resolution (l × l; L ≫ l) necessary to capture the spatial features that determine spatial variability of water flows and water stores in the landscape. In essence, the downscaling process consists of weaving subgrid-scale heterogeneity over a desired range of wavelengths in the original field. The defining question is, which properties, statistical and otherwise, of the target field (the known observable at the desired spatial resolution) should be matched, with the caveat that downscaling methods be as a general as possible and therefore ideally without case-specific constraints and/or calibration requirements? Here, the attention is focused on two simple fractal downscaling methods using iterated functions systems (IFS) and fractal Brownian surfaces (FBS) that meet this requirement. The two methods were applied to disaggregate spatially 27 summertime convective storms in the central United States during 2007 at three consecutive times (1800, 2100, and 0000 UTC, thus 81 fields overall) from the Tropical Rainfall Measuring Mission (TRMM) version 6 (V6) 3B42 precipitation product (∼25-km grid spacing) to the same resolution as the NCEP stage IV products (∼4-km grid spacing). Results from bilinear interpolation are used as the control. A fundamental distinction between IFS and FBS is that the latter implies a distribution of downscaled fields and thus an ensemble solution, whereas the former provides a single solution. The downscaling effectiveness is assessed using fractal measures (the spectral exponent β, fractal dimension D, Hurst coefficient H, and roughness amplitude R) and traditional operational scores statistics scores [false alarm rate (FR), probability of detection (PD), threat score (TS), and Heidke skill score (HSS)], as well as bias and the root-mean-square error (RMSE). The results show that both IFS and FBS fractal interpolation perform well with regard to operational skill scores, and they meet the additional requirement of generating structurally consistent fields. Furthermore, confidence intervals can be directly generated from the FBS ensemble. The results were used to diagnose errors relevant for hydrometeorological applications, in particular a spatial displacement with characteristic length of at least 50 km (2500 km2) in the location of peak rainfall intensities for the cases studied.


2016 ◽  
Vol 29 (17) ◽  
pp. 6065-6083 ◽  
Author(s):  
Yinghui Liu ◽  
Jeffrey R. Key

Abstract Cloud cover is one of the largest uncertainties in model predictions of the future Arctic climate. Previous studies have shown that cloud amounts in global climate models and atmospheric reanalyses vary widely and may have large biases. However, many climate studies are based on anomalies rather than absolute values, for which biases are less important. This study examines the performance of five atmospheric reanalysis products—ERA-Interim, MERRA, MERRA-2, NCEP R1, and NCEP R2—in depicting monthly mean Arctic cloud amount anomalies against Moderate Resolution Imaging Spectroradiometer (MODIS) satellite observations from 2000 to 2014 and against Cloud–Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) observations from 2006 to 2014. All five reanalysis products exhibit biases in the mean cloud amount, especially in winter. The Gerrity skill score (GSS) and correlation analysis are used to quantify their performance in terms of interannual variations. Results show that ERA-Interim, MERRA, MERRA-2, and NCEP R2 perform similarly, with annual mean GSSs of 0.36/0.22, 0.31/0.24, 0.32/0.23, and 0.32/0.23 and annual mean correlation coefficients of 0.50/0.51, 0.43/0.54, 0.44/0.53, and 0.50/0.52 against MODIS/CALIPSO, indicating that the reanalysis datasets do exhibit some capability for depicting the monthly mean cloud amount anomalies. There are no significant differences in the overall performance of reanalysis products. They all perform best in July, August, and September and worst in November, December, and January. All reanalysis datasets have better performance over land than over ocean. This study identifies the magnitudes of errors in Arctic mean cloud amounts and anomalies and provides a useful tool for evaluating future improvements in the cloud schemes of reanalysis products.


2014 ◽  
Vol 142 (2) ◽  
pp. 716-738 ◽  
Author(s):  
Craig S. Schwartz ◽  
Zhiquan Liu

Abstract Analyses with 20-km horizontal grid spacing were produced from parallel continuously cycling three-dimensional variational (3DVAR), ensemble square root Kalman filter (EnSRF), and “hybrid” variational–ensemble data assimilation (DA) systems between 0000 UTC 6 May and 0000 UTC 21 June 2011 over a domain spanning the contiguous United States. Beginning 9 May, the 0000 UTC analyses initialized 36-h Weather Research and Forecasting Model (WRF) forecasts containing a large convection-permitting 4-km nest. These 4-km 3DVAR-, EnSRF-, and hybrid-initialized forecasts were compared to benchmark WRF forecasts initialized by interpolating 0000 UTC Global Forecast System (GFS) analyses onto the computational domain. While important differences regarding mean state characteristics of the 20-km DA systems were noted, verification efforts focused on the 4-km precipitation forecasts. The 3DVAR-, hybrid-, and EnSRF-initialized 4-km precipitation forecasts performed similarly regarding general precipitation characteristics, such as timing of the diurnal cycle, and all three forecast sets had high precipitation biases at heavier rainfall rates. However, meaningful differences emerged regarding precipitation placement as quantified by the fractions skill score. For most forecast hours, the hybrid-initialized 4-km precipitation forecasts were better than the EnSRF-, 3DVAR-, and GFS-initialized forecasts, and the improvement was often statistically significant at the 95th percentile. These results demonstrate the potential of limited-area continuously cycling hybrid DA configurations and suggest additional hybrid development is warranted.


2013 ◽  
Vol 141 (10) ◽  
pp. 3477-3497 ◽  
Author(s):  
Mingyue Chen ◽  
Wanqiu Wang ◽  
Arun Kumar

Abstract An analysis of lagged ensemble seasonal forecasts from the National Centers for Environmental Prediction (NCEP) Climate Forecast System, version 2 (CFSv2), is presented. The focus of the analysis is on the construction of lagged ensemble forecasts with increasing lead time (thus allowing use of larger ensemble sizes) and its influence on seasonal prediction skill. Predictions of seasonal means of sea surface temperature (SST), 200-hPa height (z200), precipitation, and 2-m air temperature (T2m) over land are analyzed. Measures of prediction skill include deterministic (anomaly correlation and mean square error) and probabilistic [rank probability skill score (RPSS)]. The results show that for a fixed lead time, and as one would expect, the skill of seasonal forecast improves as the ensemble size increases, while for a fixed ensemble size the forecast skill decreases as the lead time becomes longer. However, when a forecast is based on a lagged ensemble, there exists an optimal lagged ensemble time (OLET) when positive influence of increasing ensemble size and negative influence due to an increasing lead time result in a maximum in seasonal prediction skill. The OLET is shown to depend on the geographical location and variable. For precipitation and T2m, OLET is relatively longer and skill gain is larger than that for SST and tropical z200. OLET is also dependent on the skill measure with RPSS having the longest OLET. Results of this analysis will be useful in providing guidelines on the design and understanding relative merits for different configuration of seasonal prediction systems.


2010 ◽  
Vol 25 (1) ◽  
pp. 343-354 ◽  
Author(s):  
Marion Mittermaier ◽  
Nigel Roberts

Abstract The fractions skill score (FSS) was one of the measures that formed part of the Intercomparison of Spatial Forecast Verification Methods project. The FSS was used to assess a common dataset that consisted of real and perturbed Weather Research and Forecasting (WRF) model precipitation forecasts, as well as geometric cases. These datasets are all based on the NCEP 240 grid, which translates to approximately 4-km resolution over the contiguous United States. The geometric cases showed that the FSS can provide a truthful assessment of displacement errors and forecast skill. In addition, the FSS can be used to determine the scale at which an acceptable level of skill is reached and this usage is perhaps more helpful than interpreting the actual FSS value. This spatial-scale approach is becoming more popular for monitoring operational forecast performance. The study also shows how the FSS responds to forecast bias. A more biased forecast always gives lower FSS values at large scales and usually at smaller scales. It is possible, however, for a more biased forecast to give a higher score at smaller scales, when additional rain overlaps the observed rain. However, given a sufficiently large sample of forecasts, a more biased forecast system will score lower. The use of percentile thresholds can remove the impacts of the bias. When the proportion of the domain that is “wet” (the wet-area ratio) is small, subtle differences introduced through near-threshold misses can lead to large changes in FSS magnitude in individual cases (primarily because the bias is changed). Reliable statistics for small wet-area ratios require a larger sample of forecasts. Care needs to be taken in the choice of verification domain. For high-resolution models, the domain should be large enough to encompass the length scale of the typical mesoscale forcing (e.g., upper-level troughs or squall lines). If the domain is too large, the wet-area ratios will always be small. If the domain is too small, fluctuations in the wet-area ratio can be large and larger spatial errors may be missed. The FSS is a good measure of the spatial accuracy of precipitation forecasts. Different methods are needed to determine other patterns of behavior.


Author(s):  
Ryan Lagerquist ◽  
Jebb Q. Stewart ◽  
Imme Ebert-Uphoff ◽  
Christina Kumler

AbstractPredicting the timing and location of thunderstorms (“convection”) allows for preventive actions that can save both lives and property. We have applied U-nets, a deep-learning-based type of neural network, to forecast convection on a grid at lead times up to 120 minutes. The goal is to make skillful forecasts with only present and past satellite data as predictors. Specifically, predictors are multispectral brightness-temperature images from the Himawari-8 satellite, while targets (ground truth) are provided by weather radars in Taiwan. U-nets are becoming popular in atmospheric science due to their advantages for gridded prediction. Furthermore, we use three novel approaches to advance U-nets in atmospheric science. First, we compare three architectures – vanilla, temporal, and U-net++ – and find that vanilla U-nets are best for this task. Second, we train U-nets with the fractions skill score, which is spatially aware, as the loss function. Third, because we do not have adequate ground truth over the full Himawari-8 domain, we train the U-nets with small radar-centered patches, then apply trained U-nets to the full domain. Also, we find that the best predictions are given by U-nets trained with satellite data from multiple lag times, not only the present. We evaluate U-nets in detail – by time of day, month, and geographic location – and compare to persistence models. The U-nets outperform persistence at lead times ≥ 60 minutes, and at all lead times the U-nets provide a more realistic climatology than persistence. Our code is available publicly.


Sign in / Sign up

Export Citation Format

Share Document