scholarly journals Validation of experimental charge-density refinement strategies: when do we overfit?

IUCrJ ◽  
2017 ◽  
Vol 4 (4) ◽  
pp. 420-430 ◽  
Author(s):  
Lennard Krause ◽  
Benedikt Niepötter ◽  
Christian J. Schürmann ◽  
Dietmar Stalke ◽  
Regine Herbst-Irmer

A cross-validation method is supplied to judge between various strategies in multipole refinement procedures. Its application enables straightforward detection of whether the refinement of additional parameters leads to an improvement in the model or an overfitting of the given data. For all tested data sets it was possible to prove that the multipole parameters of atoms in comparable chemical environments should be constrained to be identical. In an automated approach, this method additionally delivers parameter distributions ofkdifferent refinements. These distributions can be used for further error diagnostics,e.g.to detect erroneously defined parameters or incorrectly determined reflections. Visualization tools show the variation in the parameters. These different refinements also provide rough estimates for the standard deviation of topological parameters.

2007 ◽  
Vol 62 (5) ◽  
pp. 696-704 ◽  
Author(s):  
Diana Förster ◽  
Armin Wagner ◽  
Christian B. Hübschle ◽  
Carsten Paulmann ◽  
Peter Luger

Abstract The charge density of the tripeptide L-alanyl-glycyl-L-alanine was determined from three X-ray data sets measured at different experimental setups and under different conditions. Two of the data sets were measured with synchrotron radiation (beamline F1 of Hasylab/DESY, Germany and beamline X10SA of SLS, Paul-Scherer-Institute, Switzerland) at temperatures around 100 K while a third data set was measured under home laboratory conditions (MoKα radiation) at a low temperature of 20 K. The multipole refinement strategy to derive the experimental charge density was the same in all cases, so that the obtained charge density properties could directly be compared. While the general analysis of the three data sets suggested a small preference for one of the synchrotron data sets (Hasylab F1), a comparison of topological and atomic properties gave in no case an indication for a preference of any of the three data sets. It follows that even the 4 h data set measured at the SLS performed equally well compared to the data sets of substantially longer exposure time.


Author(s):  
Rafał Janicki ◽  
Przemysław Starynowicz

The experimental charge-density distribution in [Gd(H2O)9](CF3SO3)3 has been analysed and compared with the theoretical density functional theory calculations. Although the Gd—OH2 bonds are mainly ionic, a covalent contribution is detectable when inspecting both the topological parameters of these bonds and the natural bond orbital results. This contribution originates from small electron transfer from the lone pairs of oxygen atoms to empty 5d and 6s spin orbitals of Gd3+.


Author(s):  
Hilke Wolf ◽  
Mads R. V. Jørgensen ◽  
Yu-Sheng Chen ◽  
Regine Herbst-Irmer ◽  
Dietmar Stalke

Four datasets on [2,2]-paracyclophane were collected in-house and at the Advanced Photon Source at two different temperatures for charge density investigation. Global data quality indicators such as high resolution, highI/σ(I) values, low mergingRvalues and high multiplicity were matched for all four datasets. The structural parameters did not show significant differences, but the synchrotron data depicted deficiencies in the topological analysis. In retrospect these deficiencies could be assigned to the low quality of the innermost data, which could have been identified bye.g.mergingRvalues for only these reflections. In the multipole refinement these deficiencies could be monitored usingDRK-plotand residual density analysis. In this particular example the differences in the topological parameters were relatively small but significant.


2009 ◽  
Vol 42 (6) ◽  
pp. 1110-1121 ◽  
Author(s):  
B. Dittrich ◽  
C. B. Hübschle ◽  
J. J. Holstein ◽  
F. P. A. Fabbiani

The limiting factor for charge-density studies is crystal quality. Although area detection and low temperatures enable redundant data collection, only compounds that form well diffracting single crystals without disorder are amenable to these studies. If thermal motion and electron density ρ(r) were de-convoluted, multipole parameters could also be refined with lower-resolution data, such as those commonly collected for macromolecules. Using the invariom database for first refining conventional parameters (x,y,zand atomic displacement parameters), de-convolution can be achieved. In a subsequent least-squares refinement of multipole parameters only, information on the charge density becomes accessible also for data not fulfilling charge-density requirements. A critical aspect of this procedure is the missing information on the correlation between refined and non-refined parameters. This correlation is investigated in detail by comparing a full multipole refinement on high-resolution and a blocked refinement on `normal-resolution' data sets of ciprofloxacin hexahydrate. Topological properties and dipole moments are shown to be in excellent agreement for the two refinements. A `normal-resolution' data set of ciprofloxacin hydrochloride 1.4-hydrate is also evaluated in this manner.


2014 ◽  
Vol 70 (a1) ◽  
pp. C282-C282 ◽  
Author(s):  
Regine Herbst-Irmer

In experimental charge density investigation it is indispensable to use the highest possible quality of data. Therefore the multiplicity should be as high as possible, but poor data should be omitted. To decide about resolution limit and discarding outlier data often limits for Rint or I/σ(I) are used. A better approach is the `paired refinement method' [1] comparing two data sets by the fit of the models derived by the same refinement protocol to both data sets. For macromolecular data sets it could be shown that a higher resolution should be used than normally derived from the above mentioned criteria. First results for charge density data seem to show the same tendency but of course on a different level. The paired refinement strategy can also be used to investigate the influence of different scaling methods. In a recent version of SADABS [2] a new error model and a 3λ correction is implemented. With the paired refinement strategy the improvement in data quality gets obvious. A further concern in charge density investigation is the question of overfitting. In macromolecular refinement this is answered by the Rfree concept [3]. Here a refinement protocol is developed by refining against a work set of reflections, e.g. 90 % of the data. The remaining reflections are untouched in the whole refinement process but an Rfree value is calculated using only this test set of reflections. An overfitting can clearly be identified by a decrease in Rwork but an increase in Rfree. This refinement protocol is then used for a final refinement against all data. It will be discussed how this method could support charge density studies.


2005 ◽  
Vol 61 (1) ◽  
pp. 115-121 ◽  
Author(s):  
Marc Messerschmidt ◽  
Stephan Scheins ◽  
Peter Luger

Strychnine has an interesting oligocyclic structure of seven condensed rings. It is easy to crystallize and gives crystals of excellent quality which diffract nicely to high regions in reciprocal space. It was thus chosen for a comparative charge-density study based on four high-resolution data sets (sin θ/λ ≥ 1.15 Å−1) that were measured with different experimental setups in the temperature range 100–15 K. In addition, a theoretical charge density was derived from a B3LYP/6-311++G(3df,3pd) calculation. The agreement expressed in bond topological parameters among the four experimental charge densities is better than between experiment and theory.


2021 ◽  
Vol 29 ◽  
pp. 115-124
Author(s):  
Xinlu Wang ◽  
Ahmed A.F. Saif ◽  
Dayou Liu ◽  
Yungang Zhu ◽  
Jon Atli Benediktsson

BACKGROUND: DNA sequence alignment is one of the most fundamental and important operation to identify which gene family may contain this sequence, pattern matching for DNA sequence has been a fundamental issue in biomedical engineering, biotechnology and health informatics. OBJECTIVE: To solve this problem, this study proposes an optimal multi pattern matching with wildcards for DNA sequence. METHODS: This proposed method packs the patterns and a sliding window of texts, and the window slides along the given packed text, matching against stored packed patterns. RESULTS: Three data sets are used to test the performance of the proposed algorithm, and the algorithm was seen to be more efficient than the competitors because its operation is close to machine language. CONCLUSIONS: Theoretical analysis and experimental results both demonstrate that the proposed method outperforms the state-of-the-art methods and is especially effective for the DNA sequence.


Author(s):  
Zhijie Chua ◽  
Bartosz Zarychta ◽  
Christopher G. Gianopoulos ◽  
Vladimir V. Zhurov ◽  
A. Alan Pinkerton

A high-resolution X-ray diffraction measurement of 2,5-dichloro-1,4-benzoquinone (DCBQ) at 20 K was carried out. The experimental charge density was modeled using the Hansen–Coppens multipolar expansion and the topology of the electron density was analyzed in terms of the quantum theory of atoms in molecules (QTAIM). Two different multipole models, predominantly differentiated by the treatment of the chlorine atom, were obtained. The experimental results have been compared to theoretical results in the form of a multipolar refinement against theoretical structure factors and through direct topological analysis of the electron density obtained from the optimized periodic wavefunction. The similarity of the properties of the total electron density in all cases demonstrates the robustness of the Hansen–Coppens formalism. All intra- and intermolecular interactions have been characterized.


2004 ◽  
Vol 384 (1-3) ◽  
pp. 40-44 ◽  
Author(s):  
Konstatin A Lyssenko ◽  
Mikhail Yu Antipin ◽  
Mikhail E Gurskii ◽  
Yurii N Bubnov ◽  
Anna L Karionova ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document