Data Associations Between Two Hierarchy Trees

2018 ◽  
Vol 29 (07) ◽  
pp. 1181-1201
Author(s):  
Shuo Yan ◽  
Yunyong Zhang ◽  
Binfeng Yan ◽  
Lin Yan ◽  
Jinfeng Kou

To study the data association, a structure called a hierarchy tree is constructed. It is based on the approach to hierarchical data processing, and constituted by different level partitions of a data set. This leads to the definition of the data association, thereby links two hierarchy trees together. The research on the data association focuses on the way to check whether data are associated with other data. The investigation includes the issues: the intuitive and formal methods for constructing hierarchy trees, the technique of making granules hierarchical, the sufficient and necessary condition for measuring the data association, the analysis of basing the closer data association on the closer data identity, the discussion of connecting numerical information with association closeness, etc. Crucially, the hierarchical data processing and numerical information are important characteristics of the research. As an applied example, two hierarchy trees are set up, demonstrating the hierarchical granulation process of two actual data sets. Data associations between the data sets are characterized by the approach developed in this paper, which provides the basis of algorithm design for the actual problem. In particular, since the research is relevant to granules and alterations of granularity, it may offer an avenue of research on granular computing.

Author(s):  
J.-F. Hullo

We propose a complete methodology for the fine registration and referencing of kilo-station networks of terrestrial laser scanner data currently used for many valuable purposes such as 3D as-built reconstruction of Building Information Models (BIM) or industrial asbuilt mock-ups. This comprehensive target-based process aims to achieve the global tolerance below a few centimetres across a 3D network including more than 1,000 laser stations spread over 10 floors. This procedure is particularly valuable for 3D networks of indoor congested environments. In situ, the use of terrestrial laser scanners, the layout of the targets and the set-up of a topographic control network should comply with the expert methods specific to surveyors. Using parametric and reduced Gauss-Helmert models, the network is expressed as a set of functional constraints with a related stochastic model. During the post-processing phase inspired by geodesy methods, a robust cost function is minimised. At the scale of such a data set, the complexity of the 3D network is beyond comprehension. The surveyor, even an expert, must be supported, in his analysis, by digital and visual indicators. In addition to the standard indicators used for the adjustment methods, including Baarda’s reliability, we introduce spectral analysis tools of graph theory for identifying different types of errors or a lack of robustness of the system as well as <i>in fine</i> documenting the quality of the registration.


2016 ◽  
Vol 12 (S325) ◽  
pp. 320-323
Author(s):  
Hailong Yuan ◽  
Yanxia Zhang ◽  
Yue Wu ◽  
Yajuan Lei ◽  
Yiqiao Dong ◽  
...  

AbstractCurrently large sky area spectral surveys like SDSS, 2dF, and LAMOST, using the new generation of telescopes and observatories, have provided massive spectral data sets for astronomical research. Most of the data can be automatically handled with pipelines, but visually inspection by human eyes is still necessary in several situations, like low SNR spectra, QSO recognition and peculiar spectra mining. Using ASERA, A Spectrum Eye Recognition Assistant, we can set up a team spectral inspection platform. On a preselected spectral data set, members of a team can individually view spectra one by one, find the best match template and estimate the redshift. Results from different members will be gathered and merged to raise the team work efficiency. ASERA mainly targets the spectra of SDSS and LAMOST fits data formats. Other formats can be supported with some conversion. Spectral templates from SDSS and LAMOST pipelines are embedded and users can easily add their own templates. Convenient cross identification interfaces with SDSS, SIMBAD, VIZIER, NED and DSS are also provided. An application example targeting finding strong emission line spectra from LAMOST DR2 is presented.


Author(s):  
J.-F. Hullo

We propose a complete methodology for the fine registration and referencing of kilo-station networks of terrestrial laser scanner data currently used for many valuable purposes such as 3D as-built reconstruction of Building Information Models (BIM) or industrial asbuilt mock-ups. This comprehensive target-based process aims to achieve the global tolerance below a few centimetres across a 3D network including more than 1,000 laser stations spread over 10 floors. This procedure is particularly valuable for 3D networks of indoor congested environments. In situ, the use of terrestrial laser scanners, the layout of the targets and the set-up of a topographic control network should comply with the expert methods specific to surveyors. Using parametric and reduced Gauss-Helmert models, the network is expressed as a set of functional constraints with a related stochastic model. During the post-processing phase inspired by geodesy methods, a robust cost function is minimised. At the scale of such a data set, the complexity of the 3D network is beyond comprehension. The surveyor, even an expert, must be supported, in his analysis, by digital and visual indicators. In addition to the standard indicators used for the adjustment methods, including Baarda’s reliability, we introduce spectral analysis tools of graph theory for identifying different types of errors or a lack of robustness of the system as well as &lt;i&gt;in fine&lt;/i&gt; documenting the quality of the registration.


Author(s):  
Jiangping Chen ◽  
Wanshu Feng ◽  
Minghai Luo

In mining association rules, the evaluation of the rules is a highly important work because it directly affects the usability and applicability of the output results of mining. In this paper, the concept of reliability was imported into the association rule evaluation. The reliability of association rules was defined as the accordance degree that reflects the rules of the mining data set. Such degree contains three levels of measurement, namely, accuracy, completeness, and consistency of rules. To show its effectiveness, the "accuracy-completeness-consistency" reliability evaluation system was applied to two extremely different data sets, namely, a basket simulation data set and a multi-source lightning data fusion. Results show that the reliability evaluation system works well in both simulation data set and the actual problem. The three-dimensional reliability evaluation can effectively detect the useless rules to be screened out and add the missing rules thereby improving the reliability of mining results. Furthermore, the proposed reliability evaluation system is applicable to many research fields; using the system in the analysis can facilitate obtainment of more accurate, complete, and consistent association rules.


Author(s):  
Weiping Liu ◽  
Jennifer Fung ◽  
Craig Abbey ◽  
John W. Sedat ◽  
David A. Agard

In the electron tomographic (EM) reconstruction process the mutual alignment between projections of different view angles is a crucial step. The routinely used alignment method is based on fiducial markers': a single-axis tilt projection series is collected with gold particles distributed on the specimen, the positions of high density gold beads on the projections are found, and the relationship between the specimen and the digital projection coordinate systems is determined from least-square fitting these found bead positions. There are four alignment parameters for each projection: two in shifts, one in in-plane rotation, and one in magnification. In the threedimensional studies of subcellular biological structures, we routinely collect data sets of more than 100 projections in the tilt range of ±75 ° with our automated EM set-up. Normally around 10 bead positions are used on each projection to achieve the alignment. Bead alignment used to be a laborious task since approximately 1000 bead positions need to be hand-picked for each data set.


Author(s):  
J. I. Peláez ◽  
J. M. Doña ◽  
D. La Red

Missing data is often an actual problem in real data sets, and different imputation techniques are normally used to alleviate this problem. Imputation is a method to fill in missing data with plausible values to produce a complete data set. In this chapter, we analyze the performance of the different traditional data imputation methods. A new fuzzy imputation approach is proposed using ordered weighted average operators and the majority concept. In order to form the majority concept, we propose the use of neat OWA operators and linguistic quantifiers with two fusion strategies for aggregation operators.


2005 ◽  
Vol 5 (2) ◽  
pp. 2377-2426
Author(s):  
R. Sussmann ◽  
W. Stremme ◽  
J. P. Burrows ◽  
A. Richter ◽  
W. Seiler ◽  
...  

Abstract. Columnar NO2 retrievals from solar FTIR measurements at the Zugspitze (47.42° N, 10.98° E, 2964 m a.s.l.), Germany were investigated synergistically with columnar NO2 retrieved from SCIAMACHY data by the University of Bremen scientific algorithm UB1.5 for the time span July 2002–October 2004. A new concept to match FTIR data to the time of satellite overpass makes use of the NO2 daytime increasing rate retrieved from the FTIR data set itself (+1.02E+14 cm-2/h). This measured increasing rate shows no significant seasonal variation. SCIAMACHY data within a 200-km selection radius around Zugspitze were considered, and a pollution-clearing scheme was developed to select only pixels corresponding to clean background (free) tropospheric conditions, and exclude local pollution hot spots. The resulting (uncorrected) difference between SCIAMACHY and FTIR columns varies between 0.59–0.95E+15 cm-2 with an average of 0.74E+15 cm-2. A day-to-day scatter of daily means of 7–10% could be retrieved in mutual agreement from FTIR and SCIAMACHY. Both data sets are showing sufficient precisions to make this assessment. Analysis of the averaging kernels gives proof that at high-mountain-site FTIR is a perfect measure for the pure stratospheric column, while SCIAMACHY shows significant tropospheric sensitivity. Based on this finding, we set up a combined a posteriori FTIR-SCIAMACHY retrieval for tropospheric NO


2018 ◽  
Author(s):  
Poerbandono ◽  
Philip J. Ward ◽  
Miga Magenika Julian

This paper discusses a study of application of global spatio-temporal climate data sets and a hydrological model operated in Spatial Tools for River Basin Environmental Analysis and Management (STREAM). The study investigates reconstruction of monthly hydrographs across several selected points of the western part of Java, Indonesia for the period 1983-2002. Prior to the reconstruction, set up and calibration are carried out. The set up includes preparation of monthly precipitation and temperature data set, digital elevation model of the domain being studied and their compilation with land cover map. Discharge observations from six stations located mostly at the upper parts of major watersheds in the domain are used to calibrate the model. It is found that the model performs results with acceptable agreement. Comparison between computed and observed monthly average discharges correlate quite well with coefficient ranging from 0.72 to 0.93. The accuracy of computed total annual average discharge in five out of six observation stations is within the range of 7%. Optimum setting of calibration parameters is discovered. This study offers scheme for reconstructing historical discharge in paleo-climate perspective and future scenario for predicting local effect of global climate change, given the predicted climate data sets and geographic setting (i.e. topography and land cover).


1999 ◽  
Vol 42 (4) ◽  
Author(s):  
M. Anzidei ◽  
P. Baldi ◽  
A. Galvani ◽  
A. Pesci ◽  
I. Hunstad ◽  
...  

On September 26,1997 two earthquakes of Mw 5.7 (00.33 GMT) and Mw 6.0 (9.40 GMT), occurred in the Umbria-Marche region (Central Apennines, Italy). The epicentres were located in an area of the Apenninic chain that experienced historical earthquakes up to X degrees of the MCS scale. During the time span 1992-1996, the Italian Istituto Geografico Militare (IGM) set up a new national geodetic network measured by Global Positioning System space geodetic technique, consisting of more than 1200 vertices uniformly distributed on the Italian peninsula and islands. From October 7 to 11, 1997, a short while after the main shocks of the Umbria-Marche seismic sequence, we reoccupied thirteen stations belonging to the IGM and TYRGEONET networks to measure coseismic displacement. The determinations of the post-seismic coordinates at 13 GPS monuments detected significant coseismic displacements. The comparison between the preseismic and postseismic data sets show maximum displacements of 14 cm and 25 cm in the horizontal and vertical components respectively. In this paper, the GPS network, the field work, the data processing procedures and the computed coseismic displacements measured at the geodetic monuments are discussed with the aim to provide a data set useful to the scientific


2018 ◽  
Vol 44 (1) ◽  
pp. 52-73 ◽  
Author(s):  
Jean-Christophe Plantin

This article investigates the work of processors who curate and “clean” the data sets that researchers submit to data archives for archiving and further dissemination. Based on ethnographic fieldwork conducted at the data processing unit of a major US social science data archive, I investigate how these data processors work, under which status, and how they contribute to data sharing. This article presents two main results. First, it contributes to the study of invisible technicians in science by showing that the same procedures can make technical work invisible outside and visible inside the archive, to allow peer review and quality control. Second, this article contributes to the social study of scientific data sharing, by showing that the organization of data processing directly stems from the conception that the archive promotes of a valid data set—that is, a data set that must look “pristine” at the end of its processing. After critically interrogating this notion of pristineness, I show how it perpetuates a misleading conception of data as “raw” instead of acknowledging the important contribution of data processors to data sharing and social science.


Sign in / Sign up

Export Citation Format

Share Document