Selection of the Sub-noise Gain Level for Acquisition of VOCAL Data Sets: A Reliability Study

A comparative study of several recently proposed one-dimensional sedimentation models has been made. This has been achieved by fitting these models to steady-state and dynamic concentration profiles obtained in a down-scaled secondary decanter. The models were evaluated with several a posteriori model selection criteria. Since the purpose of the modelling task is to do on-line simulations, the calculation time was used as one of the selection criteria. Finally, the practical identifiability of the models for the available data sets was also investigated. It could be concluded that the model of Takács et al. (1991) gave the most reliable results.

Download Full-text

Real-time Approximation of Photometric Polygonal Lights

Proceedings of the ACM on Computer Graphics and Interactive Techniques ◽

10.1145/3384537 ◽

2020 ◽

Vol 3 (1) ◽

pp. 1-18

Author(s):

Christian Luksch ◽

Lukas Prost ◽

Michael Wimmer

Keyword(s):

Real Time ◽

Specular Reflection ◽

Near Field ◽

Measurement Data ◽

Data Sets ◽

Photometric Measurement ◽

Integration Technique ◽

Time Approximation ◽

Light Emitter ◽

Selection Of

We present a real-time rendering technique for photometric polygonal lights. Our method uses a numerical integration technique based on a triangulation to calculate noise-free diffuse shading. We include a dynamic point in the triangulation that provides a continuous near-field illumination resembling the shape of the light emitter and its characteristics. We evaluate the accuracy of our approach with a diverse selection of photometric measurement data sets in a comprehensive benchmark framework. Furthermore, we provide an extension for specular reflection on surfaces with arbitrary roughness that facilitates the use of existing real-time shading techniques. Our technique is easy to integrate into real-time rendering systems and extends the range of possible applications with photometric area lights.

Download Full-text

The predictability of reported drought events and impacts in the Ebro Basin using six different remote sensing data sets

Hydrology and Earth System Sciences ◽

10.5194/hess-21-4747-2017 ◽

2017 ◽

Vol 21 (9) ◽

pp. 4747-4765 ◽

Cited By ~ 8

Author(s):

Clara Linés ◽

Micha Werner ◽

Wim Bastiaanssen

Keyword(s):

Remote Sensing ◽

Remote Sensing Data ◽

Data Sets ◽

Drought Management ◽

Ebro Basin ◽

Sensing Data ◽

Drought Impacts ◽

Management Plans ◽

Drought Indicators ◽

Selection Of

Abstract. The implementation of drought management plans contributes to reduce the wide range of adverse impacts caused by water shortage. A crucial element of the development of drought management plans is the selection of appropriate indicators and their associated thresholds to detect drought events and monitor the evolution. Drought indicators should be able to detect emerging drought processes that will lead to impacts with sufficient anticipation to allow measures to be undertaken effectively. However, in the selection of appropriate drought indicators, the connection to the final impacts is often disregarded. This paper explores the utility of remotely sensed data sets to detect early stages of drought at the river basin scale and determine how much time can be gained to inform operational land and water management practices. Six different remote sensing data sets with different spectral origins and measurement frequencies are considered, complemented by a group of classical in situ hydrologic indicators. Their predictive power to detect past drought events is tested in the Ebro Basin. Qualitative (binary information based on media records) and quantitative (crop yields) data of drought events and impacts spanning a period of 12 years are used as a benchmark in the analysis. Results show that early signs of drought impacts can be detected up to 6 months before impacts are reported in newspapers, with the best correlation–anticipation relationships for the standard precipitation index (SPI), the normalised difference vegetation index (NDVI) and evapotranspiration (ET). Soil moisture (SM) and land surface temperature (LST) offer also good anticipation but with weaker correlations, while gross primary production (GPP) presents moderate positive correlations only for some of the rain-fed areas. Although classical hydrological information from water levels and water flows provided better anticipation than remote sensing indicators in most of the areas, correlations were found to be weaker. The indicators show a consistent behaviour with respect to the different levels of crop yield in rain-fed areas among the analysed years, with SPI, NDVI and ET providing again the stronger correlations. Overall, the results confirm remote sensing products' ability to anticipate reported drought impacts and therefore appear as a useful source of information to support drought management decisions.

Download Full-text

Unsupervised entropy-based selection of data sets for improved model fitting

2016 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn.2016.7727625 ◽

2016 ◽

Cited By ~ 1

Author(s):

Pedro M. Ferreira

Keyword(s):

Model Fitting ◽

Data Sets ◽

Improved Model ◽

Selection Of

Download Full-text

Model of predicting customer behavior based on big data analysis technologies

Bulletin of the National Technical University KhPI A series of Information and Modeling ◽

10.20998/2411-0558.2021.02.06 ◽

2021 ◽

Vol 1 (2 (6)) ◽

Author(s):

Anastasiia Ivanitska ◽

Dmytro Ivanov ◽

Ludmila Zubik

Keyword(s):

Machine Learning ◽

Big Data ◽

Customer Behavior ◽

Learning Technologies ◽

User Preferences ◽

Data Sets ◽

Learning Technology ◽

Behavior Prediction ◽

Network Information ◽

Selection Of

The analysis of the available methods and models of formation of recommendations for the potential buyer in network information systems for the purpose of development of effective modules of selection of advertising is executed. The effectiveness of the use of machine learning technologies for the analysis of user preferences based on the processing of data on purchases made by users with a similar profile is substantiated. A model of recommendation formation based on machine learning technology is proposed, its work on test data sets is tested and the adequacy of the RMSE model is assessed. Keywords: behavior prediction; advertising based on similarity; collaborative filtering; matrix factorization; big data; machine learning

Download Full-text

Self-Adaptive K-Means Based on a Covering Algorithm

Complexity ◽

10.1155/2018/7698274 ◽

2018 ◽

Vol 2018 ◽

pp. 1-16 ◽

Cited By ~ 1

Author(s):

Yiwen Zhang ◽

Yuanyuan Zhou ◽

Xing Guo ◽

Jintao Wu ◽

Qiang He ◽

...

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Real Data ◽

Second Phase ◽

Data Sets ◽

Number Of Clusters ◽

Large Scale Data ◽

Long Time ◽

Two Phases ◽

Selection Of

The K-means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K-means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K-means clustering algorithm called the covering K-means algorithm (C-K-means). The C-K-means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K-means. The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a “blind” feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K-means algorithm combines the advantages of CA and K-means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K-means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C-K-means algorithm outperforms the existing algorithms under both sequential and parallel conditions.

Download Full-text

Automated phase attribute-based picking applied to reflection seismics

Geophysics ◽

10.1190/geo2015-0333.1 ◽

2016 ◽

Vol 81 (2) ◽

pp. V141-V150 ◽

Cited By ~ 10

Author(s):

Emanuele Forte ◽

Matteo Dossi ◽

Michele Pipan ◽

Anna Del Ben

Keyword(s):

Signal To Noise Ratio ◽

Phase Method ◽

Data Sets ◽

Reflection Seismic ◽

Instantaneous Phase ◽

Whole Process ◽

Specific Analysis ◽

Reflection Seismics ◽

Selection Of ◽

Energy Package

We have applied an attribute-based autopicking algorithm to reflection seismics with the aim of reducing the influence of the user’s subjectivity on the picking results and making the interpretation faster with respect to manual and semiautomated techniques. Our picking procedure uses the cosine of the instantaneous phase to automatically detect and mark as a horizon any recorded event characterized by lateral phase continuity. A patching procedure, which exploits horizon parallelism, can be used to connect consecutive horizons marking the same event but separated by noise-related gaps. The picking process marks all coherent events regardless of their reflection strength; therefore, a large number of independent horizons can be constructed. To facilitate interpretation, horizons marking different phases of the same reflection can be automatically grouped together and specific horizons from each reflection can be selected using different possible methods. In the phase method, the algorithm reconstructs the reflected wavelets by averaging the cosine of the instantaneous phase along each horizon. The resulting wavelets are then locally analyzed and confronted through crosscorrelation, allowing the recognition and selection of specific reflection phases. In case the reflected wavelets cannot be recovered due to shape-altering processing or a low signal-to-noise ratio, the energy method uses the reflection strength to group together subparallel horizons within the same energy package and to select those satisfying either energy or arrival time criteria. These methods can be applied automatically to all the picked horizons or to horizons individually selected by the interpreter for specific analysis. We show examples of application to 2D reflection seismic data sets in complex geologic and stratigraphic conditions, critically reviewing the performance of the whole process.

Download Full-text

Machine learning for improved data analysis of biological aerosol using the WIBS

Atmospheric Measurement Techniques ◽

10.5194/amt-11-6203-2018 ◽

2018 ◽

Vol 11 (11) ◽

pp. 6203-6230 ◽

Cited By ~ 8

Author(s):

Simon Ruske ◽

David O. Topping ◽

Virginia E. Foot ◽

Andrew P. Morse ◽

Martin W. Gallagher

Keyword(s):

Ultraviolet Light ◽

Fungal Spores ◽

Laboratory Data ◽

Misclassification Rate ◽

Gradient Boosting ◽

Classification Error ◽

Data Sets ◽

Data Preparation ◽

Different Types ◽

Selection Of

Abstract. Primary biological aerosol including bacteria, fungal spores and pollen have important implications for public health and the environment. Such particles may have different concentrations of chemical fluorophores and will respond differently in the presence of ultraviolet light, potentially allowing for different types of biological aerosol to be discriminated. Development of ultraviolet light induced fluorescence (UV-LIF) instruments such as the Wideband Integrated Bioaerosol Sensor (WIBS) has allowed for size, morphology and fluorescence measurements to be collected in real-time. However, it is unclear without studying instrument responses in the laboratory, the extent to which different types of particles can be discriminated. Collection of laboratory data is vital to validate any approach used to analyse data and ensure that the data available is utilized as effectively as possible. In this paper a variety of methodologies are tested on a range of particles collected in the laboratory. Hierarchical agglomerative clustering (HAC) has been previously applied to UV-LIF data in a number of studies and is tested alongside other algorithms that could be used to solve the classification problem: Density Based Spectral Clustering and Noise (DBSCAN), k-means and gradient boosting. Whilst HAC was able to effectively discriminate between reference narrow-size distribution PSL particles, yielding a classification error of only 1.8 %, similar results were not obtained when testing on laboratory generated aerosol where the classification error was found to be between 11.5 % and 24.2 %. Furthermore, there is a large uncertainty in this approach in terms of the data preparation and the cluster index used, and we were unable to attain consistent results across the different sets of laboratory generated aerosol tested. The lowest classification errors were obtained using gradient boosting, where the misclassification rate was between 4.38 % and 5.42 %. The largest contribution to the error, in the case of the higher misclassification rate, was the pollen samples where 28.5 % of the samples were incorrectly classified as fungal spores. The technique was robust to changes in data preparation provided a fluorescent threshold was applied to the data. In the event that laboratory training data are unavailable, DBSCAN was found to be a potential alternative to HAC. In the case of one of the data sets where 22.9 % of the data were left unclassified we were able to produce three distinct clusters obtaining a classification error of only 1.42 % on the classified data. These results could not be replicated for the other data set where 26.8 % of the data were not classified and a classification error of 13.8 % was obtained. This method, like HAC, also appeared to be heavily dependent on data preparation, requiring a different selection of parameters depending on the preparation used. Further analysis will also be required to confirm our selection of the parameters when using this method on ambient data. There is a clear need for the collection of additional laboratory generated aerosol to improve interpretation of current databases and to aid in the analysis of data collected from an ambient environment. New instruments with a greater resolution are likely to improve on current discrimination between pollen, bacteria and fungal spores and even between different species, however the need for extensive laboratory data sets will grow as a result.

Download Full-text

Selection of Data Sets of Motifs as Attributes in the Process of Automating the Annotation of Proteins’ Keywords

Advances in Bioinformatics and Computational Biology - Lecture Notes in Computer Science ◽

10.1007/11532323_30 ◽

2005 ◽

pp. 230-233 ◽

Cited By ~ 1

Author(s):

Ana L. C. Bazzan ◽

Cassia T. dos Santos

Keyword(s):

Data Sets ◽

Selection Of

Download Full-text

Screening Strategies and Methods for Better Off-Target Liability Prediction and Identification of Small-Molecule Pharmaceuticals

SLAS DISCOVERY Advancing Life Sciences ◽

10.1177/2472555218799713 ◽

2018 ◽

Vol 24 (1) ◽

pp. 1-24 ◽

Cited By ~ 3

Author(s):

Terry R. Van Vleet ◽

Michael J. Liguori ◽

James J. Lynch ◽

Mohan Rao ◽

Scott Warder

Keyword(s):

Data Sets ◽

Multiple Sources ◽

Drug Candidates ◽

Critical Gap ◽

Screening Strategies ◽

Feasible Option ◽

Stage Development ◽

Integrated Screening ◽

Expensive Process ◽

Selection Of

Pharmaceutical discovery and development is a long and expensive process that, unfortunately, still results in a low success rate, with drug safety continuing to be a major impedance. Improved safety screening strategies and methods are needed to more effectively fill this critical gap. Recent advances in informatics are now making it possible to manage bigger data sets and integrate multiple sources of screening data in a manner that can potentially improve the selection of higher-quality drug candidates. Integrated screening paradigms have become the norm in Pharma, both in discovery screening and in the identification of off-target toxicity mechanisms during later-stage development. Furthermore, advances in computational methods are making in silico screens more relevant and suggest that they may represent a feasible option for augmenting the current screening paradigm. This paper outlines several fundamental methods of the current drug screening processes across Pharma and emerging techniques/technologies that promise to improve molecule selection. In addition, the authors discuss integrated screening strategies and provide examples of advanced screening paradigms.

Download Full-text