A Further Investigation on the Application of Critical Pore Size as an Approach for Reservoir Rock Typing

2021 ◽  
Vol 143 (11) ◽  
Author(s):  
Mohsen Faramarzi-Palangar ◽  
Behnam Sedaee ◽  
Mohammad Emami Niri

Abstract The correct definition of rock types plays a critical role in reservoir characterization, simulation, and field development planning. In this study, we use the critical pore size (linf) as an approach for reservoir rock typing. Two linf relations were separately derived based on two permeability prediction models and then merged together to drive a generalized linf relation. The proposed rock typing methodology includes two main parts: in the first part, we determine an appropriate constant coefficient, and in the second part, we perform reservoir rock typing based on two different scenarios. The first scenario is based on the forming groups of rocks using statistical analysis, and the second scenario is based on the forming groups of rocks with similar capillary pressure curves. This approach was applied to three data sets. In detail, two data sets were used to determine the constant coefficient, and one data set was used to show the applicability of the linf method in comparison with FZI for rock typing.

2015 ◽  
Vol 17 (5) ◽  
pp. 719-732
Author(s):  
Dulakshi Santhusitha Kumari Karunasingha ◽  
Shie-Yui Liong

A simple clustering method is proposed for extracting representative subsets from lengthy data sets. The main purpose of the extracted subset of data is to use it to build prediction models (of the form of approximating functional relationships) instead of using the entire large data set. Such smaller subsets of data are often required in exploratory analysis stages of studies that involve resource consuming investigations. A few recent studies have used a subtractive clustering method (SCM) for such data extraction, in the absence of clustering methods for function approximation. SCM, however, requires several parameters to be specified. This study proposes a clustering method, which requires only a single parameter to be specified, yet it is shown to be as effective as the SCM. A method to find suitable values for the parameter is also proposed. Due to having only a single parameter, using the proposed clustering method is shown to be orders of magnitudes more efficient than using SCM. The effectiveness of the proposed method is demonstrated on phase space prediction of three univariate time series and prediction of two multivariate data sets. Some drawbacks of SCM when applied for data extraction are identified, and the proposed method is shown to be a solution for them.


Author(s):  
Soumya Raychaudhuri

The genomics era has presented many new high throughput experimental modalities that are capable of producing large amounts of data on comprehensive sets of genes. In time there will certainly be many more new techniques that explore new avenues in biology. In any case, textual analysis will be an important aspect of the analysis. The body of the peer-reviewed scientific text represents all of our accomplishments in biology, and it plays a critical role in hypothesizing and interpreting any data set. To altogether ignore it is tantamount to reinventing the wheel with each analysis. The volume of relevant literature approaches proportions where it is all but impossible to manually search through all of it. Instead we must often rely on automated text mining methods to access the literature efficiently and effectively. The methods we present in this book provide an introduction to the avenues that one can employ to include text in a meaningful way in the analysis of these functional genomics data sets. They serve as a complement to the statistical methods such as classification and clustering that are commonly employed to analyze data sets. We are hopeful that this book will serve to encourage the reader to utilize and further develop text mining in their own analyses.


10.29007/rh9l ◽  
2019 ◽  
Author(s):  
Cuauhtémoc López-Martín

Defect density (DD) is a measure to determine the effectiveness of software processes. DD is defined as the total number of defects divided by the size of the software. Software prediction is an activity of software planning. This study is related to the analysis of attributes of data sets commonly used for building DD prediction models. The data sets of software projects were selected from the International Software Benchmarking Standards Group (ISBSG) Release 2018. The selection criteria were based on attributes such as type of development, development platform, and programming language generation as suggested by the ISBSG. Since a lower size of data set is generated as mentioned criteria are observed, it avoids a good generalization for models. Therefore, in this study, a statistical analysis of data sets was performed with the objective of knowing if they could be pooled instead of using them as separated data sets. Results showed that there was no difference among the DD of new projects nor among the DD of enhancement projects, but there was a difference between the DD of new and enhancement projects. Results suggest that prediction models can separately be constructed for new projects and enhancement projects, but not by pooling new and enhancement ones.


2020 ◽  
Vol 10 (8) ◽  
pp. 2725-2739 ◽  
Author(s):  
Diego Jarquin ◽  
Reka Howard ◽  
Jose Crossa ◽  
Yoseph Beyene ◽  
Manje Gowda ◽  
...  

“Sparse testing” refers to reduced multi-environment breeding trials in which not all genotypes of interest are grown in each environment. Using genomic-enabled prediction and a model embracing genotype × environment interaction (GE), the non-observed genotype-in-environment combinations can be predicted. Consequently, the overall costs can be reduced and the testing capacities can be increased. The accuracy of predicting the unobserved data depends on different factors including (1) how many genotypes overlap between environments, (2) in how many environments each genotype is grown, and (3) which prediction method is used. In this research, we studied the predictive ability obtained when using a fixed number of plots and different sparse testing designs. The considered designs included the extreme cases of (1) no overlap of genotypes between environments, and (2) complete overlap of the genotypes between environments. In the latter case, the prediction set fully consists of genotypes that have not been tested at all. Moreover, we gradually go from one extreme to the other considering (3) intermediates between the two previous cases with varying numbers of different or non-overlapping (NO)/overlapping (O) genotypes. The empirical study is built upon two different maize hybrid data sets consisting of different genotypes crossed to two different testers (T1 and T2) and each data set was analyzed separately. For each set, phenotypic records on yield from three different environments are available. Three different prediction models were implemented, two main effects models (M1 and M2), and a model (M3) including GE. The results showed that the genome-based model including GE (M3) captured more phenotypic variation than the models that did not include this component. Also, M3 provided higher prediction accuracy than models M1 and M2 for the different allocation scenarios. Reducing the size of the calibration sets decreased the prediction accuracy under all allocation designs with M3 being the less affected model; however, using the genome-enabled models (i.e., M2 and M3) the predictive ability is recovered when more genotypes are tested across environments. Our results indicate that a substantial part of the testing resources can be saved when using genome-based models including GE for optimizing sparse testing designs.


2021 ◽  
Author(s):  
Alessandra Toniato ◽  
Philippe Schwaller ◽  
Antonio Cardinale ◽  
Joppe Geluykens ◽  
Teodoro Laino

<p>Existing deep learning models applied to reaction prediction in organic chemistry can reach high levels of accuracy (> 90% for Natural Language Processing-based ones).</p><p>With no chemical knowledge embedded than the information learnt from reaction data, the quality of the data sets plays a crucial role in the performance of the prediction models. While human curation is prohibitively expensive, the need for unaided approaches to remove chemically incorrect entries from existing data sets is essential to improve artificial intelligence models' performance in synthetic chemistry tasks. Here we propose a machine learning-based, unassisted approach to remove chemically wrong entries from chemical reaction collections. We applied this method to the collection of chemical reactions Pistachio and to an open data set, both extracted from USPTO (United States Patent Office) patents. Our results show an improved prediction quality for models trained on the cleaned and balanced data sets. For the retrosynthetic models, the round-trip accuracy metric grows by 13 percentage points and the value of</p><p>the cumulative Jensen Shannon divergence decreases by 30% compared to its original record. The coverage remains high with 97%, and the value of the class-diversity is not affected by the cleaning. The proposed strategy is the first unassisted rule-free technique to address automatic noise reduction in chemical data sets.</p>


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Dipendra Jha ◽  
Kamal Choudhary ◽  
Francesca Tavazza ◽  
Wei-keng Liao ◽  
Alok Choudhary ◽  
...  

AbstractThe current predictive modeling techniques applied to Density Functional Theory (DFT) computations have helped accelerate the process of materials discovery by providing significantly faster methods to scan materials candidates, thereby reducing the search space for future DFT computations and experiments. However, in addition to prediction error against DFT-computed properties, such predictive models also inherit the DFT-computation discrepancies against experimentally measured properties. To address this challenge, we demonstrate that using deep transfer learning, existing large DFT-computational data sets (such as the Open Quantum Materials Database (OQMD)) can be leveraged together with other smaller DFT-computed data sets as well as available experimental observations to build robust prediction models. We build a highly accurate model for predicting formation energy of materials from their compositions; using an experimental data set of $$1,643$$1,643 observations, the proposed approach yields a mean absolute error (MAE) of $$0.07$$0.07 eV/atom, which is significantly better than existing machine learning (ML) prediction modeling based on DFT computations and is comparable to the MAE of DFT-computation itself.


2009 ◽  
Vol 111 (7) ◽  
pp. 1583-1618 ◽  
Author(s):  
Louise B. Jennings ◽  
Heidi Mills

Background/Context In an age of test-driven accountability, many schools are returning to banking pedagogies in which students passively take in content. Inquiry-based instruction offers one approach for actively involving students in meaningful learning activity, however, research on inquiry pedagogies often focuses on academic accomplishments. Our study examines how inquiry-based dialogue not only supports academic learning but also supports social learning as students and teachers negotiate, share ideas, collaborate, and problem-solve together. Purpose This longitudinal study builds on conceptualizations of dialogic inquiry to examine how teachers and students coconstructed a discourse of inquiry in a public magnet school. We examine the processes and practices that make up this discourse of inquiry and study the function of teacher talk in supporting academic and social learning and agency among students. Setting The Center for Inquiry is a public magnet elementary school located in an ethnically diverse suburban community that was formed in partnership with the University of South Carolina. Participants Participants included teachers and 135 students (65% European American, 30% African American) studied during a 5-year period. Research Design The two authors worked collaboratively with school members to collect two related ethnographic data sets. Data Set 1 captured classroom practices across all six classrooms, and Data Set 2 followed one cohort during the same 5-year period. Findings Findings are presented in two sections. The first section presents a discourse of inquiry made up of six interacting practices of inquiry constructed by teachers and students across classrooms. This discourse of inquiry integrates academic and social practices that position inquiry as (1) dynamic and dialogic, (2) attentive, probing, and thoughtful, (3) agentive and socially responsible, (4) relational and compassionate, (5) reflective and reflexive, and (6) valuing multiple and interdisciplinary perspectives. The second section makes visible how these practices of inquiry were coconstructed through transcripts of classroom discourse drawn from both data sets that centered on discussions of life science. Conclusions/Recommendations This discourse of inquiry supports students as active, thoughtful, engaged learners and community members and underscores the critical role of classroom talk, collaboration, and deliberation in meaningful learning engagements. Although teachers and students alike took multiple roles and responsibilities through inquiry, the teacher's discourse was critical in supporting and extending student learning. We recommend professional development opportunities that equip preservice and in-service teachers with resources, skills, and dispositions to become active inquirers of their own classroom and school practices and who recognize the power of classroom talk to shape and limit possibilities.


2021 ◽  
Vol 10 (02) ◽  
pp. 170-186
Author(s):  
Normadiah Mahiddin ◽  
Zulaiha Ali Othman ◽  
Nur Arzuar Abdul Rahim

Diabetes is one of the growing chronic diseases. Proper treatment is needed to produce its effects. Past studies have proposed an Interrelated Decision-making Model (IDM) as an intelligent decision support system (IDSS) solution for healthcare. This model can provide accurate results in determining the treatment of a particular patient. Therefore, the purpose of this study is to develop a diabetic IDM to see the increased decision-making accuracy with the IDM concept. The IDM concept allows the amount of data to increase with the addition of data records at the same level of care, and the addition of data records and attributes from the previous or subsequent levels of care. The more data or information, the more accurate a decision can be made. Data were developed to make diagnostic predictions for each stage of care in the development of type 2 diabetes. The development of data for each stage of care was confirmed by specialists. However, the experiments were performed using simulation data for two stages of care only. Four data sets of different sizes were provided to view changes in forecast accuracy. Each data set contained 2 data sets of primary care level and secondary care level with 4 times the change of the number of attributes from 25 to 58 and the number of records from 300 to 11,000. Data were developed to predict the level of diabetes confirmed by specialist doctors. The experimental results showed that on average, the J48 algorithm showed the best model (99%) followed by Logistics (98%), RandomTree (95%), NaiveBayes Updateable (93%), BayesNet (84%) and AdaBoostM1 (67%). Ratio analysis also showed that the accuracy of the forecast model has increased up to 49%. The MAPKB model for the care of diabetes is designed with data change criteria dynamically and is able to develop the latest dynamic prediction models effectively.v


Nadwa ◽  
2014 ◽  
Vol 8 (2) ◽  
pp. 193
Author(s):  
Indra Kusuma

<p>This paper describes the development of a set of data and planning models based instrumentation applications seventeen plus patterns in BK teachers SMP / MTs in Bondowoso. The results of this study indicate that counselling activity planning model using the approach pattern seventeen plus becomes very necessary. B.K. teachers in SMP/MTs of Bondowoso still do not have a wide range of data that should be held for the provision of counselling services. The teachers feel it is important to have variety of data sets and instrumentation applications for the smooth running of counselling services (score = 3.23). The evaluation of the implementation of set of data and applications instrument was still very low (score = 1.14). The planning model development activities and application instrumentation data set found that B.K. teachers desperately need (score = 4.28). The assessment model development planning activities data set and instrumentation applications that are promoted rated excellent (score = 4.47). </p><p><br /><strong>Abstrak </strong></p><p>Makalah ini menjelaskan pengembangan model perencanaan himpunan data dan aplikasi instrumentasi berbasis pola tujuh belas plus pada guru B.K. SMP/MTs di Bondowoso. Hasil penelitian ini menunjukkan bahwa model perencanaan kegiatan konseling dengan menggunakan pendekatan pola tujuh belas plus menjadi sangat perlu dilakukan agar implementasinya sesuai dengan kebutuhan siswa. Guru BK SMP/MTs di Bondowoso masih banyak yang belum memiliki berbagai data untuk penyelenggaraan layanan konseling. Guru-guru tersebut sangat memerlukan himpunan data dan aplikasi instrumentasi untuk kelancaran menjalankan tugasnya (rerata = 3,23). Sementara evaluasi pelaksanaan himpunan data dan aplikasi instrumennya ternyata masih sangat rendah (rerata = 1,14. Guru-guru BK juga sangat membutuhkan (rerata = 4,28). Penilaian pengembangan model perencanaan kegiatan himpunan data dan aplikasi instrumentasi yang dipromosikan dinilai sangat baik (rata = 4,47) </p>


2020 ◽  
Vol 15 ◽  
Author(s):  
Pratik Joshi ◽  
V Masilamani ◽  
Raj Ramesh

Background: Preventing adverse drug reactions (ADRs) is imperative for the safety of the people. The problem of under-reporting the ADRs has been prevalent across the world, making it difficult to develop the prediction models, which are unbiased. As a result, most of the models are skewed to the negative samples leading to high accuracy but poor performance in other metrics such as precision, recall, F1 score, and AUROC score. Objective: In this work, we have proposed a novel way of predicting the ADRs by balancing the dataset. Method: The whole data set has been partitioned into balanced smaller data sets. SVMs with optimal kernel have been learned using each of the balanced data sets and the prediction of given ADR for the given drug has been obtained by voting from the ensembled optimal SVMs learned. Results: We have found that results are encouraging and comparable with the competing methods in the literature and obtained the average sensitivity of 0.97 for all the ADRs. The model has been interpreted and explained with SHAP values by various plots.


Sign in / Sign up

Export Citation Format

Share Document