Collecting and evaluating large volumes of bibliographic metadata aggregated in the WorldCat database: a proposed methodology to overcome challenges

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Vyacheslav I. Zavalin ◽  
Shawne D. Miksa

Purpose This paper aims to discuss the challenges encountered in collecting, cleaning and analyzing the large data set of bibliographic metadata records in machine-readable cataloging [MARC 21] format. Possible solutions are presented. Design/methodology/approach This mixed method study relied on content analysis and social network analysis. The study examined subject representation in MARC 21 metadata records created in 2020 in WorldCat – the largest international database of “big smart data.” The methodological challenges that were encountered and solutions are examined. Findings In this general review paper with a focus on methodological issues, the discussion of challenges is followed by a discussion of solutions developed and tested as part of this study. Data collection, processing, analysis and visualization are addressed separately. Lessons learned and conclusions related to challenges and solutions for the design of a large-scale study evaluating MARC 21 bibliographic metadata from WorldCat are given. Overall recommendations for the design and implementation of future research are suggested. Originality/value There are no previous publications that address the challenges and solutions of data collection and analysis of WorldCat’s “big smart data” in the form of MARC 21 data. This is the first study to use a large data set to systematically examine MARC 21 library metadata records created after the most recent addition of new fields and subfields to MARC 21 Bibliographic Format standard in 2019 based on resource description and access rules. It is also the first to focus its analyzes on the networks formed by subject terms shared by MARC 21 bibliographic records in a data set extracted from a heterogeneous centralized database WorldCat.

2016 ◽  
Vol 8 (2) ◽  
pp. 137-172 ◽  
Author(s):  
Diana M. Hechavarría

Purpose Drawing on the multiplicity of context approach, this study investigates whether female entrepreneurs are more likely than male entrepreneurs to create environmentally oriented organizations. This study aims to examine how context, measured by gender socialization stereotypes and post-materialism, differentially affects the kinds of organizations entrepreneurs choose to create. Design/methodology/approach To test the hypotheses, this study utilizes Global Entrepreneurship Monitor data from 2009 (n = 17,364) for nascent entrepreneurs, baby businesses owners and established business owners in 47 counties. This study also utilizes the World Values Surveys to measure gender ideologies and post-materialist cultural values at the country level. To test the hypotheses, a logistic multi-level model is estimated to identify the drivers of environmental venturing. Data are nested by countries, and this allows random intercepts by countries with a variance components covariance structure. Findings Findings indicate that female entrepreneurs are more likely to engage in ecological venturing. Societies with high levels of post-materialist national values are significantly more likely to affect female entrepreneurs to engage in environmental ventures when compared to male entrepreneurs. Moreover, traditional gender socialization stereotypes decrease the probability of engaging in environmental entrepreneurship. Likewise, female entrepreneurs in societies with strong stereotypes regarding gender socialization will more likely engage in environmental entrepreneurship than male entrepreneurs. Research limitations/implications The present study uses a gender analysis approach to investigate empirical differences in environmental entrepreneurial activity based on biological sex. However, this research assumes that gender is the driver behind variations in ecopreneurship emphasis between the engagement of males and females in venturing activity. The findings suggest that female entrepreneurs pursuing ecological ventures are more strongly influenced by contextual factors, when compared to male entrepreneurs. Future research can build upon these findings by applying a more nuanced view of gender via constructivist approaches. Originality/value This study is one of the few to investigate ecologically oriented ventures with large-scale empirical data by utilizing a 47-country data set. As a result, it begins to open the black box of environmental entrepreneurship by investigating the role of gender, seeking to understand if men and women entrepreneurs equally engage in environmental venturing. And it responds to calls that request more research at the intersection of gender and context in terms of environmental entrepreneurship.


2020 ◽  
Vol 22 (34) ◽  
pp. 19326-19341
Author(s):  
Aditya Nandy ◽  
Daniel B. K. Chu ◽  
Daniel R. Harper ◽  
Chenru Duan ◽  
Naveen Arunachalam ◽  
...  

The origin of distinct 3d vs. 4d transition metal complex sensitivity to exchange is explored over a large data set.


2019 ◽  
Vol 32 (4) ◽  
pp. 1523-1538 ◽  
Author(s):  
Sérgio Moro ◽  
Joaquim Esmerado ◽  
Pedro Ramos ◽  
Bráulio Alturas

Purpose This paper aims to propose a data mining approach to evaluate a conceptual model in tourism, encompassing a large data set characterized by dimensions grounded on existing literature. Design/methodology/approach The approach is tested using a guest satisfaction model encompassing nine dimensions. A large data set of 84 k online reviews and 31 features was collected from TripAdvisor. The review score granted was considered a proxy of guest satisfaction and was defined as the target feature to model. A sequence of data understanding and preparation tasks led to a tuned set of 60k reviews and 29 input features which were used for training the data mining model. Finally, the data-based sensitivity analysis was adopted to understand which dimensions most influence guest satisfaction. Findings Previous user’s experience with the online platform, individual preferences, and hotel prestige were the most relevant dimensions concerning guests’ satisfaction. On the opposite, homogeneous characteristics among the Las Vegas hotels such as the hotel size were found of little relevance to satisfaction. Originality/value This study intends to set a baseline for an easier adoption of data mining to evaluate conceptual models through a scalable approach, helping to bridge between theory and practice, especially relevant when dealing with Big Data sources such as the social media. Thus, the steps undertaken during the study are detailed to facilitate replication to other models.


2017 ◽  
Vol 11 (4) ◽  
pp. 138-147 ◽  
Author(s):  
Nina Evans ◽  
Rae Baines

Purpose The purpose of this paper is to explore a large data set compiled by a UK charity loan scheme to identify trends and paint a practice-based picture of how young children use early years powered mobility (EYPM). Design/methodology/approach Statistical analysis was used to investigate a database of 90 children, ranging in age from 15 to 72 months who completed use of an EYPM device (the Wizzybug, or WB) between April 2011 and December 2015. Goals were set and reviewed, and thematic analysis was used to understand families’ insights into their children’s use of EYPM, using a free-text review form. Findings Children’s mean age when joining this free loan scheme was 39.6 months. The later the child started using a Wizzybug, the less likely they were to achieve their goals. A theme of happiness and enjoyment emerged as important for both child and family. The child’s independence translated to independence for the whole family. Research limitations/implications The database was operational and incomplete. Lack of a standardised outcome measure was disadvantageous. Practical implications Challenges of translating research knowledge into practice are highlighted, supporting the need for more rigorous and standardised outcome measures. Earlier identification of children’s readiness for EYPM is required alongside research and recognition of the holistic benefits of EYPM for all the family. Originality/value This research profited from a large data set of young children with long-term access to powered mobility at home.


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Rishi Dwesar ◽  
Debajani Sahoo

PurposeIncreased global air travel and competition in the airline industry entail better service delivery and failure management. This study examines how airline type, failure criticality and the traveller's culture influence travellers' airline evaluations of service failure.Design/methodology/approachThe study uses a large data set of customers' online reviews and incorporates quantitative and qualitative feedback from 20 major airlines across the world. Semantic tagging, sentiment and multivariate analyses have been used to analyse the data.FindingsFailure criticality and travellers' cultural backgrounds significantly affect airline evaluations after service failures. Moreover, failure criticality influences evaluations of travellers from individualistic cultures more severely. Contrary to expectations, full-service airlines were evaluated positively after less critical service failures.Practical implicationsThe findings support that customers undergo different emotional states when they experience service failure. Understanding these internal emotional sensitivities and how services would be judged by travellers across cultures can help airlines to better manage their service recovery efforts and to strategise prioritisation of scarce resources.Originality/valueThough airline service failure has been well researched, this study examines the role of culture in service failure evaluations. The study uses a novel method to analyse a large data set of both quantitative and qualitative traveller feedback useful in service recovery management.


2017 ◽  
Vol 83 (21) ◽  
Author(s):  
Benjamin J. K. Davis ◽  
John M. Jacobs ◽  
Meghan F. Davis ◽  
Kellogg J. Schwab ◽  
Angelo DePaola ◽  
...  

ABSTRACT Vibrio parahaemolyticus naturally occurs in brackish and marine waters and is one of the leading causes of seafood-borne illness. Previous work studying the ecology of V. parahaemolyticus has often been limited in geographic extent and lacked a full range of environmental measures. This study used a unique large data set of surface water samples in the Chesapeake Bay (n = 1,385) collected from 148 monitoring stations from 2007 to 2010. Water was analyzed for more than 20 environmental parameters, with additional meteorological and surrounding land use data. The V. parahaemolyticus-specific genetic markers thermolabile hemolysin (tlh), thermostable direct hemolysin (tdh), and tdh-related hemolysin (trh) were assayed using quantitative PCR (qPCR), and interval-censored regression models with nonlinear effects were estimated to account for limits of detection and quantitation. tlh was detected in 19.6% of water samples; tdh or trh markers were not detected. The results confirmed previously reported positive associations for V. parahaemolyticus abundance with temperature and turbidity and negative associations with high salinity (>10 to 23‰). Furthermore, the salinity relationship was determined to be a function of both low temperature and turbidity, with an increase of either nullifying the high salinity effect. Associations with dissolved oxygen and phosphate also appeared stronger when samples were taken near human developments. A renewed focus on the V. parahaemolyticus ecological paradigm is warranted to protect public health. IMPORTANCE Vibrio parahaemolyticus is one of the leading causes of seafood-borne illness in the United States and across the globe. Exposure is often through consuming raw or undercooked shellfish. Given the natural presence of the bacterium in the marine environment, an improved understanding of its environmental determinants is necessary for future preventative measures. This analysis of environmental Vibrio parahaemolyticus is one of only a few that utilize a large data set measured over a wide geographic and temporal range. The analysis also includes a large number of environmental parameters for Vibrio modeling, many of which have previously only been tested sporadically, and some of which have not been considered before. The results of the analysis revealed previously unknown relationships between salinity, turbidity, and temperature that provide significant insight into the abundance and persistence of V. parahaemolyticus bacterium in the environment. This information will be essential for developing environmental forecast models for the bacterium.


2020 ◽  
Vol 47 (3) ◽  
pp. 547-560 ◽  
Author(s):  
Darush Yazdanfar ◽  
Peter Öhman

PurposeThe purpose of this study is to empirically investigate determinants of financial distress among small and medium-sized enterprises (SMEs) during the global financial crisis and post-crisis periods.Design/methodology/approachSeveral statistical methods, including multiple binary logistic regression, were used to analyse a longitudinal cross-sectional panel data set of 3,865 Swedish SMEs operating in five industries over the 2008–2015 period.FindingsThe results suggest that financial distress is influenced by macroeconomic conditions (i.e. the global financial crisis) and, in particular, by various firm-specific characteristics (i.e. performance, financial leverage and financial distress in previous year). However, firm size and industry affiliation have no significant relationship with financial distress.Research limitationsDue to data availability, this study is limited to a sample of Swedish SMEs in five industries covering eight years. Further research could examine the generalizability of these findings by investigating other firms operating in other industries and other countries.Originality/valueThis study is the first to examine determinants of financial distress among SMEs operating in Sweden using data from a large-scale longitudinal cross-sectional database.


Author(s):  
Lior Shamir

Abstract Several recent observations using large data sets of galaxies showed non-random distribution of the spin directions of spiral galaxies, even when the galaxies are too far from each other to have gravitational interaction. Here, a data set of $\sim8.7\cdot10^3$ spiral galaxies imaged by Hubble Space Telescope (HST) is used to test and profile a possible asymmetry between galaxy spin directions. The asymmetry between galaxies with opposite spin directions is compared to the asymmetry of galaxies from the Sloan Digital Sky Survey. The two data sets contain different galaxies at different redshift ranges, and each data set was annotated using a different annotation method. The results show that both data sets show a similar asymmetry in the COSMOS field, which is covered by both telescopes. Fitting the asymmetry of the galaxies to cosine dependence shows a dipole axis with probabilities of $\sim2.8\sigma$ and $\sim7.38\sigma$ in HST and SDSS, respectively. The most likely dipole axis identified in the HST galaxies is at $(\alpha=78^{\rm o},\delta=47^{\rm o})$ and is well within the $1\sigma$ error range compared to the location of the most likely dipole axis in the SDSS galaxies with $z>0.15$ , identified at $(\alpha=71^{\rm o},\delta=61^{\rm o})$ .


2021 ◽  
pp. 102586
Author(s):  
Chuanjun Du ◽  
Ruoying He ◽  
Zhiyu Liu ◽  
Tao Huang ◽  
Lifang Wang ◽  
...  

2017 ◽  
Vol 128 (1) ◽  
pp. 243-250 ◽  
Author(s):  
Mark L. Scheuer ◽  
Anto Bagic ◽  
Scott B. Wilson

Sign in / Sign up

Export Citation Format

Share Document