scholarly journals Mining Connected Vehicle Data for Beneficial Patterns in Dubai Taxi Operations

2018 ◽  
Vol 2018 ◽  
pp. 1-8 ◽  
Author(s):  
Raj Bridgelall ◽  
Pan Lu ◽  
Denver D. Tolliver ◽  
Tai Xu

On-demand shared mobility services such as Uber and microtransit are steadily penetrating the worldwide market for traditional dispatched taxi services. Hence, taxi companies are seeking ways to compete. This study mined large-scale mobility data from connected taxis to discover beneficial patterns that may inform strategies to improve dispatch taxi business. It is not practical to manually clean and filter large-scale mobility data that contains GPS information. Therefore, this research contributes and demonstrates an automated method of data cleaning and filtering that is suitable for such types of datasets. The cleaning method defines three filter variables and applies a layered statistical filtering technique to eliminate outlier records that do not contribute to distributions that match expected theoretical distributions of the variables. Chi-squared statistical tests evaluate the quality of the cleaned data by comparing the distribution of the three variables with their expected distributions. The overall cleaning method removed approximately 5% of the data, which consisted of errors that were obvious and others that were poor quality outliers. Subsequently, mining the cleaned data revealed that trip production in Dubai peaks for the case when only the same two drivers operate the same taxi. This finding would not have been possible without access to proprietary data that contains unique identifiers for both drivers and taxis. Datasets that identify individual drivers are not publicly available.

2021 ◽  
Vol 1 (1) ◽  
Author(s):  
Xiao Li ◽  
Haowen Xu ◽  
Xiao Huang ◽  
Chenxiao Guo ◽  
Yuhao Kang ◽  
...  

AbstractEffectively monitoring the dynamics of human mobility is of great importance in urban management, especially during the COVID-19 pandemic. Traditionally, the human mobility data is collected by roadside sensors, which have limited spatial coverage and are insufficient in large-scale studies. With the maturing of mobile sensing and Internet of Things (IoT) technologies, various crowdsourced data sources are emerging, paving the way for monitoring and characterizing human mobility during the pandemic. This paper presents the authors’ opinions on three types of emerging mobility data sources, including mobile device data, social media data, and connected vehicle data. We first introduce each data source’s main features and summarize their current applications within the context of tracking mobility dynamics during the COVID-19 pandemic. Then, we discuss the challenges associated with using these data sources. Based on the authors’ research experience, we argue that data uncertainty, big data processing problems, data privacy, and theory-guided data analytics are the most common challenges in using these emerging mobility data sources. Last, we share experiences and opinions on potential solutions to address these challenges and possible research directions associated with acquiring, discovering, managing, and analyzing big mobility data.


Author(s):  
Yun Zhou ◽  
Raj Bridgelall

GPS loggers and cameras aboard connected vehicles can produce vast amounts of data. Analysts can mine such data to decipher patterns in vehicle trajectories and driver–vehicle interactions. Ability to process such large-scale data in real time can inform strategies to reduce crashes, improve traffic flow, enhance system operational efficiencies, and reduce environmental impacts. However, connected vehicle technologies are in the very early phases of deployment. Therefore, related datasets are extremely scarce, and the utility of such emerging datasets is largely unknown. This paper provides a comprehensive review of studies that used large-scale connected vehicle data from the United States Department of Transportation Connected Vehicle Safety Pilot Model Deployment program. It is the first and only such dataset available to the public. The data contains real-world information about the operation of connected vehicles that organizations are testing. The paper provides a summary of the available datasets and their organization, and the overall structure and other characteristics of the data captured during pilot deployments. Usage of the data is then classified into three categories: driving pattern identification, development of surrogate safety measures, and improvements in the operation of signalized intersections. Finally, some limitations experienced with the existing datasets are identified.


Author(s):  
Weijia Xu ◽  
Natalia Ruiz Juri ◽  
Amit Gupta ◽  
Amanda Deering ◽  
Chandra Bhat ◽  
...  

2011 ◽  
Vol 24 (13) ◽  
pp. 3457-3468 ◽  
Author(s):  
Keyan Fang ◽  
Xiaohua Gou ◽  
Fahu Chen ◽  
Edward Cook ◽  
Jinbao Li ◽  
...  

Abstract A preliminary study of a point-by-point spatial precipitation reconstruction for northwestern (NW) China is explored, based on a tree-ring network of 132 chronologies. Precipitation variations during the past ~200–400 yr (the common reconstruction period is from 1802 to 1990) are reconstructed for 26 stations in NW China from a nationwide 160-station dataset. The authors introduce a “search spatial correlation contour” method to locate candidate tree-ring predictors for the reconstruction data of a given climate station. Calibration and verification results indicate that most precipitation reconstruction models are acceptable, except for a few reconstructions (stations Hetian, Hami, Jiuquan, and Wuwei) with degraded quality. Additionally, the authors compare four spatial precipitation factors in the instrumental records and reconstructions derived from a rotated principal component analysis (RPCA). The northern and southern Xinjiang factors from the instrumental and reconstructed data agree well with each other. However, differences in spatial patterns between the instrumentation and reconstruction data are also found for the other two factors, which probably result from the relatively poor quality of a few stations. Major drought events documented in previous studies—for example, from the 1920s through the 1930s for the eastern part of NW China—are reconstructed in this study.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Esteban Moro ◽  
Dan Calacci ◽  
Xiaowen Dong ◽  
Alex Pentland

AbstractTraditional understanding of urban income segregation is largely based on static coarse-grained residential patterns. However, these do not capture the income segregation experience implied by the rich social interactions that happen in places that may relate to individual choices, opportunities, and mobility behavior. Using a large-scale high-resolution mobility data set of 4.5 million mobile phone users and 1.1 million places in 11 large American cities, we show that income segregation experienced in places and by individuals can differ greatly even within close spatial proximity. To further understand these fine-grained income segregation patterns, we introduce a Schelling extension of a well-known mobility model, and show that experienced income segregation is associated with an individual’s tendency to explore new places (place exploration) as well as places with visitors from different income groups (social exploration). Interestingly, while the latter is more strongly associated with demographic characteristics, the former is more strongly associated with mobility behavioral variables. Our results suggest that mobility behavior plays an important role in experienced income segregation of individuals. To measure this form of income segregation, urban researchers should take into account mobility behavior and not only residential patterns.


Author(s):  
Dhamanpreet Kaur ◽  
Matthew Sobiesk ◽  
Shubham Patil ◽  
Jin Liu ◽  
Puran Bhagat ◽  
...  

Abstract Objective This study seeks to develop a fully automated method of generating synthetic data from a real dataset that could be employed by medical organizations to distribute health data to researchers, reducing the need for access to real data. We hypothesize the application of Bayesian networks will improve upon the predominant existing method, medBGAN, in handling the complexity and dimensionality of healthcare data. Materials and Methods We employed Bayesian networks to learn probabilistic graphical structures and simulated synthetic patient records from the learned structure. We used the University of California Irvine (UCI) heart disease and diabetes datasets as well as the MIMIC-III diagnoses database. We evaluated our method through statistical tests, machine learning tasks, preservation of rare events, disclosure risk, and the ability of a machine learning classifier to discriminate between the real and synthetic data. Results Our Bayesian network model outperformed or equaled medBGAN in all key metrics. Notable improvement was achieved in capturing rare variables and preserving association rules. Discussion Bayesian networks generated data sufficiently similar to the original data with minimal risk of disclosure, while offering additional transparency, computational efficiency, and capacity to handle more data types in comparison to existing methods. We hope this method will allow healthcare organizations to efficiently disseminate synthetic health data to researchers, enabling them to generate hypotheses and develop analytical tools. Conclusion We conclude the application of Bayesian networks is a promising option for generating realistic synthetic health data that preserves the features of the original data without compromising data privacy.


2021 ◽  
Vol 12 ◽  
pp. 204062232098245
Author(s):  
Hye Yun Park ◽  
Hyun Lee ◽  
Danbee Kang ◽  
Hye Sook Choi ◽  
Yeong Ha Ryu ◽  
...  

Background: There are limited data about the racial difference in the characteristics of chronic obstructive pulmonary disease (COPD) patients who are treated at clinics. We aimed to compare sociodemographic and clinical characteristics between US and Korean COPD patients using large-scale nationwide COPD cohorts. Methods: We used the baseline demographic and clinical data of COPD patients aged 45 years or older with at least a 10 pack-per year smoking history from the Korean COPD Subtype Study (KOCOSS, n = 1686) cohort (2012–2018) and phase I (2008–2011) of the US Genetic Epidemiology of COPD (COPDGene) study ( n = 4477, 3461 were non-Hispanic whites [NHW], and 1016 were African Americans [AA]). Results: Compared to NHW, AA had a significantly lower adjusted prevalence ratio (aPR) of cough >3 months (aPR: 0.67; 95% CI [confidence interval]: 0.60–0.75) and phlegm >3 months (aPR: 0.78, 95% CI: 0.70–0.86), but higher aPR of dyspnea (modified Medical Round Council scale ⩾2) (aPR: 1.22; 95% CI: 1.15–1.29), short six-minute walk distance (<350 m) (aPR: 1.98; 95% CI: 1.81–2.14), and poor quality of life (aPR: 1.10; 95% CI: 1.05–1.15). Compared to NHW, Koreans had a significantly lower aPR of cough >3 months (aPR: 0.53; 95% CI: 0.47–0.59), phlegm >3 months (aPR: 0.75; 95% CI: 0.67–0.82), dyspnea (aPR: 0.72; 95% CI: 0.66–0.79), and moderate-to-severe acute exacerbation in the previous year (aPR: 0.73; 95% CI: 0.65–0.82). NHW had the highest burden related to chronic bronchitis symptoms and cardiovascular diseases related to comorbidities. Conclusion: There are substantial differences in sociodemographic characteristics, clinical presentation, and comorbidities between COPD patients from the KOCOSS and COPDGene, which might be caused by interactions between various intrapersonal, interpersonal, and environmental factors of the ecological model. Thus, a broader and more comprehensive approach would be necessary to understand the racial differences of COPD patients.


2021 ◽  
Author(s):  
Zeyu Lyu ◽  
Hiroki Takikawa

BACKGROUND The availability of large-scale and fine-grained aggregated mobility data has allowed researchers to observe the dynamic of social distancing behaviors at high spatial and temporal resolutions. Despite the increasing attentions paid to this research agenda, limited studies have focused on the demographic factors related to mobility and the dynamics of social distancing behaviors has not been fully investigated. OBJECTIVE This study aims to assist in the design and implementation of public health policies by exploring the social distancing behaviors among various demographic groups over time. METHODS We combined several data sources, including mobile tracking data and geographical statistics, to estimate visiting population of entertainment venues across demographic groups, which can be considered as the proxy of social distancing behaviors. Then, we employed time series analyze methods to investigate how voluntary and policy-induced social distancing behaviors shift over time across demographic groups. RESULTS Our findings demonstrate distinct patterns of social distancing behaviors and their dynamics across age groups. The population in the entertainment venues comprised mainly of individuals aged 20–40 years, while according to the dynamics of the mobility index and the policy-induced behavior, among the age groups, the extent of reduction of the frequency of visiting entertainment venues during the pandemic was generally the highest among younger individuals. Also, our results indicate the importance of implementing the social distancing policy promptly to limit the spread of the COVID-19 infection. However, it should be noticed that although the policy intervention during the second wave in Japan appeared to increase the awareness of the severity of the pandemic and concerns regarding COVID-19, its direct impact has been largely decreased could only last for a short time. CONCLUSIONS At the time we wrote this paper, in Japan, the number of daily confirmed cases was continuously increasing. Thus, this study provides a timely reference for decision makers about the current situation of policy-induced compliance behaviors. On the one hand, age-dependent disparity requires target mitigation strategies to increase the intention of elderly individuals to adopt mobility restriction behaviors. On the other hand, considering the decreasing impact of self-restriction recommendations, the government should employ policy interventions that limit the resurgence of cases, especially by imposing stronger, stricter social distancing interventions, as they are necessary to promote social distancing behaviors and mitigate the transmission of COVID-19. CLINICALTRIAL None


Author(s):  
Young Hyun Kim ◽  
Eun-Gyu Ha ◽  
Kug Jin Jeon ◽  
Chena Lee ◽  
Sang-Sun Han

Objectives: This study aimed to develop a fully automated human identification method based on a convolutional neural network (CNN) with a large-scale dental panoramic radiograph (DPR) dataset. Methods: In total, 2,760 DPRs from 746 subjects who had 2 to 17 DPRs with various changes in image characteristics due to various dental treatments (tooth extraction, oral surgery, prosthetics, orthodontics, or tooth development) were collected. The test dataset included the latest DPR of each subject (746 images) and the other DPRs (2,014 images) were used for model training. A modified VGG16 model with two fully connected layers was applied for human identification. The proposed model was evaluated with rank-1, –3, and −5 accuracies, running time, and gradient-weighted class activation mapping (Grad-CAM)–applied images. Results: This model had rank-1,–3, and −5 accuracies of 82.84%, 89.14%, and 92.23%, respectively. All rank-1 accuracy values of the proposed model were above 80% regardless of changes in image characteristics. The average running time to train the proposed model was 60.9 sec per epoch, and the prediction time for 746 test DPRs was short (3.2 sec/image). The Grad-CAM technique verified that the model automatically identified humans by focusing on identifiable dental information. Conclusion: The proposed model showed good performance in fully automatic human identification despite differing image characteristics of DPRs acquired from the same patients. Our model is expected to assist in the fast and accurate identification by experts by comparing large amounts of images and proposing identification candidates at high speed.


2018 ◽  
Vol 20 (6) ◽  
pp. 513-527
Author(s):  
Alexander M. Soley ◽  
Joshua E. Siegel ◽  
Dajiang Suo ◽  
Sanjay E. Sarma

Purpose The purpose of this paper is to develop a model to estimate the value of information generated by and stored within vehicles to help people, businesses and researchers. Design/methodology/approach The authors provide a taxonomy for data within connected vehicles, as well as for actors that value such data. The authors create a monetary value model for different data generation scenarios from the perspective of multiple actors. Findings Actors value data differently depending on whether the information is kept within the vehicle or on peripheral devices. The model shows the US connected vehicle data market is worth between US$11.6bn and US$92.6bn. Research limitations/implications This model estimates the value of vehicle data, but a lack of academic references for individual inputs makes finding reliable inputs difficult. The model performance is limited by the accuracy of the authors’ assumptions. Practical implications The proposed model demonstrates that connected vehicle data has higher value than people and companies are aware of, and therefore we must secure these data and establish comprehensive rules pertaining to data ownership and stewardship. Social implications Estimating the value of data of vehicle data will help companies understand the importance of responsible data stewardship, as well as drive individuals to become more responsible digital citizens. Originality/value This is the first paper to propose a model for computing the monetary value of connected vehicle data, as well as the first paper to provide an estimate of this value.


Sign in / Sign up

Export Citation Format

Share Document