Clustering Approach toward Large Truck Crash Analysis

Author(s):  
Alireza Rahimi ◽  
Ghazaleh Azimi ◽  
Hamidreza Asgari ◽  
Xia Jin

Heterogeneity of crash data masks the underlying crash patterns and perplexes crash analysis. This paper aims to explore an advanced high-dimensional clustering approach to investigate heterogeneity in large datasets. Detailed records of crashes involving large trucks occurring in the state of Florida between 2007 and 2016 were examined to identify truck crash patterns and significant conditions contributing to the patterns. The block clustering method was applied to more than 220,000 crash records with nearly 200 attributes. The analysis showed promising results in segmenting a large heterogeneous dataset into meaningful subgroups (with 95.72% average degree of homogeneity for selected blocks). The goodness of fit for clustering methods is evaluated and both integrated completed likelihood (ICL) and pseudo-likelihood values improved significantly (20.8% and 21.1% respectively). Attribute clustering showed distinct characteristics for each cluster. Crash clustering revealed significant differences among the clusters and suggested that this crash dataset could be portioned as same-direction, opposing-direction, and single-vehicle crashes. Individual blocks defined by both row and column clustering were further investigated to better understand the contribution set of conditions that lead to large truck crashes. Major features for each of the three major types of crashes were analyzed, which may provide additional insights to develop potential countermeasures and strategies that target specific segments. The clustering approach could be used as a preanalysis method to identify homogeneous subgroups for further analysis, which will help enhance the effectiveness of safety programs.

Author(s):  
Sayak Dey ◽  
Swagatam Das ◽  
Rammohan Mallipeddi

Classical clustering methods usually face tough challenges when we have a larger set of features compared to the number of items to be partitioned. We propose a Sparse MinMax k-Means Clustering approach by reformulating the objective of the MinMax k-Means algorithm (a variation of classical k-Means that minimizes the maximum intra-cluster variance instead of the sum of intra-cluster variances), into a new weighted between-cluster sum of squares (BCSS) form. We impose sparse regularization on these weights to make it suitable for high-dimensional clustering. We seek to use the advantages of the MinMax k-Means algorithm in the high-dimensional space to generate good quality clusters. The efficacy of the proposal is showcased through comparison against a few representative clustering methods over several real world datasets.


2021 ◽  
Vol 13 (11) ◽  
pp. 2125
Author(s):  
Bardia Yousefi ◽  
Clemente Ibarra-Castanedo ◽  
Martin Chamberland ◽  
Xavier P. V. Maldague ◽  
Georges Beaudoin

Clustering methods unequivocally show considerable influence on many recent algorithms and play an important role in hyperspectral data analysis. Here, we challenge the clustering for mineral identification using two different strategies in hyperspectral long wave infrared (LWIR, 7.7–11.8 μm). For that, we compare two algorithms to perform the mineral identification in a unique dataset. The first algorithm uses spectral comparison techniques for all the pixel-spectra and creates RGB false color composites (FCC). Then, a color based clustering is used to group the regions (called FCC-clustering). The second algorithm clusters all the pixel-spectra to directly group the spectra. Then, the first rank of non-negative matrix factorization (NMF) extracts the representative of each cluster and compares results with the spectral library of JPL/NASA. These techniques give the comparison values as features which convert into RGB-FCC as the results (called clustering rank1-NMF). We applied K-means as clustering approach, which can be modified in any other similar clustering approach. The results of the clustering-rank1-NMF algorithm indicate significant computational efficiency (more than 20 times faster than the previous approach) and promising performance for mineral identification having up to 75.8% and 84.8% average accuracies for FCC-clustering and clustering-rank1 NMF algorithms (using spectral angle mapper (SAM)), respectively. Furthermore, several spectral comparison techniques are used also such as adaptive matched subspace detector (AMSD), orthogonal subspace projection (OSP) algorithm, principal component analysis (PCA), local matched filter (PLMF), SAM, and normalized cross correlation (NCC) for both algorithms and most of them show a similar range in accuracy. However, SAM and NCC are preferred due to their computational simplicity. Our algorithms strive to identify eleven different mineral grains (biotite, diopside, epidote, goethite, kyanite, scheelite, smithsonite, tourmaline, pyrope, olivine, and quartz).


Safety ◽  
2021 ◽  
Vol 7 (2) ◽  
pp. 32
Author(s):  
Syed As-Sadeq Tahfim ◽  
Chen Yan

The unobserved heterogeneity in traffic crash data hides certain relationships between the contributory factors and injury severity. The literature has been limited in exploring different types of clustering methods for the analysis of the injury severity in crashes involving large trucks. Additionally, the variability of data type in traffic crash data has rarely been addressed. This study explored the application of the k-prototypes clustering method to countermeasure the unobserved heterogeneity in large truck-involved crashes that had occurred in the United States between the period of 2016 to 2019. The study segmented the entire dataset (EDS) into three homogeneous clusters. Four gradient boosted decision trees (GBDT) models were developed on the EDS and individual clusters to predict the injury severity in crashes involving large trucks. The list of input features included crash characteristics, truck characteristics, roadway attributes, time and location of the crash, and environmental factors. Each cluster-based GBDT model was compared with the EDS-based model. Two of the three cluster-based models showed significant improvement in their predicting performances. Additionally, feature analysis using the SHAP (Shapley additive explanations) method identified few new important features in each cluster and showed that some features have a different degree of effects on severe injuries in the individual clusters. The current study concluded that the k-prototypes clustering-based GBDT model is a promising approach to reveal hidden insights, which can be used to improve safety measures, roadway conditions and policies for the prevention of severe injuries in crashes involving large trucks.


Author(s):  
Mehdi Hosseinpour ◽  
Kirolos Haleem

Road departure (RD) crashes are among the most severe crashes that can result in fatal or serious injuries, especially when involving large trucks. Most previous studies neglected to incorporate both roadside and median hazards into large-truck RD crash severity analysis. The objective of this study was to identify the significant factors affecting driver injury severity in single-vehicle RD crashes involving large trucks. A random-parameters ordered probit (RPOP) model was developed using extensive crash data collected on roadways in the state of Kentucky between 2015 and 2019. The RPOP model results showed that the effect of local roadways, the natural logarithm of annual average daily traffic (AADT), the presence of median concrete barriers, cable barrier-involved collisions, and dry surfaces were found to be random across the crash observations. The results also showed that older drivers, ejected drivers, and drivers trapped in their truck were more likely to sustain severe single-vehicle RD crashes. Other variables increasing the probability of driver injury severity have included rural areas, dry road surfaces, higher speed limits, single-unit truck types, principal arterials, overturning-consequences, truck fire occurrence, segments with median concrete barriers, and roadside fixed object strikes. On the other hand, wearing seatbelt, local roads and minor collectors, higher AADT, and hitting median cable barriers were associated with lower injury severities. Potential safety countermeasures from the study findings include installing median cable barriers and flattening steep roadside embankments along those roadway stretches with high history of RD large-truck-related crashes.


2018 ◽  
Vol 89 (4) ◽  
pp. 590-611 ◽  
Author(s):  
Jie Pei ◽  
Huiju Park ◽  
Susan P. Ashdown

In this study we explore the variation in female breast shape across the younger (age: 18–45), non-obese (BMI < 30) North American Caucasian population, a population that has not previously been well-represented in studies of breast shape. A method of classifying breast shape was developed based on multiple data-mining techniques. Forty-one relative measurements (i.e., ratios and angles) were constructed from 66 raw measurements (circumferences, depths, widths, etc.), extracted from 478 CAESAR (Civilian American and European Surface Anthropometry Resource) scans, using self-developed Matlab® programs. Seventy subjects were regarded as outliers and were removed. The remaining data were transformed and standardized to ensure robust analysis. To judge results, an algorithm was developed to visualize clustering outcomes in the form of side profiles of breasts. The results of three clustering methods, namely hierarchical, K-means, and K-medoids clustering, were compared. Finally, breast shapes were categorized into three and five groups by two different cluster number selection criteria proposed by the study: (1) based on misclassification rate; (2) based on the goodness-of-fit of the model. Several of the relative body measurements were identified to be critical in defining breast shape. The findings and the proposed methods of this study can contribute to the development of improved shape and sizing systems of bra products that work for both manufacturers and consumers. The new methodology developed in this study can also be applied to other types of intimate apparel products where an understanding of body shape plays a key role in body support, comfort, and fit.


2020 ◽  
Vol 5 (8) ◽  
pp. 62
Author(s):  
Clint Morris ◽  
Jidong J. Yang

Generating meaningful inferences from crash data is vital to improving highway safety. Classic statistical methods are fundamental to crash data analysis and often regarded for their interpretability. However, given the complexity of crash mechanisms and associated heterogeneity, classic statistical methods, which lack versatility, might not be sufficient for granular crash analysis because of the high dimensional features involved in crash-related data. In contrast, machine learning approaches, which are more flexible in structure and capable of harnessing richer data sources available today, emerges as a suitable alternative. With the aid of new methods for model interpretation, the complex machine learning models, previously considered enigmatic, can be properly interpreted. In this study, two modern machine learning techniques, Linear Discriminate Analysis and eXtreme Gradient Boosting, were explored to classify three major types of multi-vehicle crashes (i.e., rear-end, same-direction sideswipe, and angle) occurred on Interstate 285 in Georgia. The study demonstrated the utility and versatility of modern machine learning methods in the context of crash analysis, particularly in understanding the potential features underlying different crash patterns on freeways.


Author(s):  
Lingtao Wu ◽  
Srinivas R. Geedipally ◽  
Adam M. Pike

Roadway departure crashes are a major contributor to traffic fatalities and injury. Rumble strips have been shown to be an effective countermeasure in reducing roadway departure crashes. However, some roadway situations, for instance, inadequate shoulder width or roadway surface depth, have limited the application of conventional milled or rolled in rumble strips. Alternative audible lane departure warning systems, including profile (audible) pavement markings and preformed rumble bars, are increasingly used to overcome the limitations that exist with the milled rumble strips. So far, the safety effectiveness of these alternative audible lane departure warning systems has not been extensively assessed. The main purpose of this paper is to examine the safety effect of installing profile pavement markings and preformed rumble bars. Specifically, this study developed crash modification factors for these treatments that quantify the effectiveness in reducing single-vehicle-run-off-road (SVROR) and opposite-direction (OD) crashes. Traffic, roadway, and crash data at the treated sites on 189 miles of rural two-lane highways in Texas were analyzed using an empirical Bayes (EB) before–after analysis method. Safety performance functions from the Highway Safety Manual and Texas Highway Safety Design Workbook were used in the EB analysis. The results revealed a 21.3% reduction in all SVROR and OD crashes, and 32.5% to 39.9% reduction in fatal and injury SVROR and OD crashes after installing profile pavement marking and preformed rumble bars.


2007 ◽  
Vol 05 (04) ◽  
pp. 895-913 ◽  
Author(s):  
MENG P. TAN ◽  
JAMES R. BROACH ◽  
CHRISTODOULOS A. FLOUDAS

We study the effects on clustering quality by different normalization and pre-clustering techniques for a novel mixed-integer nonlinear optimization-based clustering algorithm, the Global Optimum Search with Enhanced Positioning (EP_GOS_Clust). These are important issues to be addressed. DNA microarray experiments are informative tools to elucidate gene regulatory networks. But in order for gene expression levels to be comparable across microarrays, normalization procedures have to be properly undertaken. The aim of pre-clustering is to use an adequate amount of discriminatory characteristics to form rough information profiles, so that data with similar features can be pre-grouped together and outliers deemed insignificant to the clustering process can be removed. Using experimental DNA microarray data from the yeast Saccharomyces Cerevisiae, we study the merits of pre-clustering genes based on distance/correlation comparisons and symbolic representations such as {+, o, -}. As a performance metric, we look at the intra- and inter-cluster error sums, two generic but intuitive measures of clustering quality. We also use publicly available Gene Ontology resources to assess the clusters' level of biological coherence. Our analysis indicates a significant effect by normalization and pre-clustering methods on the clustering results. Hence, the outcome of this study has significance in fine-tuning the EP_GOS_Clust clustering approach.


Author(s):  
Siti Aisyah Mohamed ◽  
Muhaini Othman ◽  
Mohd Hafizul Afifi

The evolution of Artificial Neural Network recently gives researchers an interest to explore deep learning evolved by Spiking Neural Network clustering methods. Spiking Neural Network (SNN) models captured neuronal behaviour more precisely than a traditional neural network as it contains the theory of time into their functioning model [1]. The aim of this paper is to reviewed studies that are related to clustering problems employing Spiking Neural Networks models. Even though there are many algorithms used to solve clustering problems, most of the methods are only suitable for static data and fixed windows of time series. Hence, there is a need to analyse complex data type, the potential for improvement is encouraged. Therefore, this paper summarized the significant result obtains by implying SNN models in different clustering approach. Thus, the findings of this paper could demonstrate the purpose of clustering method using SNN for the fellow researchers from various disciplines to discover and understand complex data.


Sign in / Sign up

Export Citation Format

Share Document