异质的合奏学习，以增强崩溃预测 - 一个基于机器学习的最频繁，基于机器的堆叠框架

论文标题

异质的合奏学习，以增强崩溃预测 - 一个基于机器学习的最频繁，基于机器的堆叠框架

Heterogeneous Ensemble Learning for Enhanced Crash Forecasts -- A Frequentest and Machine Learning based Stacking Framework

论文作者

Ahmad, Numan, Wali, Behram, Khattak, Asad J.

论文摘要

多种统计和机器学习方法用于使用机器学习方法在特定道路上建模崩溃频率，通常具有更高的预测精度。最近，包括堆叠在内的异质集合方法（HEM）已成为更准确，更强大的智能技术，通常用于通过提供更可靠和准确的预测来解决模式识别问题。在这项研究中，我们将堆叠的关键下摆方法之一应用于城市和郊区动脉的五个车道段（5T）上的崩溃频率。将堆叠的预测性能与参数统计模型（泊松和负二项式）和三种最先进的机器学习技术（决策树，随机森林和梯度增强）进行了比较，每种技术都被称为基础学习者。通过采用最佳权重方案通过堆叠结合单个基础学习者，由于规格和预测准确性的差异，各个基础学习者中有偏见的预测问题可以避免。从2013年到2017年收集并集成了包括崩溃，流量和道路清单在内的数据。数据分为培训，验证和测试数据集。统计模型的估计结果表明，除其他因素外，崩溃随着不同类型的车道的密度（每英里数）的增加而增加。各种模型的样本外预测的比较证实了堆叠优于所考虑的替代方法的优越性。从实际的角度来看，堆叠可以提高预测准确性（与仅使用具有特定规范的基础学习者相比）。当系统地应用时，堆叠可以帮助确定更合适的对策。

A variety of statistical and machine learning methods are used to model crash frequency on specific roadways with machine learning methods generally having a higher prediction accuracy. Recently, heterogeneous ensemble methods (HEM), including stacking, have emerged as more accurate and robust intelligent techniques and are often used to solve pattern recognition problems by providing more reliable and accurate predictions. In this study, we apply one of the key HEM methods, Stacking, to model crash frequency on five lane undivided segments (5T) of urban and suburban arterials. The prediction performance of Stacking is compared with parametric statistical models (Poisson and negative binomial) and three state of the art machine learning techniques (Decision tree, random forest, and gradient boosting), each of which is termed as the base learner. By employing an optimal weight scheme to combine individual base learners through stacking, the problem of biased predictions in individual base-learners due to differences in specifications and prediction accuracies is avoided. Data including crash, traffic, and roadway inventory were collected and integrated from 2013 to 2017. The data are split into training, validation, and testing datasets. Estimation results of statistical models reveal that besides other factors, crashes increase with density (number per mile) of different types of driveways. Comparison of out-of-sample predictions of various models confirms the superiority of Stacking over the alternative methods considered. From a practical standpoint, stacking can enhance prediction accuracy (compared to using only one base learner with a particular specification). When applied systemically, stacking can help identify more appropriate countermeasures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题