论文标题
最后一英里交付中学习偏好的概率估计和结构化输出预测
Probability estimation and structured output prediction for learning preferences in last mile delivery
论文作者
论文摘要
我们研究了在最后一英里交付的背景下学习驾驶员和计划者的偏好的问题。给定一个包含历史决策和交付位置的数据集,目标是捕获决策者的内在偏好。我们考虑使用历史数据的两种方法:一种是通过一种概率估计方法来学习停止(或区域)之间的过渡概率。这是一种快速准确的方法,最近在VRP设置中进行了研究。此外,我们探讨了机器学习的使用来推断如何最好地平衡多个目标,例如距离,概率和惩罚。具体而言,我们将学习问题视为结构化的输出预测问题,在该问题中,通过反复调用TSP求解器来完成培训。我们认为的另一个重要方面是,对于最后一英里的交付,每个地址都是一个潜在的客户,因此数据非常稀少。因此,我们提出了一种两阶段的方法,该方法首先在区域级别学习偏好,以计算区域路由。之后,基于惩罚的TSP计算停止路由。结果表明,区域过渡概率估计表现良好,结构化的输出预测学习可以进一步改善结果。因此,我们在学习和计算最终解决方案期间都使用标准TSP求解器展示了概率估计和机器学习的成功组合;这意味着该方法适用于其他现实生活,TSP变体或专有求解器。
We study the problem of learning the preferences of drivers and planners in the context of last mile delivery. Given a data set containing historical decisions and delivery locations, the goal is to capture the implicit preferences of the decision-makers. We consider two ways to use the historical data: one is through a probability estimation method that learns transition probabilities between stops (or zones). This is a fast and accurate method, recently studied in a VRP setting. Furthermore, we explore the use of machine learning to infer how to best balance multiple objectives such as distance, probability and penalties. Specifically, we cast the learning problem as a structured output prediction problem, where training is done by repeatedly calling the TSP solver. Another important aspect we consider is that for last-mile delivery, every address is a potential client and hence the data is very sparse. Hence, we propose a two-stage approach that first learns preferences at the zone level in order to compute a zone routing; after which a penalty-based TSP computes the stop routing. Results show that the zone transition probability estimation performs well, and that the structured output prediction learning can improve the results further. We hence showcase a successful combination of both probability estimation and machine learning, all the while using standard TSP solvers, both during learning and to compute the final solution; this means the methodology is applicable to other, real-life, TSP variants, or proprietary solvers.