需求学习中的自我适应鲁棒性

论文标题

需求学习中的自我适应鲁棒性

Self-adapting Robustness in Demand Learning

论文作者

Chen, Boxiao, Nadarajah, Selvaprabu, Pakiman, Parshan, Jasin, Stefanus

论文摘要

我们研究了在需求模型歧义的情况下，在有限的时期内进行动态定价。偏离典型的无重格学习环境，在任何时间都允许价格变化，定价决定是在预先指定的时间点做出的，每个价格都可以应用于大量到达。在零售业中产生的环境中，基于不正确需求模型的定价决定可能会严重影响累积收入。我们制定了一种自适应学习（ARL）定价策略，该政策从数据中学习真实的模型参数，同时积极管理需求模型模棱两可。它优化了一个针对自我调整的需求模型的目标，该目标仅在此组中包含在此组中，仅当从先前的定价决策中揭示的销售数据使其“可能”。结果，当需求模型含糊不清时，当这种歧义在收到更多数据时会减少时，它可以优雅地过渡到最大程度地减少后悔。我们表征了ARL自我适应歧义集的随机行为，并引起了遗憾的束缚，突出了收入损失规模与客户到达模式之间的联系。我们还表明，通过意识到模型歧义和收入，ARL弥合了分配强大的策略与领导者政策之间的差距，分别集中于模型歧义和收入。我们从数值上发现，与分配稳健，跟随领导者和信心结合的政策相比，ARL政策或其扩展性表现出色，就预期收入和/或价值在风险上而言。

We study dynamic pricing over a finite number of periods in the presence of demand model ambiguity. Departing from the typical no-regret learning environment, where price changes are allowed at any time, pricing decisions are made at pre-specified points in time and each price can be applied to a large number of arrivals. In this environment, which arises in retailing, a pricing decision based on an incorrect demand model can significantly impact cumulative revenue. We develop an adaptively-robust-learning (ARL) pricing policy that learns the true model parameters from the data while actively managing demand model ambiguity. It optimizes an objective that is robust with respect to a self-adapting set of demand models, where a given model is included in this set only if the sales data revealed from prior pricing decisions makes it "probable". As a result, it gracefully transitions from being robust when demand model ambiguity is high to minimizing regret when this ambiguity diminishes upon receiving more data. We characterize the stochastic behavior of ARL's self-adapting ambiguity sets and derive a regret bound that highlights the link between the scale of revenue loss and the customer arrival pattern. We also show that ARL, by being conscious of both model ambiguity and revenue, bridges the gap between a distributionally robust policy and a follow-the-leader policy, which focus on model ambiguity and revenue, respectively. We numerically find that the ARL policy, or its extension thereof, exhibits superior performance compared to distributionally robust, follow-the-leader, and upper-confidence-bound policies in terms of expected revenue and/or value at risk.

下载PDF全文

下载文献需遵守相关版权规定

论文标题