论文标题

Python多处理应用程序的透明无服务器执行

Transparent Serverless execution of Python multiprocessing applications

论文作者

Arjona, Aitor, Finol, Gerard, Garcia-Lopez, Pedro

论文摘要

访问透明度意味着使用相同的操作访问本地和远程资源。凭借透明度,未修改的单机应用程序可以通过分类的计算,存储和内存资源进行运行。通过透明度隐藏分布式系统的复杂性将带来巨大的好处,例如扩展本地并行的科学应用,而不是云中灵活的分解资源。 本文介绍了绩效评估,我们评估访问透明度比最先进的云分解Python多处理应用程序的可行性。我们已经将多处理模块与实现相结合,该模块透明地在无服务器功能上运行进程,并使用内存数据存储作为共享状态。 为了评估透明度,我们在云中运行四个未修改的应用程序:Uber Research的进化策略,基线-AI的近端策略优化,Pandaral.lel.lel的DataFrame和Scikitlearn的超参数调整。我们将使用我们的库在分类资源上运行的同一应用程序的执行时间和可扩展性,以及大型VM中的单基金Python多处理库。对于平等资源,尽管远程通信的大量开销,但使用消息抽象的应用程序有效地实现了可比的结果。由于远程内存延迟,其他共享内存密集型应用程序不会执行。 结果表明,Python的多处理库设计是对透明度的推动者:使用有效分组的抽象的旧应用程序可以透明地扩展超出VM的限制资源,而无需更改基础代码或体系结构。

Access transparency means that both local and remote resources are accessed using identical operations. With transparency, unmodified single-machine applications could run over disaggregated compute, storage, and memory resources. Hiding the complexity of distributed systems through transparency would have great benefits, like scaling-out local-parallel scientific applications over flexible disaggregated resources in the Cloud. This paper presents a performance evaluation where we assess the feasibility of access transparency over state-of-the-art Cloud disaggregated resources for Python multiprocessing applications. We have interfaced the multiprocessing module with an implementation that transparently runs processes on serverless functions and uses an in-memory data store for shared state. To evaluate transparency, we run in the Cloud four unmodified applications: Uber Research's Evolution Strategies, Baselines-AI's Proximal Policy Optimization, Pandaral.lel's dataframe, and ScikitLearn's Hyperparameter tuning. We compare execution time and scalability of the same application running over disaggregated resources using our library, with the single-machine Python multiprocessing libraries in a large VM. For equal resources, applications efficiently using message-passing abstractions achieve comparable results despite the significant overheads of remote communication. Other shared-memory intensive applications do not perform due to high remote memory latency. The results show that Python's multiprocessing library design is an enabler towards transparency: legacy applications using efficient disaggregated abstractions can transparently scale beyond VM limited resources for increased parallelism without changing the underlying code or architecture.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源