论文标题

通过有效的同步和数据访问技术加速不规则应用程序

Accelerating Irregular Applications via Efficient Synchronization and Data Access Techniques

论文作者

Giannoula, Christina

论文摘要

不规则的应用程序包括许多领域的工作量越来越重要,包括生物信息学,化学,物理学,社会科学和机器学习。因此,在执行新出现的不规则应用程序中实现高性能和能源效率至关重要。该论文研究了现代计算系统中不规则应用不规则应用效率低下的根本原因,并从根本上解决了这些效率低下的问题,通过与精心设计的数据访问策略合作,提出平行线程之间的低空同步技术。 我们为在不同情况下加速不规则应用做出了四个主要贡献,包括CPU和近数据处理(NDP)(或内存过程(PIM))系统。首先,我们设计了COLORTM,这是一种用于CPU系统的新型并行图着色算法,该算法使用同步和较低的数据访问成本进行交易。其次,我们提出了SmartPQ,这是一个自适应优先级队列,在非均匀内存访问CPU系统中的所有各种争论场景下都能达到高性能。第三,我们介绍了Syncron,这是第一个针对NDP系统量身定制的实用硬件同步机制。第四,我们设计了SparseP,这是第一个用于实际PIM系统上高性能稀疏矩阵矢量乘法的库。 我们证明,可以通过共同设计轻巧的同步方法以及精心设计的数据访问策略来显着加速CPU和NDP/PIM体系结构中不规则应用的执行。该论文弥合了以处理器为中心的CPU系统与以内存为中心的PIM系统之间的差距。我们希望本文通过使用最先进的计算平台共同设计软件算法来激发未来的工作,以显着加速新兴的不规则应用程序。

Irregular applications comprise an increasingly important workload domain for many fields, including bioinformatics, chemistry, physics, social sciences and machine learning. Therefore, achieving high performance and energy efficiency in the execution of emerging irregular applications is of vital importance. This dissertation studies the root causes of inefficiency of irregular applications in modern computing systems, and fundamentally addresses such inefficiencies, by proposing low-overhead synchronization techniques among parallel threads in cooperation with well-crafted data access policies. We make four major contributions to accelerating irregular applications in different contexts including CPU and Near-Data-Processing (NDP) (or Processing-In-Memory (PIM)) systems. First, we design ColorTM, a novel parallel graph coloring algorithm for CPU systems that trades off using synchronization with lower data access costs. Second, we propose SmartPQ, an adaptive priority queue that achieves high performance under all various contention scenarios in Non-Uniform Memory Access CPU systems. Third, we introduce SynCron, the first practical hardware synchronization mechanism tailored for NDP systems. Fourth, we design SparseP, the first library for high-performance Sparse Matrix Vector Multiplication on real PIM systems. We demonstrate that the execution of irregular applications in CPU and NDP/PIM architectures can be significantly accelerated by co-designing lightweight synchronization approaches along with well-crafted data access policies. This dissertation bridges the gap between processor-centric CPU systems and memory-centric PIM systems in the critically-important area of irregular applications. We hope that this dissertation inspires future work in co-designing software algorithms with cutting-edge computing platforms to significantly accelerate emerging irregular applications.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源