window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-DZ8LQ4EHBC');
Loading Events

« All Events

  • This event has passed.

Tech Talk #11 – LOOG: A Case For Non-Conventional GPU Architectures

February 5, 2021 @ 10:00 am - 11:30 pm

Την Παρασκευή 5/2/2021 και ώρα 10:00 EET (Greek Time) θα έχουμε την παρουσίαση για τα Fridays Tech Talks από τον υποψήφιο διδάκτορα Κωστή Ηλιάκη με τίτλο:

“LOOG: A Case For Non-Conventional GPU Architectures”

Η παρουσίαση θα γίνει μέσω webex στο παρακάτω link Webex.

Ακολουθεί μία σύντομη περίληψη της παρουσίασης:

“GPU is the dominant platform for accelerating general-purpose workloads due to its computing capacity and cost-efficiency. GPU applications cover an ever-growing range of domains. To achieve high throughput, GPUs rely on massive multi-threading and fast context switching to overlap computations with memory operations. We observe that among the diverse GPU workloads, there exists a significant class of kernels that fail to maintain a sufficient number of active warps to hide the latency of memory operations, and thus suffer from frequent stalling. We argue that the dominant Thread-Level Parallelism model is not enough to efficiently accommodate the variability of modern GPU applications. To address this inherent inefficiency, we propose LOOG, a novel micro-architecture with lightweight Out-Of-Order execution capability enabling Instruction-Level Parallelism to complement the conventional Thread-Level Parallelism model. To minimize the hardware overhead, we carefully design our extension to highly re-use the existing micro-architectural structures and study various design trade-offs to contain the overall area and power overhead, while providing improved performance. We show that the proposed architecture outperforms conventional platforms by 23% on average for low-occupancy kernels, with an area and power overhead of 1.29% and 10.05%, respectively. Finally, we establish the potential of our proposal as a micro-architecture alternative by providing 16% speedup over a wide collection of 60 general-purpose kernels.”