Abstract
This work is motivated by the advance of heterogeneous computing and the strong demands of workload acceleration in practice. By considering pipeline workloads over FPGA, this paper explores a systematic methodology to configure the hardware instances of each pipeline stage such that the maximum of the execution time of each stage is minimized, where the FPGA allocation with the memory bandwidth constraint is considered. For the target problem, an algorithm is proposed and proved being optimal, and a real implementation study is conducted. In the experimental results, an image filter FPGA implementation can outperform the CPU, GPU, and baseline FPGA solutions by 460%, 73%, and 1030%, respectively. Extensive simulations were also conducted with a large FPGA size to show the scalability of this work.
Original language | English |
---|---|
Title of host publication | RTCSA 2014 - 20th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9781479939534 |
DOIs | |
State | Published - 25 09 2014 |
Event | 20th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA 2014 - Chongqing, China Duration: 20 08 2014 → 22 08 2014 |
Publication series
Name | RTCSA 2014 - 20th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications |
---|
Conference
Conference | 20th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA 2014 |
---|---|
Country/Territory | China |
City | Chongqing |
Period | 20/08/14 → 22/08/14 |
Bibliographical note
Publisher Copyright:© 2014 IEEE.