On Pipelining Sequences of Data-Dependent Loops
Rui M. M. Rodrigues (INESC-ID/IST, Portugal)
João M. P. Cardoso (INESC-ID/IST, Portugal)
Abstract: Sequences of data-dependent tasks, each one traversing large data sets, exist in many applications (such as video, image and signal processing applications). Those tasks usually perform computations (with loop intensive behavior) and produce new data to be consumed by subsequent tasks. This paper shows a scheme to pipeline sequences of data-dependent loops, in such a way that subsequent loops can start execution before the completion of the previous ones, which achieves performance improvements. It uses a hardware scheme with decoupled and concurrent data-path and control units that start execution at the same time. The communication of array elements between two loops in sequence is performed by special buffers with a data-driven, fine-grained scheme. Buffer elements are responsible to flag the availability of each array element requested by a subsequent loop (i.e., a ready protocol is used to trigger the execution of operations in the succeeding loop). Thus, the control execution of following loops is also orchestrated by data availability (in this case at the array element grain) and out-of-order produced-consumed pairs are permitted. The concept has been applied using Nau, a compiler infrastructure to map algorithms described in Java onto FPGAs. This paper presents very encouraging results showing important performance improvements and buffer size reductions for a number of benchmarks.
Keywords: FPGAs, compilation,, hardware schemes, loop pipelining
Categories: B.1.2, B.5.2, B.7.2, C.1.2,, C.1.4, C.5.4, D.3.4