This paper describes a scheme for parallel execution on FPGAs of DSP tasks which rely heavily on MAC operations. Multiple operations are assigned to a single ‘processing node’ such that each node can operate just in real-time. Where the number of MACs required exceeds the capability of a single processing node additional nodes are added until the capacity of the FPGA is exhausted. Additional requirements beyond the capability of a single FPGA are accommodated by extension across multiple devices, offering significant scalability. Resource usage, performance results for an example acoustic modelling application on a modest single FPGA and development system are presented.