How Hardware Accelerators Trade-Off Pipelining and Parallelism to Maximize Efficiency