Latency and bandwidth requirements of massively parallel programs: FFT as a case study