Table. 3 shows the parallel execution performance measured on a Cray Supserserver configured with 32 CPUs. Linear parallel gain was obtained for the compiled Fibonacci function, because there is no shared memory access and the program code is small enough to be fully loaded onto the cache memory of each processor. Contrally, when the same program was interpreted, linearly high performance could not be attained, since memory access scatters. Further, some programs that frequently refer to shared memory and request memory allocation cannot exhibit better performance than a single processor execution. This can be understood as the result of frequent cache memory purging.
|
k-okada 2013-05-21