Status meeting, March 23rd, 2012

Summary of Discussion with John A.

See notes at: Phone meeting with John Apostolakis 3/22/2012

Result from performance evaluation of CUDA code (Soon)

Compare the GPU and CPU of fields extrapolation and one step RK on various layout of block/grid.

Also compare with 100 times more/repeated work. Factor 10 improvements with 1 times. Factor 160 improvement with 100 times.

Jim: What was the bandwidth obtained.

Moving up takes longer than moving down (from the GPU). Testing on 1000 or 10,000 tracks (6 or 7 floats).
down was the same.

If doing 'just' the RK ... the speed improvement goes down.

Marc: question about memory bandwidth (.6 ms of 10000*6*sizeof(double) -> 1.2Gb and up 2Gb for 100,000 tracks)

Also tried zero copy technique: poor performance. Page lock memory also does not help (worst performance).

Update on use of texture memory for magnetic field interpolation (Philippe)

... results ...

Need a bit more understanding why the 'single thread' GPU is 3 times slower than expected.
Calculate the number of GPU cores actually.

Action Items for Philippe:
Run 2 kernels in parallel ... expand to more of Rudge Kunta.
Marc would like a sample 1000 lookups.
Check inclusive of the memory copies.