Status meeting, March 23rd, 2012¶
Summary of Discussion with John A.¶
See notes at: Phone meeting with John Apostolakis 3/22/2012
Result from performance evaluation of CUDA code (Soon)¶
Compare the GPU and CPU of fields extrapolation and one step RK on various layout of block/grid.
Also compare with 100 times more/repeated work. Factor 10 improvements with 1 times. Factor 160 improvement with 100 times.
Jim: What was the bandwidth obtained.
Moving up takes longer than moving down (from the GPU). Testing on 1000 or 10,000 tracks (6 or 7 floats).
down was the same.
If doing 'just' the RK ... the speed improvement goes down.
Marc: question about memory bandwidth (.6 ms of 10000*6*sizeof(double) -> 1.2Gb and up 2Gb for 100,000 tracks)
Also tried zero copy technique: poor performance. Page lock memory also does not help (worst performance).
Update on use of texture memory for magnetic field interpolation (Philippe)¶
... results ...
Need a bit more understanding why the 'single thread' GPU is 3 times slower than expected.
Calculate the number of GPU cores actually.
Action Items for Philippe:
Run 2 kernels in parallel ... expand to more of Rudge Kunta.
Marc would like a sample 1000 lookups.
Check inclusive of the memory copies.