Status meeting, March 30th, 2012¶
Update on propagation algorithm (Marc).¶
Checking whether the interpolation accuracy. How do we decide if an interpolation is good enough?
In general, there is some unexplained structure to the pattern of error. Roughly the GPU/texture is accurate within 0.5%.
Q: How accurate is the input of the magnetic field strength.
To do: Ask G4/CMS collaboration what is accurate enough.
Update on extending the embedded RK GPU implementation (Soon)¶
Soon: I am starting from the transportation get_physical_interaction_length and trying to go all the way down stream. There are 2 big parts: magnetic field part which can be implemented onto the GPU because we have a field map that does not rely on geometry ; the other part is the navigator that needs to calculate the length from the current position to the next interaction location. If we fix the interaction length, there is still a couple of tricky places. Without the interaction locator, I think we can implement everything we need. I am nearly done but still need to do a lot of testing. I should testing starting early next week.
Update on performance characteristic of GPU version of magnetic field interpolation (Philippe)¶
Performance of the interpolation code on GPU is strongly influenced by the performance of the memory looking within the GPU. In particular by the leveraging of one memory fetch (a cache line of 128bit) across multiple threads that are part of a warp.
The task (getting the field of for 64512) takes 10.8ms on the CPU and can take as little as 0.210ms on GPU (exclusive of memory upload and download of the data) and 3.349ms inclusive of data upload and download.
In pure calculation: CPU is 4.8 times faster in total 'core' time used. (exclusive of memory upload and download)
In optimal memory access: CPU is 8.7 times faster in total 'core' time used. (exclusive of memory upload and download)
In worse memory access: CPU is 25.1 times faster in total 'core' time used. (exclusive of memory upload and download)
In pure calculation GPU is 93 times faster in real time with 448 cores fully used. (exclusive of memory upload and download)
In optimal memory access GPU is 47.8 times faster in real time with 448 cores fully used (exclusive of memory upload and download)
Also texture extrapolation (used in the number above) is twice as fast as the explicit interpolation.