LibTorch v1.4 for DUNE reconstruction
I'd like to request that a build of LibTorch v1.4.0 be made available within LArSoft (this use case is for larpandoracontent in particular).
The Pandora team have developed a couple of algorithms that use PyTorch-based neural networks that we'll be looking to make available in the near future. One of these algorithms makes use of torch.optim.lr_scheduler modules that are not available in early versions of LibTorch, and pending developments using sparse convolutional networks also require v1.3+.
It's my understanding that there is a longer-term program relating to the provision of PyTorch and Tensorflow interfaces to support machine learning solutions being developed for DUNE sim/reco, but in the interim is it possible to provide a v1.4.0 build of LibTorch to link against? Thanks very much.
#1 Updated by Lynn Garren 8 months ago
- Assignee set to Lynn Garren
Note that we have concerns about which platforms and compilers can be supported. I see that we only built libtorch 1.0.1 for e17 on SLF7.
We strongly suggest that any interface using libtorch be modular and optional, since it may not be available for all compilers and platforms. We are currently sorting out how best to remove tensorflow from larreco. There will be a new github repository, name as yet unclear.
#2 Updated by Andrew Chappell 8 months ago
From the Pandora side, we will look to make the use of LibTorch optional, with default values assigned to the properties populated by the neural network to ensure that downstream algorithms that can make use of this information will continue to operate correctly if the network is unavailable.
Thanks very much for taking the time to look into this, it is greatly appreciated, as this will be very useful to us for both current and future development.
#6 Updated by Lynn Garren 4 months ago
- % Done changed from 0 to 100
- Status changed from Assigned to Resolved
libtorch v1_5_1 is available on SciSoft and larsoft cvmfs. For various reasons, this package is only available for e19 on SLF7. When larsoft moves to art 3.6, it should be available for all supported platforms and compilers.
To use libtorch:
setup libtorch v1_5_1 -q e19:eigen
#9 Updated by Andrew Chappell 4 months ago
Lynn Garren wrote:
libtorch v1_5_1a is now available on larsoft cvmfs. Please let us know if this works for you.
Thanks for this Lynn. I'm currently on vacation, so I will check this when I return, but I don't anticipate any issues with moving to 1.5.
#11 Updated by Andrew Chappell 3 months ago
Hi Lynn, Kyle,
I've now had the opportunity to test the LibTorch v1.5.1a build - thanks again for your efforts here. Although I've been able to get the network running and reproducing output consistent with what I was seeing with v1.4, the inference is taking much longer (per event processing has jumped from ~1.5 seconds on v1.4 to ~30 seconds on v1.5.1).
I've tested this both on my local system at Warwick (which is where the v1.4 baseline timing was established) and on dunegpvm02 and find similar runtime on each. I've also tested a network produced using 1.5.1 rather than 1.4 and, again, this did not improve the runtime.
Do you have any thoughts on possible reasons for this behaviour? Thanks again.
#12 Updated by Kyle Knoepfel 3 months ago
Andy, we have a few ideas why you might be seeing the behavior you're seeing. Please open another issue to address the efficiency side of the installation. Two requests:
- We ask that you include the build flags you used for your personal installation of libtorch.
- Can you build libtorch 1.5 yourself and reproduce the same difference in behavior?