Project

General

Profile

Support #22504

Request to upgrade the tensorflow version

Added by Tingjun Yang 4 months ago. Updated about 1 month ago.

Status:
Feedback
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
05/02/2019
Due date:
% Done:

0%

Estimated time:
Experiment:
ArgoNeut, DUNE
Co-Assignees:
Duration:

Description

Dear LArSoft experts,

Would it be possible to upgrade tensorflow in larsoft? The current version is v1_3_0 and it would be great if it can be upgraded to v1_8_0 or higher. This will help the deep learning development in DUNE and ArgoNeuT and possibly other experiments.

Thanks,
Tingjun

errors.txt (49.4 KB) errors.txt Tingjun Yang, 07/12/2019 12:15 PM

History

#1 Updated by Kyle Knoepfel 3 months ago

  • Status changed from New to Feedback

We are accepting this, but we would like assistance from your tensorflow experts. With whom should we setup a meeting?

#2 Updated by Tingjun Yang 3 months ago

Hi Kyle,

You can setup a meeting with Saul and Leigh. I would like to join the meeting but they are the tensorflow experts.

Thanks,
Tingjun

#3 Updated by Lynn Garren 3 months ago

We also have a request from MicroBooNE for a working python interface. I would like have a joint meeting. Tentatively scheduled for 10 am May 8.

#4 Updated by Tingjun Yang 3 months ago

Lynn Garren wrote:

We also have a request from MicroBooNE for a working python interface. I would like have a joint meeting. Tentatively scheduled for 10 am May 8.

This does not work for us as there will be a ProtoDUNE meeting at the same time.

#5 Updated by Tingjun Yang about 1 month ago

  • Status changed from Feedback to Work in progress

From Lynn's email:

I had a talk with Marc P about the tensorflow build.  We came up with 
two possible options.  Fortunately, the cleanest option has worked and I 
have provided new patches for the c++17 build.   I also had a look at 
the headers that are installed in the tensorflow include directory and 
refined the set.

tensorflow v1_12_0b is available for testing on cvmfs.

Please let us know if you have problems using this release.  We presume 
that you will supply a larsim feature branch for use with tensorflow 
v1_12_0b.   We should alert the larsoft mailing list and make sure there 
are no objections before making a release with the new build.

This release is only available for e17 (and e17:py3).

I remain concerned about tensorflow going forward.  With newer releases 
we will have to use the bazel build, which appears to be problematic for 
spack.  Also, tensorflow seems to be making its own copy of some 
utilities that are usually provided by the system.  As long as 
everything is completely contained, that should be fine. Given that 
tensorflow maintains its own ecosystem, it may be wise to consider 
running it inside a container and taking the output in some fashion.

Lynn

#6 Updated by Tingjun Yang about 1 month ago

I have started testing tensorflow v1_12_0b. The upgrade seems to be quite straightforward. The only issue is the following error:

/cvmfs/larsoft.opensciencegrid.org/products/tensorflow/v1_12_0b/Linux64bit+3.10-2.17-e17-prof/include/tensorflow/core/public/session.h:24,
                 from /data/tjyang/dune/larsoft_em/srcs/dunetpc/dune/CVN/tf/tf_graph.cc:12:
/cvmfs/larsoft.opensciencegrid.org/products/tensorflow/v1_12_0b/Linux64bit+3.10-2.17-e17-prof/include/tensorflow/core/lib/core/stringpiece.h:29:10: fatal error: absl/strings/string_view.h: No such file or directory
 #include "absl/strings/string_view.h" 
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~

The solution is to add include_directories( $ENV{TENSORFLOW_INC}/absl ) to the CMakeLists.txt file wherever session.h is included.

I have created feature branches feature/team_for_tensorflow_v1_12_0b in both larreco and dunetpc. I am going to test if the tensorflow results remain the same for DUNE. I will deal with argoneutcode later.

#7 Updated by Tingjun Yang about 1 month ago

Tried to run tensorflow on an existing DUNE MC file:

lar -c select_ana_dune10kt_nu.fcl xroot://fndca1.fnal.gov:1094/pnfs/fnal.gov/usr/dune/tape_backed/dunepro/mcc11/protodune/mc/full-reconstructed/07/51/34/20/nue_dune10kt_1x2x6_12855888_0_20181104T211321_gen_g4_detsim_reco.root -n -1

There are lots of errors like this
2019-07-12 12:04:58.201673: E tensorflow/core/framework/op_kernel.cc:1197] OpKernel ('op: "MutableDenseHashTableV2" device_type: "CPU" constraint { name: "key_dtype" allowed_values { list { type: DT_STRING } } } constraint { name: "value_dtype" allowed_values { list { type: DT_INT64 } } }') for unknown op: MutableDenseHashTableV2

The full error log is attached.
The program hangs on the first event:
Classifier summary: 
Output 0: 0.0257705, 
Output 1: 2.55598e-05, 0.999172, 0.000753308, 4.93325e-05, 
Output 2: 0.00586059, 0.367086, 0.627042, 1.20333e-05, 
Output 3: 6.90421e-05, 3.75733e-05, 0.000108019, 0.999785, 
Output 4: 0.997856, 0.0021328, 1.05676e-05, 3.6611e-07, 
Output 5: 0.999907, 9.18793e-05, 7.18959e-07, 3.51992e-08, 
Output 6: 0.999743, 0.000255833, 9.39033e-07, 3.14338e-07, 

I guess the conclusion is that we cannot use the old networks with the new version of tensorflow.

#8 Updated by Kyle Knoepfel about 1 month ago

There appears to be a conflict between the old and new tensorflow-generated data schema. Unless tensorflow supports schema evolution, you may need to regenerate your tensorflow-formatted data.

#9 Updated by Kyle Knoepfel about 1 month ago

  • Status changed from Work in progress to Feedback

Do you think the feature branches you are working on will be ready for this week's release?

#10 Updated by Tingjun Yang about 1 month ago

Kyle Knoepfel wrote:

Do you think the feature branches you are working on will be ready for this week's release?

Hi Kyle,
No, we need to train new networks in order to use the new version of tensorflow. By the way, does MicroBooNE use tensorflow in their code? If so they also need to retrain in order to use the new tensorflow.

#11 Updated by Lynn Garren about 1 month ago

MicroBooNE uses tensorflow via a container. They do not use our build. We expect to have a report from them at the next coordination meeting.



Also available in: Atom PDF