Project

General

Profile

Support #23361

Abseil not c++17 compliant, causing problems in Tensorflow v1_12_0b e17 build

Added by Pengfei Ding about 2 months ago. Updated 22 days ago.

Status:
Resolved
Priority:
Normal
Target version:
-
Start date:
10/02/2019
Due date:
% Done:

100%

Estimated time:
Duration:

Description

This ticket is to pass on knowledge NOvA has just learnt about Tensorflow v1_12_0b e17 build, just in case other experiments get into the same issue.

They made their own v1_12_0b Tensorflow e17 build with two changes:
1. build with gcc 7.3 but not use c++17, instead use c++14;
2. build on a node with intel CPU so that "SSE" extensions was enabled on the built shared library.

The problem started when NOvA found they have problems with v1_12_0b Tensorflow e17 build.

Code got hung at:

absl::InlinedVector<long long, 4ul, std::allocator<long long> >::EnlargeBy(unsigned long) ()

It looked the third party package Abseil used by Tensorflow is not c++17 compliant. When testing with a build with gcc 7.3 but with c++14, the problem seemed went away.

Additionally, they found the shared library in the current v1_12_0b on SciSoft does not seem to use intel's "SSE" extensions. This made running it very slow and inefficient. This might be caused by Jenkins server which distributed the build task to an AMD machine, thus not supporting SSE extensions. Rebuilding the product on a machine with Intel CPUs mitigated the problem.

NOvA previously had issues when running grid jobs with the tensorflow product with SSE turned on. That was caused by jobs hitting nodes with AMD CPUs or older nodes not supporting SSE. But that problem was mitigated later when they found condor/jobsub can actually filter on CPU types when submitting jobs.)

Thanks,
Pengfei


Related issues

Related to LArSoft - Support #22504: Request to upgrade the tensorflow versionFeedback05/02/2019

History

#1 Updated by Kyle Knoepfel about 1 month ago

  • Related to Support #22504: Request to upgrade the tensorflow version added

#2 Updated by Lynn Garren 23 days ago

  • Assignee set to Christopher Green
  • Status changed from New to Assigned

#3 Updated by Christopher Green 22 days ago

  • % Done changed from 0 to 100
  • Status changed from Assigned to Resolved

Thank you for this information. As a side note: generally speaking "standards compliance" issues do not cause run-time errors, although compiler-specific bugs could plausibly do so. In this case however, I understand from other sources that this was a bug in Abseil itself.

However, it is important specifically with Abseil (not the case in general) to ensure that everything is built to the same C++ standard, as it appears that in recent versions types (e.g. absl::string_view) may be different in headers depending on the standard selected. If that is different for the Abseil library vs the code you're using it with, there may be issues.



Also available in: Atom PDF