Project

General

Profile

Bug #15400

unit test test_feag and test_feag2 fail on MacOS only

Added by Vito Di Benedetto about 3 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Category:
Other
Target version:
-
Start date:
01/31/2017
Due date:
% Done:

100%

Estimated time:
Spent time:
Occurs In:
Experiment:
-
Co-Assignees:
Duration:

Description

Running LArSoft unit tests as part of the CI build on Jenkins, the unit tests test_feag and test_feag2 fail on MacOS platform (on Linux these unit test succeed).
From the LArSoft CI build I see that the unit test test_feag2 fails almost all the time, while unit test test_feag fail occasionally.

The ctest log of unit test test_feag2 reports:

41/84 Testing: test_feag2
41/84 Test: test_feag2
Command:
"/grid/fermiapp/products/larsoft/cetbuildtools/v5_06_01/bin/cet_exec_test" 
"--wd" 
"/Users/macdev/workspace/lar_ci_test_2/label_exp/OSX-10.10/label_exp2/swarm/build_d14.x86_64/lardata/test/Utilities/testForEachAssociatedGroup/test_feag2.d" 
"--required-files" "" "--datafiles" 
"/Users/macdev/workspace/lar_ci_test_2/label_exp/OSX-10.10/label_exp2/swarm/srcs/lardata/test/Utilities/testForEachAssociatedGroup/test_feag2.fcl" 
"--skip-return-code" "247" "lar" "--rethrow-all" "--config" "test_feag2.fcl" 
Directory:
/Users/macdev/workspace/lar_ci_test_2/label_exp/OSX-10.10/label_exp2/swarm/build_d14.x86_64/lardata/test/Utilities/testForEachAssociatedGroup/test_feag2.d
"test_feag2" start time: Jan 16 21:04 CST
Output:
----------------------------------------------------------
%MSG-i MF_INIT_OK:  16-Jan-2017 21:04:38 CST JobSetup
Messagelogger initialization complete.
%MSG
16-Jan-2017 21:04:40 CST  Initiating request to open input file
"../test_feag.d/out.root" 
16-Jan-2017 21:04:40 CST  Opened input file "../test_feag.d/out.root" 
Begin processing the 1st record. run: 1 subRun: 0 event: 1 at
16-Jan-2017 21:04:40 CST
<end of output>
Test time =   3.06 sec
----------------------------------------------------------
Test Failed.
"test_feag2" end time: Jan 16 21:04 CST
"test_feag2" time elapsed: 00:00:03
----------------------------------------------------------

The ctest log of unit test test_feag reports:

345: 40/86 Testing: test_feag
346: 40/86 Test: test_feag
347: Command:
"/grid/fermiapp/products/larsoft/cetbuildtools/v5_06_01/bin/cet_exec_test" 
"--wd" 
"/Users/macdev/workspace/lar_ci_test/label_exp/OSX-10.10/label_exp2/swarm/build_d14.x86_64/lardata/test/Utilities/testForEachAssociatedGroup/test_feag.d" 
"--required-files" "" "--datafiles" 
"/Users/macdev/workspace/lar_ci_test/label_exp/OSX-10.10/label_exp2/swarm/srcs/lardata/test/Utilities/testForEachAssociatedGroup/test_feag.fcl" 
"--skip-return-code" "247" "lar" "--rethrow-all" "--config" "test_feag.fcl" 
348: Directory:
/Users/macdev/workspace/lar_ci_test/label_exp/OSX-10.10/label_exp2/swarm/build_d14.x86_64/lardata/test/Utilities/testForEachAssociatedGroup/test_feag.d
349: "test_feag" start time: Jan 30 08:18 CST
350: Output:
351: ----------------------------------------------------------
352: %MSG-i MF_INIT_OK:  30-Jan-2017 08:18:19 CST JobSetup
353: Messagelogger initialization complete.
354: %MSG
355: %MSG-i FastCloning:  RootOutput:out1@Construction 30-Jan-2017
08:18:22 CST  ModuleConstruction
356: Initial fast cloning configuration (from default): true
357: %MSG
358: Begin processing the 1st record. run: 1 subRun: 0 event: 1 at
30-Jan-2017 08:18:22 CST
359: 30-Jan-2017 08:18:23 CST  Opened output file with pattern "out.root" 
360: %MSG-w FastCloning:  PostProcessPath end_path 30-Jan-2017 08:18:23
CST  PostProcessEvent
361: Fast cloning deactivated for this input file due to empty event
tree and/or event limits.
362: %MSG
363: Begin processing the 2nd record. run: 1 subRun: 0 event: 2 at
30-Jan-2017 08:18:23 CST
364: <end of output>
365: Test time =   5.31 sec
366: ----------------------------------------------------------
367: Test Failed.
368: "test_feag" end time: Jan 30 08:18 CST
369: "test_feag" time elapsed: 00:00:05
370: ----------------------------------------------------------

Associated revisions

Revision 53691de9 (diff)
Added by Gianluca Petrillo about 3 years ago

Tentative fix to RangeForWrapperBox (util::range_for) per issue #15400 .

I was saving the begin and end iterators from the (potentially temporary) range into my wrapper class.
Chances are that those iterators still need the range to be around to work (as they have the right to demand).
Sometimes the temporary was disappearing too early, and the iterators would go berserk.

Revision af3507b6 (diff)
Added by Gianluca Petrillo about 3 years ago

Another try to fix issue #15400 (this one works on Linux).

Revision 48627fca (diff)
Added by Gianluca Petrillo about 3 years ago

And added another bit of test for issue #15400 .

History

#1 Updated by Lynn Garren about 3 years ago

  • Status changed from New to Assigned
  • Assignee set to Gianluca Petrillo

#2 Updated by Gianluca Petrillo about 3 years ago

  • Category set to Other
  • Status changed from Assigned to Resolved
  • % Done changed from 0 to 100
  • Occurs In v06_21_00 added

Fix has been pushed as lardata:53691de98c6fbccde8b262914672b382529a2a4c .
An explanation of the problem follows.

What range-for loops guarantee

According to cppreference.com, a range-for loop

for (range_declaration: range_expression) loop_statement
is equivalent to
{
  auto && __range = range_expression ; 
  for (auto __begin = begin_expr, __end = end_expr; __begin != __end; ++__begin) { 
    range_declaration = *__begin; 
    loop_statement 
  } 
} 
A key aspect is that __range is either a (lvalue) reference or a value holding content moved from the range_expression. In both cases, the result of range_expression is kept in scope until the end of the loop.

The issue in the old wrapper implementation

The class I implemented is used as

for (range_declaration: range_expression | util::range_for) loop_statement
where the additional construct creates an object wrapping the original range_expression, so that __range stores an instance of the wrapper rather than of range_expression (the wrapper is always a temporary and it is therefore always moved into __range).
The wrapped did not save range_expression, but rather it saved the begin() and end() iterators directly. Therefore, the wrapper did not explicitly keep the result of range_expression in scope.
Then, when range_expression is a temporary, like in our test case where it is the return value of a "pipe" of ranges from the ranges library, it falls out of scope after the instance of the wrapper has been created. If the iterators refer to that structure, they will access some "unguarded" memory. At that point, depending on the compiler choice and on the content of the loop, that memory could be overwritten, and the use of those iterators will cause undefined behaviour.

It was not obvious that the iterators from the range library would need the original range, since the actual data is not in the range itself. Experimental evidence shows at least some of them do.

Test failure

The bugged code was unit-tested using a loop for(auto& v: container | util::range_for), where container was:

  1. a std::vector: the problem does not show because the wrapper is a no-op for ranges/containers with begin() and end() iterators of the same type
  2. a custom class with different begin() and end() iterators: the problem did not show because the class was not a temporary

Tests have been added using a temporary copy of the containers. Anyway, I can't recall having been able to systematically reproduce the problem, because of the compilation-dependent quality of the issue.

#3 Updated by Gianluca Petrillo about 3 years ago

  • Status changed from Resolved to Assigned

Reopening after the following report by Vito Di Benedetto:

the CI build of the develop branch of LArSoft reports a failure in the lardata unit test RangeForWrapper_test. For now this is for the Linux build, the MacOS build is still building the code.
I'll provide an update as it is available.

From the log, the relevant part about the unit test RangeForWrapper_test is:

964: 35/147 Testing: RangeForWrapper_test
965: 35/147 Test: RangeForWrapper_test
966: Command:
"/grid/fermiapp/products/larsoft/cetbuildtools/v5_06_06/bin/cet_exec_test" 
"--wd" 
"/scratch/workspace/lar_ci_test/label_exp/SLF6/label_exp2/swarm/build_slf6.x86_64/lardata/test/Utilities/RangeForWrapper_test.d" 
"--required-files" "" "--datafiles" "" "--skip-return-code" "247" 
"/scratch/workspace/lar_ci_test/label_exp/SLF6/label_exp2/swarm/build_slf6.x86_64/lardata/bin/RangeForWrapper_test" 
967: Directory:
/scratch/workspace/lar_ci_test/label_exp/SLF6/label_exp2/swarm/build_slf6.x86_64/lardata/test/Utilities/RangeForWrapper_test.d
968: "RangeForWrapper_test" start time: Feb 13 10:31 CST
969: Output:
970: ----------------------------------------------------------
971: Running 2 test cases...
972:
/scratch/workspace/lar_ci_test/label_exp/SLF6/label_exp2/swarm/srcs/lardata/test/Utilities/RangeForWrapper_test.cc(119):
error: in "RangeForWrapperSameIterator_test": check total ==
expected_total has failed [4 != 9]
973:
/scratch/workspace/lar_ci_test/label_exp/SLF6/label_exp2/swarm/srcs/lardata/test/Utilities/RangeForWrapper_test.cc(142):
error: in "RangeForWrapperSameIterator_test": check total ==
expected_total has failed [10604548 != 9]
974:
/scratch/workspace/lar_ci_test/label_exp/SLF6/label_exp2/swarm/srcs/lardata/test/Utilities/RangeForWrapper_test.cc(150):
error: in "RangeForWrapperSameIterator_test": check total == 3 *
expected_total has failed [31794156 != 27]
975:
976: *** 3 failures are detected in the test module "(
PointIsolationAlg_test )" 
977: <end of output>
978: Test time =   3.45 sec
979: ----------------------------------------------------------
980: Test Failed.
981: "RangeForWrapper_test" end time: Feb 13 10:31 CST
982: "RangeForWrapper_test" time elapsed: 00:00:03
983: ----------------------------------------------------------

#4 Updated by Gianluca Petrillo about 3 years ago

  • Status changed from Assigned to Resolved

The failure pertains the "other" part of the code.
The util::range_for is design to present the input range as an object whose begin and end iterators are of the same type. When the input range does have different types, a wrapping class is used. That was the place where the previous problems were located.
When the input range already has the two iterators with the same type, the utility should be sort of a no-operation. The last issue was affecting this half, when the input range is temporary (a rvalue).
I have imposed more control on what the code does, explicitly coding what to do in the different cases rather than relying on my understanding of what C++ would implicitly do.

Marking again to resolved. Finger crossed.

#5 Updated by Katherine Lato over 2 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF