Bug #7576
test_space_charge_3d_open_hockney_mpi4 fails intermittently
0%
Description
test_space_charge_3d_open_hockney_mpi4 fails intermittently most prominently on the upgraded Wilson cluster, but I find it happening on SLF6.5 panal7 desktop machine too.
Intermittently from ctest:
[egstern@panal7 synergia2]$ ctest --verbose -R test_space_charge_3d_open_hockney_mpi4
UpdateCTestConfiguration from :/home/egstern/syn2-dev/build/synergia2/DartConfiguration.tcl
UpdateCTestConfiguration from :/home/egstern/syn2-dev/build/synergia2/DartConfiguration.tcl
Test project /home/egstern/syn2-dev/build/synergia2
Constructing a list of tests
Done constructing a list of tests
Checking test dependency graph...
Checking test dependency graph end
test 146
Start 146: test_space_charge_3d_open_hockney_mpi4
146: Test command: /usr/lib64/openmpi/bin/mpiexec "-np" "4" "/home/egstern/syn2-dev/build/synergia2/src/synergia/collective/tests/test_space_charge_3d_open_hockney_mpi"
146: Test timeout computed to be: 9.99988e+06
146: Running 17 test cases...
146: Running 17 test cases...
146: Running 17 test cases...
146: Running 17 test cases...
146:
146: * No errors detected
146:
146: No errors detected
146:
146: No errors detected
146:
146: * No errors detected
1/1 Test #146: test_space_charge_3d_open_hockney_mpi4 ... Passed 1.43 sec
The following tests passed:
test_space_charge_3d_open_hockney_mpi4
100% tests passed, 0 tests failed out of 1
Total Test time (real) = 1.44 sec
[egstern@panal7 synergia2]$ ctest --verbose -R test_space_charge_3d_open_hockney_mpi4
UpdateCTestConfiguration from :/home/egstern/syn2-dev/build/synergia2/DartConfiguration.tcl
UpdateCTestConfiguration from :/home/egstern/syn2-dev/build/synergia2/DartConfiguration.tcl
Test project /home/egstern/syn2-dev/build/synergia2
Constructing a list of tests
Done constructing a list of tests
Checking test dependency graph...
Checking test dependency graph end
test 146
Start 146: test_space_charge_3d_open_hockney_mpi4
146: Test command: /usr/lib64/openmpi/bin/mpiexec "-np" "4" "/home/egstern/syn2-dev/build/synergia2/src/synergia/collective/tests/test_space_charge_3d_open_hockney_mpi"
146: Test timeout computed to be: 9.99988e+06
146: Running 17 test cases...
146: Running 17 test cases...
146: Running 17 test cases...
146: Running 17 test cases...
146: unknown location(0): fatal error in "auto_tune_comm_sptr": memory access violation at address: 0x01b33780: no mapping at fault address
146: /home/egstern/syn2-dev/build/synergia2/src/synergia/collective/tests/test_space_charge_3d_open_hockney_mpi.cc(92): last checkpoint
And also if I call the test executable directly:
[egstern@panal7 synergia2]$ mpirun -np 4 /home/egstern/syn2-dev/build/synergia2/src/synergia/collective/tests/test_space_charge_3d_open_hockney_mpi
Running 17 test cases...
Running 17 test cases...
Running 17 test cases...
Running 17 test cases...
- No errors detected
- No errors detected
- No errors detected
- No errors detected
[egstern@panal7 synergia2]$ mpirun -np 4 /home/egstern/syn2-dev/build/synergia2/src/synergia/collective/tests/test_space_charge_3d_open_hockney_mpi
Running 17 test cases...
Running 17 test cases...
Running 17 test cases...
Running 17 test cases...
unknown location(0): fatal error in "auto_tune_comm_sptr": memory access violation at address: 0x021ff8b0: no mapping at fault address
/home/egstern/syn2-dev/build/synergia2/src/synergia/collective/tests/test_space_charge_3d_open_hockney_mpi.cc(92): last checkpoint
History
#1 Updated by Eric Stern about 6 years ago
The memory access violation is apparently occurring in get_global_electric_field_component_allgatherv.
#2 Updated by Eric Stern about 6 years ago
- Status changed from New to Resolved
Removed the auto_tune_comm code and its use in the tests per suggestion of J. Amundson in commit a99f2e6302294a8084d8a382633e5f9a15f29516. Fix tested on platforms where the bug was most evident: panal7, tev, and amd32 worker node.
#3 Updated by Eric Stern about 6 years ago
- Status changed from Resolved to Closed