Bug #20246

Illegal Instruction in hep_concurrency

Added by Christopher Backhouse over 2 years ago. Updated over 2 years ago.

Target version:
Start date:
Due date:
% Done:


Estimated time:
2.00 h
Spent time:
Occurs In:
SSI Package:


Running essentially anything in art v2_11_01 gets me an Illegal Instruction inside hep_concurrency inside messagefacility. This same machine used to work with art1 releases.

Program received signal SIGILL, Illegal instruction.
hep::concurrency::getTSCP (cpuidx=@0x7ffffffe66d4: 0)
   at /scratch/workspace/art-release-build/SLF6/debug/build/hep_concurrency/v1_00_01/src/hep_concurrency/
28    /scratch/workspace/art-release-build/SLF6/debug/build/hep_concurrency/v1_00_01/src/hep_concurrency/ No such file or directory.

#1  0x00007fffee94506b in hep::concurrency::RecursiveMutex::lock (
   this=0x7ffff7b4a440 <mf::(anonymous namespace)::msgMutex_>, opName=...)
   at /scratch/workspace/art-release-build/SLF6/debug/build/hep_concurrency/v1_00_01/src/hep_concurrency/
72    /scratch/workspace/art-release-build/SLF6/debug/build/hep_concurrency/v1_00_01/src/hep_concurrency/ No such file or directory.

#2  0x00007fffee946d40 in hep::concurrency::RecursiveMutexSentry::RecursiveMutexSentry (this=0x7ffffffe67e0, mutex=..., name=...)
   at /scratch/workspace/art-release-build/SLF6/debug/build/hep_concurrency/v1_00_01/src/hep_concurrency/
283    in /scratch/workspace/art-release-build/SLF6/debug/build/hep_concurrency/v1_00_01/src/hep_concurrency/

#3  0x00007ffff78db6aa in mf::(anonymous namespace)::logMessage (msg=0x44c4ec0)
   at /scratch/workspace/art-release-build/SLF6/debug/build/messagefacility/v2_02_01/src/messagefacility/MessageLogger/
438    /scratch/workspace/art-release-build/SLF6/debug/build/messagefacility/v2_02_01/src/messagefacility/MessageLogger/ No such file or directory.

#4  0x00007ffff78dba4f in mf::LogErrorObj (msg=0x44c4ec0)
   at /scratch/workspace/art-release-build/SLF6/debug/build/messagefacility/v2_02_01/src/messagefacility/MessageLogger/
547    in /scratch/workspace/art-release-build/SLF6/debug/build/messagefacility/v2_02_01/src/messagefacility/MessageLogger/

#5  0x00007ffff7d55ec3 in mf::MaybeLogger_<(mf::ELseverityLevel::ELsev_)3, false>::~MaybeLogger_ (this=0x7ffffffe8b28, __in_chrg=<optimized out>)
   at /scratch/workspace/art-release-build/SLF6/debug/build/messagefacility/v2_02_01/include/messagefacility/MessageLogger/MessageLogger.h:143
143    /scratch/workspace/art-release-build/SLF6/debug/build/messagefacility/v2_02_01/include/messagefacility/MessageLogger/MessageLogger.h: No such file or directory.

#6  0x00007ffff7d51ff9 in art::run_art_common_ (main_pset=...)
   at /scratch/workspace/art-release-build/SLF6/debug/build/art/v2_11_01/src/art/Framework/Art/
287    /scratch/workspace/art-release-build/SLF6/debug/build/art/v2_11_01/src/art/Framework/Art/ No such file or directory.

#7  0x00007ffff7d51315 in art::run_art (argc=4, argv=0x7ffffffe9608,
   in_desc=..., lookupPolicy=..., handlers=...)
   at /scratch/workspace/art-release-build/SLF6/debug/build/art/v2_11_01/src/art/Framework/Art/
206    in /scratch/workspace/art-release-build/SLF6/debug/build/art/v2_11_01/src/art/Framework/Art/

/proc/cpuinfo says this:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Core(TM)2 Quad CPU    Q9550  @ 2.83GHz
stepping        : 10
microcode       : 2571
cpu MHz         : 2826.477
cache size      : 6144 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm pti retpoline tpr_shadow vnmi flexpriority
bogomips        : 5652.95
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

Is it possible to rebuild with less aggressive CPU requirements? It's obviously bad to have machines that used to work stop working. I wouldn't be surprised if we've also made some fraction of the grid unusable.

Related issues

Has duplicate cet-is - Bug #20488: DUNE jobs on some off-site worker nodes are terminated with exit status 4 (SIGILL)Rejected07/30/2018


#1 Updated by Kyle Knoepfel over 2 years ago

  • Status changed from New to Feedback

We need more details, Chris:

  • Which platform (SLF6?)
  • Printout of 'ups active'
  • Sample job that reproduces the problem

#2 Updated by Christopher Backhouse over 2 years ago


art               v2_11_01        -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
artdaq_core       v3_01_08        -f Linux64bit+2.6-2.12  -q debug:e15:s67   -z /cvmfs/
awscli            v1_7_15         -f Linux64bit+2.6-2.12                     -z /cvmfs/
boost             v1_66_0a        -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
bpf               v02.01          -f NULL                                    -z /cvmfs/
caffe             v1_0i           -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
calibcsvs         v12.06          -f NULL                                    -z /cvmfs/
calibfixnd        v01.00          -f NULL                                    -z /cvmfs/
canvas            v3_03_01        -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
canvas_root_io    v1_01_05        -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
castxml           v0_00_00_f20180122 -f Linux64bit+2.6-2.12                     -z /cvmfs/
ccache            v03.03.03       -f Linux64bit+2.6-2.12                     -z /cvmfs/
cetbuildtools     v7_03_01        -f NULL                                    -z /cvmfs/
cetlib            v3_03_00        -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
cetlib_except     v1_02_00        -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
cetpkgsupport     v1_14_01        -f NULL                                    -z /cvmfs/
cigetcert         v1_16_1         -f Linux64bit+2.6-2.12                     -z /cvmfs/
cigetcertlibs     v1_1            -f Linux64bit+2.6-2.12                     -z /cvmfs/
clhep             v2_3_4_6        -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
cmake             v3_10_1         -f Linux64bit+2.6-2.12                     -z /cvmfs/
condb             v2_0b           -f NULL                                    -z /cvmfs/
cpn               v1.7            -f NULL                                    -z /cvmfs/
cppunit           v1_13_2c        -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
cry               v1_7k           -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
cstxsd            v4_0_0h         -f Linux64bit+2.6-2.12  -q e15             -z /cvmfs/
cvn               v01.04          -f NULL                                    -z /cvmfs/
cvnprong          v01.00          -f NULL                                    -z /cvmfs/
cvnreg            v01.01          -f NULL                                    -z /cvmfs/
dk2nudata         v01_06_01b      -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
dk2nugenie        v01_06_01e      -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
eid               v01.00          -f NULL                                    -z /cvmfs/
FCHelperAna2017   v01.02          -f NULL                                    -z /cvmfs/
fftw              v3_3_6_pl2      -f Linux64bit+2.6-2.12  -q debug           -z /cvmfs/
fhiclcpp          v4_06_07        -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
fife_utils        v3_1_3          -f NULL                                    -z /cvmfs/
g4abla            v3_0            -f NULL                                    -z /cvmfs/
g4emlow           v6_50           -f NULL                                    -z /cvmfs/
g4neutron         v4_5            -f NULL                                    -z /cvmfs/
g4neutronxs       v1_4            -f NULL                                    -z /cvmfs/
g4nucleonxs       v1_1            -f NULL                                    -z /cvmfs/
g4nuclide         v2_1            -f NULL                                    -z /cvmfs/
g4photon          v4_3_2          -f NULL                                    -z /cvmfs/
g4pii             v1_3            -f NULL                                    -z /cvmfs/
g4radiative       v5_1_1          -f NULL                                    -z /cvmfs/
g4surface         v1_0            -f NULL                                    -z /cvmfs/
g4tendl           v1_3            -f NULL                                    -z /cvmfs/
gcc               v6_4_0          -f Linux64bit+2.6-2.12                     -z /cvmfs/
gdb               v8_0_1          -f Linux64bit+2.6-2.12                     -z /cvmfs/
geant4            v4_10_3_p01d    -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
genie_fluxopt     v17_03_14a      -f NULL                 -q nova            -z /cvmfs/
genie             v2_12_10b       -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
genie_phyopt      v2_12_10        -f NULL                 -q dkcharmtau      -z /cvmfs/
genie_xsec        v2_12_10        -f NULL                 -q DefaultPlusMECWithNC -z /cvmfs/
gflags            v2_2_1          -f Linux64bit+2.6-2.12  -q e15             -z /cvmfs/
gibuu_libs        v00.01          -f NULL                                    -z /cvmfs/
glog              v0_3_5          -f Linux64bit+2.6-2.12  -q e15             -z /cvmfs/
gsl               v2_4            -f Linux64bit+2.6-2.12  -q debug           -z /cvmfs/
hdf5              v1_10_1c        -f Linux64bit+2.6-2.12  -q e15             -z /cvmfs/
hep_concurrency   v1_00_01        -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
ifbeam            v2_2_3          -f Linux64bit+2.6-2.12  -q debug:e15:p2714b -z /cvmfs/
ifdh_art          v2_06_01        -f Linux64bit+2.6-2.12  -q debug:e15:s67   -z /cvmfs/
ifdhc_config      v2_3_3          -f NULL                                    -z /cvmfs/
ifdhc             v2_3_3          -f Linux64bit+2.6-2.12  -q debug:e15:p2714b -z /cvmfs/
jobsub_client     v1_2_6_2        -f NULL                                    -z /cvmfs/
kx509             v3_1_1          -f NULL                                    -z /cvmfs/
lapack            v3_7_1          -f Linux64bit+2.6-2.12  -q e15:prof        -z /cvmfs/
lemlittle         v01.03          -f NULL                                    -z /cvmfs/
leveldb           v1_20a          -f Linux64bit+2.6-2.12  -q e15             -z /cvmfs/
lhapdf            v5_9_1k         -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
library_shim      v03.03          -f NULL                                    -z /cvmfs/
libwda            v2_24_0         -f Linux64bit+2.6-2.12                     -z /cvmfs/
libxml2           v2_9_5          -f Linux64bit+2.6-2.12  -q debug           -z /cvmfs/
lid               v01.03          -f NULL                                    -z /cvmfs/
lmdb              v0_9_21         -f Linux64bit+2.6-2.12                     -z /cvmfs/
log4cpp           v1_1_3a         -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
messagefacility   v2_02_01        -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
monopoleid        v01.00          -f NULL                                    -z /cvmfs/
mysql_client      v5_5_58a        -f Linux64bit+2.6-2.12  -q e15             -z /cvmfs/
ncid              v01.03          -f NULL                                    -z /cvmfs/
novaproduction    v02.49          -f NULL                                    -z /cvmfs/
nucondb           v2_2_3          -f Linux64bit+2.6-2.12  -q debug:e15:p2714b -z /cvmfs/
nuecosrej         v01.01          -f NULL                                    -z /cvmfs/
nuededx           v01.01          -f NULL                                    -z /cvmfs/
nuone             v01.02          -f NULL                                    -z /cvmfs/
nusdata           v00.10          -f NULL                                    -z /cvmfs/
nusimdata         v1_13_00        -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
nutools           v2_22_01        -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
opencv            v3_3_0c         -f Linux64bit+2.6-2.12  -q e15             -z /cvmfs/
pdfsets           v5_9_1b         -f NULL                                    -z /cvmfs/
poms_client       v3_0_0          -f NULL                                    -z /cvmfs/
postgresql        v9_6_6a         -f Linux64bit+2.6-2.12  -q p2714b          -z /cvmfs/
ppfx              v02_03          -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
protobuf          v3_3_1a         -f Linux64bit+2.6-2.12  -q e15             -z /cvmfs/
psycopg2          v2_5_p2_7       -f Linux64bit+2.6                          -z /cvmfs/
pycurl            v7_16_4         -f Linux64bit+2.6-2.12                     -z /cvmfs/
pygccxml          v1_9_1          -f NULL                 -q p2714b          -z /cvmfs/
pythia            v6_4_28k        -f Linux64bit+2.6-2.12  -q debug:gcc640    -z /cvmfs/
python            v2_7_14b        -f Linux64bit+2.6-2.12                     -z /cvmfs/
python_request    v2_9_1          -f NULL                                    -z /cvmfs/
pyyaml            v3_12           -f Linux64bit+2.6-2.12                     -z /cvmfs/
qepid             v01.01          -f NULL                                    -z /cvmfs/
range             v3_0_3_0        -f NULL                                    -z /cvmfs/
remid             v01.03          -f NULL                                    -z /cvmfs/
root              v6_12_06a       -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
rvp               v01.00          -f NULL                                    -z /cvmfs/
sam_web_client    v2_0            -f NULL                                    -z /cvmfs/
setpath           v1_11           -f NULL                                    -z /cvmfs/
snappy            v1_1_7a         -f Linux64bit+2.6-2.12  -q e15             -z /cvmfs/
sqlite            v3_20_01_00     -f Linux64bit+2.6-2.12                     -z /cvmfs/
tbb               v2018_2a        -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
tensorflow        v1_3_0c         -f Linux64bit+2.6-2.12  -q debug:e15:p2714b -z /cvmfs/
TRACE             v3_13_05        -f Linux64bit+2.6-2.12                     -z /cvmfs/
ucana             v01.07          -f NULL                                    -z /cvmfs/
ups               v6_0_6          -f Linux64bit+2.6-2.12                     -z /cvmfs/
valgrind          v3_13_0         -f Linux64bit+2.6-2.12                     -z /cvmfs/
wsnumu            v01.00          -f NULL                                    -z /cvmfs/
xerces_c          v3_2_0a         -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
xgboost           v0.60           -f Linux64bit+2.6-2.12                     -z /cvmfs/
xrootd            v4_8_0b         -f Linux64bit+2.6-2.12  -q debug:e15       -z /cvmfs/
xsecccpi0inc      v01.01          -f NULL                                    -z /cvmfs/
xsecncpi0         v01.04          -f NULL                                    -z /cvmfs/

specifically we have messagefacility v2_02_01 and hep_concurrency v1_00_01 in there.

I'm just running nova -c eventdump.fcl <some art file>. I really suspect this happens on all messagefacility calls.

#3 Updated by Kyle Knoepfel over 2 years ago

  • Status changed from Feedback to Accepted

Thanks, Chris. Yes, I suspect you're correct. Based on the backtrace you provided, the code in question is calling the function __rdtscp, which is not supported on all CPU models. We will discuss a path forward at next week's SciSoft team meeting.

#4 Updated by Christopher Backhouse over 2 years ago


#5 Updated by Kyle Knoepfel over 2 years ago

  • Assignee set to Kyle Knoepfel
  • Estimated time set to 2.00 h

There is a straightforward fix to this issue. We will issue bug-fix releases, but the builds will not be available until the middle of the month.

#6 Updated by Kyle Knoepfel over 2 years ago

  • Category set to Infrastructure
  • Status changed from Accepted to Resolved
  • SSI Package art added

Implemented with commit hep_concurrency:0a04dd3e.

#7 Updated by Kyle Knoepfel over 2 years ago

  • % Done changed from 0 to 100

#8 Updated by Kyle Knoepfel over 2 years ago

  • Target version set to 2.11.03

#9 Updated by Kyle Knoepfel over 2 years ago

  • Status changed from Resolved to Closed

#10 Updated by Christopher Green about 2 years ago

  • Has duplicate Bug #20488: DUNE jobs on some off-site worker nodes are terminated with exit status 4 (SIGILL) added

Also available in: Atom PDF