Bug #2138
Problems running dogwood5 on R2.6 through R2.8 (+development)
0%
Description
loon -b -q ../asciidb/set_tsql_override.C reco_near_spill_data_base_dogwood5.C /path/to/N00007853_0001.mdaq.root
Fails in various ways for various builds:
Release | ROOT | gcc | built on SLF |
run on SLF |
failure mode |
---|---|---|---|---|---|
R2.5 | v5-26-00d | 3.4.3 | 4(?) | 4 | (complete success) |
5 | x | ||||
4.5.1 | 4(?) | 4 | (complete success) | ||
5 | (complete success) | ||||
R2.6 | v5-28-00b | 4.5.1 | 5 | 5 | Error: abstract class object 'TGeant3TGeo' is created |
S11-04-29-R2-06 | v5-29-02 | 4.5.1 | 4 | 4 | loon: error while loading shared libraries: libpcre.so.0 ldd loon: also missing libssl.so.6 and libcrypto.so.6 |
5 | Segmentation fault | ||||
R2.7 | v5-30-00 | 4.5.1 | 5 | 5 | Segmentation fault |
R2.8 | v5-28-00b | 4.5.1 | 5 | 5 | glibc detected loon: free(): invalid next size (normal): 0x0ed64420 |
development | nightly | 3.4.3 | 4 | 4 | glibc detected corrupted double-linked list: 0x16b93128 ( earlier snarl ) |
5 | glibc detected loon: free(): invalid next size (normal): 0x0f0072a8 | ||||
4.5.1 | 4 | 4 | glibc detected free(): invalid next size (normal): 0x16acee68 | ||
5 | glibc detected loon: free(): invalid next size (normal): 0x0fbebea0 | ||||
5 | 5 | test1 Segmentation fault ( earlier snarl ) test2 glibc detected loon: free(): invalid next size (normal): 0x10907180 Note: temporary test build of ROOT+MINOS on minos50, stomping on normal build |
|||
4.2.1 | OSX 10.6.8 | other issues w/ current ROOT build |
http://www-numi.fnal.gov/offline_software/srt_public_context/WebDocs/FrozenRel.html
Remember MessageService
warnings/errors report [run|snarl]
but only for problematic ones. At best these can be used to bracket a range of snarls where the problem lies. Catastrophic failures don't have to correspond to the last reported snarl.
For the development SLF4-build SLF5-run (and most others) failure, the MINOS output ends with:
=E= AlgFitTr 2011/11/10 11:48:16 [7853|51789] AlgFitTrackCam.cxx,v1.78:3411> SpectrometerSwim - unexpectedly large u or v (u=-2.36607e+09 v=-573.878) bailing out. =E= VertexFi 2011/11/10 11:50:49 [7853|52323] VertexFinder.cxx,v1.13:119> Zero Energy event, I declare that bad =E= VertexFi 2011/11/10 11:54:00 [7853|53001] VertexFinder.cxx,v1.13:119> Zero Energy event, I declare that bad =E= AlgFitTr 2011/11/10 11:55:37 [7853|53313] AlgFitTrackCam.cxx,v1.78:3411> SpectrometerSwim - unexpectedly large u or v (u=9050.62 v=1.1526) bailing out. [...error occurs here...]
On R2.5 GCC_3_4 running on SLF4 in the area of the above problems:
=E= VertexFi 2011/11/10 11:24:46 [7853|53001] VertexFinder.cxx,v1.13:119> Zero Energy event, I declare that bad =E= AlgFitTr 2011/11/10 11:26:27 [7853|53313] AlgFitTrackCam.cxx,v1.76:3410> SpectrometerSwim - unexpectedly large u or v (u=2111.92 v=-2.481e+11) bailing out. =E= AlgFitTr 2011/11/10 11:33:10 [7853|54566] AlgFitTrackCam.cxx,v1.76:3410> SpectrometerSwim - unexpectedly large u or v (u=1.65035e+08 v=-46.5585) bailing out. =E= VertexFi 2011/11/10 11:37:35 [7853|55302] VertexFinder.cxx,v1.13:119> Zero Energy event, I declare that bad =E= AlgFitTr 2011/11/10 11:40:10 [7853|55851] AlgFitTrackCam.cxx,v1.76:3410> SpectrometerSwim - unexpectedly large u or v (u=-814.317 v=1.66687e+09) bailing out. =E= VertexFi 2011/11/10 11:46:48 [7853|57104] VertexFinder.cxx,v1.13:119> Zero Energy event, I declare that bad
R2.5 GCC_3_4 on SLF4 completed with:
=E= AlgFitTr 2011/11/10 14:29:36 [7853|94851] AlgFitTrackCam.cxx,v1.76:3410> SpectrometerSwim - unexpectedly large u or v (u=800.171 v=-2.82894e+07) bailing out. Spill(48681 in 1565 out 47116 filt.) 1) +DataQualityReader::Reco n=48681 ( 48197/ 484) t=( 203.85/ 0.21) 2) RecordSetupModule::Get n=48197 ( 48197/ 0) t=( 0.66/ 0.00) 3) +NeardetBeamSelect::Ana n=48197 ( 1565/ 46632) t=( 0.08/ 0.01) 4) DigitListModule::Get n=1565 ( 1565/ 0) t=( 0.00/ 0.00) [...] 17) NtpMRModule::Reco n=1565 ( 1565/ 0) t=( 289.14/ 0.00) 18) Output::Put n=1565 ( 1565/ 0) t=( 12.11/ 0.12)
For the development SLF4-build SLF5-run failure, the ROOT induced backtrace looks like:
/lib/libc.so.6[0x7666c5] /lib/libc.so.6(cfree+0x59)[0x766b09] /grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libCore.so(_ZN8TStorage11ReAllocCharEPcjj+0x126)[0xf50c257c] /grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libCore.so(_ZN7TBuffer6ExpandEib+0x6f)[0xf507d18d] /grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libCore.so(_ZN7TBuffer10AutoExpandEi+0x65)[0xf507cfdb] /grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libRIO.so(_ZN11TBufferFile9WriteCharEc+0x45)[0xf460aeeb] /grid/fermiapp/minos/minossoft/releases/development/lib/Linux2.6-GCC_4_5/libRegistry.so(_ZlsR7TBufferc+0x28)[0xf5e4e2d0] /grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libRIO.so(_ZN13TStreamerInfo14WriteBufferAuxIPPcEEiR7TBufferRKT_iiii+0x34c2)[0xf47300a4] /grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libRIO.so(_ZN20TStreamerInfoActions27GenericVectorPtrWriteActionER7TBufferPvPKvPKNS_14TConfigurationE+0x6b)[0xf466aa94] /grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libRIO.so(_ZNK20TStreamerInfoActions17TConfiguredActionclER7TBufferPvPKv+0x2c)[0xf460bd62] /grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libRIO.so(_ZN11TBufferFile19ApplySequenceVecPtrERKN20TStreamerInfoActions15TActionSequenceEPvS4_+0x11e)[0xf460a0d8] /grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libTree.so(_ZN14TBranchElement22FillLeavesClonesMemberER7TBuffer+0xf9)[0xf3a02929] /grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libTree.so(_ZN7TBranch4FillEv+0x1f9)[0xf39ef8c9] /grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libTree.so(_ZN14TBranchElement4FillEv+0x156)[0xf3a0199c] /grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libTree.so(_ZN14TBranchElement4FillEv+0x2c9)[0xf3a01b0f] /grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libTree.so(_ZN14TBranchElement4FillEv+0x2c9)[0xf3a01b0f] /grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libTree.so(_ZN5TTree4FillEv+0xf8)[0xf3a4cab8] /grid/fermiapp/minos/minossoft/releases/development/lib/Linux2.6-GCC_4_5/libPersistency.so(_ZN15PerOutputStream5StoreEv+0x60d)[0xf5f78687] /grid/fermiapp/minos/minossoft/releases/development/lib/Linux2.6-GCC_4_5/libPersistency.so(_ZN22PerOutputStreamManager3PutEPK12MomNavigator+0x103)[0xf5f79f47] /grid/fermiapp/minos/minossoft/releases/development/lib/Linux2.6-GCC_4_5/libIoModules.so(_ZN14IoOutputModule3PutEPK12MomNavigator+0x47)[0xf62f72bd] /grid/fermiapp/minos/minossoft/releases/development/lib/Linux2.6-GCC_4_5/libJobControl.so(_ZNK10JobCMethod7ExecuteEP10JobCModuleP12MomNavigator+0x80)[0xf5d34860
Just because the backtrace points at Persistency
doesn't mean the problem lies there. It could mean something left a dangling pointer for it to trip over, or that something stomped on it's memory, etc. Or it could mean the problem is in Persistency
.
About this file:
mysql> select FIRST_SNARL_NUM,FIRST_SNARL_TIME,LAST_SNARL_NUM,LAST_SNARL_TIME,SNARL_RECS,REC_SETS from offline_dev.DBUDAQFILESUMMARY where DETECTOR='Near' and RUN=7853 and SUBRUN=1; +-----------------+---------------------+----------------+---------------------+------------+----------+ | FIRST_SNARL_NUM | FIRST_SNARL_TIME | LAST_SNARL_NUM | LAST_SNARL_TIME | SNARL_RECS | REC_SETS | +-----------------+---------------------+----------------+---------------------+------------+----------+ | 46717 | 2005-05-27 21:23:01 | 94913 | 2005-05-27 22:23:01 | 48197 | 51798 | +-----------------+---------------------+----------------+---------------------+------------+----------+