Project

General

Profile

Bug #2138

Problems running dogwood5 on R2.6 through R2.8 (+development)

Added by Robert Hatcher almost 8 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Start date:
11/10/2011
Due date:
% Done:

0%

Estimated time:
Duration:

Description

loon -b -q ../asciidb/set_tsql_override.C reco_near_spill_data_base_dogwood5.C /path/to/N00007853_0001.mdaq.root

Fails in various ways for various builds:

Release ROOT gcc built on
SLF
run on
SLF
failure mode
R2.5 v5-26-00d 3.4.3 4(?) 4 (complete success)
5 x
4.5.1 4(?) 4 (complete success)
5 (complete success)
R2.6 v5-28-00b 4.5.1 5 5 Error: abstract class object 'TGeant3TGeo' is created
S11-04-29-R2-06 v5-29-02 4.5.1 4 4 loon: error while loading shared libraries: libpcre.so.0
ldd loon: also missing libssl.so.6 and libcrypto.so.6
5 Segmentation fault
R2.7 v5-30-00 4.5.1 5 5 Segmentation fault
R2.8 v5-28-00b 4.5.1 5 5 glibc detected loon: free(): invalid next size (normal): 0x0ed64420
development nightly 3.4.3 4 4 glibc detected corrupted double-linked list: 0x16b93128 ( earlier snarl )
5 glibc detected loon: free(): invalid next size (normal): 0x0f0072a8
4.5.1 4 4 glibc detected free(): invalid next size (normal): 0x16acee68
5 glibc detected loon: free(): invalid next size (normal): 0x0fbebea0
5 5 test1 Segmentation fault ( earlier snarl )
test2 glibc detected loon: free(): invalid next size (normal): 0x10907180
Note: temporary test build of ROOT+MINOS on minos50, stomping on normal build
4.2.1 OSX 10.6.8 other issues w/ current ROOT build

http://www-numi.fnal.gov/offline_software/srt_public_context/WebDocs/FrozenRel.html

Remember MessageService warnings/errors report [run|snarl] but only for problematic ones. At best these can be used to bracket a range of snarls where the problem lies. Catastrophic failures don't have to correspond to the last reported snarl.

For the development SLF4-build SLF5-run (and most others) failure, the MINOS output ends with:

=E= AlgFitTr 2011/11/10 11:48:16 [7853|51789] AlgFitTrackCam.cxx,v1.78:3411> SpectrometerSwim - unexpectedly large u or v (u=-2.36607e+09 v=-573.878) bailing out.
=E= VertexFi 2011/11/10 11:50:49 [7853|52323] VertexFinder.cxx,v1.13:119> Zero Energy event, I declare that bad
=E= VertexFi 2011/11/10 11:54:00 [7853|53001] VertexFinder.cxx,v1.13:119> Zero Energy event, I declare that bad
=E= AlgFitTr 2011/11/10 11:55:37 [7853|53313] AlgFitTrackCam.cxx,v1.78:3411> SpectrometerSwim - unexpectedly large u or v (u=9050.62 v=1.1526) bailing out.
[...error occurs here...]

On R2.5 GCC_3_4 running on SLF4 in the area of the above problems:

=E= VertexFi 2011/11/10 11:24:46 [7853|53001] VertexFinder.cxx,v1.13:119> Zero Energy event, I declare that bad
=E= AlgFitTr 2011/11/10 11:26:27 [7853|53313] AlgFitTrackCam.cxx,v1.76:3410> SpectrometerSwim - unexpectedly large u or v (u=2111.92 v=-2.481e+11) bailing out.
=E= AlgFitTr 2011/11/10 11:33:10 [7853|54566] AlgFitTrackCam.cxx,v1.76:3410> SpectrometerSwim - unexpectedly large u or v (u=1.65035e+08 v=-46.5585) bailing out.
=E= VertexFi 2011/11/10 11:37:35 [7853|55302] VertexFinder.cxx,v1.13:119> Zero Energy event, I declare that bad
=E= AlgFitTr 2011/11/10 11:40:10 [7853|55851] AlgFitTrackCam.cxx,v1.76:3410> SpectrometerSwim - unexpectedly large u or v (u=-814.317 v=1.66687e+09) bailing out.
=E= VertexFi 2011/11/10 11:46:48 [7853|57104] VertexFinder.cxx,v1.13:119> Zero Energy event, I declare that bad

R2.5 GCC_3_4 on SLF4 completed with:
=E= AlgFitTr 2011/11/10 14:29:36 [7853|94851] AlgFitTrackCam.cxx,v1.76:3410> SpectrometerSwim - unexpectedly large u or v (u=800.171 v=-2.82894e+07) bailing out.

Spill(48681 in 1565 out 47116 filt.)
  1) +DataQualityReader::Reco   n=48681 ( 48197/   484) t=(  203.85/    0.21)
  2)  RecordSetupModule::Get    n=48197 ( 48197/     0) t=(    0.66/    0.00)
  3) +NeardetBeamSelect::Ana    n=48197 (  1565/ 46632) t=(    0.08/    0.01)
  4)  DigitListModule::Get      n=1565  (  1565/     0) t=(    0.00/    0.00)
[...]
 17)  NtpMRModule::Reco         n=1565  (  1565/     0) t=(  289.14/    0.00)
 18)  Output::Put               n=1565  (  1565/     0) t=(   12.11/    0.12)

For the development SLF4-build SLF5-run failure, the ROOT induced backtrace looks like:

/lib/libc.so.6[0x7666c5]
/lib/libc.so.6(cfree+0x59)[0x766b09]
/grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libCore.so(_ZN8TStorage11ReAllocCharEPcjj+0x126)[0xf50c257c]
/grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libCore.so(_ZN7TBuffer6ExpandEib+0x6f)[0xf507d18d]
/grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libCore.so(_ZN7TBuffer10AutoExpandEi+0x65)[0xf507cfdb]
/grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libRIO.so(_ZN11TBufferFile9WriteCharEc+0x45)[0xf460aeeb]
/grid/fermiapp/minos/minossoft/releases/development/lib/Linux2.6-GCC_4_5/libRegistry.so(_ZlsR7TBufferc+0x28)[0xf5e4e2d0]
/grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libRIO.so(_ZN13TStreamerInfo14WriteBufferAuxIPPcEEiR7TBufferRKT_iiii+0x34c2)[0xf47300a4]
/grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libRIO.so(_ZN20TStreamerInfoActions27GenericVectorPtrWriteActionER7TBufferPvPKvPKNS_14TConfigurationE+0x6b)[0xf466aa94]
/grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libRIO.so(_ZNK20TStreamerInfoActions17TConfiguredActionclER7TBufferPvPKv+0x2c)[0xf460bd62]
/grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libRIO.so(_ZN11TBufferFile19ApplySequenceVecPtrERKN20TStreamerInfoActions15TActionSequenceEPvS4_+0x11e)[0xf460a0d8]
/grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libTree.so(_ZN14TBranchElement22FillLeavesClonesMemberER7TBuffer+0xf9)[0xf3a02929]
/grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libTree.so(_ZN7TBranch4FillEv+0x1f9)[0xf39ef8c9]
/grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libTree.so(_ZN14TBranchElement4FillEv+0x156)[0xf3a0199c]
/grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libTree.so(_ZN14TBranchElement4FillEv+0x2c9)[0xf3a01b0f]
/grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libTree.so(_ZN14TBranchElement4FillEv+0x2c9)[0xf3a01b0f]
/grid/fermiapp/minos/products/prd/MINOS_ROOT/Linux2.6-GCC_4_5/trunk/lib/libTree.so(_ZN5TTree4FillEv+0xf8)[0xf3a4cab8]
/grid/fermiapp/minos/minossoft/releases/development/lib/Linux2.6-GCC_4_5/libPersistency.so(_ZN15PerOutputStream5StoreEv+0x60d)[0xf5f78687]
/grid/fermiapp/minos/minossoft/releases/development/lib/Linux2.6-GCC_4_5/libPersistency.so(_ZN22PerOutputStreamManager3PutEPK12MomNavigator+0x103)[0xf5f79f47]
/grid/fermiapp/minos/minossoft/releases/development/lib/Linux2.6-GCC_4_5/libIoModules.so(_ZN14IoOutputModule3PutEPK12MomNavigator+0x47)[0xf62f72bd]
/grid/fermiapp/minos/minossoft/releases/development/lib/Linux2.6-GCC_4_5/libJobControl.so(_ZNK10JobCMethod7ExecuteEP10JobCModuleP12MomNavigator+0x80)[0xf5d34860

Just because the backtrace points at Persistency doesn't mean the problem lies there. It could mean something left a dangling pointer for it to trip over, or that something stomped on it's memory, etc. Or it could mean the problem is in Persistency.

About this file:

mysql> select FIRST_SNARL_NUM,FIRST_SNARL_TIME,LAST_SNARL_NUM,LAST_SNARL_TIME,SNARL_RECS,REC_SETS from offline_dev.DBUDAQFILESUMMARY where DETECTOR='Near' and RUN=7853 and SUBRUN=1;
+-----------------+---------------------+----------------+---------------------+------------+----------+
| FIRST_SNARL_NUM | FIRST_SNARL_TIME    | LAST_SNARL_NUM | LAST_SNARL_TIME     | SNARL_RECS | REC_SETS |
+-----------------+---------------------+----------------+---------------------+------------+----------+
|           46717 | 2005-05-27 21:23:01 |          94913 | 2005-05-27 22:23:01 |      48197 |    51798 |
+-----------------+---------------------+----------------+---------------------+------------+----------+



Also available in: Atom PDF