Project

General

Profile

Bug #5436

NOvARawInputDriver/DAQ2RawDigit crashes when no triggers are present

Added by Dominick Rocco almost 6 years ago. Updated about 3 years ago.

Status:
Rejected
Priority:
High
Assignee:
-
Category:
-
Start date:
02/13/2014
Due date:
% Done:

0%

Estimated time:
Duration:

Description

Our daq2rawdigitjob crashes when we run over a file that has no triggers in it. In this case, jobs terminate with a "Bus error." It's not clear that this is a DAQ2RawDigit module, it seems to be more related to NOvARawInputDriver.

Files which trigger this error can be found using the following samweb query:

samweb -e nova list-files Online.TotalEvents = 0 AND data_tier="artdaq" AND Online.RunNumber \> 11497 AND Online.Detector="fardet" AND Online.SubRunEndTime = 0 AND Online.Stream = 0 AND  Online.Partition = 1

Those are ROOT files that were output even though the jobs crashed. For a list of raw files, try this guy:
samweb -e nova list-files Online.TotalEvents = 0 AND data_tier="raw" AND Online.RunNumber \> 11497 AND Online.Detector="fardet" AND Online.SubRunEndTime = 0 AND Online.Stream = 0 AND Online.Partition = 1

This can currently be reproduced in the development build. After sourcing development, you could try, for instance:
nova -c daq2rawdigitjob.fcl /pnfs/nova/rawdata/FarDet/000130/13078/fardet_r00013078_s48_t00.raw

Debugger output is shown below. Question: why does the trace show files that live in Chris Green's home directory?

0x00002aaaf3ce4068 in rawfileparser::RawFileParser::CheckFileType_mem (this=0x1d9b460)
    at /build/nova/novasoft/releases/development/RawFileParser/cxx/src/RawFileParser.cpp:268
268    /build/nova/novasoft/releases/development/RawFileParser/cxx/src/RawFileParser.cpp: No such file or directory.
(gdb) bt
#0  0x00002aaaf3ce4068 in rawfileparser::RawFileParser::CheckFileType_mem (this=0x1d9b460)
    at /build/nova/novasoft/releases/development/RawFileParser/cxx/src/RawFileParser.cpp:268
#1  0x00002aaaf3ce4012 in rawfileparser::RawFileParser::CheckFileType (this=0x1d9b460)
    at /build/nova/novasoft/releases/development/RawFileParser/cxx/src/RawFileParser.cpp:257
#2  0x00002aaaf3ce3ed9 in rawfileparser::RawFileParser::open (this=0x1d9b460, filename=0x1da2178 "/pnfs/nova/rawdata/FarDet/000130/13078/fardet_r00013078_s48_t00.raw")
    at /build/nova/novasoft/releases/development/RawFileParser/cxx/src/RawFileParser.cpp:208
#3  0x00002aaaf34677b1 in daq2raw::NOvARawInputDriver::readFile (this=0x1d9b440, name=..., fb=@0x7fffffff42a8: 0x0)
    at /build/nova/novasoft/releases/development/DAQ2RawDigit/NOvARawInputDriver.cxx:74
#4  0x00002aaaf31e0b96 in art::Source<daq2raw::NOvARawInputDriver>::readFile (this=0x1d9b3c0)
    at /build/nova/externals/art/v1_08_10/include/art/Framework/IO/Sources/Source.h:571
#5  0x00002aaab0c8840d in art::EventProcessor::readFile (this=0x1d55fd0)
    at /home/greenc/work/cet-is/test-products/art/v1_08_10/src/art/Framework/EventProcessor/EventProcessor.cc:516
#6  0x00002aaab0ccc08b in statemachine::HandleFiles::openFiles (this=0x1da4590)
    at /home/greenc/work/cet-is/test-products/art/v1_08_10/src/art/Framework/EventProcessor/EPStates.cc:121
#7  0x00002aaab0ccbb38 in statemachine::HandleFiles::HandleFiles (this=0x1da4590, ctx=...)
    at /home/greenc/work/cet-is/test-products/art/v1_08_10/src/art/Framework/EventProcessor/EPStates.cc:56
#8  0x00002aaab0cedc58 in boost::statechart::state<statemachine::HandleFiles, statemachine::Machine, statemachine::FirstFile, (boost::statechart::history_mode)0>::shallow_construct (pContext=@0x7fffffff4758: 0x1da4b00, outermostContextBase=...)
    at /home/greenc/work/cet-is/test-products/boost/v1_53_0/Linux64bit+2.6-2.5-e4-debug/include/boost/statechart/state.hpp:89
#9  0x00002aaab0ceda66 in boost::statechart::state<statemachine::HandleFiles, statemachine::Machine, statemachine::FirstFile, (boost::statechart::history_mode)0>::deep_construct (pContext=@0x7fffffff4758: 0x1da4b00, outermostContextBase=...)
    at /home/greenc/work/cet-is/test-products/boost/v1_53_0/Linux64bit+2.6-2.5-e4-debug/include/boost/statechart/state.hpp:79
#10 0x00002aaab0ced7c4 in boost::statechart::detail::inner_constructor<boost::mpl::l_item<mpl_::long_<1l>, statemachine::HandleFiles, boost::mpl::l_end>, boost::statechart::state_machine<statemachine::Machine, statemachine::Starting, std::allocator<void>, boost::statechart::null_exception_translator> >::construct (
    pContext=@0x7fffffff4758: 0x1da4b00, outermostContextBase=...)
    at /home/greenc/work/cet-is/test-products/boost/v1_53_0/Linux64bit+2.6-2.5-e4-debug/include/boost/statechart/detail/constructor.hpp:93
#11 0x00002aaab0ced430 in boost::statechart::simple_state<statemachine::Starting, statemachine::Machine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::transit_impl<statemachine::HandleFiles, statemachine::Machine, boost::statechart::detail::transition_function<statemachine::Machine, statemachine::File> > (
    this=0x1da27a0, transitionAction=...) at /home/greenc/work/cet-is/test-products/boost/v1_53_0/Linux64bit+2.6-2.5-e4-debug/include/boost/statechart/simple_state.hpp:798

History

#1 Updated by Christopher Backhouse almost 6 years ago

Question: why does the trace show files that live in Chris Green's home directory?

Because that's where he built the version of art that's in our externals, and the debugging info gets baked in at build time.

#2 Updated by Dominick Rocco almost 6 years ago

I've found another peculiar feature regarding this issue. If I try to run over the file located in dCache, the job terminates with a bus error. However, if I do ifdh_fetch on the file to move it elsewhere and run over that file, the job succeeds and ART exits with status 0.

ifdh_fetch seems to not like something about the dCache location, it then finds the file on bluearc. Here is the output:

found file on enstore, using dcache srm
doing: ifdh cp "/pnfs/nova/rawdata/FarDet/000130/13078/fardet_r00013078_s48_t00.raw" "/tmp/$f" ; cp /tmp/$f ./$f; rm /tmp/$f
Exception:http://samweb.fnal.gov:8480/sam/nova/api/files/name/%2e%2f/locations
HTTP-Status: 404
Error text is:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL /sam/nova/api/files/name/.//locations was not found on this server.</p>
<hr>
<address>Apache/2.2.3 (Scientific Linux) Server at samweb.fnal.gov Port 8480</address>
</body></html>

found file on bluearc
doing: ifdh cp "/./" "/tmp/$f" ; cp /tmp/$f ./$f; rm /tmp/$f

#3 Updated by Denis Perevalov almost 6 years ago

Appears to be crashing in RawFileParser::CheckFileParser_mem

mmfile_start32 appears to have a bad memory address.

I attached an image from TotalView. It shows a line where it crashes.

This is something new. Could it be the issue related to /pnfs mounts? I can't seem to run DAQ2RawDigit on any file from there. Also, I can't copy files from there.

#4 Updated by Denis Perevalov almost 6 years ago

Doesn't appear to be related to the number of events

For instance in the following run, there certainly are events, but this job crashes

nova -c daq2rawdigitjob.fcl /pnfs/nova/rawdata/FarDet/000130/13078/fardet_r00013078_s55_t02.raw

So, I'm pretty confident it's related to /pnfs mount. Is it tapes? Is there a special way we need to use to read these files?

#5 Updated by Andrew Norman almost 6 years ago

Denis's test won't work. You must copy the file out of dCache to read it. You can't read directly from the /pnfs tree.

#6 Updated by Denis Perevalov almost 6 years ago

I see. Thanks Andrew, that's what I thought.

Well, I was just following the described instructions for reproducing the problem.

So, I don't know at this point if there is even an issue or not.

If you believe there is still something to investigate, I need instructions for how to copy file. ifdh_fetch help function is not very useful. We don't have anything on our Wiki as well.

#7 Updated by Alexander Himmel about 3 years ago

  • Status changed from New to Rejected


Also available in: Atom PDF