NOvARawInputDriver/DAQ2RawDigit crashes when no triggers are present
Our daq2rawdigitjob crashes when we run over a file that has no triggers in it. In this case, jobs terminate with a "Bus error." It's not clear that this is a DAQ2RawDigit module, it seems to be more related to NOvARawInputDriver.
Files which trigger this error can be found using the following samweb query:
samweb -e nova list-files Online.TotalEvents = 0 AND data_tier="artdaq" AND Online.RunNumber \> 11497 AND Online.Detector="fardet" AND Online.SubRunEndTime = 0 AND Online.Stream = 0 AND Online.Partition = 1
Those are ROOT files that were output even though the jobs crashed. For a list of raw files, try this guy:
samweb -e nova list-files Online.TotalEvents = 0 AND data_tier="raw" AND Online.RunNumber \> 11497 AND Online.Detector="fardet" AND Online.SubRunEndTime = 0 AND Online.Stream = 0 AND Online.Partition = 1
This can currently be reproduced in the development build. After sourcing development, you could try, for instance:
nova -c daq2rawdigitjob.fcl /pnfs/nova/rawdata/FarDet/000130/13078/fardet_r00013078_s48_t00.raw
Debugger output is shown below. Question: why does the trace show files that live in Chris Green's home directory?
0x00002aaaf3ce4068 in rawfileparser::RawFileParser::CheckFileType_mem (this=0x1d9b460) at /build/nova/novasoft/releases/development/RawFileParser/cxx/src/RawFileParser.cpp:268 268 /build/nova/novasoft/releases/development/RawFileParser/cxx/src/RawFileParser.cpp: No such file or directory. (gdb) bt #0 0x00002aaaf3ce4068 in rawfileparser::RawFileParser::CheckFileType_mem (this=0x1d9b460) at /build/nova/novasoft/releases/development/RawFileParser/cxx/src/RawFileParser.cpp:268 #1 0x00002aaaf3ce4012 in rawfileparser::RawFileParser::CheckFileType (this=0x1d9b460) at /build/nova/novasoft/releases/development/RawFileParser/cxx/src/RawFileParser.cpp:257 #2 0x00002aaaf3ce3ed9 in rawfileparser::RawFileParser::open (this=0x1d9b460, filename=0x1da2178 "/pnfs/nova/rawdata/FarDet/000130/13078/fardet_r00013078_s48_t00.raw") at /build/nova/novasoft/releases/development/RawFileParser/cxx/src/RawFileParser.cpp:208 #3 0x00002aaaf34677b1 in daq2raw::NOvARawInputDriver::readFile (this=0x1d9b440, name=..., fb=@0x7fffffff42a8: 0x0) at /build/nova/novasoft/releases/development/DAQ2RawDigit/NOvARawInputDriver.cxx:74 #4 0x00002aaaf31e0b96 in art::Source<daq2raw::NOvARawInputDriver>::readFile (this=0x1d9b3c0) at /build/nova/externals/art/v1_08_10/include/art/Framework/IO/Sources/Source.h:571 #5 0x00002aaab0c8840d in art::EventProcessor::readFile (this=0x1d55fd0) at /home/greenc/work/cet-is/test-products/art/v1_08_10/src/art/Framework/EventProcessor/EventProcessor.cc:516 #6 0x00002aaab0ccc08b in statemachine::HandleFiles::openFiles (this=0x1da4590) at /home/greenc/work/cet-is/test-products/art/v1_08_10/src/art/Framework/EventProcessor/EPStates.cc:121 #7 0x00002aaab0ccbb38 in statemachine::HandleFiles::HandleFiles (this=0x1da4590, ctx=...) at /home/greenc/work/cet-is/test-products/art/v1_08_10/src/art/Framework/EventProcessor/EPStates.cc:56 #8 0x00002aaab0cedc58 in boost::statechart::state<statemachine::HandleFiles, statemachine::Machine, statemachine::FirstFile, (boost::statechart::history_mode)0>::shallow_construct (pContext=@0x7fffffff4758: 0x1da4b00, outermostContextBase=...) at /home/greenc/work/cet-is/test-products/boost/v1_53_0/Linux64bit+2.6-2.5-e4-debug/include/boost/statechart/state.hpp:89 #9 0x00002aaab0ceda66 in boost::statechart::state<statemachine::HandleFiles, statemachine::Machine, statemachine::FirstFile, (boost::statechart::history_mode)0>::deep_construct (pContext=@0x7fffffff4758: 0x1da4b00, outermostContextBase=...) at /home/greenc/work/cet-is/test-products/boost/v1_53_0/Linux64bit+2.6-2.5-e4-debug/include/boost/statechart/state.hpp:79 #10 0x00002aaab0ced7c4 in boost::statechart::detail::inner_constructor<boost::mpl::l_item<mpl_::long_<1l>, statemachine::HandleFiles, boost::mpl::l_end>, boost::statechart::state_machine<statemachine::Machine, statemachine::Starting, std::allocator<void>, boost::statechart::null_exception_translator> >::construct ( pContext=@0x7fffffff4758: 0x1da4b00, outermostContextBase=...) at /home/greenc/work/cet-is/test-products/boost/v1_53_0/Linux64bit+2.6-2.5-e4-debug/include/boost/statechart/detail/constructor.hpp:93 #11 0x00002aaab0ced430 in boost::statechart::simple_state<statemachine::Starting, statemachine::Machine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::transit_impl<statemachine::HandleFiles, statemachine::Machine, boost::statechart::detail::transition_function<statemachine::Machine, statemachine::File> > ( this=0x1da27a0, transitionAction=...) at /home/greenc/work/cet-is/test-products/boost/v1_53_0/Linux64bit+2.6-2.5-e4-debug/include/boost/statechart/simple_state.hpp:798
#2 Updated by Dominick Rocco over 5 years ago
I've found another peculiar feature regarding this issue. If I try to run over the file located in dCache, the job terminates with a bus error. However, if I do ifdh_fetch on the file to move it elsewhere and run over that file, the job succeeds and ART exits with status 0.
ifdh_fetch seems to not like something about the dCache location, it then finds the file on bluearc. Here is the output:
found file on enstore, using dcache srm doing: ifdh cp "/pnfs/nova/rawdata/FarDet/000130/13078/fardet_r00013078_s48_t00.raw" "/tmp/$f" ; cp /tmp/$f ./$f; rm /tmp/$f Exception:http://samweb.fnal.gov:8480/sam/nova/api/files/name/%2e%2f/locations HTTP-Status: 404 Error text is: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>404 Not Found</title> </head><body> <h1>Not Found</h1> <p>The requested URL /sam/nova/api/files/name/.//locations was not found on this server.</p> <hr> <address>Apache/2.2.3 (Scientific Linux) Server at samweb.fnal.gov Port 8480</address> </body></html> found file on bluearc doing: ifdh cp "/./" "/tmp/$f" ; cp /tmp/$f ./$f; rm /tmp/$f
#3 Updated by Denis Perevalov over 5 years ago
Appears to be crashing in RawFileParser::CheckFileParser_mem
mmfile_start32 appears to have a bad memory address.
I attached an image from TotalView. It shows a line where it crashes.
This is something new. Could it be the issue related to /pnfs mounts? I can't seem to run DAQ2RawDigit on any file from there. Also, I can't copy files from there.
#4 Updated by Denis Perevalov over 5 years ago
Doesn't appear to be related to the number of events
For instance in the following run, there certainly are events, but this job crashes
nova -c daq2rawdigitjob.fcl /pnfs/nova/rawdata/FarDet/000130/13078/fardet_r00013078_s55_t02.raw
So, I'm pretty confident it's related to /pnfs mount. Is it tapes? Is there a special way we need to use to read these files?
#6 Updated by Denis Perevalov over 5 years ago
I see. Thanks Andrew, that's what I thought.
Well, I was just following the described instructions for reproducing the problem.
So, I don't know at this point if there is even an issue or not.
If you believe there is still something to investigate, I need instructions for how to copy file. ifdh_fetch help function is not very useful. We don't have anything on our Wiki as well.