Project

General

Profile

Task #23305

Build dunetpc v08_31_00 --> v08_31_01

Added by Tingjun Yang 28 days ago. Updated 14 days ago.

Status:
Closed
Priority:
Normal
Assignee:
Start date:
09/21/2019
Due date:
% Done:

0%

Estimated time:
Duration:

Description

We are preparing to build dunetpc v08_31_00.

Tom, we need new version of lbne_raw_data and dune_raw_data first. Thanks.

History

#1 Updated by David Adams 26 days ago

Tom, are you working on the products needed here? Thanks. --da

#2 Updated by Thomas Junk 26 days ago

lbne_raw_data v1_04_37 and dune_raw_data v1_17_36 are now available in cvmfs and /grid/fermiapp with dependencies consistent with larsoft v08_31_00. I ran into a permissions problem uploading to SciSoft. I have submitted INC000001087498 regarding the SciSoft upload issue.

#3 Updated by Thomas Junk 26 days ago

  • Assignee changed from Thomas Junk to David Adams

#4 Updated by Thomas Junk 25 days ago

tarballs uploaded to SciSoft. Turns out the /etc/ssh/ssh_config files on the SLF7 dunegpvm's don't specify forwarding tickets like they do on the SLF6 dunegpvm's.

#5 Updated by Tingjun Yang 25 days ago

  • Assignee changed from David Adams to Tingjun Yang

I am preparing the release.

#6 Updated by Tingjun Yang 25 days ago

  • Assignee changed from Tingjun Yang to David Adams

David, please tag dunetpc v08_31_00.

#7 Updated by David Adams 25 days ago

Test build is underway...

#8 Updated by David Adams 25 days ago

I just restarted test build with fix for configuration of keep-all signal finder.

#9 Updated by David Adams 25 days ago

The tag is made and builds started.

But I have trouble running test jobs with the code. There are many messages "Upgrading RawFragmentHeaderVI" and I get a crash from the TPC decoder on the second event processed.

#10 Updated by David Adams 25 days ago

After consulting with Tom, my conclusion is:

We don't have a working version of artdaq_core that is compatible with the version of Root used in larsoft v08_31_00. Until we have that, dunetpc must remain at v08_30_02.

I propose we move dunetpc back to v08_30_02 until we have demonstrated a working version of artdaq_core in a later release. Tingjun, do you want to do this? Or should I just change the larsoft version number?

#11 Updated by Lynn Garren 25 days ago

You should be using artdaq_core v3_05_02 with the s91 qualifier

#12 Updated by David Adams 25 days ago

Lynn:

What needs to be changed to do as you suggest?

#13 Updated by Thomas Junk 25 days ago

Hi Lynn, David,

It is artdaq_core v3_05_02 which throws an exception about missing fragment metadata when reading in ProtoDUNE-SP data. I am attempting to reproduce this myself. Indeed, no other version of artdaq_core can be set up with the new larsoft due to the root and other dependencies.

#14 Updated by David Adams 25 days ago

I see there are a lot of changes in dunetpc product_deps for v08_31_00. I copied a version I have from Sep 20 for v08_31_02 to see if I can build with that.

#15 Updated by Lynn Garren 25 days ago

I presume you found the feature branch we provided and/or added the missing artdaq_core library in the appropriate link lists. If there are further problems, you need to talk to the artdaq_core support people directly. We were advised that it was appropriate to tag 3.05, so I am also interested in the resolution to this problem. A short term solution is possible if it turns out to be complex, but I'd like to know more first.

David, there are many necessary updates in product_deps for larsoft v08_31_00.

#16 Updated by Tingjun Yang 25 days ago

David, it may not be easy to move back to v08_30_02 since I merged Lynn's feature branch that was needed for using the new root version.

#17 Updated by Lynn Garren 25 days ago

Give us a bit. The artdaq team may have a resolution already in place.

#18 Updated by David Adams 25 days ago

  • Assignee changed from David Adams to Thomas Junk

The build with the old version was fine and unit tests all passed. But, in light of the above and if we don't get any complaints from developers about the current head, I will not commit the change before tomorrow morning.

#19 Updated by Thomas Junk 25 days ago

Sorry I meant to post this earlier:

One would have to build dunetpc from the development head (we were trying to build a release and held back
due to this), and then run

lar -c RunRawDecoder.fcl root://fndca1.fnal.gov:1094/pnfs/fnal.gov/usr/dune/tape_backed/dunepro/protodune-sp/raw/2019/detector/test/None/00/00/97/01/np04_raw_run009701_0004_dl1.root

David may be unwinding product_deps so that the head of develop isn't broken. I have a build on my desktop
that exhibits the problem.

#20 Updated by David Adams 24 days ago

  • Assignee changed from Thomas Junk to David Adams

If there are no objections, I am going to put dunetpc back to v08_30_02. I will keep a copy of the updated configuration so one only has to do

cp dunetpc/ups/product_deps.v08_31_00 dunetpc/ups/product_deps

to switch back to v08_31_00.

#21 Updated by Tingjun Yang 24 days ago

If you do switch back to v08_30_02, please also change the duneutil dependence to v08_29_00. Otherwise there will be a version conflict.

#22 Updated by Thomas Junk 24 days ago

Yes, we are chasing a bug introduced in the new version of artdaq_core:

https://cdcvs.fnal.gov/redmine/issues/23319

It is not ready for use and so we cannot move to v08_31_00. People developing but not decoding raw data may be able to survive with the develop branch if we don't roll back, but I wouldn't tag this until the problem is solved.

#23 Updated by Thomas Junk 24 days ago

Just as I was typing that, Eric Flumerfelt committed a fix to artdaq-core -- it still needs review, approval, and a few releases (artdaq_core, dune_raw_data, lbne_raw_data) made before we can proceed.

#24 Updated by David Adams 24 days ago

Tingjun: The product_deps for v08_30_02 do use duneutil v08_29_00 as you indicate.

#25 Updated by David Adams 24 days ago

I switched back to v08_30_02. Note that one can easily build in v08_31_00 following the prescription above.

#26 Updated by Tingjun Yang 24 days ago

Some users have noticed and been affected by this change back t0 v08_30_02. We probably should send a notice.

#27 Updated by Tingjun Yang 23 days ago

I created a new feature branch feature/team_for_v08_31_00 with the new artdaq_core, lbne_raw_data, dune_raw_data.

#28 Updated by Christoph Alt 22 days ago

I start preparing dunetpc v08_31_01 now. I will make sure that the mentioned problem is resolved before requesting the tag.

#29 Updated by Tingjun Yang 22 days ago

Thanks Christoph. I suggest we tag v08_31_01 as soon as possible. Several people are waiting for new dunetpc releases to continue their work.

#30 Updated by Thomas Junk 22 days ago

Decoding ProtoDUNE-SP RCE raw data needs a small tweak. Should I commit it to the feature/team_for_v08_31_00 or to develop? Eric Flumerfelt is able to reproduce the problem that required the tweak, and it may be corrupting other readback methods too -- the artdaq::Fragment data sizes are returned inconsistently.

In a separate issue (perhaps it's separate), there is a memory leak with the new version of root and things on top of it when reading data products from a file.

#31 Updated by Tingjun Yang 22 days ago

Hi Tom, I suggest you implement the change in feature/team_for_v08_31_00 and then bump the dependence to v08_31_01. Thanks.

#32 Updated by Thomas Junk 22 days ago

Okay, done.

#33 Updated by Tingjun Yang 22 days ago

Thanks Tom. David, could you please tag dunetpc v08_31_01 using develop? Many people are waiting for this release. Thanks.

#34 Updated by David Adams 21 days ago

I have started the test build for v08_31_01.

#35 Updated by David Adams 21 days ago

  • Status changed from New to Work in progress

#36 Updated by David Adams 21 days ago

Tag is made and builds started for v08_31_01.

#37 Updated by David Adams 21 days ago

  • Subject changed from Build dunetpc v08_31_00 to Build dunetpc v08_31_00 --> v08_31_01

The linux c2 builds failed (mac still in progress).

The v08_31_01 gcc builds have been installed.

#38 Updated by Tingjun Yang 21 days ago

I have notified Aaron about the unused variables.

#39 Updated by Christoph Alt 19 days ago

I see that you already tagged v08_31_01. After moving to larsoft v08_31_01, running:

lar -c RunRawDecoder.fcl root://fndca1.fnal.gov:1094/pnfs/fnal.gov/usr/dune/tape_backed/dunepro/protodune-sp/raw/2019/detector/test/None/00/00/97/01/np04_raw_run009701_0004_dl1.root

worked fine. The CI test for data_reco_protoDUNEsp however was (and still is) hanging. I'll investigate and update here.

#40 Updated by Christoph Alt 19 days ago

Running the data_reco_protoDUNEsp CI test on a dunegpvm with

lar --rethrow-all -n 1 --timing-db time.db -o protoDune_rawdata_datareco_Current.root --config ci_test_datareco_protoDUNEsp.fcl xroot://fndca1.fnal.gov:1094/pnfs/fnal.gov/usr/dune/tape_backed/dunepro/protodune/np04/beam/detector/None/raw/07/73/51/42/np04_raw_run005809_0004_dl6.root

gives the following output:

.
.
.
Begin processing the 1st record. run: 5809 subRun: 1 event: 4503 at 30-Sep-2019 03:45:30 CDT
CRT fragment: N hit (^C -> 32B) mismatches size 40B
M/77^C/46^W/46^@/46^Y/46�/46^O/46�/46 i/105�/46^Q/46^A/46�/460/48^K/46�/46
H/72^W/46^N/46^@/46H/72^_/46'/39^@/46 H/72;/59^]/46^@/46H/72//47^Z/46^@/46
m/109�/467/55H/72�/46i/105�/46!/33
CRT fragment: N hit (^B -> 24B) mismatches size 32B
M/77^B/46^U/46^@/46�/46�/46^O/46�/46 i/105�/46^Q/46^A/46T/844/52^K/46�/46
H/72^D/46�/46^@/46H/72$/36#/35^A/46 �/46v/118^C/46^@/46f/102�/46^C/46^@/46

CRT fragment: N hit (^C -> 32B) mismatches size 40B
.
.
.

Vito mentioned that the "�" signs may crash the CI python parser script, which in turn makes the test appear hanging in the CI dashboard. The test actually continues and exits with status 0. Is this output something we need to worry about?

#41 Updated by Thomas Junk 19 days ago

Yes, artdaq_core v3_05_04 is broken. It returns the wrong sizes of data fragments from the DAQ, which causes
undefined behavior. It's been fixed on a feature branch

https://cdcvs.fnal.gov/redmine/issues/23345

I put in a stopgap in an integrity checker for the RCEs to let that check pass, but we use the fragment size in many places, such as the CRT decoder.

Another issue is a memory leak, introduced with the same version of artdaq_core:

https://cdcvs.fnal.gov/redmine/issues/23348

This version of artdaq_core cannot be used to read ProtoDUNE-SP data. Hopefully it will be debugged soon.

#42 Updated by David Adams 19 days ago

As expected, the v08_31_00 c2 builds failed for all platforms. So that release is complete with only gcc builds.

As noted above, there are many problems with the artdaq code in that release and so I leave this ticket open to track progress on those issues.

#43 Updated by Tingjun Yang 19 days ago

The c2 build issue (unused variables) was fixed by Aaron last night.

#44 Updated by David Adams 14 days ago

  • Status changed from Work in progress to Closed

Release v08_31_00 was abandoned and v08_31_01 was made for gcc only with memory leak in protoDUNE TPC decoding.

I think a fix for latter is in the upcoming v08_32_00 (#23382).



Also available in: Atom PDF