Attending: Alex, Satish, Susan, Dominick, OPOS, Joseph, Paul Rojas, Enrique, Kanika, Bruno
We're planning on tagging and testing the prod2reco release today. However, production will only start if it works (using production tests for that) and if we get the signs off on analysis groups. The main limitation for that is the lack of near detector MC.
The main concern on this tag is whether to include subrun level masking. Alex will poke Jon Paley on this matter today and come with an answer.
We're running well under quota onsite but very well offsite, so Alex suggests that we continue the ND MC generation offsite aggressively. The main consequence of the lack of onsite resources is that we probably won't have as many cosmics as wanted. In any case, OPOS will run as many as possible over the next month.
Many idling jobs onsite¶
There are issues trying to correctly configure the new GPGrid that are affecting all the experiments. As a consequence, the number of available slots is small. People are already working on this since 9:00 this morning. The good news is that this only affects Fermigrid. The number of jobs offsite was very good, several thousand. So the recommendation is to run as much as possible offsite and ask for lots of slots.
It was working fine for a while, but when I tried to submit more jobs it started to kill all of them, even the ones that were running OK. Since the server was always below 30% load, Chris thought this should be yet another case of auto blocker triggering, given that we modified the address of the LEM server and they probably didn't include it in the exceptions. Chris opened a ticket about this and should be resolved soon.
It was not performing well over the weekend, but after restarting things seem to look better. We will need to keep an eye on it for a longer period of time to be sure this is fixed.
FD MC reco filenames collision¶
Files are created with a name including nogenierw instead of none. There's nothing too suspicious in the logs yet, but the changes in NovaGridUtils might be culprit. Draining datasets were not run according to Enrique, so it's still unclear what could have caused this. Alex recommends excluding them in the definitions by name. Also, we will retire the colliding files.
Pidparts and reco ignored by FTS¶
Apparently this is due to the output tier outreco:reco getting switched with pid. Essentially, the parsing of outTiers has not worked as expected in the metadata. Satish and Alex will look at this, first by fixing the bug in submit_nova_art.py and second to find a way to use these files without having to produce everything from scratch. Satish will fix the issue, Paul will send definitions around and Alex will try and find a solution to the metadata so the right files are in the right place. This is a very broad issue, but a possible solution would be adding outpid and outreco to the list of valid tiers.
Any other issues¶
Enrique will hold on LEM+CAF until the auto block issue is resolved
Second analysis tag¶
The plan is to tag prod2reco but only once we figure out the problems. The deadline for commits is today, so we can see the outcome of the production tests tonight. This is going to need creating a new development build, since Jon's commits happened after the nightly build was done.
Subrun to subrun masking¶
Alex will get Jon's opinion on this. If we feel that this isn't stable enough for production, the alternative will be to use the data quality approach. It's not as good in recovering data, but it's still better than losing 15% of the total POT.
Calibration UPS product¶
Matthew's problem to create a UPS product is due to a lack of free space in tmp. Kanika will talk to him to find a solution.
ND NuMI data¶
On hold until LEM server issues are resolved. Will use this as a benchmark to test producing decafs based on limited CAFs. This will be a need once we have all the ND Monte Carlo
FD cosmic data¶
Still not much progress due to memory issues. Even getting rid of the memory limit they still fail with bad_alloc. Only one of the jobs failed with information on the memory tracer dump. Apparently, the break point fitter energy estimator is using 1.1 GB at startup. A possible solution would be to get rid of BPF for the cosmics, given that this is not going to the main analysis. Joseph and Dominick are going to report on different memory tests and analyses which might help providing more sense into the actual figures. Satish also suggests that there might be some implicit virtual memory limit somewhere and that we're not overriding it.
FD ideal conditions generation¶
Almost finished. Flux swap still has a few hundreds that need draining, but very good progress has been made. We will hold on to the reconstruction until we freeze it.
Still waiting for green light from Adam
ND Genie MC generation¶
3000 more files to go. Reconstruction on hold until we decide what to do about the naming and metadata issues. During the meeting, Satish thought he just found the issue, and that this is part of the sam configuration.
FD real conditions MC¶
90% done, just waiting to solve the naming issues
15000 jobs submitted split into smaller datasets. Felipe will submit the rest to get this completed. Everything seems OK with this sample, but Felipe will wait a bit to make sure files go through the FTS.
Some of last week files were discovered to be bigger than 1.5 GB, so Vito has changed the threshold and they've been processed successfully. There are still a few files with issues, so Vito will send an email with some examples and insight.
Many files were stuck in the dropbox, so Felipe paused the reco keepup for a day, and it will be resumed tomorrow. The tickets will continue to be renewed on Tuesdays.
Any other business¶
Nothing on this matter