Project

General

Profile

22-February-2016

Attending: Satish, Susan, Enrique, Alex, Paul, Bruno, Joe, Felipe, Paul, Kanika, Paola, Vito, Gavin

Schedule Overview

docdb 14825

Gavin commented that he has made some improvements to Hough Vertex that should yield some speed improvements.

Computing Issues

CVMFS issues on UChicago/MWT2 (Enrique)

Gavin made some library changes, and Enrique submitted some test jobs this morning. He is waiting to see results back, and should know in a few hours.

SMU Status

The cluster at SMU has had a number of issues, some not related to nova. Amit has been working on these problems, and we should know more about the status on Wednesday.

LEMServer

Bruno has been running LEM+CAF jobs over the weekend. Up to about 200 simultaneous jobs, things seem to run fine. But running more than that tends to cause issues. This matches well with Alex’s expectations of when LEMServer should start running into trouble. We so far do not have any tests with Chris’s latest fixes.

StashCache (Paul R)

Paul ran some stash cache test jobs over the weekend. The jobs appear to be running successfully, but Paul is not 100% convinced they are pulling in the flux files through FNAL. Robert believes they are, however. Some code updates are needed, which Paul will commit (and then Satish will cut a new version of NovaGridUtils). But otherwise, it was decided to call this issue done.

Shim lib status (Gavin)

On Alex’s suggestion, Gavin will set this up so that it is used everywhere. He expects to go live with it tomorrow, and integrate ti with the nova setup. In this vein, Dominick asked about making use of the CVMFS checking script. This is problematic because it ha stop go very early in art_sam_wrap.sh — before even the novasoft setup. It was generally agreed that this should be done, but there were no volunteers to do the work.

SW Tags (Paul S)

Not much happening last week. Development build was trouble. Fixed and cutting soon. Will be making prod2reco soon.

Scrum

ND numi data

Production of the limited CAFS are pretty much done. The LEM+CAF jobs have made little progress, because of the troubles with LEMServer.

FD cosmic data (Joe)

Over weekend, all of these jobs encountered a problematic file that caused them to crash. He needs to look at logs. Alex recalled that cosmic ray samples were problematic, but Joe commented that it was not at this level.

FD ideal conditions genie (Joe)

This has languished as Joe has been paying attention to the cosmic ray data. He will start paying attention to it again now.

FD real conditions MC (Enrique)

Limited cafs are nearly done, but datasets are slow to update b/c of what appears to be FTS backlogs. Full CAFs are delayed by the same LEMServer issues plaguing Bruno. It will probably take until tomorrow to spin through these.

ND genie MC

Last night's full chain jobs looked strange. This caused Paul to investigate the artdaq tier files, and they appear to be much smaller than expected. Paul is investigating. There was some discussion of pulling Gavin in, but he is working on validation.

Raw2Root keepup (Vito)

This has been mostly running smoothly. Last week there were some authentication issues, but those have been resolved. This morning Vito noticed that there are no new raw files to process. No-one in the meeting was aware of any reasons why this could be, and Alex could not identify any issues from the ECL that would cause this. Satish will to follow up with the watchdogs.

Reco Keepup (Felipe)

There were no errors, but some pending fils that need to be followed up with.

Assignments

Enrique will generate rock neutrino FCLs. Job submission will wait until shim lib has been rolled out.

AOB:

The latest set of rock neutrino jobs were failing because of missing libraries. OPOS was unsure how to handle it. This is exactly the sort of think that shim lib should address, and so we will wait until that problem is resolved before resubmitting. Ee should make sure to include previously problematic sites.