Project

General

Profile

10/20/14

Note these notes were upload significantly after the meeting (on November 6th) so no attempt to format the notes into a more legible format has been made, instead they are just presented here for posterity.

Attendance

Craig, Matthew, Jonathan, Nate, Paola, Chris, Dominick, Susan, Nick, Ryan, Alex R. Robert I. Michael G. Gareth.

News (Matthew)

SAM DB migration

DB works fine, all data is across, no serious problems querying it. Problems running projects, apache web-server got overloaded, increased number, sam station bugs under heavy load.

Dominick re-started stuff on saturday looked ok

Chris had some serious timeouts but then it all worked fine. Things work for him mostly, intermittent failures.

Robert is keen for us to hit it hard now to see if it stands up.

FTS side problems from Dominick. Files not going to Blue-arc? Robert things that they should be working - a second look looks like pnfs to blue arc. Files are getting store locations but not blue arc. Robert is looking into this.

FTS migration (Nate)

Slow, not dropped checksums yet. Less than 10 Gb / hour. Seemed to work fine when switched to v4_1, checksum tested and works fine.

Don’t think corruption during copies will be a serious problem.

Bottleneck at the moment seems to be the checksums. Nate will make that change right after this meeting.

PNFS issues (all)

Still slow, big back log. 31k new files in 02 only. 17k files - will continue working on this today - high priority.

nstore

AWS tests (Paola)

Went great. Nate submitted the jobs on Friday morning. S.Timm hit an ifdh handling problem over the weekend, lead to stuck jobs. Related to SAM problems?

Lots of duplicate filenames in the Nd_cry 10-09-03

Ask Steve to summarise.

First analysis SIM update (Nate)

Only thing left is ND cry & nd genie. ND cry projects are crashing. ND genie is working fine. Chris suggested quantising projects

rhc v03? Need to be remade. Deliberate as Raphael wanted to see that the FHC ones were good before making the new RHC flux files.

Systematic samples not running yet. Will wait for initial samples to be done.

Crashing projects could have been him 2 cases have known cause - Nate will work with Robert to resolve this.

First analysis RECO discussion (all)

Nothing more since friday conveners meeting. This is not the time limiting step. Waiting on calibration before we tag & run.

Additional validation.

Keep-up calibration (Paola)

Not convinced 40M would be enough

Projects got stuck 600 files for ND

why 100M? By eye comparing MC stats.

Submitted jobs on Friday. Projects got to 60% of jobs - unclear on the reasons why things failed. ND detector files.

Data files failing at 40%-ish completion.

FD jobs not submitted yet.

Will continue to work on this.

Work offline to work out if number of events for filtered events is sensible (output number of events)

Keep-up reco round 3 (Dominick)

It worked. Ran in low priority. Some level of failures, he hasn’t looked into yet. In FTS backlog and not on blue arc, slowly tricking into sam and now trickling onto blue arc. In retry loop.

Only about 10% made it through. At this rate we won’t get all before the collaboration meeting. Only CAF’s going to blue arc, they might appear quicker.

Collaboration meeting agenda and slide request (Matthew)

AOB

Chris is using the nova pro to move files into the dropbox. Paola wanted to point out that.

Priority rules feature request for FTS - essential under current paradigm. If we’re going to be stuck in this mode for a while then this feature would be useful now.

Matthew will write this down and submit to fermilab fts red mine.