05 1 14¶
Nate, Pavan, Michael, Eric, Mathew, Craig, Tom, Kanika, Jan, Dominik, Satish, Chris, Andrew, Igor, Gareth, Jon
Not available: Gavin, Ryan, …
Slides posed to DocDB:
Pavan: disk cleanup update
- Slides posted
- Files being moved to PNFS
- documentation of first large dataset
- On DCACHE and also on tape?
(new file family, won’t get mixed with other files)
(can be easily removed later)
Kanika: Code update
- Slides posted
- 3 tags within the last month
- tag two days ago – will it work for PID?
- fix for circular dependencies is in — but waiting for tag until well clear of PID
- Need new externals for new ART - LEM mixing.
—> Big jump due to the externals.
- Will even need another version of ART soon — for memory issues.
—> Need to keep some pressure on the Artists.
- iftools documentation?
- SRT v/s Cmake ?? Need some discussion about the effect on users.
—> Invite Brian Rebel and others to discuss this at a future meeting?
- FTS instances are running as they should (01 only has 6GB of ram compared to 02 and 03)
- Nate — can make changes to the FTS.
- SAM – good lately.
- Tape report— hard to find for NOvA ?
- NOvA write pool in DCACHE seems to be full.
- Occasional authentication failures, and long wait times on SAM?
-> Long wait times: Number of processes on the SAM server is really high
-> One in few hundred fail due to authentication failures — seems like a cluster of failures.
-> Do we need some retry functionality?
Jon/Igor : Database
- Was an issue – setting up to use nova_grid user, but run history code was hard coded to use nova_reader (fixed now).
—> For now use nova_reader account for everything on the grid?
—> Problem now is much more intermittent. Jon needs IP address – he can check log files.
- Dominik: NOVA hardware database access from the grid has some
—> Immediately tell us while the file is stalled.
- hardware database and run history are not using the web service (non validity tables)
—> very quick (milli seconds) - one query every 20 seconds.
- Some changes to the database package that give more accurate query times. Grepping log files in short term should work.
— turn on reporting.
- Run History issue – using the wrong Ash River tables. Probably not a big impact on production? Impacts bad channels – mostly just APD swaps.
- Memory issues still exist?
- Squid doesn’t function properly?
- Post a ticket and give a call to Jon
- There was a big table with 2 million channels – memory issues. Igor would like us to use it again.
—> Data size 60 MB, so it seems strange to produce such a memory issue.
—> Some dedicated interactive tests would be useful here.
- Strongly suggest to push hard on better monitoring for the databases.
—> No ganglia — should be on every database server.
—> Memory usage,
—> Need to go through scientific services group.
- Have database parameters been chosen correctly?
- Igor would like a requirements estimate – for conditional database. (10x higher than now)
—> Need to understand better what resources NOvA needs.
Production Discussion :
— Reco: working on subrun 0-9 for cosmic data
— Close on CAF readiness…
— Question about projects size – been running 30,000 regularly and successfully– how about 60,000? Craig suggests to try 50,000.
—> Is it still first in first out — was the algorithm updated to something better? Robert is on vacation…
- Satish: still waiting on account privileges.
— PCHits for ND MC might be a nice starter project.
- Kanika – will keep is informed on the code and let us know when it is ready to go.
- Eric — PID MC
- Dominik — PID Data
ND – overlays
- Nate – made the changes to metadata module that were needed.
-> testing metadata – if test pass, will commit and run a larger test sample for Jim