JobSub Users Meeting: July 10 2014¶
Agenda¶
- Stdout/Stderr Size limit
- Jobsub User Interface
- AOB
Meeting Notes¶
Present: Parag Mhashilkar, Neha Sharma, Tanya Levshina, Marek Zielinski, Art Kreymer, Joe Boyd, Mike Kirby, Dennis Box, Gerard, Yun Tse, Adam
Stdout/stderrr¶
Architecture: 50K Jobs running at a time, 2G input sandbox per job. How long should the sandbox stick.
Based on the Fermigrid capacity: 50k Job per day.
2MB stdout & stderr per condor procid (job).
Users can redirect the stdout/err to data directory and make jobsub tools transfer the output using ifdh
We need means to fetch a process rather than entire cluster
Can we compress the individual files when accessed first.
Jobsub User Interface¶
Microboone: Yun Tse: So far working good. Working on 8G VM. But dont need 8G since the memory consumption is low now after the fix to memory leaks. Huge production spikes in early August.
Minos:
- Done with preliminary jobsub and will be using HA only.
- Adam is in charge of data production jobs
- SES Data production group (Marek)
- Going directly to Jobsub HA
- Ran MC production run.
- Sign off on moving fifebatch1 to HA
- Discussion about the Jobsub job id.
- options:
- 100.12@fifebatch1.fnal.gov * fifebatch1.fnal.gov@100.12 * fifebatch1.fnal.gov100.12 * xxx100.12
- Operations: Jobsub brings back many other files which we should stop doing.