Project

General

Profile

Instructions for Offline Production Checklist - Shifters

1. Is the job success rate < 90%?

2. Are there more than 300 held jobs or more than 300 failed jobs?

  • Check the top left panel of the production shifter dashboard

3. Are there less than 10K files pending and less than 10K files waiting on tape?

  • Check the “FTS Status” panel at top right of production shifter dashboard

4. Is there any individual campaign running with low efficiency?

  • Check “Active/Recent POMS campaigns” panel at the bottom of production shifter dashboard

5. Is the BlueArc and Persistent disks occupancy less than 90%?

  • Check “storage usage” panel on the production shifter dashboard for persistent/dCache usage and BlueArc usage

6. Is the production efficiency < 50% (check the onsite efficiency)?

*Check the production efficiency at the top of this link

7. Is the production memory efficiency below 50% ?

  • Check here (check if the light red band is above 50%)

8. Are there about 2k jobs running and less than 5K jobs held?

  • Check “job status” panel on the top left of this link

9. uBooNE Database is healthy

  • Check uBooNE database , the bottom plot active request and queue request is below 200. If larger than 200 is seen notify Data Production Team to file a Service desk ticket.

10. Pre-staging data, are there any user processing large dataset without coordination with production team?

  • Check uboone station if there is any large file pre-staging without coordination with production team at this link check “file in snapshot” column for 4000+ snapshot, make sure the users are listed in the recent production pre-staging data approval table. The exact columns in the table may differ slightly from what is shown below.

11. Did any user cancel more than 300 jobs in the last 14 hours?

  • Check “aborted jobs by user” panel at the bottom right of this link

12. Is there any user with larger than 300 failed jobs in the last 14 hours?

  • Check “finished jobs by user (failure)” panel at the middle left of this link

13. Is there any user with larger than 300 held jobs in the last 14 hours?

  • Check “held jobs by user (failure)” panel at the bottom left of this link