How to submit and monitor production jobs with POMS » History » Version 8
Marianette Wospakrik, 03/24/2020 04:58 PM
1 | 1 | Marianette Wospakrik | h1. How to submit and monitor production jobs with POMS |
---|---|---|---|
2 | 1 | Marianette Wospakrik | |
3 | 1 | Marianette Wospakrik | |
4 | 1 | Marianette Wospakrik | * First and most important steps is to make sure that your kerberos id is associated with the icaruspro account. Open and login with your kerberos id at this page: https://pomsgpvm01.fnal.gov/poms/ and check if you are able to open the page and edit any of the job type and campaigns. If you don't have any permission, you can request access this by submitting a ticket to service desk for permission to run production under icaruspro. |
5 | 2 | Marianette Wospakrik | !2.png! |
6 | 2 | Marianette Wospakrik | |
7 | 5 | Marianette Wospakrik | * Create a configuration file. Do this by logging into icaruspro@icarusgpm01.fnal.gov and setup the environment necessary to run as icaruspro. This can be done with the following command: |
8 | 5 | Marianette Wospakrik | <pre>setup_icaruspro <icaruscode_software_version> <qualifier></pre> |
9 | 7 | Marianette Wospakrik | e.g. <pre>setup_icaruspro v08_37_00 e17</pre> |
10 | 5 | Marianette Wospakrik | |
11 | 5 | Marianette Wospakrik | then cd to the poms configuration directory as follows: |
12 | 5 | Marianette Wospakrik | |
13 | 1 | Marianette Wospakrik | <pre> |
14 | 1 | Marianette Wospakrik | $ ssh icaruspro@icarusgpvm01.fnal.gov |
15 | 1 | Marianette Wospakrik | |
16 | 1 | Marianette Wospakrik | Last login: Thu Apr 11 12:11:33 2019 from 131.225.67.71 |
17 | 1 | Marianette Wospakrik | NOTICE TO USERS |
18 | 1 | Marianette Wospakrik | |
19 | 1 | Marianette Wospakrik | This is a Federal computer (and/or it is directly connected to a |
20 | 1 | Marianette Wospakrik | Fermilab local network system) that is the property of the United |
21 | 1 | Marianette Wospakrik | States Government. It is for authorized use only. Users (autho- |
22 | 1 | Marianette Wospakrik | rized or unauthorized) have no explicit or implicit expectation |
23 | 1 | Marianette Wospakrik | of privacy. |
24 | 1 | Marianette Wospakrik | |
25 | 1 | Marianette Wospakrik | Any or all uses of this system and all files on this system may |
26 | 1 | Marianette Wospakrik | be intercepted, monitored, recorded, copied, audited, inspected, |
27 | 1 | Marianette Wospakrik | and disclosed to authorized site, Department of Energy and law |
28 | 1 | Marianette Wospakrik | enforcement personnel, as well as authorized officials of other |
29 | 1 | Marianette Wospakrik | agencies, both domestic and foreign. By using this system, the |
30 | 1 | Marianette Wospakrik | user consents to such interception, monitoring, recording, copy- |
31 | 1 | Marianette Wospakrik | ing, auditing, inspection, and disclosure at the discretion of |
32 | 1 | Marianette Wospakrik | authorized site or Department of Energy personnel. |
33 | 1 | Marianette Wospakrik | |
34 | 1 | Marianette Wospakrik | Unauthorized or improper use of this system may result in admin- |
35 | 1 | Marianette Wospakrik | istrative disciplinary action and civil and criminal penalties. |
36 | 1 | Marianette Wospakrik | By continuing to use this system you indicate your awareness of |
37 | 1 | Marianette Wospakrik | and consent to these terms and conditions of use. LOG OFF IMME- |
38 | 1 | Marianette Wospakrik | DIATELY if you do not agree to the conditions stated in this |
39 | 1 | Marianette Wospakrik | warning. |
40 | 1 | Marianette Wospakrik | |
41 | 1 | Marianette Wospakrik | Fermilab policy and rules for computing, including appropriate |
42 | 1 | Marianette Wospakrik | use, may be found at http://www.fnal.gov/cd/main/cpolicy.html |
43 | 1 | Marianette Wospakrik | |
44 | 1 | Marianette Wospakrik | |
45 | 1 | Marianette Wospakrik | [01:01:26 ~]$ setup_icaruspro v08_13_02 e17 |
46 | 1 | Marianette Wospakrik | Setting up LArSoft from "CVMFS": |
47 | 1 | Marianette Wospakrik | - executing '/cvmfs/larsoft.opensciencegrid.org/products/setup' |
48 | 1 | Marianette Wospakrik | - appending '/cvmfs/fermilab.opensciencegrid.org/products/common/db' |
49 | 1 | Marianette Wospakrik | Setting up artdaq from "CVMFS": |
50 | 1 | Marianette Wospakrik | - appending '/cvmfs/fermilab.opensciencegrid.org/products/artdaq' |
51 | 1 | Marianette Wospakrik | Setting up ICARUS from "CVMFS": |
52 | 1 | Marianette Wospakrik | - prepending '/cvmfs/icarus.opensciencegrid.org/products/icarus' |
53 | 1 | Marianette Wospakrik | |
54 | 1 | Marianette Wospakrik | |
55 | 1 | Marianette Wospakrik | [01:01:47 ~]$ cd /icarus/app/poms_test/cfg/ |
56 | 1 | Marianette Wospakrik | </pre> |
57 | 1 | Marianette Wospakrik | |
58 | 1 | Marianette Wospakrik | We currently have all the configuration file needed to run the SBN workshop production in this directory: |
59 | 1 | Marianette Wospakrik | |
60 | 1 | Marianette Wospakrik | <pre> |
61 | 1 | Marianette Wospakrik | [01:02:31 /icarus/app/poms_test/cfg]$ ls -1 *workshop* |
62 | 1 | Marianette Wospakrik | |
63 | 1 | Marianette Wospakrik | icarus_workshop_cosmicmuon_launch.cfg |
64 | 1 | Marianette Wospakrik | icarus_workshop_cosmicmuon_launch_injectrun.cfg |
65 | 1 | Marianette Wospakrik | icarus_workshop_cosmiconly.cfg |
66 | 1 | Marianette Wospakrik | icarus_workshop_cosmiconly_injectrun.cfg |
67 | 1 | Marianette Wospakrik | icarus_workshop_fastoptical_injectrun.cfg |
68 | 1 | Marianette Wospakrik | icarus_workshop_intrinsic_nue_injectrun.cfg |
69 | 1 | Marianette Wospakrik | icarus_workshop_nominal_bnb_instrinsic_nue.cfg |
70 | 1 | Marianette Wospakrik | icarus_workshop_nominal_bnb_instrinsic_nue_injectrun.cfg |
71 | 1 | Marianette Wospakrik | icarus_workshop_nominal_bnb_neutrino.cfg |
72 | 1 | Marianette Wospakrik | icarus_workshop_nominal_bnb_neutrino_injectrun.cfg |
73 | 1 | Marianette Wospakrik | icarus_workshop_nominal_bnb_oscillated_nue.cfg |
74 | 1 | Marianette Wospakrik | icarus_workshop_nominal_bnb_oscillated_nue_injectrun.cfg |
75 | 1 | Marianette Wospakrik | icarus_workshop_osc_nue_injectrun.cfg |
76 | 1 | Marianette Wospakrik | icarus_workshop_single_electron.cfg |
77 | 1 | Marianette Wospakrik | icarus_workshop_single_electron_injectrun.cfg |
78 | 1 | Marianette Wospakrik | icarus_workshop_single_electronpiplus_injectrun.cfg |
79 | 1 | Marianette Wospakrik | icarus_workshop_single_muon_bnb_injectrun.cfg |
80 | 1 | Marianette Wospakrik | icarus_workshop_single_muon_parallel_injectrun.cfg |
81 | 1 | Marianette Wospakrik | icarus_workshop_single_muons.cfg |
82 | 1 | Marianette Wospakrik | icarus_workshop_single_muons_injectrun.cfg |
83 | 1 | Marianette Wospakrik | icarus_workshop_single_pi0_injectrun.cfg |
84 | 1 | Marianette Wospakrik | icarus_workshop_standard_singles_neutrino.cfg |
85 | 1 | Marianette Wospakrik | </pre> |
86 | 1 | Marianette Wospakrik | |
87 | 4 | Marianette Wospakrik | Most of the neutrino and single particle sample have different configuration files created for gen stage, but because both of these sample types have similar production workflow and similar memory, there is a skeleton configuration file that handles the production workflow from g4 stage to reco called the @icarus_workshop_standard_singles_neutrino.cfg@. When using this configuration file, each sample is differentiated by using a global parameter called: @global.sample@ which will tag the directory the output file is written into with the name of the produced sample. For the purpose of re-running the production sample under the new icaruscode, you would not have to create a new configuration file, but can simply change the software version inside the configuration file using the current software version. For example, setting |
88 | 1 | Marianette Wospakrik | |
89 | 7 | Marianette Wospakrik | <pre> version = v08_37_00 </pre> |
90 | 1 | Marianette Wospakrik | |
91 | 7 | Marianette Wospakrik | will set the icaruscode to thev08_37_00 version, and this will be the version used to run the sample production. *IMPORTANT*: make sure that you _always_ change the software version inside the configuration file and via the POMS editor. Changing the software version inside the configuration file will ensure that the jobs are ran using the correct icaruscode version. And having the same software version inside the configuration file and the POMS editor will ensure the automatic triggering of the next stage in the production workflow [of course, there's also another problem that might disrupt the automatic triggering and might require manual intervention, but having a consistent software version between the configuration file and POMS editor is the first requirement for having automatic triggering.] |
92 | 1 | Marianette Wospakrik | |
93 | 1 | Marianette Wospakrik | * Create a campaign workflow. If you are re-running previously requested SBN sample then the campaign is already created. |
94 | 8 | Marianette Wospakrik | If you want to run the same campaign but with a different tag, then you can do so by using the clone function (click on the clone icon of the respective campaign (blue highlighted box on the picture below)) and then rename the name of the campaign. In the example below, I copied the whole name of the campaign and then add "with_CRTgeomfix" at the end of the campaign name. |
95 | 1 | Marianette Wospakrik | |
96 | 3 | Marianette Wospakrik | !13.png! |
97 | 3 | Marianette Wospakrik | !12.png! |
98 | 3 | Marianette Wospakrik | |
99 | 3 | Marianette Wospakrik | This will copy the whole campaign production workflow. The next step is to make sure that the new sample is being written to a new directory. To do this, click the GUI editor icon of the respective campaign. This will open a GUI editor that will display the production workflow of a campaign. Click on the stage that you want to edit. In this case, edit the @Oglobal.sample@ parameter for each stage of the production workflow with the name of the new tag for the new sample (e.g. "cosmics_muon_3ms_fixedCRTgeom"). |
100 | 3 | Marianette Wospakrik | |
101 | 3 | Marianette Wospakrik | !8.png! |
102 | 3 | Marianette Wospakrik | !10.png! |
103 | 1 | Marianette Wospakrik | !11.png! |
104 | 3 | Marianette Wospakrik | |
105 | 5 | Marianette Wospakrik | If you forgot to add this parameter, then the file will be written to a “default” directory. Currently, this default directory is listed under: @/pnfs/icarus/scratch/users/icaruspro/dropbox/mc1/poms_production/MCC1_poms_icarus_prod_numu_bnb_v08_13_02@. This is because the default @sample@ parameter in the configuration file is "numu_bnb" Please remember to use the exact same name for each sample that is being produced _within a campaign_ (despite the stage). This will help to keep all of the sample files for different stages under the same directory. |
106 | 4 | Marianette Wospakrik | ** Specify the memory needed for each stage. You can do this by changing the parameter @Osubmit.memory@. For gen and g4 stage (single particle/neutrino sample), 1000MB-2500MB usually is sufficient to run a job. Cosmic sample usually needs much larger memory and wall time. You can also see the memory profiling for the different samples in these pages: |
107 | 1 | Marianette Wospakrik | |
108 | 1 | Marianette Wospakrik | **** cosmic 3ms gen stage: https://fifemon.fnal.gov/monitor/d/otZRzhImk/poms-campaign?from=now-30d&to=now&var-Campaign=3005 |
109 | 1 | Marianette Wospakrik | **** cosmic 3ms g4 stage: https://fifemon.fnal.gov/monitor/d/otZRzhImk/poms-campaign?from=now-30d&to=now&var-Campaign=3004 |
110 | 3 | Marianette Wospakrik | !14.png! |
111 | 1 | Marianette Wospakrik | **** cosmic 3ms detsim stage: https://fifemon.fnal.gov/monitor/d/otZRzhImk/poms-campaign?from=now-30d&to=now&var-Campaign=3006 |
112 | 1 | Marianette Wospakrik | **** cosmic 3ms reco stage: https://fifemon.fnal.gov/monitor/d/otZRzhImk/poms-campaign?from=now-30d&to=now&var-Campaign=3007 |
113 | 1 | Marianette Wospakrik | |
114 | 4 | Marianette Wospakrik | to give you some idea about the size of the memory and disk that you should request when running this sample. A good rule of thumb to approach this is to run a test sample of ~10 jobs, using the new software version, through the whole production flow and collect the information on the maximum memory and walltime (@Osubmit.expected-lifetime@) to be used as a baseline when submitting jobs for each stage. This will give you a better estimate of the wall time and memory to request for the production jobs |
115 | 1 | Marianette Wospakrik | |
116 | 4 | Marianette Wospakrik | ** (Not needed but will make your production life less complicated): use the POMS recovery options for jobs that are being held due to memory. POMS will run jobs based on the number of jobs we specify at the gen stage. For each of the stage downstream of that, I have added the following line into the configuration file: @n_files_per_job = 1@. This will ensure that when the files from the previous stage have been completed, POMS will only run the jobs that were located, and when the recovery option is running, it will only re-submit the missing files and not the number of samples that we submitted at the gen stage. |
117 | 1 | Marianette Wospakrik | |
118 | 1 | Marianette Wospakrik | * Now you have everything in place, you can start the campaign production by clicking “Launch” or the rocket symbol on POMS. |
119 | 3 | Marianette Wospakrik | !15.png! |