Project

General

Profile

How to manually add metadata information for tape-backed dcache bookkeeping

========

These steps only need to be done for production files created with icaruscode < v08_56_00 version.

icaruscode >= v08_56_00 already have the ICARUS-specific FileCatalog Metadata framework implemented where these metadata information is injected automatically when it's being written to the dcache:/pnfs/icarus/scratch area to then be declared to the samweb database. The additional
metadata information is important for the bookkeping as the File Transfer Service (FTS) will try to extract file metadata, add it to the SAM catalogue, and then transfer, archive, and delete the file according to its configuration.

For example, all the MC production samples will be send to the tape-backed area following this configuration:

enstore:/pnfs/icarus/archive/sam_managed_users/icaruspro/data/${file_type}/${icarus_project.stage}/${file_format}/${production.type}/${production.name}/${icarus_project.name}/${icarus_project.software}/${icarus_project.version}

========

Step-by-step instructions:

  1. Log in to icarusgpvm01 as icaruspro:
    $ ssh icaruspro@icarusgpvm01.fnal.gov
    
  2. Launch the icarus software setup script as icaruspro (not that I'm setting up icaruscode v09_20_00 with qualifiers e19:prof):
    $ setup_icaruspro v09_20_00 e19
    
  3. Go into the following directory:
    $ cd /icarus/app/poms_test/json
    
  4. Run the create_md_json.sh script and add the output directory information.
    $ . create_md_json.sh <full path to the directory in the scratch area>
    

    e.g.
    $ . create_md_json.sh /pnfs/icarus/scratch/users/icaruspro/dropbox/mc1/poms_production/MCC1_poms_icarus_prod_purity_infinite_sce_measurednoise_v08_49_00/reco2
    

    Note that it's important to exclude the / at the end of the full path
    This script will:
    • create the json files that can be used to modify the metadata of the output files in dataset,
    • create a new sam definition for that dataset,
    • remove duplicated files (if present).
      ========
  5. You should see the following outputs on the screen if the script completed without errors.
    ...
    Metadata has been updated for file 'prodcorsika_overburden_icarus_20200727T095922_5-0052_gen_20200804T191349_g4_20200805T005206_detsim_20200813T055406_reco1_20200816T054501_reco2.root'
    Metadata has been updated for file 'prodcorsika_overburden_icarus_20200727T095922_5-0052_gen_20200804T191349_g4_20200805T005206_detsim_20200813T055406_reco1_20200816T072223_reco2.root'
    Metadata has been updated for file 'prodcorsika_overburden_icarus_20200727T100101_5-0027_gen_20200804T192008_g4_20200805T005242_detsim_20200813T055352_reco1_20200816T042728_reco2.root'
    Metadata has been updated for file 'prodcorsika_overburden_icarus_20200727T100101_5-0027_gen_20200804T192008_g4_20200805T005242_detsim_20200813T055352_reco1_20200816T061923_reco2.root'
    Metadata has been updated for file 'prodcorsika_overburden_icarus_20200727T100229_3-0094_gen_20200804T195622_g4_20200805T002251_detsim_20200813T055640_reco1_20200816T060156_reco2.root'
    Metadata has been updated for file 'prodcorsika_overburden_icarus_20200727T100435_1-0052_gen_20200804T194358_g4_20200805T004821_detsim_20200813T054322_reco1_20200815T234815_reco2.root'
    Metadata has been updated for file 'prodcorsika_overburden_icarus_20200727T101113_5-0020_gen_20200804T194652_g4_20200805T005254_detsim_20200813T055716_reco1_20200815T215932_reco2.root'
    Definition 'poms_prod_purity_infinite_sce_measurednoise_v08_49_00_reco2' deleted
    Dataset definition 'poms_prod_purity_infinite_sce_measurednoise_v08_49_00_reco2' has been created with id 105561
    

    A sam dataset named poms_prod_purity_infinite_sce_measurednoise_v08_49_00_reco2 has been created for this dataset.
    And if the dataset have duplicates, you will see the following:
    parent prodcorsika_overburden_icarus_20200727T092905_4-0026_gen_20200804T183732_g4_20200805T002303_detsim_20200813T060306_reco1.root:
      duplicates of prodcorsika_overburden_icarus_20200727T092905_4-0026_gen_20200804T183732_g4_20200805T002303_detsim_20200813T060306_reco1_20200816T051733_reco2.root:
        prodcorsika_overburden_icarus_20200727T092905_4-0026_gen_20200804T183732_g4_20200805T002303_detsim_20200813T060306_reco1_20200816T060135_reco2.root (deleted)(deleted)(retired)
    parent prodcorsika_overburden_icarus_20200727T091208_3-0062_gen_20200804T175253_g4_20200805T012304_detsim_20200813T060226_reco1.root:
      duplicates of prodcorsika_overburden_icarus_20200727T091208_3-0062_gen_20200804T175253_g4_20200805T012304_detsim_20200813T060226_reco1_20200816T054738_reco2.root:
        prodcorsika_overburden_icarus_20200727T091208_3-0062_gen_20200804T175253_g4_20200805T012304_detsim_20200813T060226_reco1_20200816T054052_reco2.root (deleted)(deleted)(retired)
    parent prodcorsika_overburden_icarus_20200727T085010_5-0039_gen_20200804T191317_g4_20200805T005503_detsim_20200813T053844_reco1.root:
      duplicates of prodcorsika_overburden_icarus_20200727T085010_5-0039_gen_20200804T191317_g4_20200805T005503_detsim_20200813T053844_reco1_20200816T071629_reco2.root:
        prodcorsika_overburden_icarus_20200727T085010_5-0039_gen_20200804T191317_g4_20200805T005503_detsim_20200813T053844_reco1_20200816T034931_reco2.root (deleted)(deleted)(retired)
    parent prodcorsika_overburden_icarus_20200727T031236_4-0084_gen_20200804T192157_g4_20200805T010545_detsim_20200813T055228_reco1.root:
      duplicates of prodcorsika_overburden_icarus_20200727T031236_4-0084_gen_20200804T192157_g4_20200805T010545_detsim_20200813T055228_reco1_20200816T042038_reco2.root:
        prodcorsika_overburden_icarus_20200727T031236_4-0084_gen_20200804T192157_g4_20200805T010545_detsim_20200813T055228_reco1_20200816T060301_reco2.root (deleted)(deleted)(retired)
    parent prodcorsika_overburden_icarus_20200727T091038_2-0046_gen_20200804T174502_g4_20200805T002021_detsim_20200813T055658_reco1.root:
      duplicates of prodcorsika_overburden_icarus_20200727T091038_2-0046_gen_20200804T174502_g4_20200805T002021_detsim_20200813T055658_reco1_20200816T054527_reco2.root:
        prodcorsika_overburden_icarus_20200727T091038_2-0046_gen_20200804T174502_g4_20200805T002021_detsim_20200813T055658_reco1_20200816T054525_reco2.root (deleted)(deleted)(retired)
    parent prodcorsika_overburden_icarus_20200727T090706_2-0034_gen_20200804T173043_g4_20200805T002151_detsim_20200813T055731_reco1.root:
      duplicates of prodcorsika_overburden_icarus_20200727T090706_2-0034_gen_20200804T173043_g4_20200805T002151_detsim_20200813T055731_reco1_20200816T060403_reco2.root:
        prodcorsika_overburden_icarus_20200727T090706_2-0034_gen_20200804T173043_g4_20200805T002151_detsim_20200813T055731_reco1_20200816T042254_reco2.root (deleted)(deleted)(retired)
    ...
    
  6. Check if the number of files in the dataset now matches the input files by using the samweb list-files command:
    $ samweb list-files --summary "defname:poms_prod_purity_infinite_sce_measurednoise_v08_49_00_reco2" 
    File count:    972
    Total size:    7968258203224
    Event count:    9720
    

    Compare that to the number of reco1 files in the Campaign Stage Submissions page. The number of File count at the reco2 stage should be equal or less to the number of files at the reco1 stage.
    ========

    ========
    If everything looks good, we just have to wait until the FTS service finds these files and moved them from this directory on the tape-backed directory:
    enstore:/pnfs/icarus/archive/sam_managed_users/icaruspro/data/unknown/None/artroot/None/None/None/None/None
    

    to
    enstore:/pnfs/icarus/archive/sam_managed_users/icaruspro/data/mc/reco2/root/Production2020/poms_prod_purity_infinite_sce_measurednoise/MCC/v08_49_00/1.1
    

    In other words, you should see files listed under
    /pnfs/icarus/archive/sam_managed_users/icaruspro/data/mc/reco2/root/Production2020/poms_prod_purity_infinite_sce_measurednoise/MCC/v08_49_00/1.1 
    

    and the number of files should match the File count of the dataset above.
    ========
    If this didn't happen after 24hr or you need to do the following:
    ========
    • Open a new terminal and login to icarussamgpvm01 machine as icaruspro:
      $ ssh icaruspro@icarussamgpvm01.fnal.gov
      
    • Go to the FTS configuration folder for icarussamgpvm01:
      $ cd /home/icaruspro/FTS/icarussamgpvm01/
      
    • Setup the FTS configuration script
      $ . setup_fts.sh
      
    • Stop the FTS script that is currently running on icarussamgpvm01
      $ stop_fts .
      

      screen output:
      Stopping FTS process 29430
      Waiting for FTS to exit..
      
    • Start the FTS process
      $ start_fts . icarussamgpvm01_fts_config.ini
      

      screen output:
      Starting FTS process
      

      The FTS process for the datasets should then start immediately, and you should see the location of the files get updated in the next hour.