Project

General

Profile

How to make definitions

Definitions are made using the defgen script, which is a frontend to samweb which builds in a number of metadata constraints so we don't have to remember them. It takes a number of arguments which describe the files and can perform a number of actions based on that description. It can operate in a variety of modes which allow you to test that you are making the correct definitions before actually creating them.

Note: It is best to make definitions once some files already exist. Without some files to test with it is extremely difficult to create correct definitions.

How to define <options>

You must always specify the following:

--detector=nd (neardet) / fd (fardet)
--release=RXX-XX-XX-yyyy
--mode=genie/numi/cosmic/cry

The other options you need will depend upon the type of job your are making definitions for. Usually you will also need to specify:

  • Specific periods with the -p option. It can be used multiple times to specify multiple periods.
  • Output tiers depending on the type of job. -t artdaq for MC generation and -t reco -t pid -t caf -t decaf for full chain jobs.
  • If creating definitions for data you will generally need to make definitions both without and with goodruns. When making goodruns definitions, as of production4, you will need to specify a good runs SAM definition (we no longer use file metadata for this). For instance:
     -g isgood_prod4 

    If you're not sure what the right goodruns definition name is, please ask your production convener(s).

NOTE: The common problem people run into is, when "count-def" is used, each definition gives 0, it's possible that you do this too early so no single baby file comes out(if that's the case, you'll have to wait for a while) but more probably, you are using not enough parameters for defgen command. The defgen search/count by using the parameter values you give , for the other parameters you didn't give, it uses default value. If one of the parameters for you baby files is not same as default value, then your count-def will give zeros. To find out what parameters you want to add, you can do "samweb describe-definition parent-definition" to find out all the non-default values since parent dataset and child dataset normally share same parameter values(except tier, release), and you can compare the output with output of "defgen ---your guessed parameter values--- list-defs ", the difference will be the parameters you want to add.

For specific examples, look in the ECL

How to use defgen:

You can look back in previous novapro ECL entries (dbweb0.fnal.gov/ECL/novapro) with the FCL and Definitions topic to get some idea of what the arguments might look like. defgen --help is also useful.

When you believe you have the right options (see previous section), first, run this command to see the names of the definitions. This is a first opportunity to make sure your arguments are creating the definitions you intend:

defgen <options> list-names

Then, list the names with the definitions themselves and check them by eye:

defgen <options> list

If those look reasonable, check that the right number of files (you can figure out the "right" number from your "sam station monitor" link on the Grafana page for your jobs) appear in the definitions. If the count in the definitions is 0 then you probably need to fix a mistake in the <options>. This can take up to 15 or 20 minutes depending on the number of definitions being checked and the number of files included.

defgen <options> count-def

Once the file counts make sense, create the definitions:

defgen <options> DEFINE

Once the samples are defined, please copy-paste the command and terminal output for both the count-def and DEFINE steps into to the ECL.

Conveners also use this tool to update definitions listed on the production website (Working with the Production Website), taking advantage of a special mode in defgen which puts the definitions into the right format for the website config files:

defgen <options> website

Full defgen help

$ defgen --help
Usage: defgen -d detector -m mode -r release [OPTIONS] [output-mode]
Procedurally generate standard dataset definitions.

Positional argument:
    output-mode  Non-modifying modes:
                     list        List generated names and definitions.
                     list-names  (default) List generated names.
                     list-defs   List generated definitions.
                     website     List definitions formatted for the website
                                   .cfg files.
                     count-name  Count files by generated name
                                   (samweb count-definition-files).
                     count-def   Count files by generated definition
                                   (samweb count-files)
                     check       Compare generated definitions to definitions
                                   looked up by generated names
                                   (samweb describe-definition).
                                   Identical results omitted.

                 Modifying modes:  
                     DEFINE      Create definitions (samweb create-definition).
                     DELETE      Delete definitions (samweb delete-definition).

                 Experimental mode:
                     count-def-PARALLEL
                                 Like count-def, but each samweb process
                                   started in background. Likely to trigger
                                   samweb rate-limits.

Required options:
    -d, --detector=DETECTOR   Specify DETECTOR. Supported: fd / fardet or nd / neardet (or NDOS).
    -m, --mode=MODE           Specify MODE. Supported: numi, genie, cosmic, cry, ddactivity1.
    -r, --release=RELEASE     Specify RELEASE string.

Basic options:
    --help                    Output this message and exit.
    -f, --flavors=FLAVOR      Add FLAVOR to list of flavors (genie mode).
                                Supported aliases:
                                  2 = -f nonswap -f fluxswap
                                  3 = -f nonswap -f fluxswap -f tau.
                                Default (if none specified):
                                  fd: -f nonswap -f fluxswap, nd: -f nonswap.
    -g, --goodruns=DEFN       Add constraint on goodruns definition DEFN.
                                  Example: --goodruns isgood_prod4
    -h, --horn=HORN           Specify HORN current. Default: fhc.
    -p, --periods=PERIOD      Add PERIOD to list of periods.
                                Default (if none specified): -p full.
                                Supported aliases:
                                  sa    = -p full -p period1 -p period2
                                          -p epoch3b -p epoch3c -p epochs1-3c
                                  sa-mc = -p full -p period1 -p period2
                                          -p epoch3b -p epoch3c.
                                  prod3 = -p full -p period1 -p period2
                                          -p period3 -p period5
    -t, --tiers=TIER          Add TIER to list of tiers.
                                Default (if none specified):
                                   -t pid -t caf -t decaf.
                                Supported aliases:
                                  base       = -t pid -t caf -t decaf
                                  nolem      = -t pidpart
                                               -t limitedcaf -t limiteddecaf
                                  addlem     = -t pid -t caf -t decaf
                                  restricted = -t restrictedcaf
                                               -t restricteddecaf
    -v, --version=VERSION     Set VERSION (nova.subversion). Default: v1.

Extended options:
    --decaf-skim=DECAF_SKIM  Add DECAF_SKIM to list of decaf skims.
                                Default (if none specified):
                                  --decaf-skim nue_or_numu_or_nus_contain.
    --genierw=GENIERW        Set GENIERW (simulated.genierw) (genie mode).
                               Default: (not set).
    --mixing=MIXING          Set MIXING (simulated.mixingtype) (genie mode).
                               Default: (not set).
    --volume=VOLUME          Set VOLUME (simulated.volume) (genie mode).
                               Default: (not set).
    --skim=SKIM              Set SKIM (nova.skim). Default: none.
    --special=SPECIAL        Set SPECIAL (nova.special). Default: none.
    --systematic=SYSTEMATIC  Set SYSTEMATIC (nova.systematic). Default: none.
    --fcl-version=VERSION    Set the FCL.version used to generate the initial fcl files.

Compatibility options:
    --missing-genierw-metadata  Omit constraint on simulated.genierw field.
    --missing-skim-metadata     Omit constraint on nova.skim field.
    --missing-spec-metadata     Omit constraint on nova.special field.
    --missing-syst-metadata     Omit constraint on nova.systematic field.
    --missing-nova-standard     Omit constraint on nova.standard field.
    --use-nova-release          Use nova.release instead of a tier-specific release.