Project

General

Profile

What to do if dCache/enstore go down (no access to pnfs area)

This means we should not make an attempt to transfer a binary file to dcache area.
There are 2 actions to be taken before the beginning of the downtime and at the end.

NOTE: There are two projects registered in PUBS as "prod_transfer_binary_evb2dropbox_XXX". One has XXX= "evb" while the other has XXX="near1". Do NOT enable the project "prod_transfer_binary_evb2dropbox_near1".

First action: ~30 minutes before the scheduled downtime beginning

This requires 2 operations that can be done in 1 step.

a) Change the RESOURCE parameter "BYPASS" value from "False" to "True" for evb=>dropbox transfer project

Under

PROJECT_BEGIN
NAME prod_transfer_binary_evb2dropbox_evb

look for RESOURCE BYPASS and set it to True

RESOURCE BYPASS => True

b) Disable near1=>dropbox transfer project

Under

PROJECT_BEGIN
NAME prod_transfer_binary_near12dropbox_near1

look for ENABLE and set it to False

ENABLE False

How does this work? In above a) will change the destination of a file transfer from dropbox to near1.
This way a file produced at evb by DAQ will be moved to near1 disk space and keeps the evb area available for more data taking.
On the other hand, since dcache is unavailable, we want to disable a project that is constantly cleaning up near1 area by draining files into dcache.
So we need to do b) which is to disable this project.

As of the date of this writing, relevant project names for a) and b) are:
Project name for a) ... prod_transfer_binary_evb2dropbox_evb
Project name for b) ... prod_transfer_binary_near12dropbox_near1

How can we do this in 1 step?
0) Log into either evb or near1 as uboonepro, then

source $HOME/pubs/config/setup_uboonepro_online.sh 
cfg_dump_project current_${USER}.cfg

1) Edit project configuration. The easiest way is to dump the currently running configuration, alter, and upload.

alias vi="emacs -nw" 
vi current_${USER}.cfg

2) Upload project configuration

$PUB_TOP_DIR/sbin/register_project current_${USER}.cfg
on the command prompt, type "y" if you agree with the modification

How to confirm the effect is in place?
Confirm b) took place on GUI (check the "Binary Transfer [Near1] project color became gray).
Then take a look at a log file:

tail -n3000 $PUB_TOP_DIR/log/ubdaq-prod-near1.fnal.gov/prod_transfer_binary_evb2dropbox_evb.log

This log file usually shows lines like this:
[ INFO    ] transfer (L: 147) >> {transfer_file} Start transfer_file @ 2016-01-21 07:02:03
...
[ INFO    ] transfer (L: 256) >> {process_files} Transferring /data/uboonedaq/rawdata/PhysicsRun-2016_1_16_19_8_22-0004537-00708.ubdaq @ 2016-01-21 07:08:46
[ INFO    ] transfer (L: 256) >> {process_files} Transferring /data/uboonedaq/rawdata/PhysicsRun-2016_1_16_19_8_22-0004537-00707.ubdaq @ 2016-01-21 07:08:47
[ INFO    ] transfer (L: 275) >> {process_files} Waiting for 6/100 process to finish...
[ INFO    ] transfer (L: 256) >> {process_files} Transferring /data/uboonedaq/rawdata/PhysicsRun-2016_1_16_19_8_22-0004537-00706.ubdaq @ 2016-01-21 07:09:19
[ INFO    ] transfer (L: 256) >> {process_files} Transferring /data/uboonedaq/rawdata/PhysicsRun-2016_1_16_19_8_22-0004537-00705.ubdaq @ 2016-01-21 07:09:19
[ INFO    ] transfer (L: 256) >> {process_files} Transferring /data/uboonedaq/rawdata/PhysicsRun-2016_1_16_19_8_22-0004537-00704.ubdaq @ 2016-01-21 07:09:19
[ INFO    ] transfer (L: 240) >> {transfer_file} Finished copy (4541, 31) @ 2016-01-21 07:09:41
[ INFO    ] transfer (L: 240) >> {transfer_file} Finished copy (4541, 30) @ 2016-01-21 07:09:41
[ INFO    ] transfer (L: 240) >> {transfer_file} Finished copy (4541, 29) @ 2016-01-21 07:09:41
...
[ INFO    ] transfer (L: 245) >> {transfer_file} All finished @ 2016-01-21 07:09:42

However with a) in place you should see lines like this:
[ INFO    ] transfer (L: 147) >> {transfer_file} Start transfer_file @ 2016-01-21 07:21:19
[ INFO    ] transfer (L: 176) >> {transfer_file} Configured to bypass transfer: run=4625, subrun=0 ...
[ INFO    ] transfer (L: 176) >> {transfer_file} Configured to bypass transfer: run=4513, subrun=114 ...
...
[ INFO    ] transfer (L: 176) >> {transfer_file} Configured to bypass transfer: run=4511, subrun=2404 ...
[ INFO    ] transfer (L: 176) >> {transfer_file} Configured to bypass transfer: run=4511, subrun=2403 ...
[ INFO    ] transfer (L: 245) >> {transfer_file} All finished @ 2016-01-21 07:21:31

Also you may check another log file:
tail -n3000 $PUB_TOP_DIR/log/ubdaq-prod-near1.fnal.gov/prod_transfer_binary_evb2near1_near1.log

which usually looks like this:
[ INFO    ] mv_assembler_daq_files (L: 94 ) >> {process_newruns} Starting a parallel (5) transfer process for 50 runs...
[ INFO    ] mv_assembler_daq_files (L: 176) >> {process_newruns} Finished all @ 2016-01-21 07:09:50

however with a) in place this project starts draining files from evb to near1, and you should see a log like this:
[ INFO    ] mv_assembler_daq_files (L: 94 ) >> {process_newruns} Starting a parallel (5) transfer process for 50 runs...
[ INFO    ] mv_assembler_daq_files (L: 128) >> {process_newruns} processing new run: run=4618, subrun=1245 ...
[ INFO    ] mv_assembler_daq_files (L: 128) >> {process_newruns} processing new run: run=4537, subrun=603 ...
...
[ INFO    ] mv_assembler_daq_files (L: 196) >> {process_files} Copying /data/uboonedaq/rawdata/PhysicsRun-2016_1_21_2_53_56-0004618-01245.ubdaq @ 2016-01-21 07:10:03
[ INFO    ] mv_assembler_daq_files (L: 196) >> {process_files} Copying /data/uboonedaq/rawdata/PhysicsRun-2016_1_16_19_8_22-0004537-00603.ubdaq @ 2016-01-21 07:10:03
[ INFO    ] mv_assembler_daq_files (L: 196) >> {process_files} Copying /data/uboonedaq/rawdata/PhysicsRun-2016_1_16_19_8_22-0004537-00602.ubdaq @ 2016-01-21 07:10:03
[ INFO    ] mv_assembler_daq_files (L: 196) >> {process_files} Copying /data/uboonedaq/rawdata/PhysicsRun-2016_1_16_19_8_22-0004537-00601.ubdaq @ 2016-01-21 07:10:03
[ INFO    ] mv_assembler_daq_files (L: 196) >> {process_files} Copying /data/uboonedaq/rawdata/PhysicsRun-2016_1_16_19_8_22-0004537-00600.ubdaq @ 2016-01-21 07:10:03
[ INFO    ] mv_assembler_daq_files (L: 196) >> {process_files} Copying /data/uboonedaq/rawdata/PhysicsRun-2016_1_16_19_8_22-0004537-00599.ubdaq @ 2016-01-21 07:10:03
[ INFO    ] mv_assembler_daq_files (L: 215) >> {process_files} Waiting for 6/50 process to finish...
...
[ INFO    ] mv_assembler_daq_files (L: 176) >> {process_newruns} Finished all @ 2016-01-21 07:14:14
[ INFO    ] mv_assembler_daq_files (L: 321) >> {validate} validated run: run=4618, subrun=1245 ...
[ INFO    ] mv_assembler_daq_files (L: 321) >> {validate} validated run: run=4537, subrun=603 ...
[ INFO    ] mv_assembler_daq_files (L: 321) >> {validate} validated run: run=4537, subrun=602 ...
...

as expected.

IMPORTANT
Make sure to discard current.cfg to avoid a confusion to others and yourself in future.

Second action: at the end of downtime

You basically have to revert what you have done.

a) Change the RESOURCE parameter "BYPASS" value from "True" to "False" for evb=>dropbox transfer project

Under

PROJECT_BEGIN
NAME prod_transfer_binary_evb2dropbox_evb

look for RESOURCE BYPASS and set it to False

RESOURCE BYPASS => False

b) Enable near1=>dropbox transfer project

Under

PROJECT_BEGIN
NAME prod_transfer_binary_near12dropbox_near1

look for ENABLE and set it to True

ENABLE True

NOTE: You cannot necessarily validate that you have done this correctly by expecting reversed behavior in the logs as described above. The prod_transfer_binary_evb2near1_near1.log behavior will not change until the backlog of files are copied from evb. The point is, even though you've just undone the BYPASS change, all the files previously marked but not yet transferred will still be transferred in the BYPASSed manner.

Refer to the previous sub-section as to how you could do this & validation of your action.
Remember to discard current.cfg.