Project

General

Profile

Correcting Failed Metadata Registration

Errors in registering file metadata might because there are SSL problems registering file metadata into the SAM database or missing crontab entries.

When you have problems with metadata registration, the error can be seen in the prod_register_binary_evb log files as something like this:

[ INFO ] reg_files_to_sam (L: 352) >> {<module>} End project 2016-02-27 11:59:59
[ INFO ] reg_files_to_sam (L: 345) >> {<module>} Start project
2016-02-27 12:00:05
[ INFO ] reg_files_to_sam (L: 149) >> {process_runs} Declaring a file to SAM: run=5179, subrun=186 ...
[ ERROR ] reg_files_to_sam (L: 241) >> {process_files} Unexpected error: samweb declareFile problem:
[ ERROR ] reg_files_to_sam (L: 242) >> {process_files} SSL error: [SSL: TLSV1_ALERT_DECRYPT_ERROR] tlsv1 alert decrypt error (_ssl.c:581)

Note the SSL error. That's what tells you that it's a problem with the authentication and the voms grid proxy. The solution is to log into near1 as uboonepro, then list the crontab jobs using `crontab -l`.

[uboonepro@ubdaq-prod-near1 ~]$ crontab -l
05 */6 * * * /usr/bin/voms-proxy-init -rfc -key /home/uboonepro/uboonepro-ubdaq-prod-near1.fnal.gov.incommon_key.pem -cert /home/uboonepro/uboonepro-ubdaq-prod-near1.fnal.gov.incommon_cert.pem -valid 48:0 -voms fermilab:/fermilab/uboone/Role=Production -out /home/uboonepro/uboonepro_production_near1_proxy_file

If you don't see that, then copy the command from the above and run it in a uboonepro session on near1. Also, make sure to add it to the crontab for that account and run it every 6 hours. Note that the grid proxy location is noted in the configuration script run when you log in as uboonepro.

[uboonepro@ubdaq-prod-evb pubs]$ more config/setup_uboonepro_online.sh

<snip>
#The SSL_CERT_DIR variable is being set on the advice of Robert Illingworth and a potential
#mismatch between the version of python SSL authentication and sam_web_client. If the project
#reg_binary_to_sam gives error 102 it is likely SSL problems in samweb.declareFile and may
#require an update to this setting.
export SSL_CERT_DIR=/etc/grid-security/certificates;
export X509_USER_PROXY=/home/uboonepro/uboonepro_production_near1_proxy_file;
<snip>

Note that ubdaq-prod-evb should also have a crontab entry that looks like this:

00 12 * * * export KRB5CCNAME=FILE:/tmp/krb5cc_uboonepro_evb; kinit -A -k -t /var/adm/krb5/uboonepro.keytab uboonepro/cron/ubdaq-prod-evb.fnal.gov

If that isn't there, please replace it in the uboonepro account on ubdaq-prod-evb.

Getting back to the metadata registration failures, to select all of the failed runs you should log into ubdaq-prod-smc.fnal.gov and query the database.

$> psql -d procdb

procdb=> SELECT * from prod_register_binary_evb where status >1;
run | subrun | seq | projectver | status | data
-------+--------+-----+------------+--------+------
12880 | 26 | 0 | 0 | 112 |
12880 | 25 | 0 | 0 | 112 |
12880 | 24 | 0 | 0 | 112 |
<snip>

Use pubs/dstream_online/correct_file_status.py to change the failed registration files back to status 1. Thus, the command would be:

./correct_file_status.py prod_register_binary_evb 112 1 12880

where 5179 is the file number of all runs that were affected by the failed voms-proxy-init command.

There might be cases where the files were registered by SAM but were not found on near1:

source /cvmfs/uboone.opensciencegrid.org/products/setup_uboone.sh
setup uboonecode v05_08_00_06 -q e9:prof
samweb locate-file `samweb list-files "file_format binary% and run_number = 15969.80"`
enstore:/pnfs/uboone/data/uboone/raw/online/assembler/v6_00_00/00/01/59/69(1323@vpm599)
samweb locate-file `samweb list-files "file_format binary% and run_number = 15969.77"`
enstore:/pnfs/uboone/data/uboone/raw/online/assembler/v6_00_00/00/01/59/69(2768@vpm574) 
 select * from prod_transfer_binary_near12dropbox_near1 where run=15969 and status!=0 and status!=1006;
  run  | subrun | seq | projectver | status | data 
 -------+--------+-----+------------+--------+------
 15969 |     77 |   0 |          0 |   1002 | 
 15969 |     80 |   0 |          0 |   1002 | 
 (2 rows)

In such a case, the status of the files has to be set to 0.

./correct_file_status.py prod_register_binary_evb 1002 0 15969

where the arguments that we used above are the initial status, the final status and the run number of the files.