Project

General

Profile

Wiki » History » Version 33

« Previous - Version 33/35 (diff) - Next » - Current version
Leon Mualem, 04/16/2020 06:10 PM
clarified need to be root to install python27


Current (NOvA) FTS Instances

There are currently three FTS instances serving different aspects/needs of the NOvA DAQ, Offline and Assembly.

Instance Host Purpose
DAQ (Far Detector) http://novadaq-far-datadisk-01.fnal.gov:8888/fts/status DAQ Raw Data and logs
DAQ (Far Detector) http://novadaq-far-datadisk-02.fnal.gov:8888/fts/status DAQ Raw Data and logs
DAQ (Far Detector) http://novadaq-far-datadisk-03.fnal.gov:8888/fts/status DAQ Raw Data and logs, Assembly Laser Files
DAQ (Far Detector) http://novadaq-far-datadisk-04.fnal.gov:8888/fts/status DAQ Raw Data
DAQ (Far Detector) http://novadaq-far-datadisk-05.fnal.gov:8888/fts/status DAQ Raw Data
DAQ (Near Detector) http://novadaq-near-datadisk-01.fnal.gov:8888/fts/status DAQ Raw Data and logs
DAQ (Near Detector) http://novadaq-near-datadisk-02.fnal.gov:8888/fts/status DAQ Raw Data and logs
FNAL Relay/Offline http://novasamgpvm01.fnal.gov:8889/fts/status DAQ Relay from far site
FNAL Offline http://novasamgpvm01.fnal.gov:8888/fts/status Offline production files

FTS Destinations

Instances Destinations Notes
DAQ (Near Det.) Enstore, Bluearc Automatic cleanup enabled for log files
DAQ (Far Det.) FNAL Relay station (Bluearc)
FNAL Relay/Offline Enstore, Bluearc Automatic cleanup enabled for Monte Carlo and Laser files

Notes: The way that the Far detector -> FNAL Relay works is that the Far det station transfers to a director dropbox location on the Bluearc which is then used as the input dropbox area for the FTS Relay/Offline instance. The final SAM destination is checked against the SAM database. When these are registered correctly, then (and only then) the original files on the source machine are flagged for cleanup.

Configuration of the Raid Hardware for use with FTS

The FTS system is an IO intensive system by its nature (copying files back and forth, calculating checksums etc...) and as a result can consume large amounts of system resources if the underlying hardware is not configured correctly.

For the far detector raid arrays (novadaq-far-datadisk-01, 02, 03) the following should be done:

  • The IO scheduling for the raid should be set to "deadline"

On an SLF6 machine this is done through the following command (where here the raid device is /dev/sdb)

To check the current scheduler you can do:

[root@novadaq-far-datadisk-03 ~]# cat /sys/block/sdb/queue/scheduler 
noop anticipatory [deadline] cfq 

The default is cfq (completely fair queuing) which defeats the raid controller's own scheduling and results in lower data rates.

To set the scheduler type to "deadline" do:

> echo deadline > /sys/block/sdb/queue/scheduler

This does not persist across reboots, so a line has been added to the rc.local files for these machines.

Adding Files to the FTS System

Adding files to the far detector setup is a simple matter of adding a configuration block to the setup files used by the system.

Making a new File Family

From novasamgpvm01 (or any other machine that can see the pnfs):

> setup encp -q stken:x86_64
> enstore sfs --file-family
nova

While in the NDOS area we would get:

> ensore sfs --file-family 
rawdata_NDOS

So we need to define a new directory and family for the far detector data. From the raw data directory (/pnfs/nova/rawdata/)

> mkdir FarDet
> cd FarDet
> enstore sfs --file-family
nova
> enstore sfs --file-family rawdata_FarDet

Handled File Classes

Each of the instances is configured to handle a differ set of file classes. Currently the following matrix

Nova FTS Guide

The File Transfer System (FTS) has been installed and is currently running in for the NOVA experiment. The FTS is run on machines that are a part of the data acquistion (DAQ) computing cluster and is currently being used to process raw data files and DAQ log files for the general DAQ system.

The FTS system is also being used as a transport layer to assist the assembly and alignment groups. Information on the setup of the FTS for the laser scanner can be found here: FTS for the Laser Scanner

Instructions on starting and stopping the FTS system for the Laser scanner can be found here Starting/Stopping the Laser Scanner FTS

The FTS system is also being used by the offline groups for transfers of analysis files. Information on the FTS setup for offline can be found here FTS for Offline Use

Installation

FTS installation is changing (2020-Feb) it can be done by installing the virtualenv package for controlling the python environment and installing the python packages for the file transfer service instead of the old way of doing it via ups products.

FTS now requires Python >=2.7.6 This is problematic for SL6 systems, like novadaq. The workaround for this is to install a python version from the "Software Collections" project. Details on the project can be found here: [[https://www.softwarecollections.org/en/scls/rhscl/python27/]]

It installs in a /opt/rh/ area on the node, so it has to be done on each node.

Installation of python into /opt/rh will require that you are authenticated as the root user

Access to the installs and subsequent pip usage needs to use a proxy since outgoing connections are generally forbidden. The squid proxy is

squid.fnal.gov:3128

and can be setup for the shell session for both http and https as follows
export http_proxy="http://squid.fnal.gov:3128/" 
export https_proxy="http://squid.fnal.gov:3128/" 

Installation requires enabling centos mirrors with a new file in /etc/yum.repos.d/centos-sclo.repo

The file should look like:

#additional packages that may be useful
[sclo]
name=CentOS-$releasever - Extras
mirrorlist=http://mirrorlist.centos.org/?release=6&arch=$basearch&repo=extras
#baseurl=http://mirror.centos.org/centos/$releasever/extras/$basearch/
enabled=1
gpgcheck=0
priority=91

Enable scl with

yum install centos-release-scl-rh

Technically, you should probably delete these things after you are done with them, but I think puppet cleans up /etc/yum.repos.d/ regularly, so it will probably get cleaned up for you.

List python27 version with

yum list python27 

and you should get something like:
python27.x86_64                           1.1-25.el6            centos-sclo-rh  

Try to install python27, should install a whole bunch, if lucky.

yum install python27             

Once this is done you should have access to python27 with

scl enable python27 bash

check with
python -V 

which should give you a version 2.7.17 or greater.
There is an issue with setuptools for virtualenv (which is installed with the python27 collection) When you install setuptools it may get a version > 45.0, which requires python 3.6. If you do get this you can install the highest version that is <45.0.0. This is part of the installation instructions for fts below.

The rest of the process will run as the novaraw user, so make sure you are user novaraw before proceeding

In the current FD setup, fts 6.3.6 is installed in the virtualenv in ~novaraw/fts2020apr/, so the whole system can be configured for the FD datadisks from the novaraw account with

source /home/novaraw/fts2020apr/setup_nova_forfts.sh 

Check with

which start_fts

The start scripts and logdirs should probalby live in this subdirectory to help keep things organized.

Installation of FTS in general with python and pip can be found here:
https://cdcvs.fnal.gov/redmine/projects/filetransferservice/wiki/Wiki#Installation


Installation from ~2013 found below

The FTS system and its external dependencies can be installed via the UPS/UPD mechanism. This process allows new instances of the FTS to be brought up quickly and minimizes version conflicts with existing packages that may be installed on a system.

The current FTS installation relies on the following external products which are available from UPS/UPD for the Linux 2.6 i386 and x86_64 architectures:

ups depend FileTransferService for deps

  • python v2_4_6
  • twisted_python v11_0
  • sam v8_8_2
  • sam_config v7_1_7
  • sam_ns_ior v7_2_3
  • samgrid_batch_adapter v7_2_7

The SAM station system:

  • sam_station v8_8_20 Linux64bit+2.6

To generate a current listing of dependencies for the FTS system, you can directly query the UPS database with the following command:

> ups depend FileTransferService

Authentication

The FTS uses a set of x509 certificates to provide authentication to the storage servers that it communication with. These certificates must be installed and proxies must be generated off of them on a regular basis for the system to function properly.

To obtain a proxy:

Generate a certificate request

openssl req -nodes -newkey rsa:2048 -keyout $(hostname --fqdn)_key.pem -out $(hostname --fqdn).csr -subj "/DC=org/DC=incommon/C=US/ST=IL/L=Batavia/O=Fermi Research Alliance/OU=Fermilab/CN=$(hostname --fqdn)" 

This will create a csr file as well as a private key file. Save these both!

Next use your csr file to obtain an actual certificate from your certificate authority (Note: At the current time this part of the process is not well defined)

Once you obtain the actual certificate the distinguished name (DN) or subject must be registered in different places. First it must be registered with the VOMS servers under an individual's name.
  • Go to the Fermilab service desk https://fermi.service-now.com/wp/
  • Select the "Request Something" (seriously that is what the button is called)
  • Then select "scientific job management"
  • This will show a button labeled "Add a certificate DN to user account", select this.
  • Fill out the form and include the subject of the certificate

To extract the subject from a certificate:

> openssl x509 -in fts-novadaq-far-datadisk-2019-cert.pem -noout -subject
subject= /DC=org/DC=incommon/C=US/ST=IL/L=Batavia/O=Fermi Research Alliance/OU=Fermilab/CN=novadaq-far-datadisk-04.fnal.gov

Next the DN also needs to be registered with SAM.

Next, copy the certificate and key into the fts/certs directory under the novaraw account. Make sure that the key is only user readable (i.e. chmod 600 thekey.pem)

Attempt to obtain a valid proxy using the certificate and key:

voms-proxy-init -rfc --cert ~/fts/certs/fts-novadaq-far-datadisk-2019-cert.pem --key ~/fts/certs/fts-novadaq-far-datadisk-2019-key.pem -voms fermilab:/fermilab/nova/Role=raw -order /fermilab/nova/Role=Raw -out ~/myproxy

This won't work. There is up to an hour delay currently between when you register the DN and when the VOMS picks it up. So wait...eventually it will work.

Now make the proper symlinks for each of the FTS machines that are going to use this.

Restart the FTS on each machine that now has a current certificate.

Make a note of when this was done. Certificates are good for approximately 1 year. Near the end of this time you will want to renew your cert for another year. If you cert is valid (and the certificate provider hasn't changed) it may be possible to renew the certificate using the certificate. If the certificate is expired then the process will need to be repeated, but if the DN hasn't changed then there is no need to update the VOMS or SAM admin servers.

FTS Setup

Because the FTS uses UPS with a proper dependency chain, all that is required to initialize and setup the FTS software system is a single ups setup command:

> setup FileTransferService

Once the system has been setup, the FTS daemon can be started.

The FTS system currently relies on several external tools for extracting metadata from the files that it processes. These packages are experiment specific and must be made available via the standard execution path, and must be able to link any additional libraries via the library paths.

To setup the NOvA DAQ specific software that the FTS will use, source the standard online scripts and optionally define a test release to use for local modifications. In the case of the installation of the FTS on novadaq-datadisk-01 the initialization sequence is:

source /nova/novadaq/setup/setup_novadaq_nt1.sh cd /home/novaraw/build/dev_tools/ srt_setup -a

This will setup the novadaq environment and correctly set the $SRT_PUBLIC_CONTEXT and the $SRT_PRIVATE_CONTEXT environment variables and push them onto the correct paths.

Then to initialize the FTS and SAM products do the following:

export PRODUCTS=$HOME/sam/db:$PRODUCTS export SETUPS_DIR=$HOME/sam/etc

Which add the UPS database with the SAM products to the ups paths. Now actually setup the products:

setup ups setup sam setup FileTransferService

Then you can start the actual deamon using the config files (e.g. fts_config.txt) found in the config directory (/home/novaraw/fts_rundata/)

start_fts /home/novaraw/fts_rundata fts_config.txt

The FTS system will spin up and start processing files.

Transfer Delays

When the FTS system starts, it will immediately check for new files in the directories that have been registered with it. If the file is new, and has not already been transferred to an appropriate location, then the FTS will start the data migration process. HOWEVER, to prevent the FTS from attempting to transfer a file that is still open by a DAQ process (e.g. a file still being written by the datalogger, or a log file that is being held open by an active DAQ system) the FTS provides a "hold-off" setting that allows the user to select a mandatory delay between the modification time of the file and the current system time of the machine that the FTS is running on.

By default this hold-off is set to 25hrs to ensure that files being synced between the DAQ spool disks are not accidentally sent to archival storage while incomplete.

This hold off can be adjusted through the "scan-delay" configuration parameter. See Configuration.

Automatic FTS Start Up

FTS is designed to be an "always running" system that operates independently of other DAQ systems. The FTS does not require the normal NOVA DAQ systems to be running to accomplish its jobs of migrating files. As a result most users will have very little actual interaction with the FTS system aside from noting that is running and not reporting errors.

For the current deployment of the FTS at the NOvA near detector computing cluster, the FTS is installed on the second data spool disk (novadaq-ctrl-datadisk-02.fnal.gov) and is automatically started via the cron facility.

The actual startup sequence is placed in the crontab for the novaraw user and is executed on reboot.

Because the FTS relies on parts of the NOvA DAQ software distribution, it requires that the area that houses the SRT releases be mounted prior to FTS system startup. In particular on the NOvA cluster, /nova/novadaq needs to have been mounted prior to attempting to start the FTS. The crontab entry for the FTS on datadisk-02 has the following form which takes into account this requirement:

# Crontab for user novaraw @reboot while ! [ -d /nova/novadaq/ ]; do sleep 60; done && (other commands to execute)

In particular both the SAM station and FTS are started this way with the crontab entries:

@reboot while ! [ -d /nova/novadaq/ ]; do sleep 60; done && . /home/novaraw/setup_sam_prd.sh && ups start sam_bootstrap >& /dev/null # The below setup puts a local build dir on the path, as we need the MetaDataRunTool. Eventually it will become part of the release @reboot while ! [ -d /nova/novadaq/ ]; do sleep 60; done && . /home/novaraw/setup_sam_prd.sh && setup FileTransferService && start_fts /home/novaraw/fts_rundata fts_config.txt >& /dev/null

Current crontab is installed as novaraw

@reboot while ! [ -d /nova/novadaq/ ]; do sleep 60; done && . /home/novaraw/setup_sam_prd.sh && ups start sam_bootstrap >& /dev/null
# The below setup puts a local build dir on the path, as we need the MetaDataRunTool. Eventually it will become part of the release
@reboot while ! [ -d /nova/novadaq/ ]; do sleep 60; done && . /home/novaraw/setup_sam_prd.sh && setup FileTransferService && start_fts /home/novaraw/fts_rundata fts_config.txt >& /dev/null

File processing

Under the current configuration that is installed for the NOvA DAQ, the FTS system performs the following combination of functions:

  1. Copies all raw nova data files to central disk storage (Bluearc)
  2. Copies all raw nova data files to archival tape storage (Enstore)
  3. Copies all log nova daq log files to archival tape storage (Enstore)
  4. Bundles data files and log files into appropriate tar balls to meet the Enstore storage requirements
  5. Deletes log files which have been copied to archival storage and are more than 60 days old

The system also catalogs all files that undergo copy/transfer into the SAM metadata catalog.

In order to process each type of file the FTS system is required to have a method of generating or extracting metadata related to the file. For the current system the following tools are used to obtain the metadata:

Raw Data:

  • MetaDataRunTool from the NOvADAQ MetaDataTools package.

Log Files:

  • Internal generation from the FTS system itself

Note: the executable name MetaDataRunTool is currently hardwired into the FTS system. This should be moved to a config file to allow easy configuration of new data types and metadata extraction tools.

File Locations and filename patterns

Currently the FTS system looks in the following file tree locations for new files:

  • /daqlog/NDOS/
  • /data2/NDOS/

Only files that match predefined patterns are considered valid targets for the FTS to transfer and catalog. Under the current configuration the following filename patterns are used:

  • Data files: *.raw
  • Log Files: *.log *.gz

The system can additionally be configured to handle other types of files by adding additional filename masks to the appropriate sections of the FTS configuration file. See Configuration.

Filename patterns can also be excluded from FTS processing using a similar mechanism. Under the current setup the following filename patterns are excluded from FTS processing:

For raw data:

  • Excluded: SingleEventAtom*.raw

For Log files:

  • Excluded: ospl-*.log
  • Excluded: ospl-*.log.gz
  • Excluded: css-*.log
  • Excluded: alarmServer*.log

FTS Status and Logging

The FTS system provides two general interfaces for communicating the state that the system is operating in, and will provide more options for reporting in the second release version that will integrate with the existing NOvA DAQ system (Ganglia monitoring and Message Logging facilities)

General Logging

General logging is handled through a set of log files that are generated and rotated by the FTS system on a daily basis. The log files are written to the destination specified in the "log-file" variable in the FTS configuration files. For the NOVA DAQ deployment of the FTS this value has the default value that points to the local disk array that also houses the mirror copy of the raw data.

Log File:

  • Base directory: /data2/fts_logs
  • Base file name: fts.${hostname}

The fully enumerated file name that the FTS daemon generates has a date string appended to the base file name to give a final naming convention:

fts.${hostname}.YYYY-MM-DD.log

(e.g. fts.novadaq-ctrl-datadisk-02.fnal.gov.2011-09-28.log for the log file generated on 28SEP2011 while running on datadisk-02)

Status Webpage

The FTS system runs a simple embedded web server on port 8888, which can provide run time information to network clients. Under this interface the FTS provides a simple status page that shows a snapshot of the system and the files that it it has in its error and pending queues.

The status page can be accessed from a machine on the FNAL network through the URL:

http://novadaq-ctrl-datadisk-02.fnal.gov:8888/fts/status/

Which will provide a simple html table based representation of the data that is appropriate for browsing.

For scripts or programs that may want to access this status data directly, there is also a JSON based output format that can be requested from the server. To obtain the JSON output go to:

http://novadaq-ctrl-datadisk-02.fnal.gov:8888/fts/status?format=json

Clearing Errors and Retrying Failed Files

When the FTS system locates a new files it attempts to catalog and transfer the file to an appropriate location. If any portion of this process files, the file is placed in an error queue. The error state of the file is reported through the status interface and any information from external programs is included in that report.

When a file is placed in the error queue no further processing will be performed on it. To change its state, and have processing resume, the FTS must be manually told to reattempt processing of the file.

Retries can be instructed by a post to the FTS daemon's web server with the appropriate file information.

From a commandline, the curl utility can be used to make the simple post operation. (see cURL and LibcURL documentation for full option sets and usage)

curl --data filename=<name of file to retry> http://novadaq-ctrl-datadisk-02.fnal.gov:8888/fts/retryFiles

To retry all files in error:

curl --data all=1 http://novadaq-far-datadisk-01.fnal.gov:8888/fts/retryFiles

In this example the filename (myfile) can be the fully qualified location of the file (including directory prefix) or can be the normal filename of the file (including any type suffix such as .log or .raw)

If the retry operation is successful, the file will typically be moved from the error to the pending section of the status report as it waits to be merged or copied. If the operation was unsuccessful the status page will be updated with the details of the failure.

Remove a file

Remove first the file.

Run the retry. This will clear the file from the lists.

Status

General Status is available from:

http://novadaq-ctrl-datadisk-02.fnal.gov:8888/fts/status

/fts/status?format=json

To get a json formated return

Reboot

In the case of a reboot, if /nova/novadaq mount is not present when it tries to start then the start script get's confused.

Same is true of bluearc mounts, but here it just lists them in the errors on the status page.
Same is true of the daqlogs source area

Config

home directory in:
fts_rundata

fts_config.txt has the configuration files

Need to restart fts to have changes take affect.

Restarting

Source the setup files

source bin/setup_sam_prd.sh
stop_fts <fts config directory>
start_fts <fts config directory> <fts config file>

Example:

start_fts fts/fts_rundir-novadaq-far-datadisk-04.fnal.gov/ fts-novadaq-far-datadisk-04.fnal.gov.ini

(path then config file)

Tarballs

Currently tarballs are being written to the local disk for paranoia

they are in:

/data2/merge_logs/archive/
/data2/merge_raw/archive/

If there is a problem then this director may file up
/data2/merged_raw/build_tar/

For log files they are being built on /scratch in:

/scratch/mergedlogs/

......................................................
REWORK
......................................................

SAM Metadata definitions for the NOvA data files

Data Tier
This describes format of the data file:
  • raw
  • root

Run Type

Application Family

Application Name

Name of the application used to create the file.

  • online - for the raw data from the DataLogger.

Application Version

Version of the application
  • for the online the DataLogger version from the RawRunHeader is used.

Physical DataStream

The name for the datastream is taken from the Trigger mask. If the name for the trigger doesn't exist the trigger number is used (getTriggerCtrlID)

The name of the data stream must exist in the data_streams list in samweb for the experiment. Currently (2019-02-06) there are streams for many DDT triggers, all numbers from 0 to 100, and others. If a new stream needs to be recorded but does not exist in the list, it must be added.

  • Preparation for checking the streams:
    • Login to a novagpvm machine
    • run setup_nova
  • Check for stream:
    • samweb list-values data_streams
  • Add stream needed: (Requires admin privileges, Andrew, Pengfei, Peter, production group, maybe Leon
    • samweb add-value data_streams <new stream name>