Wiki » History » Version 29
« Previous -
Version 29/38
(diff) -
Next » -
Current version
Andrew Norman, 02/19/2019 04:59 PM
- Table of contents
- Current (NOvA) FTS Instances
- Nova FTS Guide
- SAM Metadata definitions for the NOvA data files
Current (NOvA) FTS Instances¶
There are currently three FTS instances serving different aspects/needs of the NOvA DAQ, Offline and Assembly.
Instance | Host | Purpose |
DAQ (Far Detector) | http://novadaq-far-datadisk-01:8888/fts/status | DAQ Raw Data and logs |
|
http://novadaq-far-datadisk-02:8888/fts/status | DAQ Raw Data and logs |
DAQ (Far Detector) | http://novadaq-far-datadisk-03:8888/fts/status | DAQ Raw Data and logs, Assembly Laser Files |
DAQ (Far Detector) | http://novadaq-far-datadisk-04:8888/fts/status | DAQ Raw Data |
DAQ (Far Detector) | http://novadaq-far-datadisk-05:8888/fts/status | DAQ Raw Data |
DAQ (Near Detector) | http://novadaq-near-datadisk-01:8888/fts/status | DAQ Raw Data and logs |
DAQ (Near Detector) | http://novadaq-near-datadisk-02:8888/fts/status | DAQ Raw Data and logs |
http://novasamgpvm01.fnal.gov:8889/fts/status | DAQ Relay from far site | |
http://novasamgpvm01.fnal.gov:8888/fts/status | Offline production files |
FTS Destinations¶
Instances | Destinations | Notes |
DAQ (Near Det.) | Enstore, Bluearc | Automatic cleanup enabled for log files |
DAQ (Far Det.) | FNAL Relay station (Bluearc) | |
FNAL Relay/Offline | Enstore, Bluearc | Automatic cleanup enabled for Monte Carlo and Laser files |
Notes: The way that the Far detector -> FNAL Relay works is that the Far det station transfers to a director dropbox location on the Bluearc which is then used as the input dropbox area for the FTS Relay/Offline instance. The final SAM destination is checked against the SAM database. When these are registered correctly, then (and only then) the original files on the source machine are flagged for cleanup.
Configuration of the Raid Hardware for use with FTS¶
The FTS system is an IO intensive system by its nature (copying files back and forth, calculating checksums etc...) and as a result can consume large amounts of system resources if the underlying hardware is not configured correctly.
For the far detector raid arrays (novadaq-far-datadisk-01, 02, 03) the following should be done:
- The IO scheduling for the raid should be set to "deadline"
On an SLF6 machine this is done through the following command (where here the raid device is /dev/sdb)
To check the current scheduler you can do:
[root@novadaq-far-datadisk-03 ~]# cat /sys/block/sdb/queue/scheduler noop anticipatory [deadline] cfq
The default is cfq (completely fair queuing) which defeats the raid controller's own scheduling and results in lower data rates.
To set the scheduler type to "deadline" do:
> echo deadline > /sys/block/sdb/queue/scheduler
This does not persist across reboots, so a line has been added to the rc.local files for these machines.
Adding Files to the FTS System¶
Adding files to the far detector setup is a simple matter of adding a configuration block to the setup files used by the system.
Making a new File Family¶
From novasamgpvm01 (or any other machine that can see the pnfs):
> setup encp -q stken:x86_64 > enstore sfs --file-family nova
While in the NDOS area we would get:
> ensore sfs --file-family rawdata_NDOS
So we need to define a new directory and family for the far detector data. From the raw data directory (/pnfs/nova/rawdata/)
> mkdir FarDet > cd FarDet > enstore sfs --file-family nova > enstore sfs --file-family rawdata_FarDet
Handled File Classes¶
Each of the instances is configured to handle a differ set of file classes. Currently the following matrix
Nova FTS Guide¶
The File Transfer System (FTS) has been installed and is currently running in for the NOVA experiment. The FTS is run on machines that are a part of the data acquistion (DAQ) computing cluster and is currently being used to process raw data files and DAQ log files for the general DAQ system.
The FTS system is also being used as a transport layer to assist the assembly and alignment groups. Information on the setup of the FTS for the laser scanner can be found here: FTS for the Laser Scanner
Instructions on starting and stopping the FTS system for the Laser scanner can be found here Starting/Stopping the Laser Scanner FTS
The FTS system is also being used by the offline groups for transfers of analysis files. Information on the FTS setup for offline can be found here FTS for Offline Use
Installation¶
The FTS system and its external dependencies can be installed via the UPS/UPD mechanism. This process allows new instances of the FTS to be brought up quickly and minimizes version conflicts with existing packages that may be installed on a system.
The current FTS installation relies on the following external products which are available from UPS/UPD for the Linux 2.6 i386 and x86_64 architectures:
ups depend FileTransferService for deps
- python v2_4_6
- twisted_python v11_0
- sam v8_8_2
- sam_config v7_1_7
- sam_ns_ior v7_2_3
- samgrid_batch_adapter v7_2_7
The SAM station system:
- sam_station v8_8_20 Linux64bit+2.6
To generate a current listing of dependencies for the FTS system, you can directly query the UPS database with the following command:
> ups depend FileTransferService
Authentication¶
The FTS uses a set of x509 certificates to provide authentication to the storage servers that it communication with. These certificates must be installed and proxies must be generated off of them on a regular basis for the system to function properly.
To obtain a proxy:
Generate a certificate request
openssl req -nodes -newkey rsa:2048 -keyout $(hostname --fqdn)_key.pem -out $(hostname --fqdn).csr -subj "/DC=org/DC=incommon/C=US/ST=IL/L=Batavia/O=Fermi Research Alliance/OU=Fermilab/CN=$(hostname --fqdn)"
This will create a csr file as well as a private key file. Save these both!
Next use your csr file to obtain an actual certificate from your certificate authority (Note: At the current time this part of the process is not well defined)
Once you obtain the actual certificate the distinguished name (DN) or subject must be registered in different places. First it must be registered with the VOMS servers under an individual's name.- Go to the Fermilab service desk https://fermi.service-now.com/wp/
- Select the "Request Something" (seriously that is what the button is called)
- Then select "scientific job management"
- This will show a button labeled "Add a certificate DN to user account", select this.
- Fill out the form and include the subject of the certificate
To extract the subject from a certificate:
> openssl x509 -in fts-novadaq-far-datadisk-2019-cert.pem -noout -subject subject= /DC=org/DC=incommon/C=US/ST=IL/L=Batavia/O=Fermi Research Alliance/OU=Fermilab/CN=novadaq-far-datadisk-04.fnal.gov
Next the DN also needs to be registered with SAM.
- Go to https://samweb.fnal.gov:8483/sam/nova/admin/users/
- You will need to be a SAM admin
- Add the DN to the "novaraw" user
Next, copy the certificate and key into the fts/certs directory under the novaraw account. Make sure that the key is only user readable (i.e. chmod 600 thekey.pem)
Attempt to obtain a valid proxy using the certificate and key:
voms-proxy-init -rfc --cert ~/fts/certs/fts-novadaq-far-datadisk-2019-cert.pem --key ~/fts/certs/fts-novadaq-far-datadisk-2019-key.pem -voms fermilab:/fermilab/nova/Role=raw -order /fermilab/nova/Role=Raw -out ~/myproxy
This won't work. There is up to an hour delay currently between when you register the DN and when the VOMS picks it up. So wait...eventually it will work.
Now make the proper symlinks for each of the FTS machines that are going to use this.
Restart the FTS on each machine that now has a current certificate.
Make a note of when this was done. Certificates are good for approximately 1 year. Near the end of this time you will want to renew your cert for another year. If you cert is valid (and the certificate provider hasn't changed) it may be possible to renew the certificate using the certificate. If the certificate is expired then the process will need to be repeated, but if the DN hasn't changed then there is no need to update the VOMS or SAM admin servers.
FTS Setup¶
Because the FTS uses UPS with a proper dependency chain, all that is required to initialize and setup the FTS software system is a single ups setup command:
> setup FileTransferService
Once the system has been setup, the FTS daemon can be started.
The FTS system currently relies on several external tools for extracting metadata from the files that it processes. These packages are experiment specific and must be made available via the standard execution path, and must be able to link any additional libraries via the library paths.
To setup the NOvA DAQ specific software that the FTS will use, source the standard online scripts and optionally define a test release to use for local modifications. In the case of the installation of the FTS on novadaq-datadisk-01 the initialization sequence is:
source /nova/novadaq/setup/setup_novadaq_nt1.sh
cd /home/novaraw/build/dev_tools/
srt_setup -a
This will setup the novadaq environment and correctly set the $SRT_PUBLIC_CONTEXT and the $SRT_PRIVATE_CONTEXT environment variables and push them onto the correct paths.
Then to initialize the FTS and SAM products do the following:
export PRODUCTS=$HOME/sam/db:$PRODUCTS
export SETUPS_DIR=$HOME/sam/etc
Which add the UPS database with the SAM products to the ups paths. Now actually setup the products:
setup ups
setup sam
setup FileTransferService
Then you can start the actual deamon using the config files (e.g. fts_config.txt) found in the config directory (/home/novaraw/fts_rundata/)
start_fts /home/novaraw/fts_rundata fts_config.txt
The FTS system will spin up and start processing files.
Transfer Delays¶
When the FTS system starts, it will immediately check for new files in the directories that have been registered with it. If the file is new, and has not already been transferred to an appropriate location, then the FTS will start the data migration process. HOWEVER, to prevent the FTS from attempting to transfer a file that is still open by a DAQ process (e.g. a file still being written by the datalogger, or a log file that is being held open by an active DAQ system) the FTS provides a "hold-off" setting that allows the user to select a mandatory delay between the modification time of the file and the current system time of the machine that the FTS is running on.
By default this hold-off is set to 25hrs to ensure that files being synced between the DAQ spool disks are not accidentally sent to archival storage while incomplete.
This hold off can be adjusted through the "scan-delay" configuration parameter. See Configuration.
Automatic FTS Start Up¶
FTS is designed to be an "always running" system that operates independently of other DAQ systems. The FTS does not require the normal NOVA DAQ systems to be running to accomplish its jobs of migrating files. As a result most users will have very little actual interaction with the FTS system aside from noting that is running and not reporting errors.
For the current deployment of the FTS at the NOvA near detector computing cluster, the FTS is installed on the second data spool disk (novadaq-ctrl-datadisk-02.fnal.gov) and is automatically started via the cron facility.
The actual startup sequence is placed in the crontab for the novaraw user and is executed on reboot.
Because the FTS relies on parts of the NOvA DAQ software distribution, it requires that the area that houses the SRT releases be mounted prior to FTS system startup. In particular on the NOvA cluster, /nova/novadaq needs to have been mounted prior to attempting to start the FTS. The crontab entry for the FTS on datadisk-02 has the following form which takes into account this requirement:
# Crontab for user novaraw
@reboot while ! [ -d /nova/novadaq/ ]; do sleep 60; done && (other commands to execute)
In particular both the SAM station and FTS are started this way with the crontab entries:
@reboot while ! [ -d /nova/novadaq/ ]; do sleep 60; done && . /home/novaraw/setup_sam_prd.sh && ups start sam_bootstrap >& /dev/null
# The below setup puts a local build dir on the path, as we need the MetaDataRunTool. Eventually it will become part of the release
@reboot while ! [ -d /nova/novadaq/ ]; do sleep 60; done && . /home/novaraw/setup_sam_prd.sh && setup FileTransferService && start_fts /home/novaraw/fts_rundata fts_config.txt >& /dev/null
Current crontab is installed as novaraw
@reboot while ! [ -d /nova/novadaq/ ]; do sleep 60; done && . /home/novaraw/setup_sam_prd.sh && ups start sam_bootstrap >& /dev/null # The below setup puts a local build dir on the path, as we need the MetaDataRunTool. Eventually it will become part of the release @reboot while ! [ -d /nova/novadaq/ ]; do sleep 60; done && . /home/novaraw/setup_sam_prd.sh && setup FileTransferService && start_fts /home/novaraw/fts_rundata fts_config.txt >& /dev/null
File processing¶
Under the current configuration that is installed for the NOvA DAQ, the FTS system performs the following combination of functions:
- Copies all raw nova data files to central disk storage (Bluearc)
- Copies all raw nova data files to archival tape storage (Enstore)
- Copies all log nova daq log files to archival tape storage (Enstore)
- Bundles data files and log files into appropriate tar balls to meet the Enstore storage requirements
- Deletes log files which have been copied to archival storage and are more than 60 days old
The system also catalogs all files that undergo copy/transfer into the SAM metadata catalog.
In order to process each type of file the FTS system is required to have a method of generating or extracting metadata related to the file. For the current system the following tools are used to obtain the metadata:
Raw Data:
- MetaDataRunTool from the NOvADAQ MetaDataTools package.
Log Files:
- Internal generation from the FTS system itself
Note: the executable name MetaDataRunTool is currently hardwired into the FTS system. This should be moved to a config file to allow easy configuration of new data types and metadata extraction tools.
File Locations and filename patterns¶
Currently the FTS system looks in the following file tree locations for new files:
- /daqlog/NDOS/
- /data2/NDOS/
Only files that match predefined patterns are considered valid targets for the FTS to transfer and catalog. Under the current configuration the following filename patterns are used:
- Data files: *.raw
- Log Files: *.log *.gz
The system can additionally be configured to handle other types of files by adding additional filename masks to the appropriate sections of the FTS configuration file. See Configuration.
Filename patterns can also be excluded from FTS processing using a similar mechanism. Under the current setup the following filename patterns are excluded from FTS processing:
For raw data:
- Excluded: SingleEventAtom*.raw
For Log files:
- Excluded: ospl-*.log
- Excluded: ospl-*.log.gz
- Excluded: css-*.log
- Excluded: alarmServer*.log
FTS Status and Logging¶
The FTS system provides two general interfaces for communicating the state that the system is operating in, and will provide more options for reporting in the second release version that will integrate with the existing NOvA DAQ system (Ganglia monitoring and Message Logging facilities)
General Logging¶
General logging is handled through a set of log files that are generated and rotated by the FTS system on a daily basis. The log files are written to the destination specified in the "log-file" variable in the FTS configuration files. For the NOVA DAQ deployment of the FTS this value has the default value that points to the local disk array that also houses the mirror copy of the raw data.
Log File:
- Base directory: /data2/fts_logs
- Base file name: fts.${hostname}
The fully enumerated file name that the FTS daemon generates has a date string appended to the base file name to give a final naming convention:
fts.${hostname}.YYYY-MM-DD.log
(e.g. fts.novadaq-ctrl-datadisk-02.fnal.gov.2011-09-28.log for the log file generated on 28SEP2011 while running on datadisk-02)
Status Webpage¶
The FTS system runs a simple embedded web server on port 8888, which can provide run time information to network clients. Under this interface the FTS provides a simple status page that shows a snapshot of the system and the files that it it has in its error and pending queues.
The status page can be accessed from a machine on the FNAL network through the URL:
http://novadaq-ctrl-datadisk-02.fnal.gov:8888/fts/status/
Which will provide a simple html table based representation of the data that is appropriate for browsing.
For scripts or programs that may want to access this status data directly, there is also a JSON based output format that can be requested from the server. To obtain the JSON output go to:
http://novadaq-ctrl-datadisk-02.fnal.gov:8888/fts/status?format=json
Clearing Errors and Retrying Failed Files¶
When the FTS system locates a new files it attempts to catalog and transfer the file to an appropriate location. If any portion of this process files, the file is placed in an error queue. The error state of the file is reported through the status interface and any information from external programs is included in that report.
When a file is placed in the error queue no further processing will be performed on it. To change its state, and have processing resume, the FTS must be manually told to reattempt processing of the file.
Retries can be instructed by a post to the FTS daemon's web server with the appropriate file information.
From a commandline, the curl utility can be used to make the simple post operation. (see cURL and LibcURL documentation for full option sets and usage)
curl --data filename=<name of file to retry> http://novadaq-ctrl-datadisk-02.fnal.gov:8888/fts/retryFiles
To retry all files in error:
curl --data all=1 http://novadaq-far-datadisk-01.fnal.gov:8888/fts/retryFiles
In this example the filename (myfile) can be the fully qualified location of the file (including directory prefix) or can be the normal filename of the file (including any type suffix such as .log or .raw)
If the retry operation is successful, the file will typically be moved from the error to the pending section of the status report as it waits to be merged or copied. If the operation was unsuccessful the status page will be updated with the details of the failure.
Remove a file¶
Remove first the file.
Run the retry. This will clear the file from the lists.
Status¶
General Status is available from:
http://novadaq-ctrl-datadisk-02.fnal.gov:8888/fts/status
/fts/status?format=json
To get a json formated return
Reboot¶
In the case of a reboot, if /nova/novadaq mount is not present when it tries to start then the start script get's confused.
Same is true of bluearc mounts, but here it just lists them in the errors on the status page.
Same is true of the daqlogs source area
Config¶
home directory in:
fts_rundata
fts_config.txt has the configuration files
Need to restart fts to have changes take affect.
Restarting¶
Source the setup files
source bin/setup_sam_prd.sh stop_fts <fts config directory> start_fts <fts config directory> <fts config file> Example: start_fts fts/fts_rundir-novadaq-far-datadisk-04.fnal.gov/ fts-novadaq-far-datadisk-04.fnal.gov.ini (path then config file)
Tarballs¶
Currently tarballs are being written to the local disk for paranoia
they are in:
/data2/merge_logs/archive/
/data2/merge_raw/archive/
If there is a problem then this director may file up
/data2/merged_raw/build_tar/
For log files they are being built on /scratch in:
/scratch/mergedlogs/
......................................................
REWORK
......................................................
SAM Metadata definitions for the NOvA data files¶
Data Tier
This describes format of the data file:
- raw
- root
Run Type¶
Application Family¶
Application Name¶
Name of the application used to create the file.
- online - for the raw data from the DataLogger.
Application Version¶
Version of the application- for the online the DataLogger version from the RawRunHeader is used.
Physical DataStream¶
The name for the datastream is taken from the Trigger mask. If the name for the trigger doesn't exist the trigger number is used (getTriggerCtrlID)
The name of the data stream must exist in the data_streams list in samweb for the experiment. Currently (2019-02-06) there are streams for many DDT triggers, all numbers from 0 to 100, and others. If a new stream needs to be recorded but does not exist in the list, it must be added.
- Preparation for checking the streams:
- Login to a novagpvm machine
- run setup_nova
- Check for stream:
- samweb list-values data_streams
- Add stream needed: (Requires admin privileges, Andrew, Pengfei, Peter, production group, maybe Leon
- samweb add-value data_streams <new stream name>