Project

General

Profile

Using CVMFS » History » Version 7

« Previous - Version 7/11 (diff) - Next » - Current version
Ashley Timmons, 02/29/2016 11:25 AM


Using CVMFS

Welcome to MINOS CVMFS!

CERN Virtual Machine File System is a distributed disk system for providing an experiment's code and libraries to interactive
node and grids worldwide. It is used by CMS and Atlas and well as most experiments at FNAL.

The code manager copies a code release to a CVMFS work space and "publishes" it. This process examines the code, compresses it,
and inserts it in a database. The original database is called the tier 0 copy. Remote sites may support tier 1 copies of the
database, synced to the tier 0.

The user's grid job sees a CVMFS disk mounted and containing a copy of the experiment's code, which can be accessed in any way
the code would be accessed on a standard disk. The disk is actually a custom nfs server with a small ( ~8 GB) local cache on
the node and a backend that sends file requests to a squid web cache. The squid may get its data from the tier 1 database, if
available, or from the tier 0. As a practical matter, most grid jobs do not access much in a release, usually just a small set
of shared object libraries, and these end up cached on the worker node, or on the squid, thereby avoiding a long-distance
network transfer.

CVMFS is efficienct only for distributing code and small data files which are required by a large number of nodes on the grid.
On the other hand, datasets, such as event data files, are many files which are each sent to only one node during a grid job.
CVMFS is not efficienct for this type of data distribution or for this sort of data volume. Data files should be distributed
through dCache, which is designed to deliver each file to one node, and to handle the data volume. A single large file which
is to be distributed to all nodes also need to be avoided since it would would churn or overflow the small local caches.
Examples of this sort of file are the Genie flux files or a large analysis fit template library.

The limitations of CVMFS can be found here (don't worr too much about this unless you maintain CVMFS)
http://cernvm.cern.ch/portal/filesystem/repository-limits

How to submit a job for OFFSITE

This section will explain what your job.sh script and submit.sh script should look like. Get into the habit of submitting jobs like this, this way works for both ONSITE and OFFSITE.

I have made some changes to the jobha.sh script that is loaded from performing.

setup_jobsub

The first change is to create a command

jobsub_offsite

This has the conditions

--resource-provides=usage_model=OFFSITE 
--site=Caltech,BNL,Michigan

The recommended sites from fife are --site=Wisconsin,Nebraska,Omaha,SU-OG,NotreDame,Caltech,BNL,UCSD,Michigan. However, i have been unable to get the other sites to work. Minor errors which need addressing.

A useful website to see a list of available sites is
https://cdcvs.fnal.gov/redmine/projects/fife/wiki/Information_about_job_submission_to_OSG_sites

This has a list of requirements that one must uphold to successfully have a job land on that site. By default, for now, jobsub_offsite uses

--disk=5GB --memory=1GB --expected-lifetime=2h

To submit a job the process is identical to onsite. Inside grid.sh the only command is

jobsub_offsite -100 ${path}submit.sh

This will send 100 jobs offsite to currently to BNL, Caltech and Michigan. This submit.sh script will work both off and on site. Therefore you could split your jobs up 50/50

jobsub_offsite -50 ${path}submit.sh
jobsub -50 ${path}submit.sh

Offsite jobs have a tendency to take a long time to ramp up - but they will start eventually if you have only a few jobs and want them done fast, I suggest getting them done onsite. Offsite is effective for large volumes.

Submit script