Project

General

Profile

Milestone #7100

Fermigrid Bluearc Unmount Task Force

Added by Arthur Kreymer almost 5 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
10/02/2014
Due date:
07/04/2015
% Done:

90%

Estimated time:
10.00 h
Spent time:
Duration: 276

Description

This issue is to track overall progress on the
Fermigrid Bluearc Unmount Task Force,
https://cdcvs.fnal.gov/redmine/projects/fife/wiki/FermiGridBlue

History

#1 Updated by Arthur Kreymer almost 5 years ago

  • % Done changed from 0 to 10

#2 Updated by Arthur Kreymer almost 5 years ago

Date: Wed, 8 Oct 2014 10:55:49 -0500
From: Arthur Kreymer <>
To:
Subject: Fermigrid Bluerc Unmount task force documents

FYI, the working area for the Bluearc Unmount task force is

https://cdcvs.fnal.gov/redmine/projects/fife/wiki/FermiGridBlue

Input from all liaisons is particiularly important,
particuilary after I am back from vaction (now through Oct 20)
See you then.

But talk to the other members and each others early and often.

#3 Updated by Arthur Kreymer almost 5 years ago

Let's have a first full Taskforce meeting this Thursday Nov 13
at 15:00 after the FIFE meeting, in the same room, WH3NW.

Agenda

o Review of dismount impact summary, especially feedback from Liaisons
I will prepare a summary table early this week,
please feel free to update it.
o Review of future Taskforce workplans and organization

Notes

This is a summary of my notes from the Nov 13 meeting.
Please feel free to revise and extend as appropriate.

  • Add a link to the taskforce near the top of the FIFE wiki
  • Should update customer table with groups mentioned by Timm
    • marslbne
    • APC
    • Genie
  • The existing Fermicloud based Experiment FTP servers are limited.
    • supported 'best effort'
    • cannot presently handle the full rate of Bluearc traffic.
    • each has 60 MB/sec limit capacity ?
  • Should document various ftp servers
    • rates
    • authorization
  • Mention the direct Bluerac FTP option in the Strawman
  • Should document Dcache capacity and capabilities

#4 Updated by Arthur Kreymer over 4 years ago

  • Status changed from New to Assigned
  • % Done changed from 10 to 20

I have added summaries for Storage Services
Bluearc, DCache, GridFTP, ifdh

I have added a light overview in FGBDataImpact

I will continue to work, to fill in

FGBUsage
should describe proper usage model

FGBMounts
has link FG mount inventory, somewhat outdated
NEED link to current export map
and summary of non-FG mounts

FGBMon
Update to give links to actual monitor pages

FGBIssues
Give summary of classic patterns

FGBStraw
Discuss the strawman in more detail.
Timeline

#5 Updated by Arthur Kreymer over 4 years ago

  • % Done changed from 20 to 50

I have done a lot of updates to the WIKI pages.

WIKI

updated language from  Future to Present/Past
Updated APP discussion
Clarified CDF/D0 mounts
Summarized monitoring
Cleaned up impact table format
FGBcapBA - added outline
FGBcapDC - added outline
FGBUsage - trimmed to outline
FGBMon - added outline

FGBUsage
copied from the top WIKI and expanded

FGBcapDC - cannot modify/append, noted tape and SFA support

FGBMon
Updated to give links to actual monitor pages

FGBMounts
added link to current export map
added summary of Bluearc heirarchy

#6 Updated by Arthur Kreymer over 4 years ago

The main things needed before taking action to deploy some of the
measured discussed in the plan are :

o More input from CS Liaisons
o Release of ifdhc v1_7_1 as the 'current' UPS version.

I will draft a CSL email contact to solicit input,
and post it here for review.

Unfortunately I leave on vacation Jan 21 for 2 weeks,
just when we're ready to move ahead an a broader front.

#7 Updated by Arthur Kreymer over 4 years ago

Node gpgtest has been deployed, for user tests of OSG readiness.

Users can run their scripts interactively to verify no Bluearc data.
All GPCF users have access, with their normal home areas.

gpgtest is like a Fermigrid worker, but without Bluearc data mounted,
so more like an generic OSG node.

#8 Updated by Arthur Kreymer over 4 years ago

  • % Done changed from 50 to 80

The first draft report is at
http://cd-docdb.fnal.gov/cgi-bin/ShowDocument?docid=5522

Needs some cleanup and wordsmithing, a bit clearer introduction.
Would like to get this approved next week, we can proceed with unmounts.

#9 Updated by Arthur Kreymer over 4 years ago

  • Estimated time set to 10.00 h

Highly revised draft, please review. Much closer to final.

#10 Updated by Arthur Kreymer over 4 years ago

We should have a meeting this week to prepare the report for release.
I am in ITIL Certification class this week, Wed-Fri 08:30-16:30

I bet we don't want to meet at 07:30.
Would 16:30 today work ?

Also, please send mail to if you got this email from Redmine.
Sometimes there have been problems.

#11 Updated by Arthur Kreymer over 4 years ago

Updated the draft with improved Aux file description,
per Dave Dykstra.

I will connect to Readytalk at 16:45 today ( 10 minutes )
866 740 1260 meeting 395 1664
for a meeting with those available.

#12 Updated by Arthur Kreymer over 4 years ago

mengel and timm connected to the mini-meeting,
suggested a couple of updates to the document,
which are in internal version 0.2, DocDB version 5.

Added the Storage System summary section,copied from the WIKI.
Corrected typos and language per mengel.
Corrected FTP capability per timm.
Corrected AUX file language per dwd ( Marco is away )

#13 Updated by Gerard Bernabeu Altayo over 4 years ago

Hi,

sorry I couldn't make it to the meeting, my 2 cents on this:

1. We should include a section describing what data will need to remain on Bluearc (or similar solution) for each experiment. This should be any data that needs to be modified, all the rest should go in dCache and this should be clearly stated in the document.

2. The current gridftp based approach means that files can not be modified from the workernodes. To me this means that we could make all workernode related workflows go to dCache and add workflows that move data between dCache <->Bluearc whenever the data needs to be modified. We should detail where the data is modified from and what are the performance expectations.

3. CVMFS alien cache may not be available in the short/mid term on most cluster (including FNAL's). Other federated solutions (like XrootD) seem to provide a more prove and standard (HEP centric) solution for data distribution. It's important to remember network is (much) faster than (most) WorkerNode disks nowadays!

4. What is RBF? (page #4) Couldn't find any acronym that makes sense on http://en.wikipedia.org/wiki/RBF nor https://www.google.com/search?q=RBF

Gerard

#14 Updated by Arthur Kreymer over 4 years ago

I have posted version 0.8 .
This is in .docx format, as requested.

o RBF is clarified.
o Metrics are now up to date
o Added an Historical perspective
o Added Task Force membership

Detailed use cases remaining on BA should come from Liaisons,
who first need to see the draft report.
It should remain in Draft till CSL feedback.

I think we have no use cases in which files are Modified from grid nodes.
Direct access has never been permitted,
and the Aux file read case exception is all we plan to deal with.

#15 Updated by Arthur Kreymer over 4 years ago

DocdB Version 9, Draft 0.82

Used votava's improved format in docx
Replied to votava comments
Added motivatino to the Charge
Trimmed and bulleted APP inaction language
Removed Moot GPWN item from strawman
Added the Client Impact table
Moved technmical content to an APPENDIX
Moved Task Force Membership to a table in the APPENDIX

IOU - Description of data usage before the Proposal section

#16 Updated by Arthur Kreymer over 4 years ago

The file usage descript to precede the Plan is in progress.
Should be there early tomorrow.

#17 Updated by Arthur Kreymer over 4 years ago

Added Background before the Proposal

Created Fermigrid Bluearc mount summary table in WIKI and Report.
With AUX status columns, and overview of AUX files.

Corrected bullets back to dots from some odd 2 character code
\uf0f1

Corrected some typos in Storage Summary
sysetm -> system
procols -> protocols

#18 Updated by Arthur Kreymer over 4 years ago

CS-doc-5522 Rev 11 Version 0.9

Background
described BA volumes and usage ( grid/project, app/data )
Proposal
added bullet list overall ( added /grid/data )
clarified OSG incentive to use cvmfs for apps
Strawman
added text ... on Fermigrid Worker nodes
added /grid/data move from app to data head
clarified timing and reordered in time order
Table
added Impact column ( SML )
added microboone ( private grid )
add D0 prj_root BA and non -BA
noted nusoft app hosts CWS web pages

#19 Updated by Arthur Kreymer over 4 years ago

  • Assignee set to Arthur Kreymer

I would like to invite CS Liaisons to comment on the Report,
before the CSL meeting this Wednesday.

Is this OK with everyone ? Email would be like :

To:

Subject: Fermigrid Bluearc Umount plan - request for comments

The Fermigrid Bluearc Unmount Taskforce draft report is available at
http://cd-docdb.fnal.gov/cgi-bin/ShowDocument?docid=5522

We think this is complete except for offical feedback from the Liaisons.

Please review the estimated impact on your experiment,
so that we can start scheduling unmounts.

We have opened a Redmine Issue to track this feedback.

Feel free to contact any member of the Task Force,
or contribute directly to
https://cdcvs.fnal.gov/redmine/issues/8157

#20 Updated by Arthur Kreymer over 4 years ago

Draft 0.91

Changed header to :
CS Liaisons - please review this.
Contact any Task Force member or ,
or directly update
https://cdcvs.fnal.gov/redmine/issues/8157

Charge
missing space demonstratethat )

Historical Summary -
D0 to d0

Performance Metrics
missing space 5MB/secas

#21 Updated by Arthur Kreymer over 4 years ago

  • % Done changed from 80 to 90

Hearing no objection, I have sent the email inviting CSL comment.

I added all affected liaisons to the watch list of
https://cdcvs.fnal.gov/redmine/issues/8157

#22 Updated by Arthur Kreymer over 4 years ago

Date: Thu, 26 Mar 2015 13:14:21 -0500
From: Steven C Timm <>

Art--here are my comments on the doc.

Page 2:  "Many of these files is being moved to DCache now"

should be "are being"
also I believe the correct capitalization is dCache

page 3 "Blueac" is misspelled

page 4--you left in the part of "due to present lack of appropriate ftp
servers"
even though the gridftp servers have now been upgraded to be on faster
hardware

page 5--I know for a fact that numix does use some flux files.. I will try
to find out how many.
I'm pretty sure that genie does too.

page 5--whatever is large impact for marsmu2e will be large impact for
the other marses (marslbne and marsgm2) as well. should make sure Eric Stern
is on board and aware of this, he is liaison for mars now.

page 6--miniboone also still runs on fermigrid from time to time and when
they
do, they are doing direct bluearc writes, although not very big.  this is
pre-jobsub usage.

page 9--note that in fact there is both /grid/app and a /grid/fermiapp..the
two
areas exist because they have different permissions.  unlike all the
other apps, the /grid/app is writeable from the worker nodes due to long
standing legacy reasons, it is meant to be so that OSG users coming from
outside can update their application as the first part of the job.

Steve Timm

#23 Updated by Arthur Kreymer over 4 years ago

Version 13 - Draft 0.92 per timm feedback

corrected various typos
Corrected FTP server language to note need for SLA
Changed impact label from Size to Impact
Moved numix, genie to Medium impact
Changed large impact marsmu2e to mars*
Noted miniboone non-jobsub use of Fermigrid
Added /grid/app description in Background section
Added Issue 8157 - CSL comments
Added Task Force members as Authors

#24 Updated by Arthur Kreymer over 4 years ago

I have a request from Mike Kirby to present the final Umount report
at the FIFE meeting this Thu Apr 23.

CSL feedback is summarized in https://cdcvs.fnal.gov/redmine/issues/8157

We have direct input from Minos, Minerva, Mu2e, and Microboone.
I take silence as consent from the Liaisons.
I will add page numbers when converting the document to V1.0

Task force members -
please confirm your consent or disapproval by mail to me ( I will summarize ),
or in this Issue.
---------------------------
Date: Tue, 21 Apr 2015 14:26:56 -0500
From: Michael H Kirby <>
To: Arthur E Kreymer <>
Subject: BlueArc task force report this Thursday

With the FIFE projects meeting coming up this Thursday,
I believe that you had agreed to present the finalized version of the
BlueArc Unmounting impact paper at that meeting.
I wanted to make sure that you had gotten everything updated,
liaison feedback incorporated,
and sign off from all of the members of the working group.

Let me know if there are any issues or delays going forward,
otherwise I’d like to add you to the agenda for this Thursday.

#25 Updated by Arthur Kreymer over 4 years ago

To clarify, from the Liaisions,
Silence is Consent for the purpose of publishing the Report V1.0.

Umounts will be Change managed, via Standard Changes,
and will require detailed plans and close coordination with Liaisons.

#26 Updated by Arthur Kreymer over 4 years ago

  • Due date set to 07/04/2015

I need to hear from each Task Force member before delaring the report V1.0,
http://cd-docdb.fnal.gov/cgi-bin/ShowDocument?docid=5522

The V1.0 title is needed before the first Unmounts happen.

The latest change was made 3 weeks ago, March 30. over 3 weeks ago.

Please send your reply via email, phone, in person, or in
https://cdcvs.fnal.gov/redmine/issues/7100

Marco is away from the lab, so Dave Dykstra will reply for him.

So far :

OK - from mengel, romero

Taking a final look now - tamsett,dwd,litvinse,gerard1

Need to hear from - timm

#27 Updated by Arthur Kreymer over 4 years ago

There is an internal page summarizing approvals for V1.0
It is not linked to the Umount WIKI at present.

Use this URL
https://cdcvs.fnal.gov/redmine/projects/fife/wiki/FGBTASK

#28 Updated by Arthur Kreymer over 4 years ago

We have OK/NO from all but dwd ( for slyz ) and tamsett.
They are reviewing the document.

#29 Updated by Arthur Kreymer over 4 years ago

Date: Mon, 27 Apr 2015 11:17:41 -0500
From: Dave Dykstra <>

Hi Art,

It looks fine to me. The mention of using CVMFS alien cache for
auxiliary data is clearly indicated as being in testing, not production,
as it should be.

Dave

------------------------------------------
Date: Tue, 28 Apr 2015 13:48:26 -0500
From: Matthew Tamsett <>

Hi Art,

Sorry for the slow response. I’ve taken a look now and it all looks good to me.

Thanks for your patience!

Matthew

#30 Updated by Arthur Kreymer over 4 years ago

Date: Fri, 24 Apr 2015 17:06:30 -0500
From: Gerard Bernabeu <>
To: Arthur E Kreymer <>
Cc: "" <>, Steven C Timm <>, David Dykstra <>,
Marko J. Slyz <>, Andrew J. Romero <>, Margaret Votava <>,
Michael H Kirby <>, Dmitry O Litvintsev <>
Subject: Re: Fermigrid Bluearc Report - V1.0 approval needed

Hi,

I have some issues with the document as it stands today:

1. The report should be renamed to something like 'Avoiding direct access to the Bluearc from the WorkerNodes'. The
reason is that we are not really unmounting the BlueArc from the Workernodes.

2. The mid-term plan relies on certain 'if-gridftp' servers that store data in-and-out of the BlueArc. Despite these
servers are used and probably critical for some experiments' workflows, the if-gridftp service is not a properly
defined so I'm not sure about what support level it really has (best effort by Steve Timm?). If the mid-term plan
relies on this service, the plan should include a service status review.

3. I'm not sure we can count on 'Alien Cache CVMFS' as a solution for the FNAL clusters, as of today there is no plan
to deploy it.

Overall I think the plan described in the report points in the good direction of avoiding direct POSIX access from
the workernodes to a non-scalable system, however we still keep to non-scalable system (the Bluearc) in the picture.

In my opinion we should be a bit more ambitious with the mid and long-term vision and try to move the Data stored in
the BlueArc to dCache (or another system that meets the requirements) and, if needed, establish a mechanism to move
the data back and forth from the dCache highly scalable WORM (Write Once Read Many) space to some other space (I'm
assuming here a use case in which users need to modify/append files or something like that). If we do this, then we
can unmount the bluearc and actually call this report a 'Bluearc Unmount effort'.

The reasons stated in the document to keep data in the Bluearc are the following (page 2):

There remains a need for Bluearc style storage, with quota,
persistence, and low overhead for cases needing many small files.

I would like DMS to look at this and evaluate if dCache would be able to deal with this use case, probably a better
description of 'many small files' is needed.

have a good weekend,
  Gerard

#31 Updated by Arthur Kreymer over 4 years ago

The official release V1.0 is posted to DocDB.

Added subtitle for Gerard
Avoiding direct User access to Bluearc Data on Fermigrid

Added page numbers

#32 Updated by Arthur Kreymer over 3 years ago

Here is a draft note for the Dec FIFE newsletter.
I'll work to clean it up, suggestions are welcome :

Ending grid access to the Fermilab Bluearc data area -
So Long and Thanks for All the Files.

A long time ago in a cluster far, far away,
is was a period of rebellion against the limitations of
local batch clusters.

In 2009, the 3000 cores of the GP Grid Farm were
a vast improvement over the 50 core FNALU batch system.
GPGrid wes connected to the then-new Bluerc data system with
a 2 GBit network link.

A simple lock system deployed in late 2009, still in use today,
avoided head contention on the underlying Bluearc data system,
improving uptime in 2010 from 97% to 99.9997%

Alas, this was not to last. As new Minerva and Nova came online
in 2011/12, uptime dropped to 99.8%

Deployment of the ifdhc tools brought us back to 99.95%,
but there are new issues that locks cannot fix.

Our Bluearc servers have about 1 GBytes/second service capacity.
Single GPGrid nodes now have that much capacity.

We are now running as many as 30,000 user processes on Fermigrid,
sustaining over 3 GBytes/sec locally.

The dCache storage elements deployed in 2015
can handle this load, Bluearc cannot.

We will proceed this year with the Bluearc Unmount process described in
http://cd-docdb.fnal.gov/cgi-bin/ShowDocument?docid=5522
and
https://cdcvs.fnal.gov/redmine/projects/fife/wiki/FermiGridBlue

We will go farther, removing GridFTP access to Bluearc data areas,
and removing all access to Bluearc data areas from all grid workers.
See https://cdcvs.fnal.gov/redmine/projects/fife/wiki/FGB-DATASCHED

#33 Updated by Arthur Kreymer over 3 years ago

Bluearc unmounting from GPGrid nodes -
So Long and Thanks for All the Files.

A long time ago in a cluster far, far away,
is was a period of rebellion against the limitations of
local batch clusters.

In 2009, the 3000 cores of the GP Grid Farm were
a vast improvement over the 50 core FNALU batch system.
GPGrid was connected to the then-new Bluerc data system with
a 2 GBit network link.

A simple lock system deployed in late 2009, still in use today,
avoided head contention on the underlying Bluearc data system,
improving uptime in 2010 from 97% to 99.9997 %

Deployment of the ifdhc tools as new projects came on line
kept uptime fairly good, 99.95% in 2014.

But there are new issues that locks cannot fix.

Our Bluearc servers have about 1 GBytes/second service capacity.
Single GPGrid worker nodes now have that much capacity.

We are now running as many as 30,000 user processes on Fermigrid,
sustaining over 3 GBytes/sec locally.

The dCache storage elements deployed in 2015 can handle this load,
Bluearc cannot.

We need to proceed this year with the Bluearc Unmount process described in
http://cd-docdb.fnal.gov/cgi-bin/ShowDocument?docid=5522
and
https://cdcvs.fnal.gov/redmine/projects/fife/wiki/FermiGridBlue

We need to go farther, removing even GridFTP access to Bluearc data.
See https://cdcvs.fnal.gov/redmine/projects/fife/wiki/FGB-DATASCHED

The existing Bluerc data areas remain a valuable resource for
interactive work, where full Posix file access may be needed.

#34 Updated by Arthur Kreymer over 3 years ago

Added if-gridftp-dune and if-gridftp-lar1nd to the schedule per timm.

These are not mounted on Fermigrid, but are using if-gridftp servers.

lar1nd usage seems to be for app access, not presently being cut off.
Will need to clarify this in the plan.

#35 Updated by Arthur Kreymer over 3 years ago

Katherine -

Having recieved no suggested revisions,
please consider the the draft FIFE article to be submitted,
as posted at
https://cdcvs.fnal.gov/redmine/issues/7100#note-33

Advice on wordsmithing and optimizing would be welcome.

#36 Updated by Tanya Levshina over 2 years ago

  • Assignee changed from Arthur Kreymer to Michael Kirby

#37 Updated by Tanya Levshina over 2 years ago

  • Status changed from Assigned to Closed

It looks like this task failed because we are still discussing the unmount process



Also available in: Atom PDF